How do we implement a kill switch pattern with LaunchDarkly across multiple microservices?
Feature Management Platforms

How do we implement a kill switch pattern with LaunchDarkly across multiple microservices?

11 min read

Most teams only care about a kill switch after a release goes sideways. The advantage of LaunchDarkly is that you can wire a single kill switch pattern across all your microservices and recover in seconds—no redeploys required, no 2am fire drill.

Quick Answer: A kill switch pattern with LaunchDarkly uses a shared feature flag that all relevant microservices evaluate at runtime. When something breaks, you flip (or auto-trigger) that flag once in LaunchDarkly, and every service reacts instantly—rolling back or disabling the risky behavior in under 200ms worldwide.

The Quick Overview

  • What It Is: A runtime kill switch pattern using LaunchDarkly feature flags and flag triggers to instantly disable risky code paths across multiple microservices.
  • Who It Is For: Engineering, SRE, platform, and DevOps teams running distributed systems that need fast, coordinated incident response without redeploys.
  • Core Problem Solved: Reduces blast radius and time-to-mitigation when a release or dependency fails, by providing a single control surface instead of per-service hotfixes.

How It Works

At a high level, you model the kill switch as a LaunchDarkly flag (or small set of flags) that every relevant microservice evaluates at runtime. Instead of hardwiring the behavior into your deployment pipeline, each service reads flag state from LaunchDarkly’s Flag Delivery Network via an SDK. When you or an automated trigger change the flag, all services see the update in <200ms and switch behavior immediately.

You’re essentially decoupling “turn this off” from “ship a new build.”

  1. Design the kill switch surface:

    • Decide what you’re turning off: one feature, an entire dependency, or an external integration.
    • Create a shared flag in LaunchDarkly (for example, payments-service.enabled or external-search.useNewIndex).
    • Define what “off” means in each microservice (fallback behavior, degraded mode, or full stop).
  2. Instrument every microservice with the flag:

    • Add the LaunchDarkly SDK in each service (pick from 25+ native SDKs: Java, .NET, Node.js, Go, Python, etc.).
    • Evaluate the flag on the hot path where the risky behavior occurs.
    • Implement branching logic: if flag is off, route to safe behavior; if on, run the new path.
  3. Operationalize controls and automation:

    • Wire the flag into your incident response runbooks and on-call playbooks.
    • Add flag triggers from your observability/APM stack so metrics can auto-toggle the kill switch.
    • Use policies, approvals, and audit logs in LaunchDarkly to govern who can flip what, and when.

How to Implement the Kill Switch Pattern Step-by-Step

1. Model the kill switch in LaunchDarkly

Think in terms of runtime behavior, not environment variables.

Typical patterns across microservices:

  • Global kill switch for a feature:
    One flag (e.g., checkout-new-flow.enabled) read by multiple services (API gateway, checkout service, email service).

  • Service-level kill switch:
    One flag per dependency/critical subsystem (e.g., payments-stripe.enabled, search-es-cluster.enabled).

  • Path-level kill switch:
    Flags around risky code paths (e.g., recommendations-ai.enabled, image-processor.newPipelineEnabled).

Key choices when creating the flag:

  • Flag key:
    Use a clear, stable naming convention, e.g., serviceName.capability.enabled.

  • Flag variations:

    • Boolean kill switch: true = enabled, false = disabled.
    • Optional: use multivariate flags if you need more complex behavior (e.g., normal, degraded, maintenance), but start with boolean for simplicity.
  • Environments:
    Mirror your deployment environments: dev, staging, prod.
    Test kill switch behavior in lower environments before you rely on it in production.

2. Integrate the LaunchDarkly SDKs in each microservice

LaunchDarkly’s architecture uses server-sent events and a global Flag Delivery Network (100+ points of presence) to propagate changes in under 200ms, with 99.99% uptime. To benefit from this, each service should:

  • Initialize the SDK on startup using a service-specific SDK key.
  • Identify the evaluation context (often a system/user context with a key like service-payments).
  • Evaluate the kill switch in the part of the code you want to control.

Example pattern (pseudocode, applicable across languages):

// Initialization on service startup
ldClient = LDClient(sdkKey)

// Health check: ensure SDK is ready
if !ldClient.initializedWithin(timeout) {
  // Decide: fail fast, log, or run in safe mode
}

// Handling a request
function handleRequest(request) {
  context = LDContext(key="service-payments")

  killSwitchOn = ldClient.boolVariation("payments-stripe.enabled", context, true)

  if (!killSwitchOn) {
    // Kill switch disabled the risky path: use fallback
    return handlePaymentInSafeMode(request)
  }

  // Normal behavior
  return processPaymentViaStripe(request)
}

Best practices:

  • Default value: Choose the fallback for when LaunchDarkly is unreachable. For kill switches, that’s usually the safe behavior (e.g., default to false to disable the risky path).
  • Avoid per-request initialization: Initialize the client once and reuse it.
  • Graceful degradation: If the SDK can’t connect, define whether you prefer:
    • “Safe by default” (kill risky behavior), or
    • “Operational by default” (keep feature on).

3. Align behavior across microservices

A kill switch only works if services agree on what “off” means.

  • Document the behavior per service:

    • What APIs will return (error vs fallback).
    • What logs and metrics you emit when the switch is off.
    • Any client-side UX changes (e.g., hide a button, show a banner).
  • Add consistency checks:

    • Integration tests that simulate the flag being off and verify every service behaves as expected.
    • Contract tests between services to validate responses in kill-switch mode.
  • Use shared libraries or middlewares:

    • Wrap the LaunchDarkly evaluation in a shared helper (client SDKs live in your language of choice), so kill switch usage is consistent across codebases.

4. Add observability and flag triggers

The kill switch pattern is much more powerful when it’s wired directly to signal:

  • Integrate with your monitoring/APM tools:
    LaunchDarkly integrates with popular observability and APM solutions so you can link flag changes to metrics, logs, traces, and dashboards.

  • Define thresholds for automated remediation:
    For example:

    • Error rate for /payments/charge > 3% for 5 minutes.
    • p95 latency for search-query exceeds 2 seconds.
    • Spike in 5xx responses for a single region.
  • Configure flag triggers:
    Flag triggers push the kill switch a step further by automatically deactivating broken code. When your monitoring tool detects an issue and fires an alert, that alert can call LaunchDarkly’s trigger URL to flip the flag—resolving the incident in near real time, without waiting on a human.

Pattern:

  1. Create a trigger for payments-stripe.enabled in LaunchDarkly.
  2. Configure your APM/monitoring alert to call that trigger when conditions are met.
  3. Verify in a non-prod environment that the alert flips the flag and all services react correctly.

You’ve now automated part of your incident response.

5. Govern who can flip the kill switch

Across multiple microservices, flag changes are effectively production changes. You want speed and guardrails:

  • Use policies and custom roles:

    • Limit kill switch control to on-call SREs, senior engineers, or incident commanders.
    • Create roles like “Incident Responder” that can toggle specific flags but not modify targeting rules or other environments.
  • Require approvals for high-risk changes:

    • In LaunchDarkly, use approvals on production flags so non-emergency changes require review.
    • For kill switches, you might allow emergency changes but log them heavily.
  • Audit everything:

    • Use audit logs to track who toggled the kill switch, when, and why.
    • Add a standard “incident ID” or ticket link in flag change comments for traceability.

6. Bake kill switches into your incident runbooks

A kill switch is only useful if people know it exists.

  • Runbook sections:

    • “Verify related flags in LaunchDarkly.”
    • “If X metric spikes, toggle serviceX.enabled off and observe recovery.”
    • “If automated flag trigger fires, confirm behavior and start RCA.”
  • Chaos drills and game days:

    • Simulate an outage and practice using the kill switch to recover.
    • Validate all services respond appropriately, dashboards update, and on-call can see the impact.
  • Training:

    • Walk new engineers through LaunchDarkly’s UI, how to find critical flags, and how to use previews and simulations safely.

Features & Benefits Breakdown

Core FeatureWhat It DoesPrimary Benefit
Feature flags as kill switchesProvide a single flag that controls risky behavior in multiple microservices at runtime.Reduce blast radius and time-to-mitigation without redeploying.
Flag triggers & observability integrationsConnect flags to metrics and alerts to auto-toggle when thresholds are breached.Automate incident remediation in near real time, avoiding manual fire drills.
Global Flag Delivery NetworkPropagates flag changes via streaming in <200ms globally with 99.99% uptime.Ensure every service responds to the kill switch almost instantly, even at scale.
Policies, approvals, and audit logsGovern who can change what, require approvals, and track every flag change.Maintain control and compliance while preserving fast incident response.
Targeting & segmentationLimit kill switch impact to specific segments, tenants, regions, or environments.Run controlled rollbacks or partial disables instead of global shutdowns.

Ideal Use Cases

  • Best for distributed, microservice-heavy architectures:
    Because it provides a unified control plane to coordinate behavior across dozens of services without code pushes or complex orchestration.

  • Best for high-risk features and external dependencies:
    Because it lets you instantly disable AI-powered features, payment flows, search backends, or third-party integrations when they misbehave—without touching the deployment pipeline.


Limitations & Considerations

  • Requires consistent adoption:
    The kill switch only protects the services that actually evaluate the flag. You need a rollout plan to integrate the SDK and kill switch logic into each microservice. Start with your most critical paths and expand.

  • Designing safe defaults is non-trivial:
    Choosing whether “flag not available” should mean “disable feature” or “keep it on” is a product and SRE decision. Document these choices and test them (including LaunchDarkly outage simulations) so you’re not surprised in production.


Pricing & Plans

LaunchDarkly offers plans aligned to team size, usage, and governance needs. While exact pricing depends on your scale and requirements, the common pattern for kill switch implementations looks like this:

  • Team or Growth-style plans: Best for product and engineering teams needing reliable feature flags, kill switches, and progressive rollouts across a handful of services and environments.

  • Enterprise plans: Best for organizations with many microservices, strict compliance requirements, and complex governance—needing custom roles, advanced policies, audit trails, and higher flag volume across multiple business units.

To understand which plan fits your microservices footprint, it’s usually helpful to map:

  • Number of services and environments you want covered by kill switches.
  • Expected flag volume and evaluation rate.
  • Governance and compliance needs (e.g., approvals, audit, SSO, custom roles).

Frequently Asked Questions

How many kill switch flags should we create for a microservices architecture?

Short Answer: Start with one kill switch per critical feature or dependency, not per service, and expand only when you have concrete reasons to be more granular.

Details:
In a typical microservices setup, you’ll have a combination of:

  • Feature-level flags (e.g., checkout-new-flow.enabled) evaluated by multiple services (frontend, API, payment, email).
  • Dependency-level flags (e.g., payments-stripe.enabled, search-es-cluster.enabled) that wrap external systems or risky internal subsystems.

If you create flags per service for the same behavior, you’ll end up with a fragmented control plane and inconsistent behavior during incidents. Instead:

  • Define the kill switch around the behavior or dependency you want to control.
  • Use that same flag across services that participate in that behavior.
  • Introduce per-service flags later only when you need a service-specific override or distinct degradation mode.

How do we avoid performance overhead from flag checks in hot paths?

Short Answer: Use LaunchDarkly’s in-memory SDKs, initialize once per service, and keep evaluations simple; you’re not making a network call per request.

Details:
LaunchDarkly SDKs are designed for hot-path use in production:

  • Local evaluation: SDKs maintain an in-memory cache of flag configurations, updated via streaming connections from the Flag Delivery Network. Evaluations are local operations, typically microseconds-level.
  • Single initialization: Initialize the SDK once when the service starts and reuse it. Don’t create a new client per request.
  • Efficient targeting: If you’re worried about additional logic, keep kill switch targeting simple (often global booleans). Use more complex targeting (segments, tenants, regions) only where it matters.
  • Monitoring: You can track latency and overhead via your existing observability stack and adjust placement or patterns if needed. For most teams, the tradeoff is heavily in favor of the control you gain versus the negligible evaluation cost.

Summary

Implementing a kill switch pattern with LaunchDarkly across multiple microservices gives you a centralized, runtime control surface for your riskiest behavior. Instead of patching each service during an incident, you flip (or auto-trigger) a single feature flag and rely on LaunchDarkly’s Flag Delivery Network to propagate the change globally in under 200ms—no redeploys required.

By modeling kill switches as shared flags, instrumenting each service with the SDK, wiring flags into your observability stack, and governing access with policies and audit logs, you turn “2am fire drills” into controlled, predictable operations. Releases can move faster because you have a reliable way to shrink blast radius and recover instantly when things go wrong.

Next Step

Get Started