
How do we implement a kill switch pattern with LaunchDarkly across multiple microservices?
Move fast with microservices, stay in control when they misbehave. That’s the whole point of a kill switch pattern with LaunchDarkly: a way to instantly turn off risky behavior in production, across services and regions, without a redeploy or a 2am fire drill.
Quick Answer: A kill switch pattern with LaunchDarkly is a shared feature flag (or set of flags) that each microservice checks at runtime to decide whether to run risky code paths. When something goes wrong, you flip the flag once in LaunchDarkly and every service stops the behavior in under 200ms worldwide—no redeploys required.
The Quick Overview
- What It Is: A runtime control pattern where you use LaunchDarkly feature flags as global or scoped kill switches that every relevant microservice evaluates in production to enable, disable, or degrade functionality instantly.
- Who It Is For: Engineering teams running distributed systems (microservices, edge services, background workers) that need a reliable, low-latency safety lever when new features or dependencies start failing.
- Core Problem Solved: Reduces blast radius and mean time to recovery (MTTR) for failures by decoupling “stop the bad behavior” from “ship another deploy,” allowing you to remediate incidents in real time.
How It Works
You implement kill switches by wiring LaunchDarkly SDKs into each microservice and standardizing how they evaluate flags before running risky logic. Instead of hard-coding behavior or relying on config files, you evaluate a flag at runtime and branch:
- If the kill switch is on, run the feature or behavior.
- If the kill switch is off, skip it, fall back, or degrade gracefully.
Because LaunchDarkly evaluates flags via a global Flag Delivery Network (100+ points of presence, 99.99% uptime, <200ms flag changes worldwide), a single toggle propagates near-instantly to all services.
A typical microservice kill switch pattern follows three phases:
-
Design & Model the Kill Switch:
- Decide what you’re protecting: a feature, a dependency, a downstream service, or an entire traffic path.
- Create one or more flags in LaunchDarkly (for example,
payment-service.enable_stripe,global.checkout_enabled). - Define default behavior if LaunchDarkly is unavailable (safe off vs safe on).
-
Instrument Microservices with SDKs:
- Add LaunchDarkly’s native SDKs to each service (Node, Java, Go, .NET, Python, etc.).
- Evaluate the kill switch flag as close as possible to the risky operation.
- Implement fallbacks so the system fails safe when the flag is off.
-
Operate & Automate Rollbacks:
- Use the LaunchDarkly UI, API, CLI, or MCP to flip kill switches during incidents.
- Configure flag triggers from your observability tooling so metrics can automatically turn a feature off.
- Use audit logs, policies, and approvals to control who can pull the lever.
Phase 1: Design & Model the Kill Switch
Start from the blast radius. Ask: “When this fails, what do we need to stop—and where?”
Common kill switch scopes in microservices:
- Global product kill switch
global.app_enabled– disables a major feature or module across front-end and back-end services. - Service-level kill switch
billing.service_enabled– turns off an entire microservice (e.g., bypass billing, respond with 503, or redirect). - Dependency-level kill switch
payment.use_stripe– stops calling a flaky third-party API and falls back to another provider or queue. - Feature-path kill switch
recommendations.v2_enabled– turns off a new algorithm path while keeping the rest of the service alive.
Key modeling decisions:
- Flag type: Use a boolean for straightforward kill switches (
true= enabled,false= off). - Naming: Keep it explicit and scoped:
service.action_enabledis easier to reason about during an outage. - Default behavior: If the SDK can’t reach LaunchDarkly, do you:
- Fail closed (default to off, safest blast radius but may disable more than necessary).
- Fail open (default to on, more availability but less safety).
- Targeting scope: Decide if you need per-environment flags (
prod,staging) or per-customer/segment flags (e.g., canary to 5% first).
Phase 2: Instrument Microservices with LaunchDarkly SDKs
Copy, paste, guard the risky path. Kill switches only work if every microservice evaluates them consistently.
1. Initialize the SDK in each service
Example in Node.js:
import LDClient from 'launchdarkly-node-server-sdk';
const ldClient = LDClient.init(process.env.LD_SDK_KEY);
await ldClient.waitForInitialization();
console.log('LaunchDarkly initialized');
Each service uses its own SDK key and environment in LaunchDarkly (e.g., prod, staging).
2. Define a consistent context
Use a LaunchDarkly context that reflects how you want to target:
const context = {
kind: 'service',
key: 'payment-service',
name: 'Payment Service',
environment: 'prod'
};
If you want customer- or tenant-specific kill switches, include those attributes too:
const context = {
kind: 'user',
key: customerId,
plan: customerPlan,
region: customerRegion
};
3. Evaluate the kill switch at runtime
Wrap the risky call:
const killSwitchOn = await ldClient.variation(
'payment-service.enable_stripe',
context,
false // default value if flag not found or SDK not ready
);
if (!killSwitchOn) {
// Kill switch engaged – fallback behavior
return queuePaymentForLater(order);
}
// Risky path – Stripe call
return chargeViaStripe(order);
Key practices:
- Check flags at the edge of the behavior, not once at boot. You want real-time changes, not stale decisions.
- Keep the default safe. If you’re protecting a brittle dependency, default to
false(off) and explicitly turn it on. - Cache where appropriate. For very hot paths, you can use LaunchDarkly’s built-in streaming (SSE) to get updates; avoid your own heavy polling.
4. Extend across languages and services
Because LaunchDarkly has 25+ native SDKs, you use the same flag across:
- API gateways and BFFs
- Core microservices
- Background workers and cron jobs
- Mobile and web clients (when appropriate)
Every service checks the same flag key, but may implement service-specific fallbacks.
Phase 3: Operate & Automate Rollbacks
Flip a kill switch if something breaks. With the pattern in place, you now operate via LaunchDarkly, not the deployment pipeline.
Manual control during incidents
When an incident is live:
- Open the flag in LaunchDarkly (
payment-service.enable_stripe). - Toggle it off for production (or reduce rollout to 0%).
- Observe behavior stabilizing across all services—propagation within ~200ms globally.
- Use observability dashboards to verify error rates, latency, and user impact.
You’ve just decoupled incident remediation from redeploys.
Automated kill switches with flag triggers
To remove humans from the critical path, you can:
- Integrate LaunchDarkly with your APM/observability tools (Datadog, New Relic, etc.).
- Configure flag triggers so when a metric breaches a threshold (error rate, latency, CPU), a webhook tells LaunchDarkly to flip the flag.
For example:
- Error rate > 5% on
payment-servicein prod triggers:payment-service.enable_stripe→ OFF
- Latency p95 > 1s on
recommendationstriggers:recommendations.v2_enabled→ OFF
This is a kill switch that pulls itself, giving you near real-time automated rollback without redeploys.
Governance and audit
In large organizations, you’ll want to control who can flip kill switches:
- Use custom roles, policies, and approvals so only on-call SREs or designated owners can turn off high-risk flags.
- Rely on audit logs to see:
- Who changed the flag
- When
- From which surface (UI, API, CLI, MCP)
- What targeting rules changed
This turns kill switches into a governed part of your release pipeline, not shadow config.
Features & Benefits Breakdown
| Core Feature | What It Does | Primary Benefit |
|---|---|---|
| Global kill switches | Boolean flags evaluated at runtime across services to enable/disable paths | Instantly stop bad behavior across microservices in prod |
| Flag triggers & integrations | Connect flags to observability metrics via webhooks | Automated rollback when error/latency thresholds breach |
| Flag Delivery Network | 100+ PoPs, SSE streaming, <200ms flag changes worldwide | Reliable, low-latency updates with 99.99% uptime |
| Targeting & segmentation | Roll out or roll back per environment, region, tenant, or user segment | Limit blast radius and run canaries safely |
| Governance & audit logs | Policies, approvals, and full change history on flags | Safe, compliant control over high-impact kill switches |
| 25+ native SDKs | SDKs for major languages and platforms, plus CLI, API, MCP support | Use the same pattern across all microservices and clients |
Ideal Use Cases
-
Best for protecting cross-service features:
Because a single kill switch (for example,checkout.flow_enabled) can be evaluated by the front-end, API gateway, payment service, and inventory service, you contain failures in a complex flow without coordinating redeploys. -
Best for guarding external dependencies:
Because you can wrap risky third-party calls (payments, search, recommendations) with flags and automatically turn them off when latency or errors spike, you keep your core system healthy even when vendors are down. -
Best for progressive rollouts in microservices:
Because you can roll out a feature to a small percentage of traffic across multiple services, then scale up as metrics stay healthy, you reduce blast radius while keeping your deployment pipeline simple.
Limitations & Considerations
-
Not a substitute for good architecture:
A kill switch won’t fix poor timeouts, retries, or circuit breakers. It complements them. Design your microservices to degrade gracefully, then use flags as the control plane for behavior. -
Requires consistent adoption:
Kill switches only work across microservices if every relevant service checks the flag. Establish conventions (flag naming, context structure, default behavior) and bake them into templates or shared libraries. -
Adds operational surfaces:
Feature flags are powerful, but they become their own form of technical debt if unmanaged. Use LaunchDarkly’s flag lifecycle tools (TTL, tags, cleanup) to retire kill switches when they’re no longer needed.
Pricing & Plans
LaunchDarkly is priced by environments, MAUs, and feature sets (Feature Flags, Experimentation, Observability, AI Configs). For kill switch patterns across microservices, you’re primarily using Feature Flags plus integrations.
Typical guidance:
- Team / Growth tiers: Best for product and platform teams needing reliable kill switches, gradual rollouts, and incident control for a few core services.
- Enterprise tiers: Best for organizations with many microservices and compliance requirements—needing advanced governance (custom roles, approvals), higher volume, SLAs, and deeper integration with observability and incident management tooling.
For current details, talk to LaunchDarkly directly or request a demo.
- Feature Flags-focused plan: Best for engineering teams needing runtime control, kill switches, and progressive rollouts without full experimentation or observability built in.
- Platform / Enterprise plan: Best for organizations needing an integrated runtime control plane across kill switches, experimentation, and observability with strong governance and scale.
Frequently Asked Questions
How many kill switches should we have across microservices?
Short Answer: Start coarse (per feature or dependency), then introduce more granular flags where incidents repeatedly hit.
Details:
Over-flagging can create confusion during incidents; under-flagging expands blast radius. A practical pattern:
- Global or product-level kill switch – for “big red button” scenarios.
- Per-service kill switches – for critical services like
payments,search,checkout. - Per-experiment/per-version kill switches – for high-risk algorithm changes or new flows.
Standardize this in your architecture guidelines so engineers know when to add a kill switch vs when a configuration or environment variable is enough.
How do we avoid performance issues from flag checks?
Short Answer: Use LaunchDarkly’s server-side SDKs with streaming; evaluations are in-memory and designed for high QPS.
Details:
LaunchDarkly’s architecture uses SSE streaming and a global Flag Delivery Network so that:
- Flag configurations are pulled and cached by the SDK.
- Evaluations happen locally in the process, not over the network per request.
- Flag updates propagate within ~200ms worldwide, with 99.99% uptime.
Design tips:
- Initialize the SDK once per service process, not per request.
- Use built-in caching and streaming instead of custom polling.
- For ultra-hot paths, consider evaluating once per request lifecycle rather than multiple times in nested calls, unless you need per-call distinctions.
Can kill switches be tenant-specific or region-specific?
Short Answer: Yes. Use LaunchDarkly’s targeting rules and context attributes to limit or expand the kill switch blast radius.
Details:
You can define rules like:
- Disable
checkout.flow_enabledonly for:region == "eu-west-1", orplan == "beta", or- specific Enterprise customers.
Each microservice passes a context that includes tenant, region, or plan. LaunchDarkly evaluates the flag per-context, so a kill switch can be:
- Global: Turn the feature off for everyone.
- Segmented: Turn it off only where it’s failing (a region, a cohort, a single large customer).
This lets you protect the majority of users while investigating issues in a constrained slice of traffic.
Summary
A kill switch pattern with LaunchDarkly turns your microservices architecture into something you can actually control in production:
- You decouple release from deploy, evaluating flags at runtime.
- You reduce blast radius by guarding risky paths with kill switches.
- You recover instantly by flipping one flag instead of coordinating redeploys across services.
- You automate rollback with flag triggers wired to your observability stack.
Instead of hoping your next deploy fixes the problem, you change behavior in production in under 200ms—no redeploys required.