How do we set up Guarded Rollouts in LaunchDarkly Guardian to auto-pause or roll back when guardrail metrics fail?
Feature Management Platforms

How do we set up Guarded Rollouts in LaunchDarkly Guardian to auto-pause or roll back when guardrail metrics fail?

8 min read

Moving fast in production should not mean gambling with your error budget. Guarded Rollouts in LaunchDarkly Guardian let you ship changes quickly while automatically pausing or rolling back when guardrail metrics fail—no redeploys required.

Quick Answer: You set up Guarded Rollouts in LaunchDarkly Guardian by (1) defining which feature flag rollout to guard, (2) attaching key metrics and performance thresholds, and (3) choosing automated actions (pause or rollback) when those thresholds are breached. Guardian then monitors your rollout in real time and reacts for you.

The Quick Overview

  • What It Is: Guarded Rollouts in LaunchDarkly Guardian are runtime-controlled, progressive feature rollouts that are continuously monitored against performance and reliability guardrails, with automatic pause or rollback when things go wrong.
  • Who It Is For: Engineering, SRE, and product teams that want to ship often, limit blast radius, and avoid 2am fire drills by tying releases directly to real-time metrics.
  • Core Problem Solved: Releases fail because teams don’t see regressions early enough and can’t react fast without redeploying. Guarded Rollouts close that loop—detecting issues in milliseconds and reverting automatically, before customers feel the full impact.

How It Works

Guarded Rollouts layer intelligent monitoring and automation on top of LaunchDarkly’s feature flags and progressive delivery. You configure which changes are “guarded,” connect them to critical metrics (errors, latency, business KPIs), define thresholds and windows, and tell Guardian what to do when those thresholds are exceeded.

Guardian then watches every evaluation in real time across your environments, using LaunchDarkly’s global network (100+ points of presence, 45T+ flag evals/day, <200ms flag changes worldwide) to detect regressions quickly and enforce your rules—without restarting services or shipping new code.

A typical flow looks like this:

  1. Plan & Configure the Guarded Rollout:
    Choose the flag, define the rollout pattern, and attach guardrails (metrics, thresholds, and monitoring windows).

  2. Release & Observe in Real Time:
    Gradually increase exposure while Guardian tracks performance thresholds, error rates, and other metrics per variation.

  3. React Automatically (Pause or Roll Back):
    If guardrail metrics fail, Guardian triggers your configured action: auto-pause the rollout, roll back to a safe variation, and notify the right teams.

1. Plan & Configure the Guarded Rollout

This is where you decide what “safe” looks like.

  • Pick the feature flag and environment

    • Use an existing rollout flag (for example checkout_v2_enabled) in your staging or production environment.
    • Ensure the flag is wired through one of LaunchDarkly’s 25+ SDKs or via MCP/CLI for your service.
  • Define the rollout strategy

    • Set up a progressive rollout: e.g., 1% → 5% → 25% → 50% → 100%.
    • Target by user or system attributes: plan type, region, OS, app version, or internal cohorts (e.g., staff canary).
    • This limits blast radius from the beginning.
  • Attach Guardian protection

    • Mark this rollout as “guarded” so Guardian monitors it explicitly, not just as a regular flag change.

2. Connect Guardrail Metrics & Thresholds

Guardian is only as effective as the metrics you give it. The goal is to track the signals that indicate “this release is hurting reliability or users.”

  • Choose sources for your metrics

    • LaunchDarkly Observability SDKs and events
    • OpenTelemetry pipelines
    • Error and performance tools like Sentry
    • Custom business events via SDK or API
  • Set performance thresholds in real time
    Common guardrail examples:

    • Error rate: “If 5xx error rate for the new variation exceeds 1% over 5 minutes, trigger rollback.”
    • Latency / performance: “If p95 response time increases by 200ms compared to baseline over 10 minutes, pause rollout.”
    • Availability: “If success rate drops by more than 0.5% relative to control, roll back.”
    • User experience: “If front-end LCP/INP/CLS crosses thresholds tracked by the observability SDK, pause.”

    LaunchDarkly gives you out-of-the-box templates and flexible monitoring windows—from multi-day slow-burn issues down to minutes for fast regressions.

  • Define monitoring windows and sensitivity

    • Short windows (5–15 minutes) for canaries and high-risk changes.
    • Longer windows (1–24 hours) for subtle performance drifts or regional rollouts.
    • You can tune thresholds to balance false positives vs. catching issues early.
  • Map metrics to variations

    • Ensure metrics are segmented by flag variation (on/off, variant A/B) so Guardian can distinguish “new code” behavior from baseline.

3. Configure Auto-Pause or Auto-Rollback Actions

With guardrails in place, you decide how aggressive Guardian should be.

  • Choose the automated action
    For each metric/threshold, configure:

    • Auto-pause:

      • Freeze the rollout at the current exposure.
      • No additional users receive the new variation while you investigate.
      • Useful when you suspect noise or want manual confirmation before rollback.
    • Auto-rollback:

      • Immediately switch all exposed traffic back to the safe variation.
      • Applies to current and future evaluations.
      • Best when the degradation is clear and you want “recover instantly” behavior.
  • Define scope and blast radius

    • Global: roll back for all users in the environment.
    • Scoped: roll back only in specific segments (for example, a single region or platform) where the metric is failing.
  • Set up notifications and incident hooks

    • Configure Regression Detection alerts to send to Slack or PagerDuty.
    • Optionally open incidents in your incident management tool so the on-call can investigate.
    • Because Guardian has already paused or rolled back, these incidents are about understanding the regression, not stopping an active outage.
  • Enforce governance and approvals

    • Use release pipelines, policies, approvals, and audit logs so that only authorized teams can change Guardian settings or override an auto-pause/rollback.
    • This ensures your guardrails can’t be casually disabled when release pressure is high.

Features & Benefits Breakdown

Core FeatureWhat It DoesPrimary Benefit
Guarded RolloutsLinks feature flag rollouts to real-time performance thresholds and metricsPrevents risky releases from ever reaching most customers; minimizes blast radius.
Automated Pause & Rollback (Guardian)Automatically pauses or reverts rollouts when guardrail metrics failDetects application issues and instantly recovers, without redeploying or manual intervention.
Real-Time Monitoring & AlertsTracks errors, latency, and custom metrics with regression alertsGives on-call clear, actionable insight tied directly to the release that caused the regression.

Ideal Use Cases

  • Best for high-risk production changes:
    Because it lets you ship behind feature flags, roll out progressively, and rely on Guardian to auto-rollback if error rates or latency spike.

  • Best for teams standardizing safe release practices:
    Because it gives you a consistent, policy-driven way to guard every rollout—across services, teams, and environments—without needing every engineer to build their own canary logic.

Limitations & Considerations

  • Metrics coverage must be in place:
    Guardian can only guard what it can see. Make sure critical services emit the right events via SDKs, OpenTelemetry, or your error/performance tools before relying on auto-rollbacks.

  • Guardrails aren’t a substitute for testing:
    Guarded Rollouts catch regressions in production, but they don’t replace unit, integration, or load testing. Treat them as a last line of defense, not the only one.

Pricing & Plans

Guardian and Guarded Rollouts are part of the broader LaunchDarkly runtime control platform. Pricing typically aligns with:

  • Feature management and experimentation needs
  • Number of MAUs / flag evaluations
  • Governance and observability depth

For specific details, it’s best to talk to our team so we can match Guardian’s capabilities to your release workflows and environments.

  • Team / Growth-style plans: Best for product and engineering teams needing progressive delivery, basic Guarded Rollouts, and developer-first tooling to de-risk frequent releases.
  • Enterprise plans: Best for organizations needing Guardian at scale, with enterprise governance (custom roles, policies, audit logs), advanced observability integration, and support for complex, multi-team release pipelines.

Frequently Asked Questions

Do Guarded Rollouts only work with new feature launches?

Short Answer: No. You can guard any runtime change controlled by a feature flag.

Details:
Guarded Rollouts are most visible during new feature launches, but they work for any change behind a flag:

  • Backend behavior toggles
  • Configuration switches (e.g., rate limits, timeouts)
  • UI variants or layout changes
  • AI Configs and prompt/model switches

If a behavior is controlled by a flag and you can attach metrics to it, Guardian can monitor it and auto-pause or roll back when guardrails fail.


How fast can Guardian pause or roll back a failing rollout?

Short Answer: Flag changes propagate globally in under 200ms, so Guardian can react in effectively real time once a threshold is breached.

Details:
LaunchDarkly operates with 99.99% uptime, 45T+ daily flag evaluations, and 100+ points of presence. That means:

  • When Guardian decides to pause or roll back, it’s just a flag update.
  • That update is pushed worldwide with <200ms flag changes.
  • Your services, through 25+ SDKs, react on the next evaluation—no redeploys, no config file changes, no restarts.

In practice, this is the difference between a small, contained blip and a headline-making outage.

Summary

Guarded Rollouts in LaunchDarkly Guardian turn release risk into a controlled, observable process. You ship behind feature flags, progressively increase exposure, and let Guardian enforce your guardrails with automated pause or rollback when metrics fail. Because everything happens at runtime—with sub-200ms flag updates—you can move at the pace your business demands while keeping 2am fire drills off the calendar.

Next Step

Get Started