Flaky Test Quarantine

Flaky Test Quarantine is the process of identifying unstable tests and temporarily removing them from automated pipelines while maintaining visibility and accountability. This improves CI signal quality, reduces false failures, and helps engineering teams focus on relevant regressions. Quarantined tests are isolated from gating criteria but tracked for triage, fixing, or deletion.

Background and History of Flaky Test Quarantine

Flaky tests, those that fail inconsistently without code changes, have long been a pain point in test-driven development. As CI/CD pipelines became the norm, developers began ignoring test failures due to frequent false alarms, eroding trust in the system. Engineering organizations such as Google and Facebook were early to adopt automated detection and quarantine workflows to restore confidence in their pipelines.

Flaky test quarantine practices have evolved to include automatic re-runs, flake dashboards, and tagged quarantines within test frameworks. While the strategy doesn't eliminate flakes, it makes them manageable, preventing one unstable test from blocking dozens of engineers.

Goals of Flaky Test Quarantine

Quarantine practices improve CI signal quality and unblock teams by reducing the impact of unstable tests. Specifically, they address:

  • Flaky Tests Ignored, by preventing developers from dismissing all test failures as noise.
  • Pipeline Downtime, by reducing false red builds that stall merges or deploys.
  • Change Failure Rate, by preserving the value of CI as a regression safety net.
  • Developer Trust Erosion, where teams lose faith in test automation and delay responses.

The goal is to separate real failures from environmental noise without abandoning long-term quality standards.

Scope of Flaky Test Quarantine

Flaky test quarantine is typically applied to:

  • Unit, integration, or system tests that fail intermittently
  • Tests that pass after re-running without changes
  • Tests that fail inconsistently across environments

The process includes:

  • Detection through repeated test failure patterns or retry success
  • Quarantine tagging to exclude from CI gating or quality thresholds
  • Triage ownership and tracking via dashboards or work queues

Some teams define a threshold for flakiness (e.g., >5 percent fail rate across runs) or require N-of-M pass results before quarantine. A key principle is that quarantine is temporary, teams must define how and when to fix, stabilize, or remove the test.

Metrics to Track Flaky Test Quarantine Effectiveness

MetricPurpose
Flaky Tests Ignored Tracks how often failures are dismissed as non-actionable—quarantine should lower this.
Pipeline Success Rate Improves as flaky tests are removed from gating paths.
Pipeline Downtime Reduces when flaky test failures no longer block builds or merges.
Change Failure Rate Stabilizes when regressions are no longer hidden behind noise.

Teams may also track flake backlog size, median quarantine duration, or time to resolution.

Flaky Test Quarantine Implementation Steps

To quarantine effectively, engineering teams need a detection mechanism, tracking visibility, and clear ownership. Here’s how to get started:

  1. Enable retry-on-failure in CI – Configure rerun logic for tests that fail nondeterministically.
  2. Log retry patterns and mark flakes – Tag tests that pass after retries or have high variance across builds.
  3. Exclude confirmed flakes from gating – Remove them from merge blockers but not from execution or logging.
  4. Track quarantined test set – Use dashboards to surface flakes by frequency, ownership, and duration.
  5. Define fix or retire criteria – Set rules for triage (e.g., remove after 30 days if not fixed).
  6. Alert owners and link to test history – Make flaky test triage part of sprint or platform backlog grooming.
  7. Monitor CI health post-quarantine – Ensure signal-to-noise ratio improves without hiding real failures.

Gotchas in Flaky Test Quarantine

Without tight processes, quarantining can create new risks:

  • Using quarantine as a permanent fix – Avoid turning it into a graveyard for bad tests.
  • Failing to notify owners – Flaky tests need accountability, not invisibility.
  • Quarantining too aggressively – Some tests that fail for legitimate edge cases get filtered out.
  • Loss of observability – Hidden tests still consume time but no longer contribute meaningful signal.
  • CI masking regressions – When too many tests are muted, real bugs can slip through unnoticed.

Limitations of Flaky Test Quarantine

Flaky test quarantine is a mitigation, not a root cause fix. It doesn’t:

  • Eliminate test flakiness at the source (e.g., async bugs, infra variability)
  • Validate test correctness - flaky tests can still be invalid or redundant
  • Prevent all signal loss - quarantined tests no longer provide gating confidence

It’s also limited in environments without retryable CI or test traceability. In such cases, test flakiness remains a human triage burden.

Despite these limitations, when paired with flake detection and triage ownership, quarantine dramatically improves test signal quality and CI reliability.