Flaky Test Rate
Flaky Test Rate measures the percentage of automated tests that fail inconsistently, passing or failing under identical conditions. It reflects the stability of the test suite and its ability to provide reliable signals during development and CI/CD.
Calculation
A test is considered flaky when it fails intermittently without any change to the underlying code. These failures are typically detected through test retries, repeated builds, or patterns in test history.
The metric is calculated as:
flaky test rate = number of flaky test failures ÷ total test executions × 100
Goals
Flaky Test Rate helps teams detect and reduce noise in their automated testing systems. It answers questions like:
- Are we wasting time re-running builds due to unreliable test outcomes?
- Are flaky tests eroding developer trust in the test suite?
- Which parts of our test infrastructure are unstable?
Reducing flakiness improves signal quality, speeds up feedback loops, and increases confidence in test results. For foundational insight, see Google’s research on test flakiness and its impact.
Variations
Flaky Test Rate may also be referred to as Test Stability, Test Flake Frequency, or Intermittent Failure Rate. Common breakdowns include:
- By test type, such as unit, integration, or end-to-end
- By framework, like Jest, JUnit, or Cypress
- By system or component, to identify where instability clusters
- By environment, such as local, CI, staging, or production
- By frequency of retry, to surface tests that consistently require multiple runs
Some teams also track Flake Density, which measures the number of flaky tests per 1,000 test cases or test runs.
Limitations
Flaky Test Rate identifies inconsistency. Some flaky tests may be harmless (e.g. timing delays), while others mask critical issues.
It also depends on detection methods. Flakes caught by retries are easier to spot than those that fail occasionally but pass manually.
To better understand how test flakiness affects delivery, combine this metric with:
Complementary Metric | Why It’s Relevant |
---|---|
Pipeline Success Rate | Highlights whether flaky tests are contributing to failed or blocked builds |
First-Time Pass Rate | Reveals whether test flakiness is forcing developers to re-run pipelines |
Review Latency | Shows whether test reliability is slowing down approvals and merge readiness |
Optimization
Improving Flaky Test Rate involves isolating unstable test behaviors, improving infrastructure, and enforcing hygiene practices in test automation.
-
Log and track flaky tests. Maintain a flaky test dashboard or tagging system to flag repeat offenders and prioritize fixes
-
Use retry logic with caution. Retrying builds may hide instability. Treat repeated passes as an indicator not a workaround
-
Stabilize test environments. Ensure consistency in runtime environments, data, and infrastructure to reduce non-deterministic behavior
-
Review test dependencies. Flaky tests often depend on shared state, network timing, or slow services. Use mocks or isolation where possible
-
Hold flaky tests to a high bar. If a test frequently fails for reasons unrelated to the code, disable it temporarily or mark it as non-blocking until fixed
Flaky Test Rate is more than a CI annoyance it’s a signal that reliability debt is accumulating. Reducing it ensures test results reflect reality and lets engineers ship with clarity and trust.