Flaky Test Rate

Flaky Test Rate measures the percentage of automated tests that fail inconsistently, passing or failing under identical conditions. It reflects the stability of the test suite and its ability to provide reliable signals during development and CI/CD.

Calculation

A test is considered flaky when it fails intermittently without any change to the underlying code. These failures are typically detected through test retries, repeated builds, or patterns in test history.

The metric is calculated as:

flaky test rate = number of flaky test failures ÷ total test executions × 100

Goals

Flaky Test Rate helps teams detect and reduce noise in their automated testing systems. It answers questions like:

Are we wasting time re-running builds due to unreliable test outcomes?
Are flaky tests eroding developer trust in the test suite?
Which parts of our test infrastructure are unstable?

Reducing flakiness improves signal quality, speeds up feedback loops, and increases confidence in test results. For foundational insight, see Google’s research on test flakiness and its impact.

Variations

Flaky Test Rate may also be referred to as Test Stability, Test Flake Frequency, or Intermittent Failure Rate. Common breakdowns include:

By test type, such as unit, integration, or end-to-end
By framework, like Jest, JUnit, or Cypress
By system or component, to identify where instability clusters
By environment, such as local, CI, staging, or production
By frequency of retry, to surface tests that consistently require multiple runs

Some teams also track Flake Density, which measures the number of flaky tests per 1,000 test cases or test runs.

Limitations

Flaky Test Rate identifies inconsistency. Some flaky tests may be harmless (e.g. timing delays), while others mask critical issues.

It also depends on detection methods. Flakes caught by retries are easier to spot than those that fail occasionally but pass manually.

To better understand how test flakiness affects delivery, combine this metric with:

Complementary Metric	Why It’s Relevant
Pipeline Success Rate	Highlights whether flaky tests are contributing to failed or blocked builds
First-Time Pass Rate	Reveals whether test flakiness is forcing developers to re-run pipelines
Review Latency	Shows whether test reliability is slowing down approvals and merge readiness

Optimization

Improving Flaky Test Rate involves isolating unstable test behaviors, improving infrastructure, and enforcing hygiene practices in test automation.

Log and track flaky tests. Maintain a flaky test dashboard or tagging system to flag repeat offenders and prioritize fixes
Use retry logic with caution. Retrying builds may hide instability. Treat repeated passes as an indicator not a workaround
Stabilize test environments. Ensure consistency in runtime environments, data, and infrastructure to reduce non-deterministic behavior
Review test dependencies. Flaky tests often depend on shared state, network timing, or slow services. Use mocks or isolation where possible
Hold flaky tests to a high bar. If a test frequently fails for reasons unrelated to the code, disable it temporarily or mark it as non-blocking until fixed

Flaky Test Rate is more than a CI annoyance it’s a signal that reliability debt is accumulating. Reducing it ensures test results reflect reality and lets engineers ship with clarity and trust.