Change Failure Rate

Change Failure Rate (CFR) measures the percentage of production deployments that result in a failure requiring remediation. It reflects how reliably teams ship changes and how often deployments trigger incidents, rollbacks, or hotfixes.

In the AI era, CFR still serves the same purpose, but “changes” may include more than application code. If model versions, prompts, retrieval configurations, or agent behaviors can alter production outcomes, they should be treated as deployable change types and included (or at least segmented) in CFR reporting.

How do you calculate Change Failure Rate (CFR)?

This metric is calculated by dividing the number of failed deployments by the total number of deployments over a defined reporting window.

A failed deployment is typically defined as one that results in service degradation, a rollback, a user-facing bug, or an incident requiring emergency remediation. Teams should establish a consistent definition of failure that matches their operational reality to ensure accurate comparisons over time.

cfr = failed deployments ÷ total deployments × 100

AI and agentic AI considerations (definition + counting):

  • Include non-code production changes when relevant. If you ship changes through model releases, prompt template updates, retrieval/index updates, policy/guardrail updates, or agent workflow changes, decide whether those count as “deployments” for CFR (or track a parallel CFR for AI changes).
  • Define “remediation” clearly for AI incidents. Remediation may look like a model rollback, prompt revert, disabling an agent capability, tightening guardrails, or turning off an automated action path rather than a traditional code hotfix.
  • Set an attribution window. Decide how long after a deployment you still attribute incidents to that change (e.g., within 24–72 hours), and apply it consistently. This matters more for AI changes where failure can show up as gradual quality degradation rather than an immediate outage.

What is Change Failure Rate used for?

CFR helps teams monitor the reliability of their release process and assess the risk associated with frequent deployments. It answers questions like:

  • How often do changes we deploy introduce user-facing problems?
  • Are our current validation and quality gates catching issues early enough?
  • Is our delivery velocity undermining release stability?

Tracking CFR highlights whether the delivery process is producing predictable, safe outcomes or requiring frequent reactive fixes. For more background, see the DORA Metrics guide on Change Failure Rate.

AI and agentic AI considerations (what CFR reveals):

  • A rising CFR can indicate that automated change creation (AI-assisted coding, agent-generated PRs, auto-merges, auto-deploys) is outpacing your validation and rollout controls.
  • CFR can also surface gaps in evaluation coverage for AI-driven behavior (for example, when changes pass unit tests but fail in real user interactions).

What are common variations of Change Failure Rate?

CFR can be segmented to provide deeper insights:

  • By environment, comparing failure rates across production and pre-production stages
  • By deployment type, such as hotfixes versus planned releases
  • By severity, including only failures that trigger incidents or customer-visible outages

Some teams also calculate median or percentile CFR over multiple windows to reduce skew from outlier events. Others tie failures to specific repos, teams, or service owners to identify where release risk is concentrated.

AI and agentic AI segmentations (often high-signal):

  • By change type: application code vs. infrastructure vs. model version vs. prompt/config vs. retrieval/index changes
  • By change origin: primarily human-authored vs. AI-assisted vs. agent-authored changes
  • By rollout strategy: all-at-once vs. progressive delivery (canary / phased rollout) vs. feature-flagged enablement
  • By automation level: manual approvals vs. “hands-off” pipelines (useful for detecting when autonomy outpaces controls)

What are the limitations of Change Failure Rate?

CFR tells you how often things go wrong after release, but not why. It does not differentiate between small glitches and major outages, nor does it account for how quickly teams recover from incidents.

The metric can also be misleading if failure definitions are too broad or too narrow. For example, some teams count toggled-off features as success, while others count every production hotfix as failure, even when harmless.

AI and agentic AI limitations to watch for:

  • Silent failures can be undercounted. AI-related failures (e.g., hallucinations, degraded answer quality, subtle policy misses) may not trigger incidents or rollbacks even when user impact is real, so CFR can look “good” while trust erodes.
  • Attribution is harder. If an agent takes actions across multiple systems, a downstream incident may be difficult to tie back to a single “deployment” unless you have strong tracing and change provenance.
  • The incentive risk increases with automation. If CFR becomes a target, teams (or agents) may avoid shipping meaningful changes, overuse flags without follow-through, or redefine failures to look better.

To address these limitations, use CFR alongside the following metrics:

Complementary Metric Why It’s Relevant
Mean Time to Restore (MTTR) Captures how quickly teams recover from deployment failures once they occur.
Deployment Frequency Contextualizes how often deployments happen, which affects how CFR trends are interpreted.
Lead Time for Changes Surfaces process delays that may limit validation time or increase delivery pressure.

How can teams improve Change Failure Rate?

Improving Change Failure Rate involves reducing the frequency of failure-triggering changes and minimizing the scope of their impact when they do occur.

  • Deploy smaller, safer changes. Reducing batch size lowers the likelihood of shipping defects and makes troubleshooting easier. Trunk-Based Development helps limit the blast radius of each deployment.

  • Catch problems earlier in the lifecycle. Strengthen pre-release validation with Test-Driven Development, integration testing, and staging parity. These reduce the number of defects that escape into production.

  • Enforce better review standards. Adopting consistent Code Review Best Practices ensures peer input before changes go live and improves the quality of merged code.

  • Use Feature Flags to reduce exposure. Flags enable teams to disable problematic functionality without rolling back the entire deployment. This limits failure impact and may prevent an issue from counting as a failed deployment.

  • Clarify what counts as a failure. Misalignment on what constitutes a “failed deployment” undermines tracking. Ensure failures are tagged consistently in incidents, deployment logs, or release notes.

AI and agentic AI improvements (make CFR resilient to autonomy):

  • Add AI-specific quality gates where behavior is probabilistic. Maintain regression suites for prompts, retrieval, and model versions (golden sets, scenario tests, policy tests), and run them on every relevant change.
  • Roll out AI behavior progressively. Use canaries, limited cohorts, and gradual enablement for model/prompt/agent changes—especially when an agent can take irreversible actions.
  • Instrument and log provenance. Track which model/prompt/config version generated an output or action so remediation can be fast and attribution is credible.
  • Design fast “kill switches.” For agentic features, make it easy to disable high-risk capabilities (e.g., write actions, auto-merge, auto-deploy) without a full rollback.

Reducing CFR is not about achieving perfection. It’s about designing delivery systems that prevent failure when possible, and allowing teams to react quickly, learn, and improve when things go wrong.