Speed and Stability Are Not Opposites
The State of DevOps research is clear: optimizing for fast feedback improves stability too. A tour of the four DORA metrics and what they really measure.
We all hear customers complain when we take longer than expected to ship the perfect release — polishing and checking every edge case. And we all know the fear of breaking production when we ship fast. This is a continuation of my previous post, Measure What Truly Matters. If you haven't read it, I highly recommend it first.
In software engineering it's common to believe that speed and stability conflict. The main reason, I think, is that when the delivery process involves more manual work to ship more releases, it needs more time — or fewer manual checks. But the State of DevOps report suggests otherwise: speed and stability are not opposite sides of the same coin. When a delivery process is optimized for fast feedback cycles with the right engineering, its stability improves too. 🤯 This isn't an opinion — the State of DevOps report is empirical research measuring the state of the software industry, and a great place to get an objective view of software development around the world.
Accelerate suggests we measure delivery performance with four key metrics:
- Delivery Lead Time
- Deployment Frequency
- Time to Restore Service
- Change Fail Rate
These four metrics capture two key aspects of delivery — speed and stability — where lead time and deployment frequency measure speed, and time to restore service and change fail rate measure stability.
1. Lead Time
Lead time is the time it takes to go from a customer making a request to that request being satisfied — especially the time it takes for work to be implemented, tested, and delivered. Less lead time is better, because it enables faster feedback cycles.
2. Deployment Frequency
As the name suggests, deployment frequency measures how often changes are deployed to production. A single release may bundle multiple commits. Higher deployment frequency is better: deploying frequently reduces cycle times and variability in flow, accelerates feedback, reduces risk and overhead, improves efficiency, increases motivation and urgency, and reduces cost and schedule growth.
3. Time to Restore Service
This is the time it takes to recover a service after a failure. When teams build rapidly in a complex, changing environment, failure is inevitable — so one of the best shields against failure is how fast a system can recover. The lower the time to restore service, the better.
4. Change Fail Rate
This measures the percentage of releases to production that fail — those that caused a hotfix, service outage, rollback, or patch. It tells you whether efficiency improvements to the delivery process came at the expense of the system's stability.
All of these metrics are contextual — their meaning differs between teams based on industry, team size, and the nature of the project. So they don't serve as a comparison in absolute terms, but in relative terms: Low, Medium, and High performers. We can use them to measure how things evolve over time within a team — for example, the result of a particular change to the delivery process.
Measuring is hard and takes effort. These metrics work best as a tool to improve the delivery process and culture. In organizations with learning cultures, they're powerful drivers of positive results. In organizations driven by fear, you'll just get the wrong numbers.
Related reading
Agility Beyond Stand-ups: An Engineering Approach to Results-Driven Agile
Real agility is not sprints and stand-ups — it is the engineering infrastructure (CI/CD, TDD, trunk-based development) that makes a fast build-measure-learn loop possible.
Measure What Truly Matters (With Heart Intact)
What gets measured gets done — but measuring the wrong things in software (hours, lines of code, estimates) produces the wrong results. Measure to improve, not to evaluate.