Engineering

Speed and Stability Are Not Opposites

The State of DevOps research is clear: optimizing for fast feedback improves stability too. A tour of the four DORA metrics and what they really measure.

October 7, 2024 3 min read

We all hear customers complain when we take longer than expected to ship the perfect release — polishing and checking every edge case. And we all know the fear of breaking production when we ship fast. This is a continuation of my previous post, Measure What Truly Matters. If you haven't read it, I highly recommend it first.

In software engineering it's common to believe that speed and stability conflict. The main reason, I think, is that when the delivery process involves more manual work to ship more releases, it needs more time — or fewer manual checks. But the State of DevOps report suggests otherwise: speed and stability are not opposite sides of the same coin. When a delivery process is optimized for fast feedback cycles with the right engineering, its stability improves too. 🤯 This isn't an opinion — the State of DevOps report is empirical research measuring the state of the software industry, and a great place to get an objective view of software development around the world.

Accelerate suggests we measure delivery performance with four key metrics:

Delivery Lead Time
Deployment Frequency
Time to Restore Service
Change Fail Rate

These four metrics capture two key aspects of delivery — speed and stability — where lead time and deployment frequency measure speed, and time to restore service and change fail rate measure stability.

1. Lead Time

Lead time is the time it takes to go from a customer making a request to that request being satisfied — especially the time it takes for work to be implemented, tested, and delivered. Less lead time is better, because it enables faster feedback cycles.

2. Deployment Frequency

As the name suggests, deployment frequency measures how often changes are deployed to production. A single release may bundle multiple commits. Higher deployment frequency is better: deploying frequently reduces cycle times and variability in flow, accelerates feedback, reduces risk and overhead, improves efficiency, increases motivation and urgency, and reduces cost and schedule growth.

3. Time to Restore Service

This is the time it takes to recover a service after a failure. When teams build rapidly in a complex, changing environment, failure is inevitable — so one of the best shields against failure is how fast a system can recover. The lower the time to restore service, the better.

4. Change Fail Rate

This measures the percentage of releases to production that fail — those that caused a hotfix, service outage, rollback, or patch. It tells you whether efficiency improvements to the delivery process came at the expense of the system's stability.

All of these metrics are contextual — their meaning differs between teams based on industry, team size, and the nature of the project. So they don't serve as a comparison in absolute terms, but in relative terms: Low, Medium, and High performers. We can use them to measure how things evolve over time within a team — for example, the result of a particular change to the delivery process.

Measuring is hard and takes effort. These metrics work best as a tool to improve the delivery process and culture. In organizations with learning cultures, they're powerful drivers of positive results. In organizations driven by fear, you'll just get the wrong numbers.

#DORA#DevOps#Metrics#CI/CD#Software Delivery

Speed and Stability Are Not Opposites

1. Lead Time

2. Deployment Frequency

3. Time to Restore Service

4. Change Fail Rate

Related reading

Agility Beyond Stand-ups: An Engineering Approach to Results-Driven Agile

Measure What Truly Matters (With Heart Intact)