Most engineering leaders are measuring how busy their teams are. A much smaller number are measuring whether anything is actually shipping. These are the five metrics that tell you the truth.
In a recent piece on this blog, we broke down why AI-driven throughput gains aren't translating into faster releases and why the bottleneck was never the engineers. If you haven't read it, the short version is this: your team is producing more code than ever, and most of it is getting stuck somewhere between commit and production.
That piece raised a follow-up question worth answering directly: if velocity, story points, and lines of code are the wrong things to measure (and they are), then what should a CTO actually be looking at?
Here are the five metrics that give you an honest picture of your engineering organization's health. Not how hard your team is working or how many tickets they closed. Whether software is actually reaching users and what happens when it doesn't.
Velocity measures how much work a team completes in a sprint. Story points quantify estimated effort. Lines of code count output. All three have one thing in common: they measure activity inside the development cycle, and say almost nothing about what happens after code leaves an engineer's hands.
In an AI-augmented team, this problem compounds fast. When AI tools can increase individual throughput by 59% — as CircleCI's 2026 State of Software Delivery report found — activity metrics inflate without any corresponding improvement in delivery. Your dashboard looks healthier than ever. Your release cycle doesn't move. The metrics aren't broken. They were just never designed to measure what you actually care about.
The five metrics below are. They come from the DORA (DevOps Research and Assessment) framework, the most rigorous body of research on engineering performance that exists, and they have been validated across thousands of engineering organizations globally. They measure outcomes, not output. Delivery, not activity.
The five metrics below are. Some come from the DORA framework, validated across thousands of engineering organizations globally. Others come from the questions every engineering team should be asking about itself and rarely does. Together, they measure outcomes, not output — and the organizational health that makes delivery possible in the first place.
Not velocity. Not story points. Not the number of tickets closed in a sprint. The real question is how long it takes for a meaningful change to move from idea to production value and whether anyone in your organization actually knows the answer.
Time to useful delivery measures the full path: from the moment a request enters the system through clarification, design, development, review, QA, deployment, and actual use by a real user. It is the most complete picture of how your delivery system performs end to end.
Days, not weeks, for high-performing teams. If you're measuring in months, the bottleneck is organizational.
What makes this metric powerful is what it exposes. A long time to useful delivery is rarely caused by slow engineers. It is caused by unclear requirements that force repeated clarification cycles, approval bottlenecks that nobody has bothered to redesign, overloaded senior engineers who are the only people who can unblock certain decisions, fragile QA environments that delay testing, or deployment processes so risky that teams avoid releasing until they absolutely have to.
Pick the last five features that shipped. How long did each one take from first request to production use? Where did the time go? If you can't answer that question in under ten minutes, that's the first thing to fix.
Change failure rate measures how often releases create defects, regressions, outages, rollbacks, support escalations, or emergency fixes. It is one of the healthiest counterbalances to the narrative that a team is performing well because it is shipping fast.
0–15% for elite and high performers. Anything above 30% signals a systemic quality problem that speed is making worse.
Shipping fast while breaking customer trust is not progress. It is debt being converted into visible pain, and it tends to accumulate quietly until it becomes impossible to ignore. Teams with a high change failure rate often have strong velocity numbers, which is exactly why this metric exists: to tell you what velocity can't.
In the age of AI-generated code, this metric deserves particular attention. AI tools write fast, plausible code, and fast, plausible code can pass a test suite and still behave unexpectedly in production. Teams that accelerate code generation without proportionally investing in review, testing, and QA tend to see their change failure rate climb silently while their velocity numbers look great.
Of everything deployed last quarter, what percentage required a follow-up fix within 48 hours? If you don't have that number, that's the first problem to solve, because you're flying blind on quality while optimizing hard for speed.
Mean time to recovery measures how long it takes your team to understand an issue, respond to it, and restore service after a failure in production. It is the most honest measure of your engineering organization's resilience, not whether things go wrong, but how fast you can make them right when they do.
Less than one hour for elite performers. Less than one day for high performers. Anything measured in days is a signal that something structural needs attention.
Mature engineering organizations don't assume nothing will break. They assume failure will happen and build for recovery. The question this metric answers is whether your team can respond without panic, heroics, or tribal knowledge locked inside one exhausted senior engineer's head.
A long mean time to recovery usually signals one of three things:
When was the last production incident? How long did it take to resolve? Who drove the recovery? If those answers take more than five minutes to find, your observability and incident response processes need attention before your next hire does.
Where is engineering time actually going? Most CTOs can answer this question in broad strokes. Very few can answer it with enough precision to make good decisions from it and that gap is expensive.
Engineering capacity allocation measures the rough breakdown of how engineering time is distributed across new product work, maintenance, support, technical debt, security, infrastructure, incident response, and rework. It is one of the most important executive-level metrics because it tells you whether your organization is actually investing in the future or simply burning capacity to keep the lights on.
No universal benchmark, but if less than 50% of engineering time is going toward new product work, the conversation about why needs to happen now.
A team can look fully utilized while having almost no real capacity for innovation. Engineers are busy, but they're busy fixing, maintaining, responding, and unblocking, not building. This is the scenario where leadership asks why AI adoption or modernization isn't moving faster, and the engineering team is too exhausted to explain that there's simply no room.
If you had to estimate how last month's engineering hours were actually spent across those categories, what would the breakdown look like? If that question takes a week to answer, that's the answer, and it's telling you that capacity visibility is the thing to fix first.
How much engineering work has to be done twice? This is where a significant amount of organizational waste hides.
Rework and defect escape rate captures how often work needs to be redone because it was misunderstood, poorly scoped, rushed, poorly tested, or built on assumptions that turned out to be wrong. It includes bugs that reach production, but it also includes product rework, repeated clarification cycles, redesigns, duplicated efforts, and the expensive category of having built the wrong thing entirely.
Rework consuming less than 10–15% of engineering capacity. Above 25% is a signal that something upstream — in discovery, scoping, or alignment — is consistently broken.
What makes this metric particularly valuable is what it connects. High rework rates are rarely a pure engineering problem. They are usually a signal of weak product clarity, misaligned stakeholders, immature QA, or an underdeveloped discovery process. They don't let the organization pretend that every problem begins and ends with developers — because the data shows exactly where in the system work is being generated that shouldn't exist.
Every hour spent on rework is an hour that wasn't spent on something new. It is the most invisible form of engineering waste and often the most fixable.
Of the bugs that reached production last quarter, how many could have been caught earlier and at what stage? Of the features that shipped, how many required significant rework after the fact? The answers will tell you where in the process to intervene, and it's almost never where you expect.
You don't need a new platform or a six-month implementation to start tracking these. Most of the data already exists in your project management tool, your version control system, your incident log, and your engineering team's own sense of where time goes. The work is in surfacing it, agreeing on definitions, and reviewing it consistently.
You'll have a working baseline within a week, and a much clearer picture of where your delivery system actually breaks down. From there, set targets, review the metrics in your engineering leadership meetings, and treat regressions the same way you'd treat a drop in revenue: as a signal that something in the system needs attention, not as a reason to push the team harder.
Measuring the right things doesn't automatically fix anything. But it does something just as important: it tells you where to look. And in an era where AI is inflating output metrics across the board, knowing where to look is the competitive advantage most engineering leaders don't yet have.
Your velocity dashboard will keep telling you everything is fine. These five metrics will tell you the truth and give you something to actually act on.
Want engineers who ship, not just engineers who code? At BetterEngineer, we help CTOs and engineering leaders build teams that close the gap between throughput and delivery by placing engineers who understand the full pipeline, from commit to production. CTA: Talk to us