The 5 Engineering Metrics Every CTOs Should Be Looking at

Written by BetterEngineer | May 18, 2026 6:19:45 PM

Most engineering leaders are measuring how busy their teams are. A much smaller number are measuring whether anything is actually shipping. These are the five metrics that tell you the truth.

In a recent piece on this blog, we broke down why AI-driven throughput gains aren't translating into faster releases and why the bottleneck was never the engineers. If you haven't read it, the short version is this: your team is producing more code than ever, and most of it is getting stuck somewhere between commit and production.

That piece raised a follow-up question worth answering directly: if velocity, story points, and lines of code are the wrong things to measure (and they are), then what should a CTO actually be looking at?

Here are the five metrics that give you an honest picture of your engineering organization's health. Not how hard your team is working or how many tickets they closed. Whether software is actually reaching users and what happens when it doesn't.

In the AI Era, the Metrics You Trust Are Measuring the Wrong Thing

Velocity measures how much work a team completes in a sprint. Story points quantify estimated effort. Lines of code count output. All three have one thing in common: they measure activity inside the development cycle, and say almost nothing about what happens after code leaves an engineer's hands.

In an AI-augmented team, this problem compounds fast. When AI tools can increase individual throughput by 59% — as CircleCI's 2026 State of Software Delivery report found — activity metrics inflate without any corresponding improvement in delivery. Your dashboard looks healthier than ever. Your release cycle doesn't move. The metrics aren't broken. They were just never designed to measure what you actually care about.

The five metrics below are. They come from the DORA (DevOps Research and Assessment) framework, the most rigorous body of research on engineering performance that exists, and they have been validated across thousands of engineering organizations globally. They measure outcomes, not output. Delivery, not activity.

The five metrics below are. Some come from the DORA framework, validated across thousands of engineering organizations globally. Others come from the questions every engineering team should be asking about itself and rarely does. Together, they measure outcomes, not output — and the organizational health that makes delivery possible in the first place.

1. Time to Useful Delivery

Not velocity. Not story points. Not the number of tickets closed in a sprint. The real question is how long it takes for a meaningful change to move from idea to production value and whether anyone in your organization actually knows the answer.

Time to useful delivery measures the full path: from the moment a request enters the system through clarification, design, development, review, QA, deployment, and actual use by a real user. It is the most complete picture of how your delivery system performs end to end.

What good looks like:

Days, not weeks, for high-performing teams. If you're measuring in months, the bottleneck is organizational.

What makes this metric powerful is what it exposes. A long time to useful delivery is rarely caused by slow engineers. It is caused by unclear requirements that force repeated clarification cycles, approval bottlenecks that nobody has bothered to redesign, overloaded senior engineers who are the only people who can unblock certain decisions, fragile QA environments that delay testing, or deployment processes so risky that teams avoid releasing until they absolutely have to.

What to ask your team:

Pick the last five features that shipped. How long did each one take from first request to production use? Where did the time go? If you can't answer that question in under ten minutes, that's the first thing to fix.

2. Change Failure Rate

Change failure rate measures how often releases create defects, regressions, outages, rollbacks, support escalations, or emergency fixes. It is one of the healthiest counterbalances to the narrative that a team is performing well because it is shipping fast.

What good looks like:

0–15% for elite and high performers. Anything above 30% signals a systemic quality problem that speed is making worse.

Shipping fast while breaking customer trust is not progress. It is debt being converted into visible pain, and it tends to accumulate quietly until it becomes impossible to ignore. Teams with a high change failure rate often have strong velocity numbers, which is exactly why this metric exists: to tell you what velocity can't.

In the age of AI-generated code, this metric deserves particular attention. AI tools write fast, plausible code, and fast, plausible code can pass a test suite and still behave unexpectedly in production. Teams that accelerate code generation without proportionally investing in review, testing, and QA tend to see their change failure rate climb silently while their velocity numbers look great.

What to ask your team:

Of everything deployed last quarter, what percentage required a follow-up fix within 48 hours? If you don't have that number, that's the first problem to solve, because you're flying blind on quality while optimizing hard for speed.

03. Mean Time to Recovery

Mean time to recovery measures how long it takes your team to understand an issue, respond to it, and restore service after a failure in production. It is the most honest measure of your engineering organization's resilience, not whether things go wrong, but how fast you can make them right when they do.

What good looks like:

Less than one hour for elite performers. Less than one day for high performers. Anything measured in days is a signal that something structural needs attention.

Mature engineering organizations don't assume nothing will break. They assume failure will happen and build for recovery. The question this metric answers is whether your team can respond without panic, heroics, or tribal knowledge locked inside one exhausted senior engineer's head.

A long mean time to recovery usually signals one of three things:

Poor observability: you can't see what broke
Unclear ownership: nobody knows whose job it is to fix it
Insufficient on-call infrastructure: the people who can fix it aren't available when things break. All three are fixable. None of them are fixed by hiring more engineers.

What to ask your team:

When was the last production incident? How long did it take to resolve? Who drove the recovery? If those answers take more than five minutes to find, your observability and incident response processes need attention before your next hire does.

04. Engineering Capacity Allocation

Where is engineering time actually going? Most CTOs can answer this question in broad strokes. Very few can answer it with enough precision to make good decisions from it and that gap is expensive.

Engineering capacity allocation measures the rough breakdown of how engineering time is distributed across new product work, maintenance, support, technical debt, security, infrastructure, incident response, and rework. It is one of the most important executive-level metrics because it tells you whether your organization is actually investing in the future or simply burning capacity to keep the lights on.

What good looks like:

No universal benchmark, but if less than 50% of engineering time is going toward new product work, the conversation about why needs to happen now.

A team can look fully utilized while having almost no real capacity for innovation. Engineers are busy, but they're busy fixing, maintaining, responding, and unblocking, not building. This is the scenario where leadership asks why AI adoption or modernization isn't moving faster, and the engineering team is too exhausted to explain that there's simply no room.

What to ask your team:

If you had to estimate how last month's engineering hours were actually spent across those categories, what would the breakdown look like? If that question takes a week to answer, that's the answer, and it's telling you that capacity visibility is the thing to fix first.

05. Rework and Defect Escape Rate

How much engineering work has to be done twice? This is where a significant amount of organizational waste hides.

Rework and defect escape rate captures how often work needs to be redone because it was misunderstood, poorly scoped, rushed, poorly tested, or built on assumptions that turned out to be wrong. It includes bugs that reach production, but it also includes product rework, repeated clarification cycles, redesigns, duplicated efforts, and the expensive category of having built the wrong thing entirely.

What good looks like:

Rework consuming less than 10–15% of engineering capacity. Above 25% is a signal that something upstream — in discovery, scoping, or alignment — is consistently broken.

What makes this metric particularly valuable is what it connects. High rework rates are rarely a pure engineering problem. They are usually a signal of weak product clarity, misaligned stakeholders, immature QA, or an underdeveloped discovery process. They don't let the organization pretend that every problem begins and ends with developers — because the data shows exactly where in the system work is being generated that shouldn't exist.

Every hour spent on rework is an hour that wasn't spent on something new. It is the most invisible form of engineering waste and often the most fixable.

What to ask your team:

Of the bugs that reached production last quarter, how many could have been caught earlier and at what stage? Of the features that shipped, how many required significant rework after the fact? The answers will tell you where in the process to intervene, and it's almost never where you expect.

How to Start Using These Metrics

You don't need a new platform or a six-month implementation to start tracking these. Most of the data already exists in your project management tool, your version control system, your incident log, and your engineering team's own sense of where time goes. The work is in surfacing it, agreeing on definitions, and reviewing it consistently.

A few practical starting points:

Map the last five features from request to production and time each stage.
Pull your change failure rate from the last quarter's deployment log.
Run a capacity allocation conversation with your engineering leads and see how close their estimates are to each other.
Check your incident history for mean time to recovery on the last five failures.
Ask your team directly where rework is showing up most often.

You'll have a working baseline within a week, and a much clearer picture of where your delivery system actually breaks down. From there, set targets, review the metrics in your engineering leadership meetings, and treat regressions the same way you'd treat a drop in revenue: as a signal that something in the system needs attention, not as a reason to push the team harder.

The Metrics Are the Map. The Territory Is Still Yours to Fix.

Measuring the right things doesn't automatically fix anything. But it does something just as important: it tells you where to look. And in an era where AI is inflating output metrics across the board, knowing where to look is the competitive advantage most engineering leaders don't yet have.

Your velocity dashboard will keep telling you everything is fine. These five metrics will tell you the truth and give you something to actually act on.

Want engineers who ship, not just engineers who code? At BetterEngineer, we help CTOs and engineering leaders build teams that close the gap between throughput and delivery by placing engineers who understand the full pipeline, from commit to production. CTA: Talk to us

View full post