The Real Numbers Behind Feature Delivery

There is a number that should bother every CTO who has ever presented a velocity report to a board: 70–90%.

That is the percentage of a feature’s total delivery time spent waiting. Not being built. Not being tested. Not being deployed. Waiting. Waiting for a handoff, waiting for a review, waiting for clarification on a requirement that was ambiguous from the start.

Donald Reinertsen documented this in Principles of Product Development Flow back in 2009. It was true then. It is, remarkably, still true now—even in organizations that have adopted CI/CD, trunk-based development, and all the other practices the industry has spent fifteen years evangelizing.

The question worth asking is: why?

70–90% of delivery time is spent waiting

We got very good at measuring the wrong end of the pipeline

DORA metrics—Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Mean Time to Recovery—have been transformative for the industry. The research from Forsgren, Humble, and Kim gave engineering leaders a shared vocabulary and a defensible framework for measuring delivery performance. Elite teams deploy on demand, with lead times under a day and change failure rates below 5%.

But here is the problem: DORA metrics measure what happens after someone decides to build something. They are lagging indicators of delivery health. They tell you how fast code moves through a pipeline. They do not tell you whether the right code entered the pipeline in the first place.

And that distinction matters enormously, because the most expensive failure mode in software delivery is not slow deployment. It is rework.

The rework tax nobody quantifies

Capers Jones, in Applied Software Measurement, put the number at 30–50% of total engineering effort. That is not a typo. Across industries and team sizes, roughly a third to half of all engineering work goes to fixing, revising, or rebuilding things that were built wrong the first time.

Let that settle for a moment. If you run a 40-person engineering team with a fully loaded cost of $200K per engineer, you are spending somewhere between $2.4M and $4M per year on rework. That is not a line item in your budget. It does not show up in your sprint reports. It is buried in the gap between what was specified and what was understood.

McKinsey’s Developer Velocity Index (2020) found that top-quartile engineering organizations deliver features 4x faster than bottom-quartile organizations. The instinct is to attribute that gap to better tools, better talent, or better processes. But the data suggests something less flattering: top-quartile teams waste less. They do not build faster. They rebuild less.

The anatomy of a delivery cycle most teams never examine

A typical feature delivery in a mid-market SaaS company touches 7–15 handoffs between ideation and production. Product writes a brief. Design interprets it. Engineering asks questions. Design revises. Engineering builds. QA finds edge cases that were never specified. Product clarifies intent. Engineering reworks.

Each handoff introduces latency, and each latency introduces context loss, and each context loss introduces rework risk. Reinertsen’s 70–90% wait time is not a scheduling failure. It is a structural one. The process is designed to generate handoffs, and handoffs are designed to generate misunderstanding.

In a traditional workflow, the idea-to-production timeline runs 4–16 weeks. The actual construction time—hands on keyboards writing code—is a small fraction of that. The rest is coordination overhead: meetings to align on requirements, reviews to catch misunderstandings, rework to fix what was misunderstood anyway.

This is not an argument against process. It is an argument against the specific process most teams run, where ambiguity in the upstream specification compounds through every downstream step.

The leading indicator nobody tracks

Here is what I have learned from building agentic development workflows and running AI readiness assessments across PE-backed SaaS companies: the single highest-leverage metric most teams do not track is spec quality.

Not story point velocity. Not deployment frequency. Not sprint burndown. Spec quality.

A spec is the upstream artifact that determines everything downstream. When a spec is precise—when it defines the behavior, the edge cases, the acceptance criteria, and the constraints before code starts—the entire delivery pipeline accelerates. Not because engineers type faster, but because they do not have to stop, ask, wait, and redo.

When a spec is vague—“as a user, I want to see my dashboard”—it triggers the full cascade: interpretation differences, design-engineering misalignment, QA-discovered gaps, product-initiated rework. Every ambiguity in a spec becomes a handoff, and every handoff becomes a wait state, and every wait state becomes part of that 70–90% Reinertsen documented.

The reason nobody tracks spec quality is that it is harder to measure than deployment frequency. You cannot pull it from a CI/CD dashboard. But the organizations I work with that have started measuring it—tracking rework rates by spec completeness, tracking clarification requests per feature, tracking the ratio of “build” time to “wait + rework” time—have found it to be the single most predictive metric of delivery outcome.

Rework is not one problem. It is four.

One of the things that makes rework so persistent is that teams treat it as a single category. “We had to rework that feature” could mean any of four very different things:

Spec-driven rework: The specification was ambiguous or incomplete, and the team built something that matched their interpretation but not the actual intent.
Integration-driven rework: The feature worked in isolation but failed when integrated with the broader system—an architecture or dependency problem.
Quality-driven rework: The implementation had bugs, performance issues, or security vulnerabilities that were caught in review or testing.
Scope-driven rework: Requirements changed after implementation started, invalidating work that was correct at the time it was done.

These four categories have completely different root causes and completely different solutions. Lumping them together as “rework” makes it impossible to reduce any of them systematically.

In the teams I have worked with, spec-driven rework is consistently the largest category—often 40–60% of total rework. It is also the most preventable. You cannot eliminate integration surprises entirely. You cannot stop requirements from evolving. But you can write a better spec. And the return on that investment is disproportionate, because spec quality compounds: a clear spec reduces clarification handoffs, which reduces wait states, which reduces context loss, which reduces quality rework.

What this means for mid-market SaaS

If you are a CTO at a PE-backed SaaS company in the $25M–$500M ARR range, this data should reframe how you think about engineering investment.

Most board conversations about engineering productivity focus on output metrics: features shipped, velocity trends, deployment frequency. These are not wrong to track. But they are insufficient, because they cannot explain why your team burns 30–50% of its capacity on rework, or why features that should take two weeks take eight.

The uncomfortable answer is almost always upstream. It is in the quality of what your team is asked to build, not in how fast they can build it.

This is also why AI-assisted development tools—copilots, code generation, agentic coding agents—deliver inconsistent results across organizations. These tools accelerate construction. They do not fix specification. If you feed an AI coding agent a vague spec, you get vague code faster. The rework still happens. It just starts sooner.

The organizations that get disproportionate value from AI in their engineering workflows are the ones that invest in spec quality first. They use AI to accelerate execution against a clear target, not to generate code against an ambiguous one. The spec is what makes AI useful, not the other way around.

The metric to start tracking tomorrow

If I could convince every engineering leader to add one metric to their delivery dashboard, it would be this: rework rate by root cause category.

Tag every rework instance—every bug fix that traces back to a misunderstood requirement, every feature revision that was actually a spec gap, every “we built the wrong thing” conversation—with one of the four categories above. Run it for a quarter. The distribution will tell you more about your delivery health than any velocity chart.

For most teams, the result is humbling. The majority of rework traces back to the spec. And the majority of delivery time traces back to wait states caused by spec ambiguity. Reinertsen’s 70–90% and Capers Jones’s 30–50% are not independent findings. They are two measurements of the same upstream failure.

The good news is that this is fixable. Not with better tooling. Not with more process. With better specs—and with the discipline to measure whether those specs are actually getting better over time.

The teams that figure this out will not just deliver faster. They will deliver predictably. And in a PE context, where predictability is what drives valuation confidence, that distinction is worth more than any deployment frequency number you can put on a slide.

Where does your organization stand?

The free AI readiness assessment benchmarks you across six dimensions — including specification maturity and delivery health.

Take the Free Assessment →

For the full framework, see the Agentic Transformation Best Practices guide. Related: The Uncomfortable Question DORA Can’t Answer.

Reinertsen, Donald G. The Principles of Product Development Flow: Second Generation Lean Product Development. Celeritas Publishing, 2009.
Capers Jones. Applied Software Measurement: Global Analysis of Productivity and Quality. McGraw-Hill, 2008.
DORA. Accelerate State of DevOps Report, 2023.
McKinsey & Company. Developer Velocity: How Software Excellence Fuels Business Performance, 2020.