Your team deployed 47 times last week. Lead time is under an hour. Change failure rate is 3%. By every DORA measure, you are elite.
So why did your last three features miss what customers actually needed?
DORA tells you how fast you shipped. It does not tell you whether you should have shipped it at all. This is not a critique of DORA -- it is a recognition that the world DORA was built for is changing. When AI agents handle implementation and deployment, the bottleneck moves upstream. And our metrics have not followed.
DORA Measures the Tail End
DORA's four keys -- deployment frequency, lead time for changes, change failure rate, and mean time to recovery -- all measure what happens after someone decides what to build and how to build it. They were designed for a world where humans wrote every line of code, and shipping was the hard part.
In an agentic world, that equation inverts. Deployment frequency and lead time approach commodity -- agents can ship continuously. When the automated part is no longer the bottleneck, the differentiator moves upstream to what you specified and why.
The Gap Nobody Has Named
Two separate conversations are happening in engineering leadership right now, and nobody has connected them.
The first: DORA metrics are not sufficient. This is argued by DX, Oobeya, RedMonk, and even DORA's own evolving methodology. The DX Core 4 framework supplements DORA with developer experience. SPACE adds satisfaction and well-being. Both are valuable -- and both still measure outputs or experience, not input quality.
The second: spec quality matters more than ever. Thoughtworks identifies spec-driven development as "one of 2025's key new engineering practices." GitHub's Spec Kit has over 71,000 stars. Amazon built an entire IDE around it. An arxiv paper found that human-refined specs reduce LLM-generated code errors by up to 50%.
But Thoughtworks also acknowledges the gap: "there is not yet a systematic way to evaluate specs."
That is the gap. DORA measures the output. Existing supplements measure the experience. Nobody measures the input -- the quality of the specification that determines everything downstream.
Three Proposed Leading Indicators
These are proposed metrics that complement DORA, not replace it. They are emerging patterns based on what we are seeing in practice, not established standards. The industry needs to validate them -- and that starts by measuring.
Spec completeness rate
What percentage of specs ship without mid-flight rework? A low rate means the team is specifying ambiguously -- the agent builds what was asked, not what was needed. This is the CPO's and CTO's scorecard: are we defining problems well enough for agents to solve them?
Rework attribution
When something breaks, trace it back to one of four root causes: a spec gap (the spec did not cover it), an agent failure (the agent misinterpreted a clear spec), a validation miss (the review process did not catch it), or a misunderstood customer need (the spec was technically correct but solved the wrong problem). This changes where you invest -- more spec rigor, better agents, stronger review, or deeper customer research.
Customer problem fidelity
Does the spec trace to a validated customer problem? This is an established product management practice that becomes newly critical when agents execute specs without human judgment. Without this link, you are building the wrong thing faster. Measure problem-to-spec traceability to ensure that what ships is what matters.
We recommend starting with one: rework attribution. It is the easiest to implement -- just ask "why?" after each rework cycle -- and the most immediately revealing.
The Iterative Objection
The strongest counter-argument comes from Birgitta Bockeler, writing on Martin Fowler's site: upfront specification contradicts iterative development. If we over-specify, we lose the ability to learn and adapt.
The concern is valid -- but it misframes the choice. Specs in a spec-driven world are not waterfall requirements documents. They are living contracts that evolve through iteration. The question is not "spec or iterate." It is "iterate on what?" Without a spec, you are iterating on code. With a spec, you are iterating on understanding. One produces better code. The other produces better products.
The alternative -- no spec, just prompt and iterate -- is how 95% of GenAI pilots fail to deliver measurable P&L impact (MIT 2025). We believe spec-first approaches address this because they force the team to articulate what "done" looks like before the first line is generated.
Where does your organization stand?
Find out whether your DORA scores might be hiding upstream problems. The free AI readiness assessment benchmarks you across six dimensions -- including specification maturity.
Take the Free Assessment →For the full framework across all nine practice areas, see the Agentic Transformation Best Practices guide.