pith. sign in

arxiv: 2606.27045 · v1 · pith:UZLIEIUBnew · submitted 2026-06-25 · 💻 cs.SE · cs.AI

The Spec Growth Engine: Spec-Anchored, Code-Coupled, Drift-Enforced Architecture for AI-Assisted Software Development

Pith reviewed 2026-06-26 03:37 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords spec graphcontext explosionspec-code driftAI coding agentsdrift gatevertical slicesoftware architecturecontext scoping
0
0 comments X

The pith

A spec graph with contract-design separation, ownership-path context scoping, hardest-first vertical slices, and a blocking drift gate prevents context explosion and silent spec-code drift for AI coding agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that AI coding agents suffer from context explosion when forced to reason over whole repositories and from silent spec-code drift when implementations diverge from specifications without detection. It proposes the Spec Growth Engine as a lightweight countermeasure built from a machine-readable spec graph whose nodes separate contracts from designs, a Spine context assembler that limits agent reasoning to an ownership path, a vertical-slice growth protocol that forces hardest-first ordering, and a drift gate that treats any spec-code mismatch as a merge blocker. The approach deliberately recombines familiar practices such as information hiding and fitness functions into a single code-coupled, machine-enforced system rather than layering on heavy process frameworks. If the claim holds, AI-assisted development could scale to larger codebases while keeping specifications and implementations visibly aligned throughout the process.

Core claim

The Spec Growth Engine is a lightweight framework that addresses context explosion and silent spec-code drift through a machine-readable spec graph whose nodes carry explicit contract/design separation, a Spine context assembler that scopes agent context to an ownership path, a vertical-slice growth protocol that enforces hardest-first ordering, and a drift gate that makes spec-code divergence a blocking merge condition; the design synthesises established principles such as Parnas information hiding, C4, ADRs, Walking Skeleton, Reflexion Models, and Fitness Functions into a lean, code-coupled, machine-enforced whole without the overhead of heavyweight frameworks.

What carries the argument

The spec graph whose nodes separate contracts from designs, together with the Spine context assembler, the vertical-slice growth protocol, and the drift gate that blocks merges on divergence.

If this is right

  • AI agents can work across growing repositories without output quality dropping from full-context overload.
  • Specifications remain visibly coupled to code because divergence is treated as a blocking condition at merge time.
  • Development follows a hardest-first vertical-slice order that keeps structural decisions visible early.
  • The same machinery can be applied without adopting heavy process frameworks such as RUP or MDA.
  • Context supplied to agents is restricted to an ownership path, limiting the information each agent must process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The drift-gate idea could be ported to conventional non-AI workflows to catch divergence earlier in review cycles.
  • The vertical-slice ordering rule might improve project predictability even when no AI agent is present.
  • Automated extraction of the initial spec graph from an existing codebase would be a natural next implementation step.
  • Regulated domains that require traceable alignment between requirements and code could adopt the drift gate as an audit point.

Load-bearing premise

Combining the spec graph, Spine assembler, vertical-slice protocol, and drift gate will remove context explosion and silent drift in real projects without creating new failure modes or unacceptable overhead.

What would settle it

Run the framework on a multi-module application with an AI agent, then check whether context-window usage stays low as the repository grows and whether any spec-code divergence reaches a merge without being rejected by the drift gate.

Figures

Figures reproduced from arXiv: 2606.27045 by Hartwig Grabowski.

Figure 1
Figure 1. Figure 1: Context scoping via the Spine. Without the Spine (a), an agent working on [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The silent drift cycle. A passing test suite does not guarantee that the specification and [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The two-layer growth rule. Layer 1 invariants are specified up front. Layer 2 features grow [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: High-level architecture of the Spec Growth Engine. The Spec Graph feeds the Context [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The drift validation gate. The engine derives the Intent Graph from SPEC.md files and [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The spec graph growing with working software. Layer-1 invariants (coral) are seeded [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
read the original abstract

AI coding agents dramatically accelerate implementation speed but introduce two structural failure modes that existing spec-driven approaches do not fully solve: (1) context explosion -- the agent must reason over an entire repository at once, degrading output quality as the context window fills; and (2) silent spec-code drift -- code evolves, the specification does not, and the divergence becomes invisible until it is costly to repair. We present the Spec Growth Engine, a lightweight framework that addresses both failure modes through a machine-readable spec graph whose nodes carry explicit contract/design separation, a Spine context assembler that scopes agent context to an ownership path, a vertical-slice growth protocol that enforces hardest-first ordering, and a drift gate that makes spec-code divergence a blocking merge condition. The design synthesises well-established software engineering principles (Parnas information hiding, C4, ADRs, Walking Skeleton, Reflexion Models, Fitness Functions) into a lean, code-coupled, machine-enforced whole -- without the overhead of heavy-weight frameworks such as RUP or MDA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper claims that the Spec Growth Engine framework—consisting of a machine-readable spec graph with explicit contract/design separation, a Spine context assembler that scopes agent context to an ownership path, a vertical-slice growth protocol enforcing hardest-first ordering, and a drift gate that blocks merges on spec-code divergence—addresses context explosion and silent spec-code drift in AI-assisted development. It does so by synthesizing established principles (Parnas information hiding, C4, ADRs, Walking Skeleton, Reflexion Models, Fitness Functions) into a lightweight, code-coupled, machine-enforced system without the overhead of heavy frameworks like RUP or MDA.

Significance. If the proposed synthesis of these components can be shown to deliver the claimed scoping and enforcement effects, the work could provide a practical, enforceable architecture for AI-assisted software development that maintains spec-code alignment and manages context without introducing unacceptable overhead or new failure modes.

major comments (1)
  1. [Abstract] Abstract (second paragraph): the central claim that the spec graph, Spine assembler, vertical-slice protocol, and drift gate 'address both failure modes' rests entirely on an untested synthesis assumption; the manuscript supplies no reasoning, interaction analysis, worked example, or validation demonstrating why this specific combination produces the claimed reductions in context explosion and silent drift without new overhead or failure modes.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and for highlighting the need for stronger justification of the framework's claims. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (second paragraph): the central claim that the spec graph, Spine assembler, vertical-slice protocol, and drift gate 'address both failure modes' rests entirely on an untested synthesis assumption; the manuscript supplies no reasoning, interaction analysis, worked example, or validation demonstrating why this specific combination produces the claimed reductions in context explosion and silent drift without new overhead or failure modes.

    Authors: We agree that the current manuscript, being a conceptual design paper, does not include empirical validation or a detailed worked example demonstrating the interactions. The claims are grounded in the synthesis of established principles (Parnas information hiding for scoping, C4 and ADRs for structure, Walking Skeleton and vertical slices for growth, Reflexion Models and Fitness Functions for drift enforcement). However, we acknowledge the absence of explicit reasoning on their combined effects. In the revised manuscript, we will add a new section providing a step-by-step worked example of applying the framework to a sample project, including analysis of how the components interact to mitigate context explosion (via Spine scoping) and silent drift (via drift gate), and discuss why this synthesis does not introduce unacceptable overhead based on the lightweight nature of the components. This will strengthen the justification for the central claim. revision: yes

Circularity Check

0 steps flagged

No circularity; purely conceptual synthesis of external principles

full rationale

The manuscript proposes a framework by combining named external principles (Parnas information hiding, C4, ADRs, Walking Skeleton, Reflexion Models, Fitness Functions) into a spec graph, Spine assembler, vertical-slice protocol, and drift gate. No equations, fitted parameters, derivations, or predictions exist. No self-citations appear as load-bearing justification for the central claim; the synthesis is presented as an untested design assumption rather than a result forced by prior author work or definitional loops. The paper is self-contained as a proposal and does not reduce any claimed outcome to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 4 invented entities

The central claim rests on the premise that the four named components will solve the two failure modes. No free parameters are introduced. Four new components are postulated without independent evidence. One domain assumption is stated explicitly.

axioms (1)
  • domain assumption AI coding agents suffer from context explosion and silent spec-code drift as primary structural failure modes that existing spec-driven approaches do not fully solve.
    This premise is stated in the first sentence of the abstract and motivates the entire framework.
invented entities (4)
  • Spec graph with explicit contract/design separation no independent evidence
    purpose: To structure specifications in a machine-readable form that supports drift detection
    Introduced as a core node type in the framework; no independent evidence supplied.
  • Spine context assembler no independent evidence
    purpose: To scope agent context to an ownership path and avoid context explosion
    New assembler component proposed; no independent evidence supplied.
  • Vertical-slice growth protocol no independent evidence
    purpose: To enforce hardest-first ordering during development
    New protocol proposed; no independent evidence supplied.
  • Drift gate no independent evidence
    purpose: To make spec-code divergence a blocking merge condition
    New enforcement mechanism proposed; no independent evidence supplied.

pith-pipeline@v0.9.1-grok · 5711 in / 1598 out tokens · 66346 ms · 2026-06-26T03:37:29.193059+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 7 canonical work pages

  1. [1]

    Aws kiro: Spec-driven ai ide, 2025.https://kiro.dev

    Amazon Web Services. Aws kiro: Spec-driven ai ide, 2025.https://kiro.dev

  2. [2]

    Exploring gen ai: The tools of spec-driven development

    Birgitta Böckeler. Exploring gen ai: The tools of spec-driven development. martinfowler.com, 2025.https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html

  3. [3]

    Barry W. Boehm. A spiral model of software development and enhancement.ACM SIGSOFT Software Engineering Notes, 11(4):14–24, 1986. doi: 10.1145/12944.12948. 13

  4. [4]

    2: Visualise, Document and Explore Your Software Architecture

    Simon Brown.Software Architecture for Developers, Vol. 2: Visualise, Document and Explore Your Software Architecture. Leanpub, 2018.https://c4model.com

  5. [5]

    Context rot: How increasing input tokens impacts llm performance, 2025

    Chroma Research. Context rot: How increasing input tokens impacts llm performance, 2025. https://www.trychroma.com/research/context-rot

  6. [6]

    Crystal clear: A human-powered methodology for small teams, 2001

    Alistair Cockburn. Crystal clear: A human-powered methodology for small teams, 2001. Walking Skeleton pattern

  7. [7]

    Robert G. Cooper. Stage-gate systems: A new tool for managing new products.Business Horizons, 33(3):44–54, 1990. doi: 10.1016/0007-6813(90)90040-I

  8. [8]

    Addison- Wesley, 2003

    Eric Evans.Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison- Wesley, 2003. ISBN 978-0321125217

  9. [9]

    O’Reilly Media, 2022

    Neal Ford, Rebecca Parsons, Patrick Kua, and Pramod Sadalage.Building Evolutionary Architectures, 2nd Edition. O’Reilly Media, 2022. ISBN 978-1492097532

  10. [10]

    IT Revolution Press, 2018

    Nicole Forsgren, Jez Humble, and Gene Kim.Accelerate: The Science of Lean Software and DevOps. IT Revolution Press, 2018. ISBN 978-1942788331

  11. [11]

    Addison-Wesley, 1999

    Martin Fowler.Refactoring: Improving the Design of Existing Code. Addison-Wesley, 1999. ISBN 978-0201485677

  12. [12]

    Addison- Wesley, 2009

    Steve Freeman and Nat Pryce.Growing Object-Oriented Software, Guided by Tests. Addison- Wesley, 2009. ISBN 978-0321503626

  13. [13]

    Spec kit: Toolkit for spec-driven development, 2025.https://github.com/github/ spec-kit

    GitHub. Spec kit: Toolkit for spec-driven development, 2025.https://github.com/github/ spec-kit

  14. [14]

    Dumb Zone

    Dexter Horthy. No vibes allowed: Engineering with coding agents. Talk, AI Engineer, 2025. Popularises the “Dumb Zone” heuristic for coding agents.https://www.youtube.com/watch? v=rmvDxxNubIg

  15. [15]

    Addison-Wesley, 2010

    Jez Humble and David Farley.Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Addison-Wesley, 2010. ISBN 978-0321601919

  16. [16]

    Addison-Wesley, 1999

    Andrew Hunt and David Thomas.The Pragmatic Programmer. Addison-Wesley, 1999. ISBN 978-0201616224

  17. [17]

    Guide to the software engineering body of knowledge (swe- bok), version 4.0, 2024

    IEEE Computer Society. Guide to the software engineering body of knowledge (swe- bok), version 4.0, 2024. https://www.computer.org/education/bodies-of-knowledge/ software-engineering

  18. [18]

    O’Reilly Media, 2021

    Vlad Khononov.Learning Domain-Driven Design. O’Reilly Media, 2021. ISBN 978-1098100131

  19. [19]

    The 4+1 view model of architecture.IEEE Software, 12(6):42–50, 1995

    Philippe Kruchten. The 4+1 view model of architecture.IEEE Software, 12(6):42–50, 1995. doi: 10.1109/52.469759

  20. [20]

    and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy

    Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024. doi: 10.1162/tacl_a_00638. 14

  21. [21]

    Martin.Agile Software Development: Principles, Patterns, and Practices

    Robert C. Martin.Agile Software Development: Principles, Patterns, and Practices. Prentice Hall, 2002. ISBN 978-0135974445

  22. [22]

    In: Proceedings of the 3rd ACM SIGSOFT Symposium on Foundations of Software Engineering, pp

    Gail C. Murphy, David Notkin, and Kevin Sullivan. Software reflexion models: Bridging the gap between design and implementation. InProceedings of the 3rd ACM SIGSOFT Symposium on Foundations of Software Engineering, pages 18–28, 1995. doi: 10.1145/222124.222136

  23. [23]

    Michael T. Nygard. Documenting architecture decisions, 2011.https://cognitect.com/blog/ 2011/11/15/documenting-architecture-decisions

  24. [24]

    David L. Parnas. On the criteria to be used in decomposing systems into modules.Communi- cations of the ACM, 15(12):1053–1058, 1972. doi: 10.1145/361598.361623

  25. [25]

    Parnas et al

    David L. Parnas et al. The modular structure of complex systems. Technical report, Naval Research Laboratory, 1979. A-7E project module guide

  26. [26]

    Perry and Alexander L

    Dewayne E. Perry and Alexander L. Wolf. Foundations for the study of software architecture. ACM SIGSOFT Software Engineering Notes, 17(4):40–52, 1992. doi: 10.1145/141874.141884

  27. [27]

    LongCodeBench: Evaluating coding LLMs at 1m context windows.arXiv preprint arXiv:2505.07897, 2025.https://arxiv.org/abs/2505.07897

    Stefano Rando et al. LongCodeBench: Evaluating coding LLMs at 1m context windows.arXiv preprint arXiv:2505.07897, 2025.https://arxiv.org/abs/2505.07897

  28. [28]

    The scrum guide, 2020.https://scrumguides.org

    Ken Schwaber and Jeff Sutherland. The scrum guide, 2020.https://scrumguides.org

  29. [29]

    Tessl framework, 2025

    Tessl. Tessl framework, 2025. Private beta; see [2]. 15