pith. machine review for the scientific record. sign in

arxiv: 2604.03515 · v2 · submitted 2026-04-03 · 💻 cs.SE · cs.AI· cs.ET

Recognition: no theorem link

Inside the Scaffold: A Source-Code Taxonomy of Coding Agent Architectures

Authors on Pith no claims yet

Pith reviewed 2026-05-13 17:57 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.ET
keywords coding agentsLLM scaffoldingagent architecturesource code taxonomyloop primitivesReActcontrol loopssoftware engineering
0
0 comments X

The pith

Coding agents combine five loop primitives such as ReAct and tree search in different combinations rather than using single fixed structures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes the source code of 13 open-source coding agents to build a taxonomy across control architecture, tool interfaces, and resource management. It establishes that five loop primitives function as reusable building blocks that most agents layer together. A sympathetic reader would care because the actual scaffold code determines agent behavior in ways that capability-based surveys miss. The study grounds all claims in specific file paths and line numbers from pinned commits. This reveals both convergence under external constraints and open design questions in areas like context compaction.

Core claim

Source-code examination of 13 coding agent scaffolds shows that five loop primitives—ReAct, generate-test-repair, plan-execute, multi-attempt retry, and tree search—act as composable building blocks. Eleven of the thirteen agents combine multiple primitives rather than relying on one control structure. Architectures range widely in tool counts from zero to 37 and in seven distinct context compaction strategies, while dimensions converge where external constraints such as edit formats and execution isolation dominate.

What carries the argument

A 12-dimension source-code taxonomy divided into control architecture, tool and environment interface, and resource management layers, with all observations tied to concrete file paths and line numbers.

If this is right

  • New agents can be classified by which subset of the five primitives they employ and in what order.
  • Designers can deliberately mix primitives to create hybrids that target specific failure modes.
  • Performance differences between agents can be traced to particular primitive combinations rather than opaque overall architectures.
  • Standardization efforts can focus on the divergent dimensions such as context compaction while leaving constrained areas untouched.
  • Researchers gain a shared reference for comparing agent behavior at the implementation level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The composability finding suggests that future agent improvements may come more from better selection and ordering of existing primitives than from inventing entirely new control loops.
  • Extending the same code-level analysis to closed-source agents could test whether the same five primitives dominate outside open repositories.
  • The taxonomy could support automated search over primitive combinations to optimize agents for particular codebases or bug types.
  • Similar decompositions might apply to agent scaffolds in non-coding domains where loop structures recur.

Load-bearing premise

The 13 chosen open-source agents at their pinned commits represent the broader space of coding agent architectures well enough for the observed patterns to generalize.

What would settle it

Identification of a coding agent whose control loop cannot be expressed as any combination of the five listed primitives.

Figures

Figures reproduced from arXiv: 2604.03515 by Benjamin Rombaut.

Figure 1
Figure 1. Figure 1: Taxonomy overview: 12 dimensions organized into three architectural layers. Each [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗
read the original abstract

LLM-based coding agents can localize bugs, generate patches, and run tests with diminishing human oversight, yet the scaffolding code that surrounds the language model (the control loop, tool definitions, state management, and context strategy) remains poorly understood. Existing surveys classify agents by abstract capabilities (tool use, planning, reflection) that cannot distinguish between architecturally distinct systems, and trajectory studies observe what agents do without examining the scaffold code that determines why. This paper presents a source-code-level architectural taxonomy derived from analysis of 13 open-source coding agent scaffolds at pinned commit hashes. Each agent is characterized across 12 dimensions organized into three layers: control architecture, tool and environment interface, and resource management. The analysis reveals that scaffold architectures resist discrete classification: control strategies range from fixed pipelines to Monte Carlo Tree Search, tool counts range from 0 to 37, and context compaction spans seven distinct strategies. Five loop primitives (ReAct, generate-test-repair, plan-execute, multi-attempt retry, tree search) function as composable building blocks that agents layer in different combinations; 11 of 13 agents compose multiple primitives rather than relying on a single control structure. Dimensions converge where external constraints dominate (tool capability categories, edit formats, execution isolation) and diverge where open design questions remain (context compaction, state management, multi-model routing). All taxonomic claims are grounded in file paths and line numbers, providing a reusable reference for researchers studying agent behavior and practitioners designing new scaffolds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript presents a source-code taxonomy of 13 open-source LLM-based coding agent architectures at pinned commits. Each agent is analyzed across 12 dimensions grouped into three layers (control architecture, tool and environment interface, resource management). The central claim is that five loop primitives—ReAct, generate-test-repair, plan-execute, multi-attempt retry, and tree search—act as composable building blocks, with 11 of 13 agents layering multiple primitives rather than relying on a single control structure. All taxonomic assignments are tied to explicit file paths and line numbers.

Significance. If the taxonomy holds, the work supplies a verifiable, code-grounded reference that distinguishes architecturally distinct scaffolds where prior surveys rely on abstract capabilities. The composability finding and the observed convergence on externally constrained dimensions (tool categories, edit formats) versus divergence on open questions (context compaction, state management) provide a reusable framework for researchers studying agent behavior and for practitioners designing new systems. The explicit pinning of commits and line-level grounding are particular strengths.

major comments (1)
  1. [§3.1] §3.1: The selection criteria for the 13 agents (e.g., GitHub popularity thresholds or activity filters) are described at a high level but not quantified; this judgment affects how strongly the 11/13 multiple-primitive pattern can be taken as representative of the broader space.
minor comments (3)
  1. [Table 2] Table 2: The per-agent primitive assignments are clear, but adding a column or footnote that explicitly flags the two single-primitive agents and their control structures would make the composability claim easier to verify at a glance.
  2. [§4.2] §4.2: The seven context-compaction strategies are enumerated but lack a compact summary table mapping each strategy to the agents that employ it; this would improve readability without altering the analysis.
  3. [Figure 1] Figure 1: The three-layer diagram would benefit from explicit arrows or labels showing how the five loop primitives map onto the control-architecture layer.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment, detailed summary, and recommendation for minor revision. The single major comment is addressed point-by-point below.

read point-by-point responses
  1. Referee: [§3.1] §3.1: The selection criteria for the 13 agents (e.g., GitHub popularity thresholds or activity filters) are described at a high level but not quantified; this judgment affects how strongly the 11/13 multiple-primitive pattern can be taken as representative of the broader space.

    Authors: We agree that the selection criteria in §3.1 are stated at a high level and that explicit quantification would allow readers to better evaluate the representativeness of the 11/13 composability result. In the revised manuscript we will expand §3.1 with concrete thresholds: agents were required to (i) exceed 200 GitHub stars at the time of selection, (ii) have at least one commit in the preceding 12 months, and (iii) be explicitly described in their README or paper as an LLM-based coding agent. We will also insert a new table (Table 1) listing each of the 13 repositories, their star counts, last commit dates, and pinned commit hashes. These additions will clarify that the observed pattern applies to prominent, actively maintained open-source scaffolds rather than to the entire space of possible implementations. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a descriptive taxonomy based on direct inspection of source code from 13 external open-source repositories at pinned commits. All claims about loop primitives, control architectures, and composability are grounded in observable code patterns with explicit file paths and line numbers. There are no equations, fitted parameters, predictions, or self-referential definitions that reduce the findings to the paper's own inputs. The five primitives are identified from control-flow structures in the analyzed agents rather than defined circularly within the taxonomy. No self-citation chains or ansatzes support the core results, making the derivation chain fully self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The taxonomy rests on the domain assumption that code-level inspection of a modest sample of open-source agents reveals stable architectural distinctions; no free parameters are fitted and no new entities are postulated.

axioms (1)
  • domain assumption The 13 selected open-source coding agents at their pinned commits are representative enough to support general statements about scaffold architectures.
    The paper derives all taxonomic claims from analysis of these specific agents.

pith-pipeline@v0.9.0 · 5564 in / 1308 out tokens · 44652 ms · 2026-05-13T17:57:54.196830+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents

    cs.CL 2026-05 unverdicted novelty 5.0

    An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · cited by 1 Pith paper · 5 internal anchors

  1. [1]

    MASAI: Modular architecture for software-engineering AI agents,

    The open-source repository contains plugins, examples, and documentation, but the core agent source is dis- tributed as compiled TypeScript bundles in the npm package. Daman Arora, Atharv Sonwane, Nalin Wadhwa, Abhav Mehrotra, Saiteja Utpala, Ramakrishna Bairi, Aditya Kanade, and Nagarajan Natarajan. MASAI: Modular architecture for software- engineering A...

  2. [2]

    arXiv preprint arXiv:2506.18824 , year=

    arXiv preprint arXiv:2506.18824. Islem Bouzenia, Premkumar Devanbu, and Michael Pradel. RepairAgent: An autonomous, LLM- based agent for program repair. InProceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE), pages 2188–2200,

  3. [3]

    Nghi D. Q. Bui. Building AI coding agents for the terminal: Scaffolding, harness, context engi- neering, and lessons learned.arXiv preprint arXiv:2603.05344,

  4. [4]

    Under- standing software engineering agents through the lens of traceability: An empirical study.arXiv preprint arXiv:2506.08311,

    Ira Ceka, Saurabh Pujar, Shyam Ramji, Luca Buratti, Gail Kaiser, and Baishakhi Ray. Under- standing software engineering agents through the lens of traceability: An empirical study.arXiv preprint arXiv:2506.08311,

  5. [5]

    Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents

    Zhi Chen, Zhensu Sun, Yuling Shi, Chao Peng, Xiaodong Gu, David Lo, and Lingxiao Jiang. Rethinking the value of agent-generated tests for LLM-based software engineering agents.arXiv preprint arXiv:2602.07900,

  6. [6]

    SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

    Xiang Deng, Jeff Da, Edwin Pan, et al. SWE-Bench Pro: Can AI agents solve long-horizon software engineering tasks?arXiv preprint arXiv:2509.16941,

  7. [7]

    Zhang, Pinjia He, and Ahmed E

    Zhiyu Fan, Kirill Vasilevski, Dayi Lin, Boyuan Chen, Yihao Chen, Zhiqing Zhong, Jie M. Zhang, Pinjia He, and Ahmed E. Hassan. SWE-Effi: Re-evaluating software AI agent system effectiveness under resource constraints.arXiv preprint arXiv:2509.09853,

  8. [8]

    URLhttps://martinfowler.com/eaaDev/EventSourcing. html. Matthias Galster, Seyedmoein Mohsenimofidi, Jai Lal Lulla, Muhammad Auwal Abubakar, Christoph Treude, and Sebastian Baltes. Configuring agentic AI coding tools: An exploratory study.arXiv preprint arXiv:2602.14690,

  9. [9]

    37 Paul Gauthier

    arXiv preprint arXiv:2510.08996. 37 Paul Gauthier. Aider: AI pair programming in your terminal.https://aider.chat,

  10. [10]

    Gemma: Open Models Based on Gemini Research and Technology

    Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, et al. Gemma: Open models based on Gemini research and technology.arXiv preprint arXiv:2403.08295,

  11. [11]

    The browsergym ecosystem for web agent research.arXiv preprint arXiv:2412.05467,

    Thibault Le Sellier De Chezelles, Maxime Gasse, Alexandre Drouin, Massimo Caccia, L´ eo Boisvert, Megh Thakkar, Tom Marty, Rim Assouel, Sahar Omidi Shayegan, Lawrence Keunho Jang, et al. The BrowserGym ecosystem for web agent research.arXiv preprint arXiv:2412.05467,

  12. [12]

    arXiv preprint arXiv:2511.00197 , year=

    arXiv preprint arXiv:2511.00197. Tula Masterman, Sandi Besen, Mason Sawtell, and Alex Chao. The landscape of emerging AI agent architectures for reasoning, planning, and tool calling: A survey.arXiv preprint arXiv:2404.11584,

  13. [13]

    Architectures for building agentic AI.arXiv preprint arXiv:2512.09458,

    S lawomir Nowaczyk. Architectures for building agentic AI.arXiv preprint arXiv:2512.09458,

  14. [14]

    Prometheus: Towards long-horizon codebase navigation for repository-level problem solving.arXiv preprint arXiv:2507.19942,

    Yue Pan, Zimin Chen, Siyu Lu, Zhaoyang Chu, Xiang Li, Han Li, Yang Feng, Claire Le Goues, Federica Sarro, Martin Monperrus, and He Ye. Prometheus: Towards long-horizon codebase navigation for repository-level problem solving.arXiv preprint arXiv:2507.19942,

  15. [15]

    Anselm Strauss and Juliet Corbin.Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory

    arXiv preprint arXiv:2601.19583. Anselm Strauss and Juliet Corbin.Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory. SAGE Publications, 2nd edition,

  16. [16]

    arXiv preprint arXiv:2407.16741. W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. A survey on software fault localization.IEEE Transactions on Software Engineering, 42(8):707–740,

  17. [17]

    Swe-compass: Towards unified evaluation of agentic coding abilities for large language models, 2025

    ACM SIGSOFT Distinguished Paper Award. Jingxuan Xu, Ken Deng, Weihao Li, et al. SWE-Compass: Towards unified evaluation of agentic coding abilities for large language models.arXiv preprint arXiv:2511.05459,

  18. [18]

    Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury

    doi: 10.1007/s11432-025-4670-0. Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. AutoCodeRover: Au- tonomous program improvement. InProceedings of the 33rd ACM SIGSOFT International Sym- posium on Software Testing and Analysis (ISSTA), 2024b. A Candidate Agent Corpus Table 14 lists the full pool of 22 candidate agents considered for this st...