arxiv: 2604.03515 · v2 · submitted 2026-04-03 · 💻 cs.SE · cs.AI· cs.ET

Recognition: no theorem link

Inside the Scaffold: A Source-Code Taxonomy of Coding Agent Architectures

Benjamin Rombaut

Authors on Pith no claims yet

Pith reviewed 2026-05-13 17:57 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.ET

keywords coding agentsLLM scaffoldingagent architecturesource code taxonomyloop primitivesReActcontrol loopssoftware engineering

0 comments

The pith

Coding agents combine five loop primitives such as ReAct and tree search in different combinations rather than using single fixed structures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes the source code of 13 open-source coding agents to build a taxonomy across control architecture, tool interfaces, and resource management. It establishes that five loop primitives function as reusable building blocks that most agents layer together. A sympathetic reader would care because the actual scaffold code determines agent behavior in ways that capability-based surveys miss. The study grounds all claims in specific file paths and line numbers from pinned commits. This reveals both convergence under external constraints and open design questions in areas like context compaction.

Core claim

Source-code examination of 13 coding agent scaffolds shows that five loop primitives—ReAct, generate-test-repair, plan-execute, multi-attempt retry, and tree search—act as composable building blocks. Eleven of the thirteen agents combine multiple primitives rather than relying on one control structure. Architectures range widely in tool counts from zero to 37 and in seven distinct context compaction strategies, while dimensions converge where external constraints such as edit formats and execution isolation dominate.

What carries the argument

A 12-dimension source-code taxonomy divided into control architecture, tool and environment interface, and resource management layers, with all observations tied to concrete file paths and line numbers.

If this is right

New agents can be classified by which subset of the five primitives they employ and in what order.
Designers can deliberately mix primitives to create hybrids that target specific failure modes.
Performance differences between agents can be traced to particular primitive combinations rather than opaque overall architectures.
Standardization efforts can focus on the divergent dimensions such as context compaction while leaving constrained areas untouched.
Researchers gain a shared reference for comparing agent behavior at the implementation level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The composability finding suggests that future agent improvements may come more from better selection and ordering of existing primitives than from inventing entirely new control loops.
Extending the same code-level analysis to closed-source agents could test whether the same five primitives dominate outside open repositories.
The taxonomy could support automated search over primitive combinations to optimize agents for particular codebases or bug types.
Similar decompositions might apply to agent scaffolds in non-coding domains where loop structures recur.

Load-bearing premise

The 13 chosen open-source agents at their pinned commits represent the broader space of coding agent architectures well enough for the observed patterns to generalize.

What would settle it

Identification of a coding agent whose control loop cannot be expressed as any combination of the five listed primitives.

Figures

Figures reproduced from arXiv: 2604.03515 by Benjamin Rombaut.

read the original abstract

LLM-based coding agents can localize bugs, generate patches, and run tests with diminishing human oversight, yet the scaffolding code that surrounds the language model (the control loop, tool definitions, state management, and context strategy) remains poorly understood. Existing surveys classify agents by abstract capabilities (tool use, planning, reflection) that cannot distinguish between architecturally distinct systems, and trajectory studies observe what agents do without examining the scaffold code that determines why. This paper presents a source-code-level architectural taxonomy derived from analysis of 13 open-source coding agent scaffolds at pinned commit hashes. Each agent is characterized across 12 dimensions organized into three layers: control architecture, tool and environment interface, and resource management. The analysis reveals that scaffold architectures resist discrete classification: control strategies range from fixed pipelines to Monte Carlo Tree Search, tool counts range from 0 to 37, and context compaction spans seven distinct strategies. Five loop primitives (ReAct, generate-test-repair, plan-execute, multi-attempt retry, tree search) function as composable building blocks that agents layer in different combinations; 11 of 13 agents compose multiple primitives rather than relying on a single control structure. Dimensions converge where external constraints dominate (tool capability categories, edit formats, execution isolation) and diverge where open design questions remain (context compaction, state management, multi-model routing). All taxonomic claims are grounded in file paths and line numbers, providing a reusable reference for researchers studying agent behavior and practitioners designing new scaffolds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's core finding is that most coding agent scaffolds compose multiple loop primitives rather than relying on a single structure.

read the letter

The punchline on this one is that the taxonomy reveals most coding scaffolds are built by composing multiple loop primitives rather than using a single fixed pattern. Eleven of the thirteen agents do this, based on direct code inspection. What the paper does well is stay close to the source. Each assignment to the 12 dimensions comes with specific file paths and line numbers from the pinned commits. That turns the categories into something you can verify instead of high-level abstractions. Breaking it into control architecture, tool and environment interface, and resource management gives a structured way to compare things that previous work treated as similar. The list of seven context strategies and the note on where designs converge under constraints are useful details for anyone implementing these. On the soft spots, the choice of the 13 agents and how the dimensions are defined isn't mechanically fixed, so different researchers might draw the lines differently. The sample is limited to open-source ones at particular points in time, which means new closed systems or later versions could shift the picture. But the paper doesn't overclaim on that front. This is aimed at researchers and practitioners in AI-assisted software engineering who need a better way to talk about scaffold design. It provides a reference that can be reused when analyzing agent behavior or building new ones. The grounding makes it worth a serious referee's time even if some categorization choices get pushed back on in review. I'd say send it for peer review.

Referee Report

1 major / 3 minor

Summary. The manuscript presents a source-code taxonomy of 13 open-source LLM-based coding agent architectures at pinned commits. Each agent is analyzed across 12 dimensions grouped into three layers (control architecture, tool and environment interface, resource management). The central claim is that five loop primitives—ReAct, generate-test-repair, plan-execute, multi-attempt retry, and tree search—act as composable building blocks, with 11 of 13 agents layering multiple primitives rather than relying on a single control structure. All taxonomic assignments are tied to explicit file paths and line numbers.

Significance. If the taxonomy holds, the work supplies a verifiable, code-grounded reference that distinguishes architecturally distinct scaffolds where prior surveys rely on abstract capabilities. The composability finding and the observed convergence on externally constrained dimensions (tool categories, edit formats) versus divergence on open questions (context compaction, state management) provide a reusable framework for researchers studying agent behavior and for practitioners designing new systems. The explicit pinning of commits and line-level grounding are particular strengths.

major comments (1)

[§3.1] §3.1: The selection criteria for the 13 agents (e.g., GitHub popularity thresholds or activity filters) are described at a high level but not quantified; this judgment affects how strongly the 11/13 multiple-primitive pattern can be taken as representative of the broader space.

minor comments (3)

[Table 2] Table 2: The per-agent primitive assignments are clear, but adding a column or footnote that explicitly flags the two single-primitive agents and their control structures would make the composability claim easier to verify at a glance.
[§4.2] §4.2: The seven context-compaction strategies are enumerated but lack a compact summary table mapping each strategy to the agents that employ it; this would improve readability without altering the analysis.
[Figure 1] Figure 1: The three-layer diagram would benefit from explicit arrows or labels showing how the five loop primitives map onto the control-architecture layer.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment, detailed summary, and recommendation for minor revision. The single major comment is addressed point-by-point below.

read point-by-point responses

Referee: [§3.1] §3.1: The selection criteria for the 13 agents (e.g., GitHub popularity thresholds or activity filters) are described at a high level but not quantified; this judgment affects how strongly the 11/13 multiple-primitive pattern can be taken as representative of the broader space.

Authors: We agree that the selection criteria in §3.1 are stated at a high level and that explicit quantification would allow readers to better evaluate the representativeness of the 11/13 composability result. In the revised manuscript we will expand §3.1 with concrete thresholds: agents were required to (i) exceed 200 GitHub stars at the time of selection, (ii) have at least one commit in the preceding 12 months, and (iii) be explicitly described in their README or paper as an LLM-based coding agent. We will also insert a new table (Table 1) listing each of the 13 repositories, their star counts, last commit dates, and pinned commit hashes. These additions will clarify that the observed pattern applies to prominent, actively maintained open-source scaffolds rather than to the entire space of possible implementations. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a descriptive taxonomy based on direct inspection of source code from 13 external open-source repositories at pinned commits. All claims about loop primitives, control architectures, and composability are grounded in observable code patterns with explicit file paths and line numbers. There are no equations, fitted parameters, predictions, or self-referential definitions that reduce the findings to the paper's own inputs. The five primitives are identified from control-flow structures in the analyzed agents rather than defined circularly within the taxonomy. No self-citation chains or ansatzes support the core results, making the derivation chain fully self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The taxonomy rests on the domain assumption that code-level inspection of a modest sample of open-source agents reveals stable architectural distinctions; no free parameters are fitted and no new entities are postulated.

axioms (1)

domain assumption The 13 selected open-source coding agents at their pinned commits are representative enough to support general statements about scaffold architectures.
The paper derives all taxonomic claims from analysis of these specific agents.

pith-pipeline@v0.9.0 · 5564 in / 1308 out tokens · 44652 ms · 2026-05-13T17:57:54.196830+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents
cs.CL 2026-05 unverdicted novelty 5.0

An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · cited by 1 Pith paper · 5 internal anchors

[1]

MASAI: Modular architecture for software-engineering AI agents,

The open-source repository contains plugins, examples, and documentation, but the core agent source is dis- tributed as compiled TypeScript bundles in the npm package. Daman Arora, Atharv Sonwane, Nalin Wadhwa, Abhav Mehrotra, Saiteja Utpala, Ramakrishna Bairi, Aditya Kanade, and Nagarajan Natarajan. MASAI: Modular architecture for software- engineering A...

work page arXiv
[2]

arXiv preprint arXiv:2506.18824 , year=

arXiv preprint arXiv:2506.18824. Islem Bouzenia, Premkumar Devanbu, and Michael Pradel. RepairAgent: An autonomous, LLM- based agent for program repair. InProceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE), pages 2188–2200,

work page arXiv
[3]

Nghi D. Q. Bui. Building AI coding agents for the terminal: Scaffolding, harness, context engi- neering, and lessons learned.arXiv preprint arXiv:2603.05344,

work page arXiv
[4]

Under- standing software engineering agents through the lens of traceability: An empirical study.arXiv preprint arXiv:2506.08311,

Ira Ceka, Saurabh Pujar, Shyam Ramji, Luca Buratti, Gail Kaiser, and Baishakhi Ray. Under- standing software engineering agents through the lens of traceability: An empirical study.arXiv preprint arXiv:2506.08311,

work page arXiv
[5]

Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents

Zhi Chen, Zhensu Sun, Yuling Shi, Chao Peng, Xiaodong Gu, David Lo, and Lingxiao Jiang. Rethinking the value of agent-generated tests for LLM-based software engineering agents.arXiv preprint arXiv:2602.07900,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

Xiang Deng, Jeff Da, Edwin Pan, et al. SWE-Bench Pro: Can AI agents solve long-horizon software engineering tasks?arXiv preprint arXiv:2509.16941,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Zhang, Pinjia He, and Ahmed E

Zhiyu Fan, Kirill Vasilevski, Dayi Lin, Boyuan Chen, Yihao Chen, Zhiqing Zhong, Jie M. Zhang, Pinjia He, and Ahmed E. Hassan. SWE-Effi: Re-evaluating software AI agent system effectiveness under resource constraints.arXiv preprint arXiv:2509.09853,

work page arXiv
[8]

URLhttps://martinfowler.com/eaaDev/EventSourcing. html. Matthias Galster, Seyedmoein Mohsenimofidi, Jai Lal Lulla, Muhammad Auwal Abubakar, Christoph Treude, and Sebastian Baltes. Configuring agentic AI coding tools: An exploratory study.arXiv preprint arXiv:2602.14690,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

37 Paul Gauthier

arXiv preprint arXiv:2510.08996. 37 Paul Gauthier. Aider: AI pair programming in your terminal.https://aider.chat,

work page arXiv
[10]

Gemma: Open Models Based on Gemini Research and Technology

Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, et al. Gemma: Open models based on Gemini research and technology.arXiv preprint arXiv:2403.08295,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

The browsergym ecosystem for web agent research.arXiv preprint arXiv:2412.05467,

Thibault Le Sellier De Chezelles, Maxime Gasse, Alexandre Drouin, Massimo Caccia, L´ eo Boisvert, Megh Thakkar, Tom Marty, Rim Assouel, Sahar Omidi Shayegan, Lawrence Keunho Jang, et al. The BrowserGym ecosystem for web agent research.arXiv preprint arXiv:2412.05467,

work page arXiv
[12]

arXiv preprint arXiv:2511.00197 , year=

arXiv preprint arXiv:2511.00197. Tula Masterman, Sandi Besen, Mason Sawtell, and Alex Chao. The landscape of emerging AI agent architectures for reasoning, planning, and tool calling: A survey.arXiv preprint arXiv:2404.11584,

work page arXiv
[13]

Architectures for building agentic AI.arXiv preprint arXiv:2512.09458,

S lawomir Nowaczyk. Architectures for building agentic AI.arXiv preprint arXiv:2512.09458,

work page arXiv
[14]

Prometheus: Towards long-horizon codebase navigation for repository-level problem solving.arXiv preprint arXiv:2507.19942,

Yue Pan, Zimin Chen, Siyu Lu, Zhaoyang Chu, Xiang Li, Han Li, Yang Feng, Claire Le Goues, Federica Sarro, Martin Monperrus, and He Ye. Prometheus: Towards long-horizon codebase navigation for repository-level problem solving.arXiv preprint arXiv:2507.19942,

work page arXiv
[15]

Anselm Strauss and Juliet Corbin.Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory

arXiv preprint arXiv:2601.19583. Anselm Strauss and Juliet Corbin.Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory. SAGE Publications, 2nd edition,

work page arXiv
[16]

arXiv preprint arXiv:2407.16741. W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. A survey on software fault localization.IEEE Transactions on Software Engineering, 42(8):707–740,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Swe-compass: Towards unified evaluation of agentic coding abilities for large language models, 2025

ACM SIGSOFT Distinguished Paper Award. Jingxuan Xu, Ken Deng, Weihao Li, et al. SWE-Compass: Towards unified evaluation of agentic coding abilities for large language models.arXiv preprint arXiv:2511.05459,

work page arXiv
[18]

Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury

doi: 10.1007/s11432-025-4670-0. Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. AutoCodeRover: Au- tonomous program improvement. InProceedings of the 33rd ACM SIGSOFT International Sym- posium on Software Testing and Analysis (ISSTA), 2024b. A Candidate Agent Corpus Table 14 lists the full pool of 22 candidate agents considered for this st...

work page doi:10.1007/s11432-025-4670-0