From Prompt to Process: a Process Taxonomy and Comparative Assessment of Frameworks Supporting AI Software Development Agents

Sanderson Oliveira de Macedo

arxiv: 2606.04967 · v1 · pith:3SERPYG4new · submitted 2026-06-03 · 💻 cs.SE · cs.AI

From Prompt to Process: a Process Taxonomy and Comparative Assessment of Frameworks Supporting AI Software Development Agents

Sanderson Oliveira de Macedo This is my paper

Pith reviewed 2026-06-28 05:14 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords AI software developmentprocess taxonomydevelopment frameworksagent coordinationspecification-driven developmentcomparative assessmentsoftware agents

0 comments

The pith

AI software development frameworks shift from isolated prompts to structured processes using artifacts and reviews, but none covers all six key dimensions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a taxonomy to assess how frameworks turn AI capabilities into repeatable development processes rather than single prompts. It examines six frameworks and identifies convergence on persistent artifacts, work contracts, traceability, and human review as ways to reduce ambiguity and coordinate agents. A sympathetic reader cares because this clarifies the move from chat-based tools to full process support in software engineering. The assessment shows a structural trade-off where deeper process coverage reduces portability across different agents. It also flags recurring risks like specification drift and lack of benchmarks for entire workflows.

Core claim

We introduce a six-dimension process taxonomy—specification, context, roles, execution, validation and portability—with a scoring rubric that turns it into a replicable instrument. We apply it to six frameworks: GitHub Spec Kit, OpenSpec, BMAD Method, Get Shit Done, Spec Kitty and Reversa, plus an out-of-sample case Spec-Flow. Among frameworks that adopt some process there is convergence: the isolated prompt loses centrality, and persistent artifacts, work contracts, traceability and human review become mechanisms that reduce ambiguity and coordinate agents. No framework strongly covers all six dimensions, exposing a structural trade-off between process depth and portability across agents.

What carries the argument

Six-dimension process taxonomy with scoring rubric: specification, context, roles, execution, validation and portability.

If this is right

Isolated prompts lose centrality as persistent artifacts and human review coordinate agents.
Work contracts and traceability become central mechanisms for reducing ambiguity.
A structural trade-off exists between process depth and portability across agents.
Recurring risks include drift between specification and code plus excessive trust in generated artifacts.
Empirical evaluation should focus on intermediate-quality metrics, context governance, installation security and reproducibility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The taxonomy could guide design of hybrid frameworks that modularize dimensions to ease the depth-portability trade-off.
Platform dependence and fragility of community extensions suggest value in studying standardized interfaces between frameworks.
Risks around generated artifacts point to potential value of automated drift-detection tools as a testable extension.

Load-bearing premise

The directed search of primary sources using a functional inclusion criterion and traction measurement produced a representative sample of frameworks that accurately reflects the current landscape without significant omission or bias.

What would settle it

Identification of a widely adopted framework that scores strongly on all six dimensions while maintaining high portability across agents would challenge the claimed trade-off.

Figures

Figures reproduced from arXiv: 2606.04967 by Sanderson Oliveira de Macedo.

read the original abstract

AI tools for programming are no longer just autocomplete or chat assistants: they organize themselves as development frameworks, with process, roles, artifacts and verification. Recent surveys map agents and LLMs for software engineering, but a study centered on the operational frameworks that turn these capabilities into process is missing. We ran a directed search of primary sources, with a functional inclusion criterion and traction measurement, and selected six frameworks: GitHub Spec Kit, OpenSpec, BMAD Method, Get Shit Done (GSD), Spec Kitty and Reversa. Each attacks AI development through a different path: spec-driven development in full and lightweight variants, agent-driven agile planning, context engineering over the agent, worktree isolation and review, and recovery of operational specifications from legacy systems. Our central contribution is a six-dimension process taxonomy: specification, context, roles, execution, validation and portability, with a scoring rubric that turns it into a replicable instrument. We apply it to the six frameworks and an out-of-sample case, Spec-Flow. Two results stand out. Among frameworks that already adopt some process there is convergence: the isolated prompt loses centrality, and persistent artifacts, work contracts, traceability and human review become mechanisms that reduce ambiguity and coordinate agents. And no framework strongly covers all six dimensions, exposing a structural trade-off between process depth and portability across agents. We also found recurring risks: drift between specification and code, excessive trust in generated artifacts, fragility of community extensions, platform dependence and a lack of benchmarks for the complete process. We close with a research agenda for empirical evaluation, focused on intermediate-quality metrics, context governance, installation security and reproducibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The taxonomy is a real new instrument but the sample selection and missing scores leave the convergence and trade-off claims hard to verify.

read the letter

The paper's main contribution is a six-dimension taxonomy—specification, context, roles, execution, validation, portability—plus a scoring rubric for comparing AI software development frameworks. It applies this to six named frameworks plus one out-of-sample case and surfaces patterns like the move from isolated prompts to persistent artifacts, work contracts, traceability, and human review, along with risks such as drift and lack of benchmarks. That organizing effort is useful on its own and distinct from prior surveys of agents and LLMs in software engineering.

The taxonomy and the risk list are the parts that hold up best. They give practitioners and researchers a replicable way to look at how these frameworks actually structure work instead of just listing capabilities.

The soft spots are in the evidence. The methods describe a directed search using functional inclusion and traction measurement, yet supply no search strings, database list, candidate count, or exclusion rules. That leaves the representativeness concern from the stress-test note standing: if the criteria tilt toward frameworks already heavy on specification or context engineering, both the convergence observation and the claim that no framework covers all six dimensions could be artifacts of the sample rather than general properties. The abstract also gives high-level results without the raw dimension scores, inter-rater details, or rubric validation steps, so the comparative findings cannot be checked directly.

This is for people building, choosing, or studying AI agent frameworks in software engineering who need a structured evaluation lens. A reader focused on tool design or standardization could extract value from the taxonomy even if the specific rankings need more backing.

It deserves peer review because the taxonomy is a concrete, new instrument in a fast-moving area, though the paper will need clearer methods and the actual scores before the stronger claims can be assessed.

Referee Report

3 major / 2 minor

Summary. The paper introduces a six-dimension process taxonomy (specification, context, roles, execution, validation, portability) with an associated scoring rubric for evaluating AI software development frameworks. It applies this instrument via a directed search to six frameworks (GitHub Spec Kit, OpenSpec, BMAD Method, GSD, Spec Kitty, Reversa) plus an out-of-sample case (Spec-Flow), concluding that frameworks converge on persistent artifacts, work contracts, traceability and human review while exhibiting a structural trade-off between process depth and portability; it also catalogs recurring risks and outlines a research agenda.

Significance. If the taxonomy and rubric prove replicable and the sample representative, the work supplies a needed structured lens for comparing operational AI development processes rather than isolated agents or LLMs, with explicit credit for defining a scoring rubric that converts the taxonomy into a replicable instrument and for including an out-of-sample validation case.

major comments (3)

[Abstract (methods paragraph)] Abstract (methods paragraph): the directed search is characterized only as using 'a functional inclusion criterion and traction measurement' with no search strings, databases, candidate pool size, or explicit exclusion rules supplied. This under-specification is load-bearing for the representativeness claim that underpins both the convergence observation and the 'no framework strongly covers all six dimensions' conclusion.
[Results] Results (application of taxonomy): the manuscript supplies no raw per-framework dimension scores, inter-rater reliability statistics, or explicit validation steps for the scoring rubric. Without these, the mapping from rubric to the reported convergence and depth-portability trade-off cannot be independently verified.
[Framework selection and scoring] § on framework selection and scoring: the choice of the six primary frameworks plus Spec-Flow is presented without evidence that the functional inclusion criterion systematically avoids favoring specification-heavy approaches or omitting frameworks that might achieve broader dimension coverage, directly affecting the generality of the structural trade-off claim.

minor comments (2)

[Taxonomy definition] The six dimensions are introduced at a high level; explicit operational definitions or example scoring anchors for each would improve replicability of the instrument.
[Risks and research agenda] Risks section lists platform dependence and lack of benchmarks but does not connect them quantitatively back to the dimension scores.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments. We address each of the major comments point by point below, indicating the revisions we will make to improve the manuscript's transparency and replicability.

read point-by-point responses

Referee: [Abstract (methods paragraph)] Abstract (methods paragraph): the directed search is characterized only as using 'a functional inclusion criterion and traction measurement' with no search strings, databases, candidate pool size, or explicit exclusion rules supplied. This under-specification is load-bearing for the representativeness claim that underpins both the convergence observation and the 'no framework strongly covers all six dimensions' conclusion.

Authors: We agree that the abstract's description of the search method is concise and could benefit from additional detail to support replicability. In the revised manuscript, we will expand the Methods section to explicitly list the search strings employed (e.g., terms related to 'AI software development frameworks', 'spec-driven AI agents'), the primary sources and databases consulted (GitHub repositories, arXiv preprints, developer forums), the size of the initial candidate pool, and the full set of exclusion rules applied. This will provide a clearer basis for the representativeness of the selected frameworks. revision: yes
Referee: [Results] Results (application of taxonomy): the manuscript supplies no raw per-framework dimension scores, inter-rater reliability statistics, or explicit validation steps for the scoring rubric. Without these, the mapping from rubric to the reported convergence and depth-portability trade-off cannot be independently verified.

Authors: This is a valid observation regarding transparency. We will add a supplementary table presenting the raw scores for each framework across the six dimensions. Since the scoring was conducted by the author team using the defined rubric, we will explicitly state that it was a single-rater process and discuss this as a limitation, while noting plans for future multi-rater studies. Additionally, we will elaborate on the validation steps, including how the out-of-sample Spec-Flow case was used to test the taxonomy's applicability. revision: partial
Referee: [Framework selection and scoring] § on framework selection and scoring: the choice of the six primary frameworks plus Spec-Flow is presented without evidence that the functional inclusion criterion systematically avoids favoring specification-heavy approaches or omitting frameworks that might achieve broader dimension coverage, directly affecting the generality of the structural trade-off claim.

Authors: We selected frameworks based on their explicit definition of end-to-end processes for AI-assisted software development and measurable traction indicators such as community adoption and documentation quality. The sample includes a range of approaches beyond pure specification (e.g., GSD's agent-driven agile planning and Spec Kitty's context engineering). To strengthen this, we will revise the relevant section to provide more detailed justification of the inclusion criteria, list examples of excluded frameworks with reasons, and discuss potential selection biases. The inclusion of the out-of-sample Spec-Flow case already serves to partially address generalizability concerns. revision: yes

Circularity Check

0 steps flagged

No circularity: qualitative taxonomy and scoring are self-contained

full rationale

The paper introduces a novel six-dimension taxonomy (specification, context, roles, execution, validation, portability) with an explicit scoring rubric, then applies it to six frameworks selected by directed search plus one out-of-sample case. No equations, fitted parameters, or derivations exist. The convergence claim and depth-vs-portability trade-off are direct empirical observations from the rubric scores, not reductions to prior self-referential quantities or self-citations. The methods description of the search is high-level but does not create definitional or load-bearing circularity in the assessment results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central contribution rests on the domain assumption that the six chosen dimensions adequately capture the essential aspects of AI development processes; no free parameters or invented entities are introduced.

axioms (1)

domain assumption The dimensions specification, context, roles, execution, validation and portability comprehensively and usefully describe AI software development processes.
The taxonomy is introduced as the central contribution without derivation or external validation of why these exact six dimensions were selected.

pith-pipeline@v0.9.1-grok · 5827 in / 1512 out tokens · 38698 ms · 2026-06-28T05:14:22.996334+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 15 canonical work pages · 8 internal anchors

[1]

6 Best Spec-Driven Development Tools for AI Coding in 2026.https: //www.augmentcode.com/tools/best-spec-driven-development-tools, 2026

Augment Code. 6 Best Spec-Driven Development Tools for AI Coding in 2026.https: //www.augmentcode.com/tools/best-spec-driven-development-tools, 2026. Accessed 2026-05-26

2026
[2]

Spec-Driven Development (SDD): The Definitive 2026 Guide.https://thebcms

BCMS. Spec-Driven Development (SDD): The Definitive 2026 Guide.https://thebcms. com/blog/spec-driven-development, 2026. Accessed 2026-05-26

2026
[3]

Agentic AI in the Software Development Lifecycle: Architecture, Empirical Evidence, and the Reshaping of Software Engineering

Happy Bhati. Agentic AI in the Software Development Lifecycle: Architecture, Empirical Evidence, and the Reshaping of Software Engineering, 2026. doi: 10.48550/arXiv.2604.26275

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.26275 2026
[4]

BMAD Method Documentation

BMad Code. BMAD Method Documentation. https://docs.bmad-method.org/, 2026. Accessed 2026-05-26

2026
[5]

BMAD-METHOD: Breakthrough Method for Agile AI Driven Development

BMad Code Org. BMAD-METHOD: Breakthrough Method for Agile AI Driven Development. https://github.com/bmad-code-org/BMAD-METHOD, 2026. Accessed 2026-05-26

2026
[6]

Chikofsky and James H

Elliot J. Chikofsky and James H. Cross. Reverse Engineering and Design Recovery: A Taxonomy.IEEE Software, 7(1):13–17, 1990. doi: 10.1109/52.43044

work page doi:10.1109/52.43044 1990
[7]

Spec-Driven Development Framework Patterns

David Daniel. Spec-Driven Development Framework Patterns. https://daviddaniel. tech/research/papers/sdd-frameworks/, 2026. Accessed 2026-05-26. 15

2026
[8]

Fission-AI/OpenSpec: A Lightweight Specification Framework for Human-AI Alignment Before Coding

Fission AI. Fission-AI/OpenSpec: A Lightweight Specification Framework for Human-AI Alignment Before Coding. https://github.com/Fission-AI/OpenSpec, 2026. Accessed 2026-05-28

2026
[9]

Vahid Garousi, Michael Felderer, and Mika V. Mäntylä. Guidelines for including grey literature and conducting multivocal literature reviews in software engineering.Information and Software Technology, 106:101–121, 2019

2019
[10]

github/spec-kit: Toolkit to Help You Get Started with Spec-Driven Development

GitHub. github/spec-kit: Toolkit to Help You Get Started with Spec-Driven Development. https://github.com/github/spec-kit, 2026. Accessed 2026-05-26

2026
[11]

Spec Kit: A Specification-Driven Development Toolkit.https://github.github

GitHub. Spec Kit: A Specification-Driven Development Toolkit.https://github.github. io/spec-kit/, 2026. Accessed 2026-05-26

2026
[12]

Get Shit Done (GSD): A Lightweight Meta-Prompting, Context Engineering and Spec-Driven Development System for Claude Code.https://github.com/gsd-build/ get-shit-done, 2026

GSD Build. Get Shit Done (GSD): A Lightweight Meta-Prompting, Context Engineering and Spec-Driven Development System for Claude Code.https://github.com/gsd-build/ get-shit-done, 2026. Accessed 2026-05-28. Project relocated tohttps://github.com/ open-gsd/gsd-core

2026
[13]

ACM Transactions on Software Engineering and Methodology , volume=

Junda He, Christoph Treude, and David Lo. LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision, and the Road Ahead.ACM Transactions on Software Engineering and Methodology, 34(5):1–35, 2025. doi: 10.1145/3712003

work page doi:10.1145/3712003 2025
[14]

Large Language Models for Software Engineering: A Systematic Literature Review.ACM Transactions on Software Engineering and Methodology, 33(8):1–79, 2024

Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. Large Language Models for Software Engineering: A Systematic Literature Review.ACM Transactions on Software Engineering and Methodology, 33(8):1–79, 2024. doi: 10.1145/3695988

work page doi:10.1145/3695988 2024
[15]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. SWE-bench: Can Language Models Resolve Real-World GitHub Issues?, 2024. doi: 10.48550/arXiv.2310.06770

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.06770 2024
[16]

From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future, 2024

Haolin Jin et al. From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future, 2024. doi: 10.48550/arXiv.2408.02479

work page doi:10.48550/arxiv.2408.02479 2024
[17]

Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen Linkman

Barbara Kitchenham, O. Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen Linkman. Systematic literature reviews in software engineering: A systematic literature review.Information and Software Technology, 51(1):7–15, 2009

2009
[18]

Large Language Model-Based Agents for Software Engineering: A Survey,

Junwei Liu et al. Large Language Model-Based Agents for Software Engineering: A Survey,
[19]

doi: 10.48550/arXiv.2409.02977

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2409.02977
[20]

Reversa: A Reverse Documentation Engineering Framework for Converting Legacy Software into Operational Specifications for AI Agents

Sanderson Oliveira de Macedo and Ronaldo Martins da Costa. Reversa: A Reverse Documen- tationEngineeringFrameworkforConvertingLegacySoftwareintoOperationalSpecifications for AI Agents, 2026. doi: 10.48550/arXiv.2605.18684

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.18684 2026
[21]

Spec-Flow: Spec-Driven Development workflow for Claude Code

marcusgoll. Spec-Flow: Spec-Driven Development workflow for Claude Code. https: //github.com/marcusgoll/Spec-Flow, 2026. Accessed 2026-05-29

2026
[22]

OpenSpec: A Lightweight Spec-Driven Framework.https://openspec.dev/,

OpenSpec. OpenSpec: A Lightweight Spec-Driven Framework.https://openspec.dev/,
[23]

OpenSpec vs Spec Kit vs BMAD vs Kiro: Complete Comparison Guide

OpenSpec. OpenSpec vs Spec Kit vs BMAD vs Kiro: Complete Comparison Guide. https://openspec.pro/comparison/, 2026. Accessed 2026-05-26

2026
[24]

Spec-Driven Development: From Code to Contract in the Age of AI Coding Assistants, 2026

Deepak Babu Piskala. Spec-Driven Development: From Code to Contract in the Age of AI Coding Assistants, 2026. doi: 10.48550/arXiv.2602.00180. 16

work page doi:10.48550/arxiv.2602.00180 2026
[25]

Spec Kitty: Spec-Driven Development for AI Coding Agents.https://github

Priivacy AI. Spec Kitty: Spec-Driven Development for AI Coding Agents.https://github. com/Priivacy-ai/spec-kitty, 2026. Accessed 2026-05-28

2026
[26]

ChatDev: Communicative Agents for Software Development

Chen Qian, Xin Cong, Cheng Yang, Weize Chen, Yusheng Su, Juyuan Xu, Zhiyuan Liu, and Maosong Sun. ChatDev: Communicative Agents for Software Development, 2024. doi: 10.48550/arXiv.2307.07924

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.07924 2024
[27]

Meta-Engineering Harnesses for AI-Native Software Production: A Contract-Driven Adversarial Verification Architecture with Early Deployment Report

Satadru Sengupta, Tamunokorite Briggs, and Ivan Myshakivskyi. Meta-Engineering Har- nesses for AI-Native Software Production: A Contract-Driven Adversarial Verification Architecture with Early Deployment Report, 2026. doi: 10.48550/arXiv.2605.25665

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.25665 2026
[28]

Spec Kit Agents: Context-Grounded Agentic Workflows

Pardis Taghavi and Santosh Bhavani. Spec Kit Agents: Context-Grounded Agentic Work- flows, 2026. doi: 10.48550/arXiv.2604.05278

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.05278 2026
[29]

The Impact of Traceability on Software Maintenance and Evolution: A Mapping Study.Journal of Software: Evolution and Process, 33(10), 2021

Fangchao Tian, Tianlu Wang, Peng Liang, Chong Wang, Arif Ali Khan, and Muhammad Ali Babar. The Impact of Traceability on Software Maintenance and Evolution: A Mapping Study.Journal of Software: Evolution and Process, 33(10), 2021. doi: 10.1002/smr.2374

work page doi:10.1002/smr.2374 2021
[30]

Agents in Software Engineering: Survey, Landscape, and Vision, 2024

Yanlin Wang et al. Agents in Software Engineering: Survey, Landscape, and Vision, 2024. doi: 10.48550/arXiv.2409.09030

work page doi:10.48550/arxiv.2409.09030 2024
[31]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering, 2024. doi: 10.48550/arXiv.2405.15793. 17

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2405.15793 2024

[1] [1]

6 Best Spec-Driven Development Tools for AI Coding in 2026.https: //www.augmentcode.com/tools/best-spec-driven-development-tools, 2026

Augment Code. 6 Best Spec-Driven Development Tools for AI Coding in 2026.https: //www.augmentcode.com/tools/best-spec-driven-development-tools, 2026. Accessed 2026-05-26

2026

[2] [2]

Spec-Driven Development (SDD): The Definitive 2026 Guide.https://thebcms

BCMS. Spec-Driven Development (SDD): The Definitive 2026 Guide.https://thebcms. com/blog/spec-driven-development, 2026. Accessed 2026-05-26

2026

[3] [3]

Agentic AI in the Software Development Lifecycle: Architecture, Empirical Evidence, and the Reshaping of Software Engineering

Happy Bhati. Agentic AI in the Software Development Lifecycle: Architecture, Empirical Evidence, and the Reshaping of Software Engineering, 2026. doi: 10.48550/arXiv.2604.26275

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.26275 2026

[4] [4]

BMAD Method Documentation

BMad Code. BMAD Method Documentation. https://docs.bmad-method.org/, 2026. Accessed 2026-05-26

2026

[5] [5]

BMAD-METHOD: Breakthrough Method for Agile AI Driven Development

BMad Code Org. BMAD-METHOD: Breakthrough Method for Agile AI Driven Development. https://github.com/bmad-code-org/BMAD-METHOD, 2026. Accessed 2026-05-26

2026

[6] [6]

Chikofsky and James H

Elliot J. Chikofsky and James H. Cross. Reverse Engineering and Design Recovery: A Taxonomy.IEEE Software, 7(1):13–17, 1990. doi: 10.1109/52.43044

work page doi:10.1109/52.43044 1990

[7] [7]

Spec-Driven Development Framework Patterns

David Daniel. Spec-Driven Development Framework Patterns. https://daviddaniel. tech/research/papers/sdd-frameworks/, 2026. Accessed 2026-05-26. 15

2026

[8] [8]

Fission-AI/OpenSpec: A Lightweight Specification Framework for Human-AI Alignment Before Coding

Fission AI. Fission-AI/OpenSpec: A Lightweight Specification Framework for Human-AI Alignment Before Coding. https://github.com/Fission-AI/OpenSpec, 2026. Accessed 2026-05-28

2026

[9] [9]

Vahid Garousi, Michael Felderer, and Mika V. Mäntylä. Guidelines for including grey literature and conducting multivocal literature reviews in software engineering.Information and Software Technology, 106:101–121, 2019

2019

[10] [10]

github/spec-kit: Toolkit to Help You Get Started with Spec-Driven Development

GitHub. github/spec-kit: Toolkit to Help You Get Started with Spec-Driven Development. https://github.com/github/spec-kit, 2026. Accessed 2026-05-26

2026

[11] [11]

Spec Kit: A Specification-Driven Development Toolkit.https://github.github

GitHub. Spec Kit: A Specification-Driven Development Toolkit.https://github.github. io/spec-kit/, 2026. Accessed 2026-05-26

2026

[12] [12]

Get Shit Done (GSD): A Lightweight Meta-Prompting, Context Engineering and Spec-Driven Development System for Claude Code.https://github.com/gsd-build/ get-shit-done, 2026

GSD Build. Get Shit Done (GSD): A Lightweight Meta-Prompting, Context Engineering and Spec-Driven Development System for Claude Code.https://github.com/gsd-build/ get-shit-done, 2026. Accessed 2026-05-28. Project relocated tohttps://github.com/ open-gsd/gsd-core

2026

[13] [13]

ACM Transactions on Software Engineering and Methodology , volume=

Junda He, Christoph Treude, and David Lo. LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision, and the Road Ahead.ACM Transactions on Software Engineering and Methodology, 34(5):1–35, 2025. doi: 10.1145/3712003

work page doi:10.1145/3712003 2025

[14] [14]

Large Language Models for Software Engineering: A Systematic Literature Review.ACM Transactions on Software Engineering and Methodology, 33(8):1–79, 2024

Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. Large Language Models for Software Engineering: A Systematic Literature Review.ACM Transactions on Software Engineering and Methodology, 33(8):1–79, 2024. doi: 10.1145/3695988

work page doi:10.1145/3695988 2024

[15] [15]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. SWE-bench: Can Language Models Resolve Real-World GitHub Issues?, 2024. doi: 10.48550/arXiv.2310.06770

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.06770 2024

[16] [16]

From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future, 2024

Haolin Jin et al. From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future, 2024. doi: 10.48550/arXiv.2408.02479

work page doi:10.48550/arxiv.2408.02479 2024

[17] [17]

Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen Linkman

Barbara Kitchenham, O. Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen Linkman. Systematic literature reviews in software engineering: A systematic literature review.Information and Software Technology, 51(1):7–15, 2009

2009

[18] [18]

Large Language Model-Based Agents for Software Engineering: A Survey,

Junwei Liu et al. Large Language Model-Based Agents for Software Engineering: A Survey,

[19] [19]

doi: 10.48550/arXiv.2409.02977

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2409.02977

[20] [20]

Reversa: A Reverse Documentation Engineering Framework for Converting Legacy Software into Operational Specifications for AI Agents

Sanderson Oliveira de Macedo and Ronaldo Martins da Costa. Reversa: A Reverse Documen- tationEngineeringFrameworkforConvertingLegacySoftwareintoOperationalSpecifications for AI Agents, 2026. doi: 10.48550/arXiv.2605.18684

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.18684 2026

[21] [21]

Spec-Flow: Spec-Driven Development workflow for Claude Code

marcusgoll. Spec-Flow: Spec-Driven Development workflow for Claude Code. https: //github.com/marcusgoll/Spec-Flow, 2026. Accessed 2026-05-29

2026

[22] [22]

OpenSpec: A Lightweight Spec-Driven Framework.https://openspec.dev/,

OpenSpec. OpenSpec: A Lightweight Spec-Driven Framework.https://openspec.dev/,

[23] [23]

OpenSpec vs Spec Kit vs BMAD vs Kiro: Complete Comparison Guide

OpenSpec. OpenSpec vs Spec Kit vs BMAD vs Kiro: Complete Comparison Guide. https://openspec.pro/comparison/, 2026. Accessed 2026-05-26

2026

[24] [24]

Spec-Driven Development: From Code to Contract in the Age of AI Coding Assistants, 2026

Deepak Babu Piskala. Spec-Driven Development: From Code to Contract in the Age of AI Coding Assistants, 2026. doi: 10.48550/arXiv.2602.00180. 16

work page doi:10.48550/arxiv.2602.00180 2026

[25] [25]

Spec Kitty: Spec-Driven Development for AI Coding Agents.https://github

Priivacy AI. Spec Kitty: Spec-Driven Development for AI Coding Agents.https://github. com/Priivacy-ai/spec-kitty, 2026. Accessed 2026-05-28

2026

[26] [26]

ChatDev: Communicative Agents for Software Development

Chen Qian, Xin Cong, Cheng Yang, Weize Chen, Yusheng Su, Juyuan Xu, Zhiyuan Liu, and Maosong Sun. ChatDev: Communicative Agents for Software Development, 2024. doi: 10.48550/arXiv.2307.07924

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.07924 2024

[27] [27]

Meta-Engineering Harnesses for AI-Native Software Production: A Contract-Driven Adversarial Verification Architecture with Early Deployment Report

Satadru Sengupta, Tamunokorite Briggs, and Ivan Myshakivskyi. Meta-Engineering Har- nesses for AI-Native Software Production: A Contract-Driven Adversarial Verification Architecture with Early Deployment Report, 2026. doi: 10.48550/arXiv.2605.25665

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.25665 2026

[28] [28]

Spec Kit Agents: Context-Grounded Agentic Workflows

Pardis Taghavi and Santosh Bhavani. Spec Kit Agents: Context-Grounded Agentic Work- flows, 2026. doi: 10.48550/arXiv.2604.05278

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.05278 2026

[29] [29]

The Impact of Traceability on Software Maintenance and Evolution: A Mapping Study.Journal of Software: Evolution and Process, 33(10), 2021

Fangchao Tian, Tianlu Wang, Peng Liang, Chong Wang, Arif Ali Khan, and Muhammad Ali Babar. The Impact of Traceability on Software Maintenance and Evolution: A Mapping Study.Journal of Software: Evolution and Process, 33(10), 2021. doi: 10.1002/smr.2374

work page doi:10.1002/smr.2374 2021

[30] [30]

Agents in Software Engineering: Survey, Landscape, and Vision, 2024

Yanlin Wang et al. Agents in Software Engineering: Survey, Landscape, and Vision, 2024. doi: 10.48550/arXiv.2409.09030

work page doi:10.48550/arxiv.2409.09030 2024

[31] [31]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering, 2024. doi: 10.48550/arXiv.2405.15793. 17

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2405.15793 2024