pith. machine review for the scientific record. sign in

arxiv: 2604.04990 · v1 · submitted 2026-04-05 · 💻 cs.SE · cs.AI

Recognition: no theorem link

Architecture Without Architects: How AI Coding Agents Shape Software Architecture

Phongsakon Mark Konrad, Riccardo Terrenzi, Serkan Ayvaz, Tim Lukas Adam

Authors on Pith no claims yet

Pith reviewed 2026-05-13 17:16 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords AI coding agentssoftware architectureprompt engineeringimplicit decisionsvibe architectingcoupling patternsinfrastructure scaffolding
0
0 comments X

The pith

AI coding agents make implicit architectural decisions based on prompt wording alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

AI coding agents select frameworks, scaffold infrastructure, and wire integrations as part of responding to natural language prompts, and these actions amount to architectural decisions made without human review. The paper identifies five mechanisms through which agents arrive at these choices and introduces six prompt-architecture coupling patterns that link specific prompt features to the infrastructure they produce. Some patterns are expected to weaken as models improve while others remain fundamental. An illustrative demonstration shows that rewording the same task produces structurally different systems. The authors call this vibe architecting and outline review practices, decision records, and tooling needed to bring the choices under control.

Core claim

AI coding agents select frameworks, scaffold infrastructure, and wire integrations often in seconds. These constitute architectural decisions made implicitly through five mechanisms. Six prompt-architecture coupling patterns map natural-language prompt features to the infrastructure they require, with patterns ranging from contingent couplings such as structured output validation that may weaken as models improve to fundamental ones such as tool-call orchestration that persist regardless of model capability. An illustrative demonstration confirms that prompt wording alone produces structurally different systems for the same task. The phenomenon is termed vibe architecting.

What carries the argument

Six prompt-architecture coupling patterns that map natural-language prompt features to the infrastructure they require.

If this is right

  • Architectural review of AI-generated code must examine the prompt features that triggered framework and integration choices.
  • Some coupling patterns will weaken with model improvements while others remain independent of capability gains.
  • Decision records should capture the prompt elements that shaped infrastructure decisions.
  • Tooling is required to surface and govern these previously hidden choices during development.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Prompt templates could be standardized to produce consistent architectural outcomes across similar tasks.
  • The same mechanisms may interact with model training data biases in ways that compound over successive generations of code.
  • Developers could treat prompt variation as an explicit design variable to explore alternative architectures before committing to one.

Load-bearing premise

Differences in generated systems for the same task are caused primarily by prompt wording rather than model stochasticity, training data, or other uncontrolled factors, and that these differences qualify as architectural decisions.

What would settle it

Re-running the illustrative demonstration with identical prompts but fixed random seeds across multiple model calls and checking whether structurally different systems still appear.

Figures

Figures reproduced from arXiv: 2604.04990 by Phongsakon Mark Konrad, Riccardo Terrenzi, Serkan Ayvaz, Tim Lukas Adam.

Figure 1
Figure 1. Figure 1: Architectural components introduced by each prompt variant. Darker shading indicates components added by increasing prompt specificity. Variant A [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Composition of coupling patterns in a single system (RAG chatbot with tool access). Pattern IDs match the catalog in Section IV. Cross-cutting [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Three-layer framework for architecture-aware AI-assisted develop [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

AI coding agents select frameworks, scaffold infrastructure, and wire integrations, often in seconds. These are architectural decisions, yet almost no one reviews them as such. We identify five mechanisms by which agents make implicit architectural choices and propose six prompt-architecture coupling patterns that map natural-language prompt features to the infrastructure they require. The patterns range from contingent couplings (structured output validation) that may weaken as models improve to fundamental ones (tool-call orchestration) that persist regardless of model capability. An illustrative demonstration confirms that prompt wording alone produces structurally different systems for the same task. We term the phenomenon vibe architecting, architecture shaped by prompts rather than deliberate design, and outline review practices, decision records, and tooling to bring these hidden decisions under governance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that AI coding agents make implicit architectural decisions via five mechanisms, proposes six prompt-architecture coupling patterns (ranging from contingent to fundamental) that map natural-language prompt features to infrastructure requirements, demonstrates via illustration that prompt wording alone yields structurally different systems for identical tasks, introduces the term 'vibe architecting', and outlines governance practices including review processes and tooling.

Significance. If the mechanisms and patterns hold under controlled conditions, the work would be significant for software engineering by surfacing how prompt engineering influences architecture in AI-assisted development and by proposing actionable governance approaches. The conceptual framing is timely and identifies a previously under-examined phenomenon, though its current reliance on an uncontrolled illustration limits immediate impact.

major comments (1)
  1. [Illustrative Demonstration] The illustrative demonstration (abstract and associated section) asserts that 'prompt wording alone produces structurally different systems' but provides no details on temperature, top-p, seeds, number of replicates per prompt, or statistical tests comparing within-prompt versus between-prompt variance. This omission is load-bearing for the five mechanisms and six coupling patterns, as observed differences may stem from sampling noise rather than the claimed prompt features.
minor comments (1)
  1. [Abstract] The abstract introduces 'vibe architecting' without a concise definition; a one-sentence operational definition would improve readability for readers unfamiliar with the framing.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the timeliness of examining how AI coding agents influence software architecture. We agree that the illustrative demonstration requires greater methodological transparency and rigor to support the five mechanisms and six coupling patterns. We will revise the manuscript to address this by adding the requested details on generation parameters, replicates, and variance analysis.

read point-by-point responses
  1. Referee: [Illustrative Demonstration] The illustrative demonstration (abstract and associated section) asserts that 'prompt wording alone produces structurally different systems' but provides no details on temperature, top-p, seeds, number of replicates per prompt, or statistical tests comparing within-prompt versus between-prompt variance. This omission is load-bearing for the five mechanisms and six coupling patterns, as observed differences may stem from sampling noise rather than the claimed prompt features.

    Authors: We acknowledge that the current illustrative demonstration does not report the generation hyperparameters or perform replicates with statistical validation, which limits the strength of the evidence. While the demonstration was designed to show qualitative structural differences arising from prompt variations rather than to serve as a controlled empirical study, we agree this creates ambiguity about sampling noise. In the revised manuscript we will: specify the exact model settings (temperature, top-p, and seed where applicable), generate a minimum of five replicates per prompt variant for the same task, document the observed architectural differences across replicates, and include a comparison of within-prompt versus between-prompt structural variance (using both qualitative descriptions and, where feasible, simple similarity metrics). These additions will make the support for the coupling patterns more robust without altering the illustrative nature of the section. revision: yes

Circularity Check

0 steps flagged

No circularity: observational patterns with no derivations or self-referential reductions

full rationale

The paper identifies five mechanisms and proposes six prompt-architecture coupling patterns as direct observations from an illustrative demonstration. No equations, fitted parameters, or closed derivation chains exist. The patterns are presented as mappings derived from prompt wording differences rather than outputs forced by construction from prior inputs or self-citations. The demonstration is described as confirmatory but not as a statistical model whose outputs are renamed as predictions. This is a standard non-circular conceptual proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that prompt wording is the dominant driver of architectural structure in agent-generated code and that the observed differences constitute meaningful architectural variation.

axioms (1)
  • domain assumption AI coding agents make decisions that qualify as architectural when they select frameworks, scaffold infrastructure, or wire integrations.
    Invoked in the identification of the five mechanisms and the definition of vibe architecting.

pith-pipeline@v0.9.0 · 5425 in / 1132 out tokens · 35397 ms · 2026-05-13T17:16:17.003313+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Formal Architecture Descriptors as Navigation Primitives for AI Coding Agents

    cs.SE 2026-04 unverdicted novelty 5.0

    Formal architecture descriptors reduce AI coding agent navigation steps by 33-44% and behavioral variance by 52% in controlled and observational studies.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    There’s a new kind of coding I call ‘vibe coding’,

    A. Karpathy, “There’s a new kind of coding I call ‘vibe coding’,” X/Twitter, Feb. 2025, [Online]. Available: https://x.com/karpathy/status/ 1886192184808149383

  2. [2]

    Claude Code documentation,

    Anthropic, “Claude Code documentation,” [Online]. Available: https:// code.claude.com/docs/en/overview, 2025, accessed: Feb. 8, 2026

  3. [3]

    Cursor documentation,

    Cursor, “Cursor documentation,” [Online]. Available: https://cursor.com/ docs/agent, 2025, accessed: Feb. 8, 2026

  4. [4]

    Devin 2.0,

    Cognition, “Devin 2.0,” [Online]. Available: https://cognition.ai/blog/ devin-2, 2025, accessed: Feb. 8, 2026

  5. [5]

    Bolt.new,

    StackBlitz, “Bolt.new,” [Online]. Available: https://github.com/ stackblitz/bolt.new, 2025, accessed: Feb. 8, 2026

  6. [6]

    Available: https://openai.com/index/ introducing-codex/, 2025, accessed: Feb

    OpenAI, “Codex,” [Online]. Available: https://openai.com/index/ introducing-codex/, 2025, accessed: Feb. 8, 2026

  7. [7]

    Windsurf editor,

    Windsurf, “Windsurf editor,” [Online]. Available: https://windsurf.com/, 2025, accessed: Feb. 8, 2026

  8. [8]

    LangChain framework, v1.2,

    LangChain, “LangChain framework, v1.2,” [Online]. Available: https: //github.com/langchain-ai/langchain, 2025, accessed: Feb. 8, 2026

  9. [9]

    LlamaIndex framework, v0.14,

    LlamaIndex, “LlamaIndex framework, v0.14,” [Online]. Available: https: //github.com/run-llama/llama index, 2025, accessed: Feb. 8, 2026

  10. [10]

    A prompt pattern catalog to enhance prompt engineering with chatgpt,

    J. Whiteet al., “A prompt pattern catalog to enhance prompt engineering with ChatGPT,”arXiv preprint arXiv:2302.11382, 2023

  11. [11]

    Promptware engineering: Software engineering for prompt-enabled systems,

    Z. Chen, C. Wang, W. Sun, X. Liu, J. M. Zhang, and Y . Liu, “Promptware engineering: Software engineering for prompt-enabled systems,”ACM Trans. Softw. Eng. Methodol., 2026, to appear. Preprint: arXiv:2503.02400

  12. [12]

    A survey on large language model based autonomous agents,

    L. Wanget al., “A survey on large language model based autonomous agents,”Frontiers of Computer Science, vol. 18, no. 6, 2024

  13. [13]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    Q. Wuet al., “AutoGen: Enabling next-gen LLM applications via multi- agent conversation,”arXiv preprint arXiv:2308.08155, 2023

  14. [14]

    Software architecture meets LLMs: A systematic literature review,

    L. Schmidet al., “Software architecture meets LLMs: A systematic literature review,”arXiv preprint arXiv:2505.16697, 2025

  15. [15]

    Hidden technical debt in machine learning systems,

    D. Sculleyet al., “Hidden technical debt in machine learning systems,” inAdvances in Neural Information Processing Systems, vol. 28, 2015, pp. 2503–2511

  16. [16]

    TODO: Fix the mess Gemini created: Towards understanding GenAI-induced self-admitted technical debt,

    A. Al Mujahid and M. M. Imran, “TODO: Fix the mess Gemini created: Towards understanding GenAI-induced self-admitted technical debt,” inProc. 9th Int. Conf. Technical Debt (TechDebt), 2026, preprint: arXiv:2601.07786

  17. [17]

    The modular im- perative: Rethinking LLMs for maintainable software,

    A. Kravchuk-Kirilyuk, F. Graciolli, and N. Amin, “The modular im- perative: Rethinking LLMs for maintainable software,” inProc. 1st ACM SIGPLAN Int. Workshop on Language Models and Programming Languages (LMPL), 2025

  18. [18]

    Vibe coding: Programming through conversa- tion with artificial intelligence,

    A. Sarkar and I. Drosos, “Vibe coding: Programming through conversa- tion with artificial intelligence,”arXiv preprint arXiv:2506.23253, 2025

  19. [19]

    Vibe coding in practice: Moti- vations, challenges, and a future outlook – a grey literature review,

    A. Fawzy, A. Tahir, and K. Blincoe, “Vibe coding in practice: Moti- vations, challenges, and a future outlook – a grey literature review,” in Proc. 47th IEEE/ACM Int. Conf. Softw. Eng. (ICSE), SEIP Track, 2026, preprint: arXiv:2510.00328

  20. [20]

    Toward self-coding information systems,

    R. Falc ˜ao, F. Elberzhager, and K. Vaidhyanathan, “Toward self-coding information systems,”arXiv preprint arXiv:2601.14132, Jan. 2026

  21. [21]

    Agentic Much? Adoption of Coding Agents on GitHub

    R. Robbes, T. Matricon, T. Degueule, A. Hora, and S. Zacchiroli, “Agentic much? adoption of coding agents on GitHub,”arXiv preprint arXiv:2601.18341, Jan. 2026

  22. [22]

    GitHub Copilot,

    GitHub, “GitHub Copilot,” [Online]. Available: https://docs.github.com/ en/copilot, 2025, accessed: Feb. 8, 2026

  23. [23]

    Available: https://kiro.dev/, 2025, accessed: Feb

    AWS, “Kiro,” [Online]. Available: https://kiro.dev/, 2025, accessed: Feb. 8, 2026

  24. [24]

    Replit Agent,

    Replit, “Replit Agent,” [Online]. Available: https://docs.replit.com/ replitai/agent, 2025, accessed: Feb. 8, 2026

  25. [25]

    L. Bass, P. Clements, and R. Kazman,Software Architecture in Practice, 4th ed. Boston, MA, USA: Addison-Wesley, 2021

  26. [26]

    Model Context Protocol specification,

    Anthropic, “Model Context Protocol specification,” [Online]. Avail- able: https://modelcontextprotocol.io/specification/2025-11-25, 2025, accessed: Feb. 8, 2026

  27. [27]

    Agentic AI foundation,

    Linux Foundation, “Agentic AI foundation,” [Online]. Available: https: //aaif.io/, Dec. 2025, accessed: Feb. 8, 2026

  28. [28]

    The coding personalities of leading LLMs,

    SonarSource, “The coding personalities of leading LLMs,” State of Code Report, Aug. 2025. [Online]. Available: https://www.sonarsource. com/the-coding-personalities-of-leading-llms/, 2025, accessed: Feb. 13, 2026

  29. [29]

    Taming throughput-latency tradeoff in LLM infer- ence with Sarathi-Serve,

    A. Agrawalet al., “Taming throughput-latency tradeoff in LLM infer- ence with Sarathi-Serve,” inProc. 18th USENIX Symp. Operating Syst. Design Implementation (OSDI), 2024

  30. [30]

    Function calling,

    OpenAI, “Function calling,” OpenAI Platform Documentation. [Online]. Available: https://platform.openai.com/docs/guides/function-calling, 2025, accessed: Feb. 8, 2026

  31. [31]

    Tool use (function calling),

    Anthropic, “Tool use (function calling),” Anthropic Documentation. [On- line]. Available: https://docs.anthropic.com/en/docs/build-with-claude/ tool-use, 2025, accessed: Feb. 8, 2026

  32. [32]

    ReAct: Synergizing reasoning and acting in language models,

    S. Yaoet al., “ReAct: Synergizing reasoning and acting in language models,” inProc. Int. Conf. Learn. Representations (ICLR), 2023

  33. [33]

    Not what you’ve signed up for: Compromising real-world LLM- integrated applications with indirect prompt injection,

    K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world LLM- integrated applications with indirect prompt injection,” inProc. 16th ACM Workshop on Artificial Intelligence and Security (AISec), 2023, pp. 79–90

  34. [34]

    Retrieval-augmented generation for knowledge-intensive NLP tasks,

    P. Lewiset al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 9459–9474

  35. [35]

    Retrieval-Augmented Generation for Large Language Models: A Survey

    Y . Gaoet al., “Retrieval-augmented generation for large language models: A survey,”arXiv preprint arXiv:2312.10997, 2024

  36. [36]

    Available: https://v0.dev/, 2025, accessed: Feb

    Vercel, “v0,” [Online]. Available: https://v0.dev/, 2025, accessed: Feb. 8, 2026