A Deterministic Control Plane for LLM Coding Agents
Pith reviewed 2026-06-26 03:54 UTC · model grok-4.3
The pith
LLM coding agent configurations propagate as unmanaged duplicates across repositories and require a deterministic control plane to enforce supply-chain integrity and permissions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Agent configurations propagate as undeclared shared components: 10.1 percent of tracked paths are SHA-256 exact duplicates across independent repositories, with 75.5 percent of clone pairs crossing organisational boundaries, 58 percent single-commit histories, and less than 1 percent declaring permission boundaries. The central claim is that these gaps are addressed by a deterministic control plane that treats definitions as a managed supply chain with content addressing and audit logs, enforces tiered permissions and blocklists, gates work through a requirement-to-file-to-test state machine, compiles one canonical definition to seven IDE targets, and detects prompt drift via Jaccard similar
What carries the argument
Rel(AI)Build deterministic control plane, which provides a one-to-one mapping from the identified configuration gaps to supply-chain primitives, tiered permissions, and state-machine gating.
If this is right
- Agent definitions receive SHA-256 content addressing, HMAC-stamped lockfiles, and hash-chained audit logs.
- Tiered permissions and attack-derived blocklists are enforced before any LLM invocation occurs.
- Feature work is gated by a phase state machine that maintains requirement-to-file-to-test traceability.
- A single canonical definition compiles to seven different IDE targets.
- Prompt drift is detected automatically through Jaccard similarity on the definitions.
Where Pith is reading between the lines
- The same supply-chain and gating approach could extend to configuration files for non-coding LLM agents.
- Widespread use would likely lower the observed cross-repository duplication rate by making definitions versioned and unique.
- Integration points with existing CI/CD systems would allow agent setups to inherit the same governance level as workflow files.
- Developer productivity and security metrics from actual deployments would be required to quantify the practical gains beyond the conformance tests.
Load-bearing premise
That the specific combination of supply-chain primitives, tiered permissions, and state-machine gating will produce better real-world outcomes than continued reliance on LLM orchestration or ad-hoc config management.
What would settle it
A controlled comparison measuring rates of unauthorized file access, configuration drift, or security incidents in projects that adopt the control plane versus matched projects that continue with unmanaged configurations.
Figures
read the original abstract
LLM coding harnesses grant agents broad file and shell access, yet the configuration layer that steers them -- rules files, agent definitions, IDE-specific markdown -- is largely unmanaged. A prevalence study of 10,008 public GitHub repositories (n=6,145 agent config files) finds that agent configurations propagate as undeclared shared components: 10.1% of tracked paths are SHA-256 exact duplicates across independent repositories (fork-adjusted, threshold-independent), with 75.5% of clone pairs crossing organisational boundaries. Two further patterns are indicative: configurations are rarely revised (58% single-commit; 0.4 vs 0.6 commits/month age-normalised against CI/CD workflows), and rarely declare permission boundaries (<1% of agent configs vs 33% of Actions workflows, n=31 true positives). We propose a deterministic control plane above the harness that maps one-to-one to these gaps. Rel(AI)Build treats agent definitions as a managed supply chain (SHA-256 content addressing, HMAC-stamped lockfiles, hash-chained audit logs); enforces tiered permissions and attack-derived blocklists before LLM invocation; gates feature work through a phase state machine with requirement-to-file-to-test traceability; compiles a single canonical definition to seven IDE targets; and detects prompt drift via Jaccard similarity. Conformance tests on injected violations confirm each mechanism enforces its stated invariant; developer outcomes remain future work. Governance of this layer must be deterministic and tool-agnostic -- not delegated to further LLM orchestration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports a prevalence study across 10,008 public GitHub repositories (yielding 6,145 agent config files) that identifies three gaps in LLM coding agent configuration management: 10.1% of tracked paths are SHA-256 exact duplicates across independent repositories (75.5% crossing organisational boundaries), configurations are rarely revised (58% single-commit; 0.4 vs 0.6 commits/month age-normalised), and permission boundaries are rarely declared (<1% of agent configs vs 33% of Actions workflows). It proposes the Rel(AI)Build deterministic control plane that maps one-to-one to these gaps via supply-chain primitives (SHA-256 content addressing, HMAC-stamped lockfiles, hash-chained audit logs), tiered permissions and attack-derived blocklists, a phase state machine with requirement-to-file-to-test traceability, compilation to seven IDE targets, and Jaccard-based prompt-drift detection. Conformance tests on injected violations are reported to confirm that each mechanism enforces its stated invariant; developer outcomes and real-world efficacy are explicitly scoped as future work.
Significance. If the design and its conformance properties hold, the work supplies a concrete, tool-agnostic alternative to ad-hoc or LLM-orchestrated configuration management for LLM coding agents. The prevalence statistics provide empirical motivation for the three gaps, the use of standard cryptographic primitives is parameter-free in the stated sense, and the explicit scoping of claims to the mechanisms (rather than asserted outcome improvements) is a strength. The approach could influence secure configuration practices in the growing LLM-agent tooling ecosystem.
minor comments (2)
- The abstract is unusually long and dense; a shorter version focused on the three gaps, the one-to-one mapping, and the scope limitation would improve readability while preserving all technical content.
- A high-level architecture diagram of the control plane (showing the relationship between the supply-chain layer, permission gate, state machine, and multi-target compiler) would help readers visualise the one-to-one mapping claimed in the abstract.
Simulated Author's Rebuttal
We thank the referee for the detailed summary of the manuscript, the positive assessment of its significance, and the recommendation of minor revision. No major comments were raised in the report.
Circularity Check
No significant circularity detected
full rationale
The paper reports an empirical prevalence study (n=10,008 repos) that surfaces three observable patterns in agent configs, then presents an explicit design proposal (Rel(AI)Build) that addresses those patterns using standard, externally defined primitives (SHA-256 addressing, HMAC lockfiles, phase state machines, Jaccard drift detection). Conformance is verified by synthetic injection tests that check stated invariants. No equations, fitted parameters, or predictions appear; the mapping is by construction of the proposal itself rather than a reduction. No self-citations are load-bearing, no uniqueness theorems are invoked, and no ansatz or renaming of known results is used. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math SHA-256 and HMAC provide reliable content addressing and tamper evidence for configuration files.
invented entities (1)
-
Rel(AI)Build control plane
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Software engineering for machine learning: A case study
Saleema Amershi et al. Software engineering for machine learning: A case study. In IEEE/ACM Intl. Conf. on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2019
2019
-
[2]
The debugging decay index: Rethinking debugging strategies for code llms
Anonymous . The debugging decay index: Rethinking debugging strategies for code llms. arXiv preprint arXiv:2506.18403, 2025. doi:10.48550/arXiv.2506.18403
-
[3]
Responsible scaling policy
Anthropic . Responsible scaling policy. https://www.anthropic.com/news/anthropics-responsible-scaling-policy, 2023
2023
-
[4]
Barrett, Jessica Newman, Brandie Nonnecke, Nada Madkour, Dan Hendrycks, Evan R
Anthony M. Barrett, Jessica Newman, Brandie Nonnecke, Nada Madkour, Dan Hendrycks, Evan R. Murphy, Krystal Jackson, and Deepika Raman. Ai risk-management standards profile for general-purpose ai (gpai) and foundation models, version 1.2. Technical report, UC Berkeley Center for Long-Term Cybersecurity (CLTC), 2026. arXiv:2506.23949; https://cltc.berkeley....
arXiv 2026
-
[5]
Andrei Z. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences (SEQUENCES), 1997
1997
-
[6]
Robert G. Cooper. Stage-gate systems: A new tool for managing new products. Business Horizons, 33 0 (3): 0 44--54, 1990
1990
-
[7]
Dennis and Earl C
Jack B. Dennis and Earl C. Van Horn. Programming semantics for multiprogrammed computations. Communications of the ACM, 9 0 (3): 0 143--155, 1966
1966
-
[8]
Minijail and seccomp-bpf
Will Drewry and Tavis Ormandy. Minijail and seccomp-bpf. In Linux Security Summit, 2012
2012
-
[9]
Aider: Ai pair programming in your terminal
Paul Gauthier. Aider: Ai pair programming in your terminal. Open-source project, https://aider.chat/, 2024
2024
-
[10]
Prompt cache: Modular attention reuse for low-latency inference
In Gim, Guojun Chen, Seung-seob Lee, Nikhil Sarda, Anurag Khandelwal, and Lin Zhong. Prompt cache: Modular attention reuse for low-latency inference. In Proceedings of Machine Learning and Systems (MLSys); arXiv:2311.04934, 2024. doi:10.48550/arXiv.2311.04934
-
[11]
Supply-chain levels for software artifacts (slsa), v1.0
Google and others . Supply-chain levels for software artifacts (slsa), v1.0. https://slsa.dev/, 2023
2023
-
[12]
Guardrails: A toolkit for building safe and reliable llm applications
Guardrails AI . Guardrails: A toolkit for building safe and reliable llm applications. Open-source project, https://github.com/guardrails-ai/guardrails, 2024
2024
-
[13]
Scott Stornetta
Stuart Haber and W. Scott Stornetta. How to time-stamp a digital document. Journal of Cryptology, 3 0 (2): 0 99--111, 1991
1991
-
[14]
C. A. R. Hoare. Communicating sequential processes. Communications of the ACM, 21 0 (8): 0 666--677, 1978
1978
-
[15]
Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation
Jez Humble and David Farley. Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation. Addison-Wesley, 2010
2010
-
[16]
Iso/iec 27001:2022 information security management systems --- requirements, 2022
ISO/IEC . Iso/iec 27001:2022 information security management systems --- requirements, 2022
2022
-
[17]
The promises and perils of mining github
Eirini Kalliamvakou et al. The promises and perils of mining github. In Working Conference on Mining Software Repositories (MSR), 2014
2014
-
[18]
Taxonomy of attacks on open-source software supply chains
Piergiorgio Ladisa et al. Taxonomy of attacks on open-source software supply chains. In IEEE Symposium on Security and Privacy (S&P), 2023
2023
-
[19]
Can you trust chatgpt's package recommendations? Vendor research report (grey literature), 2024
Bar Lanyado et al. Can you trust chatgpt's package recommendations? Vendor research report (grey literature), 2024
2024
-
[20]
Certificate transparency
Ben Laurie, Adam Langley, and Emilia Kasper. Certificate transparency. RFC 6962, IETF, 2013
2013
-
[21]
Agentbench: Evaluating llms as agents
Xiao Liu et al. Agentbench: Evaluating llms as agents. In International Conference on Learning Representations (ICLR), 2024
2024
-
[22]
``your ai, my shell'': Demystifying prompt injection attacks on agentic ai coding editors
Yuhao Liu, Yiyang Zhao, Yiyang Lyu, Tianyi Zhang, Haoyu Wang, and David Lo. ``your ai, my shell'': Demystifying prompt injection attacks on agentic ai coding editors. arXiv preprint arXiv:2509.22040, 2025
Pith/arXiv arXiv 2025
-
[23]
Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity
Yao Lu et al. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Annual Meeting of the Association for Computational Linguistics (ACL), 2022
2022
-
[24]
Github agent-configuration prevalence study: Dataset and reproduction scripts
Padmaraj Madatha. Github agent-configuration prevalence study: Dataset and reproduction scripts. Zenodo (DOI to be assigned) and arXiv ancillary files [Author artifact, not peer-reviewed], 2026 a
2026
-
[25]
Rel(ai)build enterprise compliance alignment: Sox itgc, iso 27001 annex a, and nist ai rmf sub-function mappings
Padmaraj Madatha. Rel(ai)build enterprise compliance alignment: Sox itgc, iso 27001 annex a, and nist ai rmf sub-function mappings. Happiest Minds Technologies, companion document [Author artifact, not peer-reviewed], 2026 b
2026
-
[26]
Cursor vs cursor+rel(ai)build illustrative build walkthrough
Padmaraj Madatha. Cursor vs cursor+rel(ai)build illustrative build walkthrough. arXiv ancillary files [Author artifact, not peer-reviewed], 2026 c
2026
-
[27]
Mark S. Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, Johns Hopkins University, 2006
2006
-
[28]
Reframing instructional prompts to gptk's language
Swaroop Mishra et al. Reframing instructional prompts to gptk's language. In Findings of the Association for Computational Linguistics (ACL Findings), 2022
2022
-
[29]
Infrastructure as Code: Managing Servers in the Cloud
Kief Morris. Infrastructure as Code: Managing Servers in the Cloud. O'Reilly Media, 2016
2016
-
[30]
Petri nets: Properties, analysis and applications
Tadao Murata. Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77 0 (4): 0 541--580, 1989
1989
-
[31]
Bitcoin: A peer-to-peer electronic cash system, 2008
Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008
2008
-
[32]
Sigstore: Software signing for everybody
Zachary Newman, John Speed Meyers, and Santiago Torres-Arias. Sigstore: Software signing for everybody. In ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 2353--2367, 2022. doi:10.1145/3548606.3560596
-
[33]
Artificial intelligence risk management framework (ai rmf 1.0)
NIST . Artificial intelligence risk management framework (ai rmf 1.0). Technical Report NIST AI 100-1, National Institute of Standards and Technology, 2023
2023
-
[34]
Artificial intelligence risk management framework: Generative artificial intelligence profile
NIST . Artificial intelligence risk management framework: Generative artificial intelligence profile. Technical Report NIST AI 600-1, National Institute of Standards and Technology, 2024
2024
-
[35]
Cybersecurity framework profile for artificial intelligence (cyber ai profile)
NIST . Cybersecurity framework profile for artificial intelligence (cyber ai profile). Technical Report NIST IR 8596 (initial public draft), National Institute of Standards and Technology, 2025. https://csrc.nist.gov/pubs/ir/8596/iprd
2025
-
[36]
Ai agent standards initiative
NIST CAISI . Ai agent standards initiative. https://www.nist.gov/artificial-intelligence/ai-agent-standards-initiative, 2026
2026
-
[37]
The minimum elements for a software bill of materials (sbom)
NTIA . The minimum elements for a software bill of materials (sbom). Technical report, U.S. Department of Commerce, National Telecommunications and Information Administration, 2021
2021
-
[38]
Backstabber's knife collection: A review of open source software supply chain attacks
Marc Ohm et al. Backstabber's knife collection: A review of open source software supply chain attacks. In Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), 2020
2020
-
[39]
Preparedness framework
OpenAI . Preparedness framework. https://openai.com/index/openai-preparedness-framework/, 2023
2023
-
[40]
Top 10 for large language model applications, 2025
OWASP . Top 10 for large language model applications, 2025
2025
-
[41]
Security smells in ansible and chef scripts: A replication study
Akond Rahman et al. Security smells in ansible and chef scripts: A replication study. ACM Transactions on Software Engineering and Methodology (TOSEM), 30 0 (1): 0 1--31, 2021
2021
-
[42]
Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails
Traian Rebedea et al. Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails. arXiv preprint arXiv:2310.10561, 2023
arXiv 2023
-
[43]
Saltzer and Michael D
Jerome H. Saltzer and Michael D. Schroeder. The protection of information in computer systems. Proceedings of the IEEE, 63 0 (9): 0 1278--1308, 1975
1975
-
[44]
Quantifying language models' sensitivity to spurious features in prompt design
Melanie Sclar et al. Quantifying language models' sensitivity to spurious features in prompt design. arXiv preprint arXiv:2310.11324, 2023
Pith/arXiv arXiv 2023
-
[45]
Executive order 14028: Improving the nation's cybersecurity
The White House . Executive order 14028: Improving the nation's cybersecurity. Federal Register Vol. 86, No. 93, 2021
2021
-
[46]
Pulling Strings with Puppet: Configuration Management Made Easy
James Turnbull and Jeffrey McCune. Pulling Strings with Puppet: Configuration Management Made Easy. Apress, 2008
2008
-
[47]
Wil M. P. van der Aalst. Workflow verification: Finding control-flow errors using petri-net-based techniques. In Business Process Management, LNCS 1806, pages 161--183. 2000
2000
-
[48]
Ai package hallucination
Vulcan Cyber . Ai package hallucination. Vendor research report (grey literature), 2024
2024
-
[49]
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
Xingyao Wang et al. Openhands: An open platform for ai software developers as generalist agents (formerly opendevin). arXiv preprint arXiv:2407.16741, 2024. doi:10.48550/arXiv.2407.16741
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.16741 2024
-
[50]
Autogen: Enabling next-gen llm applications via multi-agent conversation
Qingyun Wu et al. Autogen: Enabling next-gen llm applications via multi-agent conversation. arXiv preprint arXiv:2308.08155, 2023
Pith/arXiv arXiv 2023
-
[51]
Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press
John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. Swe-agent: Agent--computer interfaces enable automated software engineering. In Advances in Neural Information Processing Systems (NeurIPS), volume 37, 2024
2024
-
[52]
What are weak links in the npm supply chain? In IEEE/ACM Intl
Nusrat Zahan et al. What are weak links in the npm supply chain? In IEEE/ACM Intl. Conf. on Software Engineering: Software Engineering in Society (ICSE-SEIS), 2022
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.