A Deterministic Control Plane for LLM Coding Agents

Padmaraj Madatha

arxiv: 2606.26924 · v1 · pith:JH7WLF3Tnew · submitted 2026-06-25 · 💻 cs.SE · cs.AI· cs.CR

A Deterministic Control Plane for LLM Coding Agents

Padmaraj Madatha This is my paper

Pith reviewed 2026-06-26 03:54 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.CR

keywords LLM coding agentsagent configuration filessupply chain managementdeterministic control planepermission enforcementstate machine gatingprompt drift detectionGitHub repository analysis

0 comments

The pith

LLM coding agent configurations propagate as unmanaged duplicates across repositories and require a deterministic control plane to enforce supply-chain integrity and permissions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

A study of 10,008 GitHub repositories identifies that agent configuration files duplicate at a 10.1 percent rate across independent projects, with most duplication crossing organizational boundaries, while fewer than 1 percent declare permission boundaries and revisions occur at low rates. The paper proposes a deterministic control plane that maps directly onto these gaps by treating agent definitions as a managed supply chain. This includes content addressing with SHA-256 hashes, HMAC-stamped lockfiles, tiered permissions enforced before LLM invocation, a phase state machine for traceability from requirements to tests, compilation to multiple IDE targets, and Jaccard-based drift detection. A sympathetic reader would care because the layer sits above the harness and aims to replace ad-hoc or further LLM-based management with tool-agnostic, enforceable invariants.

Core claim

Agent configurations propagate as undeclared shared components: 10.1 percent of tracked paths are SHA-256 exact duplicates across independent repositories, with 75.5 percent of clone pairs crossing organisational boundaries, 58 percent single-commit histories, and less than 1 percent declaring permission boundaries. The central claim is that these gaps are addressed by a deterministic control plane that treats definitions as a managed supply chain with content addressing and audit logs, enforces tiered permissions and blocklists, gates work through a requirement-to-file-to-test state machine, compiles one canonical definition to seven IDE targets, and detects prompt drift via Jaccard similar

What carries the argument

Rel(AI)Build deterministic control plane, which provides a one-to-one mapping from the identified configuration gaps to supply-chain primitives, tiered permissions, and state-machine gating.

If this is right

Agent definitions receive SHA-256 content addressing, HMAC-stamped lockfiles, and hash-chained audit logs.
Tiered permissions and attack-derived blocklists are enforced before any LLM invocation occurs.
Feature work is gated by a phase state machine that maintains requirement-to-file-to-test traceability.
A single canonical definition compiles to seven different IDE targets.
Prompt drift is detected automatically through Jaccard similarity on the definitions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same supply-chain and gating approach could extend to configuration files for non-coding LLM agents.
Widespread use would likely lower the observed cross-repository duplication rate by making definitions versioned and unique.
Integration points with existing CI/CD systems would allow agent setups to inherit the same governance level as workflow files.
Developer productivity and security metrics from actual deployments would be required to quantify the practical gains beyond the conformance tests.

Load-bearing premise

That the specific combination of supply-chain primitives, tiered permissions, and state-machine gating will produce better real-world outcomes than continued reliance on LLM orchestration or ad-hoc config management.

What would settle it

A controlled comparison measuring rates of unauthorized file access, configuration drift, or security incidents in projects that adopt the control plane versus matched projects that continue with unmanaged configurations.

Figures

Figures reproduced from arXiv: 2606.26924 by Padmaraj Madatha.

**Figure 2.** Figure 2: The deterministic control plane architecture. Four horizontal layers [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: The agent-definition install pipeline. Three independent integrity [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Conceptual illustration of tokenisation drift risk zones. Thresholds [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: The deterministic phase-gated lifecycle. Four HITL gates (pause [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Requirement→file→test traceability. Each acceptance criterion (AC) in the content-addressed spec maps to implementing files and verifying tests; a verified AC is green, an unverified AC is amber. A file changed outside any AC scope surfaces as a spec-drift warning (red), making scope creep detectable automatically. File-to-AC linkage requires cooperative agent invocation of trace-update (§4.5 trust bounda… view at source ↗

**Figure 7.** Figure 7: Threat → deterministic control mapping. Seven identified threats (red, left) each have a primary deterministic control (teal, right); T-numbers correspond to [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

**Figure 8.** Figure 8: Raw lifetime version-control depth by file category. AI agent [PITH_FULL_IMAGE:figures/full_fig_p031_8.png] view at source ↗

**Figure 9.** Figure 9: Median commits/month by category. Left panel: all files; [PITH_FULL_IMAGE:figures/full_fig_p032_9.png] view at source ↗

read the original abstract

LLM coding harnesses grant agents broad file and shell access, yet the configuration layer that steers them -- rules files, agent definitions, IDE-specific markdown -- is largely unmanaged. A prevalence study of 10,008 public GitHub repositories (n=6,145 agent config files) finds that agent configurations propagate as undeclared shared components: 10.1% of tracked paths are SHA-256 exact duplicates across independent repositories (fork-adjusted, threshold-independent), with 75.5% of clone pairs crossing organisational boundaries. Two further patterns are indicative: configurations are rarely revised (58% single-commit; 0.4 vs 0.6 commits/month age-normalised against CI/CD workflows), and rarely declare permission boundaries (<1% of agent configs vs 33% of Actions workflows, n=31 true positives). We propose a deterministic control plane above the harness that maps one-to-one to these gaps. Rel(AI)Build treats agent definitions as a managed supply chain (SHA-256 content addressing, HMAC-stamped lockfiles, hash-chained audit logs); enforces tiered permissions and attack-derived blocklists before LLM invocation; gates feature work through a phase state machine with requirement-to-file-to-test traceability; compiles a single canonical definition to seven IDE targets; and detects prompt drift via Jaccard similarity. Conformance tests on injected violations confirm each mechanism enforces its stated invariant; developer outcomes remain future work. Governance of this layer must be deterministic and tool-agnostic -- not delegated to further LLM orchestration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a concrete design for governing LLM agent configs plus a GitHub prevalence study, but stops short of any real-user results.

read the letter

The core contribution is Rel(AI)Build, a control plane that treats agent definitions as a supply chain with content addressing, HMAC lockfiles, tiered permissions, a state machine for traceability, and Jaccard drift detection. It compiles one definition to seven IDE targets and claims each piece enforces an invariant via synthetic conformance tests.

The prevalence numbers are the clearest new data: across 10k repos they found 10% exact SHA-256 duplicates crossing org boundaries, very low revision rates, and almost no permission declarations. That part grounds the motivation.

The design itself is new in its specific combination and the one-to-one mapping to the three gaps they measured. Standard crypto primitives are used sensibly and the tests on injected violations are a reasonable first check.

The soft spot is obvious and acknowledged: developer outcomes and real efficacy are future work. No implementation artifacts or usage data are shown, so the claim that deterministic governance beats continued LLM orchestration or ad-hoc files rests on the design logic alone. The study methodology details are also thin in what is visible, which limits how much weight the numbers can carry.

This is for readers already working on LLM coding harnesses or supply-chain tooling for agents. A serious referee could usefully pressure the study methods, test coverage, and whether the invariants actually translate outside synthetic cases.

I would send it to peer review.

Referee Report

0 major / 2 minor

Summary. The manuscript reports a prevalence study across 10,008 public GitHub repositories (yielding 6,145 agent config files) that identifies three gaps in LLM coding agent configuration management: 10.1% of tracked paths are SHA-256 exact duplicates across independent repositories (75.5% crossing organisational boundaries), configurations are rarely revised (58% single-commit; 0.4 vs 0.6 commits/month age-normalised), and permission boundaries are rarely declared (<1% of agent configs vs 33% of Actions workflows). It proposes the Rel(AI)Build deterministic control plane that maps one-to-one to these gaps via supply-chain primitives (SHA-256 content addressing, HMAC-stamped lockfiles, hash-chained audit logs), tiered permissions and attack-derived blocklists, a phase state machine with requirement-to-file-to-test traceability, compilation to seven IDE targets, and Jaccard-based prompt-drift detection. Conformance tests on injected violations are reported to confirm that each mechanism enforces its stated invariant; developer outcomes and real-world efficacy are explicitly scoped as future work.

Significance. If the design and its conformance properties hold, the work supplies a concrete, tool-agnostic alternative to ad-hoc or LLM-orchestrated configuration management for LLM coding agents. The prevalence statistics provide empirical motivation for the three gaps, the use of standard cryptographic primitives is parameter-free in the stated sense, and the explicit scoping of claims to the mechanisms (rather than asserted outcome improvements) is a strength. The approach could influence secure configuration practices in the growing LLM-agent tooling ecosystem.

minor comments (2)

The abstract is unusually long and dense; a shorter version focused on the three gaps, the one-to-one mapping, and the scope limitation would improve readability while preserving all technical content.
A high-level architecture diagram of the control plane (showing the relationship between the supply-chain layer, permission gate, state machine, and multi-target compiler) would help readers visualise the one-to-one mapping claimed in the abstract.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed summary of the manuscript, the positive assessment of its significance, and the recommendation of minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports an empirical prevalence study (n=10,008 repos) that surfaces three observable patterns in agent configs, then presents an explicit design proposal (Rel(AI)Build) that addresses those patterns using standard, externally defined primitives (SHA-256 addressing, HMAC lockfiles, phase state machines, Jaccard drift detection). Conformance is verified by synthetic injection tests that check stated invariants. No equations, fitted parameters, or predictions appear; the mapping is by construction of the proposal itself rather than a reduction. No self-citations are load-bearing, no uniqueness theorems are invoked, and no ansatz or renaming of known results is used. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Relies on standard cryptographic assumptions and introduces one new system entity without fitted parameters or additional axioms beyond domain conventions for permissions and hashing.

axioms (1)

standard math SHA-256 and HMAC provide reliable content addressing and tamper evidence for configuration files.
Invoked for lockfiles and audit logs.

invented entities (1)

Rel(AI)Build control plane no independent evidence
purpose: Deterministic management layer for LLM agent configurations
New system proposed to address identified gaps.

pith-pipeline@v0.9.1-grok · 5801 in / 1245 out tokens · 60257 ms · 2026-06-26T03:54:38.229057+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Software engineering for machine learning: A case study

Saleema Amershi et al. Software engineering for machine learning: A case study. In IEEE/ACM Intl. Conf. on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2019

2019
[2]

The debugging decay index: Rethinking debugging strategies for code llms

Anonymous . The debugging decay index: Rethinking debugging strategies for code llms. arXiv preprint arXiv:2506.18403, 2025. doi:10.48550/arXiv.2506.18403

work page doi:10.48550/arxiv.2506.18403 2025
[3]

Responsible scaling policy

Anthropic . Responsible scaling policy. https://www.anthropic.com/news/anthropics-responsible-scaling-policy, 2023

2023
[4]

Barrett, Jessica Newman, Brandie Nonnecke, Nada Madkour, Dan Hendrycks, Evan R

Anthony M. Barrett, Jessica Newman, Brandie Nonnecke, Nada Madkour, Dan Hendrycks, Evan R. Murphy, Krystal Jackson, and Deepika Raman. Ai risk-management standards profile for general-purpose ai (gpai) and foundation models, version 1.2. Technical report, UC Berkeley Center for Long-Term Cybersecurity (CLTC), 2026. arXiv:2506.23949; https://cltc.berkeley....

arXiv 2026
[5]

Andrei Z. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences (SEQUENCES), 1997

1997
[6]

Robert G. Cooper. Stage-gate systems: A new tool for managing new products. Business Horizons, 33 0 (3): 0 44--54, 1990

1990
[7]

Dennis and Earl C

Jack B. Dennis and Earl C. Van Horn. Programming semantics for multiprogrammed computations. Communications of the ACM, 9 0 (3): 0 143--155, 1966

1966
[8]

Minijail and seccomp-bpf

Will Drewry and Tavis Ormandy. Minijail and seccomp-bpf. In Linux Security Summit, 2012

2012
[9]

Aider: Ai pair programming in your terminal

Paul Gauthier. Aider: Ai pair programming in your terminal. Open-source project, https://aider.chat/, 2024

2024
[10]

Prompt cache: Modular attention reuse for low-latency inference

In Gim, Guojun Chen, Seung-seob Lee, Nikhil Sarda, Anurag Khandelwal, and Lin Zhong. Prompt cache: Modular attention reuse for low-latency inference. In Proceedings of Machine Learning and Systems (MLSys); arXiv:2311.04934, 2024. doi:10.48550/arXiv.2311.04934

work page doi:10.48550/arxiv.2311.04934 2024
[11]

Supply-chain levels for software artifacts (slsa), v1.0

Google and others . Supply-chain levels for software artifacts (slsa), v1.0. https://slsa.dev/, 2023

2023
[12]

Guardrails: A toolkit for building safe and reliable llm applications

Guardrails AI . Guardrails: A toolkit for building safe and reliable llm applications. Open-source project, https://github.com/guardrails-ai/guardrails, 2024

2024
[13]

Scott Stornetta

Stuart Haber and W. Scott Stornetta. How to time-stamp a digital document. Journal of Cryptology, 3 0 (2): 0 99--111, 1991

1991
[14]

C. A. R. Hoare. Communicating sequential processes. Communications of the ACM, 21 0 (8): 0 666--677, 1978

1978
[15]

Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation

Jez Humble and David Farley. Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation. Addison-Wesley, 2010

2010
[16]

Iso/iec 27001:2022 information security management systems --- requirements, 2022

ISO/IEC . Iso/iec 27001:2022 information security management systems --- requirements, 2022

2022
[17]

The promises and perils of mining github

Eirini Kalliamvakou et al. The promises and perils of mining github. In Working Conference on Mining Software Repositories (MSR), 2014

2014
[18]

Taxonomy of attacks on open-source software supply chains

Piergiorgio Ladisa et al. Taxonomy of attacks on open-source software supply chains. In IEEE Symposium on Security and Privacy (S&P), 2023

2023
[19]

Can you trust chatgpt's package recommendations? Vendor research report (grey literature), 2024

Bar Lanyado et al. Can you trust chatgpt's package recommendations? Vendor research report (grey literature), 2024

2024
[20]

Certificate transparency

Ben Laurie, Adam Langley, and Emilia Kasper. Certificate transparency. RFC 6962, IETF, 2013

2013
[21]

Agentbench: Evaluating llms as agents

Xiao Liu et al. Agentbench: Evaluating llms as agents. In International Conference on Learning Representations (ICLR), 2024

2024
[22]

``your ai, my shell'': Demystifying prompt injection attacks on agentic ai coding editors

Yuhao Liu, Yiyang Zhao, Yiyang Lyu, Tianyi Zhang, Haoyu Wang, and David Lo. ``your ai, my shell'': Demystifying prompt injection attacks on agentic ai coding editors. arXiv preprint arXiv:2509.22040, 2025

Pith/arXiv arXiv 2025
[23]

Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity

Yao Lu et al. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Annual Meeting of the Association for Computational Linguistics (ACL), 2022

2022
[24]

Github agent-configuration prevalence study: Dataset and reproduction scripts

Padmaraj Madatha. Github agent-configuration prevalence study: Dataset and reproduction scripts. Zenodo (DOI to be assigned) and arXiv ancillary files [Author artifact, not peer-reviewed], 2026 a

2026
[25]

Rel(ai)build enterprise compliance alignment: Sox itgc, iso 27001 annex a, and nist ai rmf sub-function mappings

Padmaraj Madatha. Rel(ai)build enterprise compliance alignment: Sox itgc, iso 27001 annex a, and nist ai rmf sub-function mappings. Happiest Minds Technologies, companion document [Author artifact, not peer-reviewed], 2026 b

2026
[26]

Cursor vs cursor+rel(ai)build illustrative build walkthrough

Padmaraj Madatha. Cursor vs cursor+rel(ai)build illustrative build walkthrough. arXiv ancillary files [Author artifact, not peer-reviewed], 2026 c

2026
[27]

Mark S. Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, Johns Hopkins University, 2006

2006
[28]

Reframing instructional prompts to gptk's language

Swaroop Mishra et al. Reframing instructional prompts to gptk's language. In Findings of the Association for Computational Linguistics (ACL Findings), 2022

2022
[29]

Infrastructure as Code: Managing Servers in the Cloud

Kief Morris. Infrastructure as Code: Managing Servers in the Cloud. O'Reilly Media, 2016

2016
[30]

Petri nets: Properties, analysis and applications

Tadao Murata. Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77 0 (4): 0 541--580, 1989

1989
[31]

Bitcoin: A peer-to-peer electronic cash system, 2008

Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008

2008
[32]

Sigstore: Software signing for everybody

Zachary Newman, John Speed Meyers, and Santiago Torres-Arias. Sigstore: Software signing for everybody. In ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 2353--2367, 2022. doi:10.1145/3548606.3560596

work page doi:10.1145/3548606.3560596 2022
[33]

Artificial intelligence risk management framework (ai rmf 1.0)

NIST . Artificial intelligence risk management framework (ai rmf 1.0). Technical Report NIST AI 100-1, National Institute of Standards and Technology, 2023

2023
[34]

Artificial intelligence risk management framework: Generative artificial intelligence profile

NIST . Artificial intelligence risk management framework: Generative artificial intelligence profile. Technical Report NIST AI 600-1, National Institute of Standards and Technology, 2024

2024
[35]

Cybersecurity framework profile for artificial intelligence (cyber ai profile)

NIST . Cybersecurity framework profile for artificial intelligence (cyber ai profile). Technical Report NIST IR 8596 (initial public draft), National Institute of Standards and Technology, 2025. https://csrc.nist.gov/pubs/ir/8596/iprd

2025
[36]

Ai agent standards initiative

NIST CAISI . Ai agent standards initiative. https://www.nist.gov/artificial-intelligence/ai-agent-standards-initiative, 2026

2026
[37]

The minimum elements for a software bill of materials (sbom)

NTIA . The minimum elements for a software bill of materials (sbom). Technical report, U.S. Department of Commerce, National Telecommunications and Information Administration, 2021

2021
[38]

Backstabber's knife collection: A review of open source software supply chain attacks

Marc Ohm et al. Backstabber's knife collection: A review of open source software supply chain attacks. In Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), 2020

2020
[39]

Preparedness framework

OpenAI . Preparedness framework. https://openai.com/index/openai-preparedness-framework/, 2023

2023
[40]

Top 10 for large language model applications, 2025

OWASP . Top 10 for large language model applications, 2025

2025
[41]

Security smells in ansible and chef scripts: A replication study

Akond Rahman et al. Security smells in ansible and chef scripts: A replication study. ACM Transactions on Software Engineering and Methodology (TOSEM), 30 0 (1): 0 1--31, 2021

2021
[42]

Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails

Traian Rebedea et al. Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails. arXiv preprint arXiv:2310.10561, 2023

arXiv 2023
[43]

Saltzer and Michael D

Jerome H. Saltzer and Michael D. Schroeder. The protection of information in computer systems. Proceedings of the IEEE, 63 0 (9): 0 1278--1308, 1975

1975
[44]

Quantifying language models' sensitivity to spurious features in prompt design

Melanie Sclar et al. Quantifying language models' sensitivity to spurious features in prompt design. arXiv preprint arXiv:2310.11324, 2023

Pith/arXiv arXiv 2023
[45]

Executive order 14028: Improving the nation's cybersecurity

The White House . Executive order 14028: Improving the nation's cybersecurity. Federal Register Vol. 86, No. 93, 2021

2021
[46]

Pulling Strings with Puppet: Configuration Management Made Easy

James Turnbull and Jeffrey McCune. Pulling Strings with Puppet: Configuration Management Made Easy. Apress, 2008

2008
[47]

Wil M. P. van der Aalst. Workflow verification: Finding control-flow errors using petri-net-based techniques. In Business Process Management, LNCS 1806, pages 161--183. 2000

2000
[48]

Ai package hallucination

Vulcan Cyber . Ai package hallucination. Vendor research report (grey literature), 2024

2024
[49]

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

Xingyao Wang et al. Openhands: An open platform for ai software developers as generalist agents (formerly opendevin). arXiv preprint arXiv:2407.16741, 2024. doi:10.48550/arXiv.2407.16741

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.16741 2024
[50]

Autogen: Enabling next-gen llm applications via multi-agent conversation

Qingyun Wu et al. Autogen: Enabling next-gen llm applications via multi-agent conversation. arXiv preprint arXiv:2308.08155, 2023

Pith/arXiv arXiv 2023
[51]

Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. Swe-agent: Agent--computer interfaces enable automated software engineering. In Advances in Neural Information Processing Systems (NeurIPS), volume 37, 2024

2024
[52]

What are weak links in the npm supply chain? In IEEE/ACM Intl

Nusrat Zahan et al. What are weak links in the npm supply chain? In IEEE/ACM Intl. Conf. on Software Engineering: Software Engineering in Society (ICSE-SEIS), 2022

2022

[1] [1]

Software engineering for machine learning: A case study

Saleema Amershi et al. Software engineering for machine learning: A case study. In IEEE/ACM Intl. Conf. on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2019

2019

[2] [2]

The debugging decay index: Rethinking debugging strategies for code llms

Anonymous . The debugging decay index: Rethinking debugging strategies for code llms. arXiv preprint arXiv:2506.18403, 2025. doi:10.48550/arXiv.2506.18403

work page doi:10.48550/arxiv.2506.18403 2025

[3] [3]

Responsible scaling policy

Anthropic . Responsible scaling policy. https://www.anthropic.com/news/anthropics-responsible-scaling-policy, 2023

2023

[4] [4]

Barrett, Jessica Newman, Brandie Nonnecke, Nada Madkour, Dan Hendrycks, Evan R

Anthony M. Barrett, Jessica Newman, Brandie Nonnecke, Nada Madkour, Dan Hendrycks, Evan R. Murphy, Krystal Jackson, and Deepika Raman. Ai risk-management standards profile for general-purpose ai (gpai) and foundation models, version 1.2. Technical report, UC Berkeley Center for Long-Term Cybersecurity (CLTC), 2026. arXiv:2506.23949; https://cltc.berkeley....

arXiv 2026

[5] [5]

Andrei Z. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences (SEQUENCES), 1997

1997

[6] [6]

Robert G. Cooper. Stage-gate systems: A new tool for managing new products. Business Horizons, 33 0 (3): 0 44--54, 1990

1990

[7] [7]

Dennis and Earl C

Jack B. Dennis and Earl C. Van Horn. Programming semantics for multiprogrammed computations. Communications of the ACM, 9 0 (3): 0 143--155, 1966

1966

[8] [8]

Minijail and seccomp-bpf

Will Drewry and Tavis Ormandy. Minijail and seccomp-bpf. In Linux Security Summit, 2012

2012

[9] [9]

Aider: Ai pair programming in your terminal

Paul Gauthier. Aider: Ai pair programming in your terminal. Open-source project, https://aider.chat/, 2024

2024

[10] [10]

Prompt cache: Modular attention reuse for low-latency inference

In Gim, Guojun Chen, Seung-seob Lee, Nikhil Sarda, Anurag Khandelwal, and Lin Zhong. Prompt cache: Modular attention reuse for low-latency inference. In Proceedings of Machine Learning and Systems (MLSys); arXiv:2311.04934, 2024. doi:10.48550/arXiv.2311.04934

work page doi:10.48550/arxiv.2311.04934 2024

[11] [11]

Supply-chain levels for software artifacts (slsa), v1.0

Google and others . Supply-chain levels for software artifacts (slsa), v1.0. https://slsa.dev/, 2023

2023

[12] [12]

Guardrails: A toolkit for building safe and reliable llm applications

Guardrails AI . Guardrails: A toolkit for building safe and reliable llm applications. Open-source project, https://github.com/guardrails-ai/guardrails, 2024

2024

[13] [13]

Scott Stornetta

Stuart Haber and W. Scott Stornetta. How to time-stamp a digital document. Journal of Cryptology, 3 0 (2): 0 99--111, 1991

1991

[14] [14]

C. A. R. Hoare. Communicating sequential processes. Communications of the ACM, 21 0 (8): 0 666--677, 1978

1978

[15] [15]

Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation

Jez Humble and David Farley. Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation. Addison-Wesley, 2010

2010

[16] [16]

Iso/iec 27001:2022 information security management systems --- requirements, 2022

ISO/IEC . Iso/iec 27001:2022 information security management systems --- requirements, 2022

2022

[17] [17]

The promises and perils of mining github

Eirini Kalliamvakou et al. The promises and perils of mining github. In Working Conference on Mining Software Repositories (MSR), 2014

2014

[18] [18]

Taxonomy of attacks on open-source software supply chains

Piergiorgio Ladisa et al. Taxonomy of attacks on open-source software supply chains. In IEEE Symposium on Security and Privacy (S&P), 2023

2023

[19] [19]

Can you trust chatgpt's package recommendations? Vendor research report (grey literature), 2024

Bar Lanyado et al. Can you trust chatgpt's package recommendations? Vendor research report (grey literature), 2024

2024

[20] [20]

Certificate transparency

Ben Laurie, Adam Langley, and Emilia Kasper. Certificate transparency. RFC 6962, IETF, 2013

2013

[21] [21]

Agentbench: Evaluating llms as agents

Xiao Liu et al. Agentbench: Evaluating llms as agents. In International Conference on Learning Representations (ICLR), 2024

2024

[22] [22]

``your ai, my shell'': Demystifying prompt injection attacks on agentic ai coding editors

Yuhao Liu, Yiyang Zhao, Yiyang Lyu, Tianyi Zhang, Haoyu Wang, and David Lo. ``your ai, my shell'': Demystifying prompt injection attacks on agentic ai coding editors. arXiv preprint arXiv:2509.22040, 2025

Pith/arXiv arXiv 2025

[23] [23]

Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity

Yao Lu et al. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Annual Meeting of the Association for Computational Linguistics (ACL), 2022

2022

[24] [24]

Github agent-configuration prevalence study: Dataset and reproduction scripts

Padmaraj Madatha. Github agent-configuration prevalence study: Dataset and reproduction scripts. Zenodo (DOI to be assigned) and arXiv ancillary files [Author artifact, not peer-reviewed], 2026 a

2026

[25] [25]

Rel(ai)build enterprise compliance alignment: Sox itgc, iso 27001 annex a, and nist ai rmf sub-function mappings

Padmaraj Madatha. Rel(ai)build enterprise compliance alignment: Sox itgc, iso 27001 annex a, and nist ai rmf sub-function mappings. Happiest Minds Technologies, companion document [Author artifact, not peer-reviewed], 2026 b

2026

[26] [26]

Cursor vs cursor+rel(ai)build illustrative build walkthrough

Padmaraj Madatha. Cursor vs cursor+rel(ai)build illustrative build walkthrough. arXiv ancillary files [Author artifact, not peer-reviewed], 2026 c

2026

[27] [27]

Mark S. Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, Johns Hopkins University, 2006

2006

[28] [28]

Reframing instructional prompts to gptk's language

Swaroop Mishra et al. Reframing instructional prompts to gptk's language. In Findings of the Association for Computational Linguistics (ACL Findings), 2022

2022

[29] [29]

Infrastructure as Code: Managing Servers in the Cloud

Kief Morris. Infrastructure as Code: Managing Servers in the Cloud. O'Reilly Media, 2016

2016

[30] [30]

Petri nets: Properties, analysis and applications

Tadao Murata. Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77 0 (4): 0 541--580, 1989

1989

[31] [31]

Bitcoin: A peer-to-peer electronic cash system, 2008

Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008

2008

[32] [32]

Sigstore: Software signing for everybody

Zachary Newman, John Speed Meyers, and Santiago Torres-Arias. Sigstore: Software signing for everybody. In ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 2353--2367, 2022. doi:10.1145/3548606.3560596

work page doi:10.1145/3548606.3560596 2022

[33] [33]

Artificial intelligence risk management framework (ai rmf 1.0)

NIST . Artificial intelligence risk management framework (ai rmf 1.0). Technical Report NIST AI 100-1, National Institute of Standards and Technology, 2023

2023

[34] [34]

Artificial intelligence risk management framework: Generative artificial intelligence profile

NIST . Artificial intelligence risk management framework: Generative artificial intelligence profile. Technical Report NIST AI 600-1, National Institute of Standards and Technology, 2024

2024

[35] [35]

Cybersecurity framework profile for artificial intelligence (cyber ai profile)

NIST . Cybersecurity framework profile for artificial intelligence (cyber ai profile). Technical Report NIST IR 8596 (initial public draft), National Institute of Standards and Technology, 2025. https://csrc.nist.gov/pubs/ir/8596/iprd

2025

[36] [36]

Ai agent standards initiative

NIST CAISI . Ai agent standards initiative. https://www.nist.gov/artificial-intelligence/ai-agent-standards-initiative, 2026

2026

[37] [37]

The minimum elements for a software bill of materials (sbom)

NTIA . The minimum elements for a software bill of materials (sbom). Technical report, U.S. Department of Commerce, National Telecommunications and Information Administration, 2021

2021

[38] [38]

Backstabber's knife collection: A review of open source software supply chain attacks

Marc Ohm et al. Backstabber's knife collection: A review of open source software supply chain attacks. In Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), 2020

2020

[39] [39]

Preparedness framework

OpenAI . Preparedness framework. https://openai.com/index/openai-preparedness-framework/, 2023

2023

[40] [40]

Top 10 for large language model applications, 2025

OWASP . Top 10 for large language model applications, 2025

2025

[41] [41]

Security smells in ansible and chef scripts: A replication study

Akond Rahman et al. Security smells in ansible and chef scripts: A replication study. ACM Transactions on Software Engineering and Methodology (TOSEM), 30 0 (1): 0 1--31, 2021

2021

[42] [42]

Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails

Traian Rebedea et al. Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails. arXiv preprint arXiv:2310.10561, 2023

arXiv 2023

[43] [43]

Saltzer and Michael D

Jerome H. Saltzer and Michael D. Schroeder. The protection of information in computer systems. Proceedings of the IEEE, 63 0 (9): 0 1278--1308, 1975

1975

[44] [44]

Quantifying language models' sensitivity to spurious features in prompt design

Melanie Sclar et al. Quantifying language models' sensitivity to spurious features in prompt design. arXiv preprint arXiv:2310.11324, 2023

Pith/arXiv arXiv 2023

[45] [45]

Executive order 14028: Improving the nation's cybersecurity

The White House . Executive order 14028: Improving the nation's cybersecurity. Federal Register Vol. 86, No. 93, 2021

2021

[46] [46]

Pulling Strings with Puppet: Configuration Management Made Easy

James Turnbull and Jeffrey McCune. Pulling Strings with Puppet: Configuration Management Made Easy. Apress, 2008

2008

[47] [47]

Wil M. P. van der Aalst. Workflow verification: Finding control-flow errors using petri-net-based techniques. In Business Process Management, LNCS 1806, pages 161--183. 2000

2000

[48] [48]

Ai package hallucination

Vulcan Cyber . Ai package hallucination. Vendor research report (grey literature), 2024

2024

[49] [49]

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

Xingyao Wang et al. Openhands: An open platform for ai software developers as generalist agents (formerly opendevin). arXiv preprint arXiv:2407.16741, 2024. doi:10.48550/arXiv.2407.16741

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.16741 2024

[50] [50]

Autogen: Enabling next-gen llm applications via multi-agent conversation

Qingyun Wu et al. Autogen: Enabling next-gen llm applications via multi-agent conversation. arXiv preprint arXiv:2308.08155, 2023

Pith/arXiv arXiv 2023

[51] [51]

Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. Swe-agent: Agent--computer interfaces enable automated software engineering. In Advances in Neural Information Processing Systems (NeurIPS), volume 37, 2024

2024

[52] [52]

What are weak links in the npm supply chain? In IEEE/ACM Intl

Nusrat Zahan et al. What are weak links in the npm supply chain? In IEEE/ACM Intl. Conf. on Software Engineering: Software Engineering in Society (ICSE-SEIS), 2022

2022