arxiv: 2604.21744 · v1 · submitted 2026-04-23 · 💻 cs.SE · cs.AI· q-bio.BM

Recognition: unknown

Agentic AI-assisted coding offers a unique opportunity to instill epistemic grounding during software development

Magnus Palmblad , Jared M. Ragland , Benjamin A. Neely

Authors on Pith no claims yet

Pith reviewed 2026-05-09 21:11 UTC · model grok-4.3

classification 💻 cs.SE cs.AIq-bio.BM

keywords agentic AIAI-assisted codingepistemic groundingscientific softwareproteomicshard constraintscommunity governancesoftware validity

0 comments

The pith

A community-governed grounding document can direct agentic AI to generate scientifically valid code by overriding user prompts with hard constraints and conventions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes creating field-scoped documents such as GROUNDING.md to embed epistemic grounding into agentic AI coding workflows. Using mass spectrometry-based proteomics as the running example, the document would list non-negotiable Hard Constraints required for scientific correctness alongside community Convention Parameters. These rules are meant to take precedence over any other instructions the AI receives, allowing non-experts to produce reliable software while domain experts retain influence by maintaining the shared file. The approach matters because agentic systems are expected to follow explicit guidelines more consistently than human developers, supporting higher-quality output in democratized scientific tool creation.

Core claim

By establishing a community-governed, field-scoped epistemic grounding document such as GROUNDING.md for mass spectrometry-based proteomics, which encodes Hard Constraints as non-negotiable validity invariants empirically required for scientific correctness and Convention Parameters as community-agreed defaults, agentic AI systems can be forced to generate code, tools, and software that adhere to best practices at the ground level regardless of user prompts. This setup provides confidence to the software developer as well as to reviewers and end users, and it keeps domain experts in the loop through ongoing maintenance of the document.

What carries the argument

The GROUNDING.md document, which encodes Hard Constraints (non-negotiable validity invariants) and Convention Parameters (community-agreed defaults) that override all other contexts to enforce scientific validity.

If this is right

Non-domain experts can generate field-specific software that incorporates scientific best practices from the outset.
Domain experts continue to shape outcomes by updating and maintaining the grounding document.
Reviewers and users gain greater assurance that the delivered code meets validity requirements.
Organizations can create bespoke scientific tools more quickly while retaining epistemic safeguards.
AI adherence to explicit rules offers a practical advantage over relying on human developers to follow guidelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Comparable grounding documents could be written for other experimental fields such as bioinformatics or analytical chemistry.
Compliance could be measured by comparing code outputs produced with and without access to the grounding file.
The method might reduce downstream validation effort for AI-generated scientific software across disciplines.
Integration into agent scaffolds could include automatic loading of the relevant GROUNDING.md for each task domain.

Load-bearing premise

Communities can reach and sustain sufficient consensus on the grounding document contents, and agentic AI systems will reliably prioritize and follow those rules over conflicting user instructions.

What would settle it

A controlled test in which an agentic AI is given a proteomics coding task, a GROUNDING.md file that states a specific hard constraint, and a user prompt that directly conflicts with it, followed by inspection of whether the generated code obeys the constraint or the prompt.

read the original abstract

The capabilities of AI-assisted coding are progressing at breakneck speed. Chat-based vibe coding has evolved into fully fledged AI-assisted, agentic software development using agent scaffolds where the human developer creates a plan that agentic AIs implement. One current trend is utilizing documents beyond this plan document, such as project and method-scoped documents. Here we propose GROUNDING$.$md, a community-governed, field-scoped epistemic grounding document, using mass spectrometry-based proteomics as an example. This explicit field-scoped grounding document encodes Hard Constraints (non-negotiable validity invariants empirically required for scientific correctness) and Convention Parameters (community-agreed defaults) that override all other contexts to enforce validity, regardless of what the user prompts. In practice, this will empower a non-domain expert to generate code, tools, and software that have best practices baked in at the ground level, providing confidence to the software developer but also to those reviewing or using the final product. Undoubtedly it is easier to have agentic AIs adhere to guidelines than humans, and this opportunity allows for organizations to develop epistemic grounding documents in such a way as to keep domain experts in the loop in a future of democratized generation of bespoke software solutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a clean proposal for field-specific GROUNDING.md files that hard-code scientific rules for agentic AI coders, but it stays at the level of untested assertions.

read the letter

The main point is that agentic AI coding tools could be steered toward reliable scientific software by community documents that list non-negotiable validity rules and agreed defaults, with mass-spec proteomics as the concrete case. The authors frame GROUNDING.md as something that overrides user prompts and other context, which is a modest but clear extension of existing prompt and guideline practices. They rightly note that agents may prove easier to constrain than human developers, potentially letting domain experts stay involved while non-experts generate tools. That observation is useful and worth keeping in mind for anyone building or using these systems in specialized fields. The proteomics example helps ground the idea without overclaiming. The central weakness is the complete absence of any test, prototype, or even a sketch of how current agent scaffolds would actually read and prioritize such a file over conflicting instructions. The paper also does not address how communities would reach and maintain consensus on the hard constraints, which is likely to be the hardest practical step. No data on agent compliance or governance friction is supplied, so the claim that this will produce correct code rests on hope rather than evidence. Readers working on AI-assisted scientific software or on proteomics tooling might find the suggestion worth discussing in a reading group or workshop. It does not contain new methods or results that would change daily practice, but the workflow idea is coherent enough to merit referee comments on implementation details and adoption barriers.

Referee Report

1 major / 1 minor

Summary. The paper proposes GROUNDING.md, a community-governed, field-scoped document for agentic AI-assisted coding. Using mass spectrometry-based proteomics as an example, it encodes Hard Constraints (non-negotiable validity invariants required for scientific correctness) and Convention Parameters (community-agreed defaults) that are intended to override all other contexts, including user prompts, to ensure epistemic grounding and best practices in generated code and tools. The central claim is that this approach leverages AI agents' greater adherence to guidelines compared to humans, enabling non-domain experts to produce reliable scientific software while maintaining domain-expert oversight through community governance.

Significance. If the proposed documents can be maintained via consensus and reliably prioritized by agentic systems, the framework could meaningfully improve the reliability of AI-generated code in scientific domains by embedding validity invariants at the context level. This addresses a timely challenge in democratized software development and highlights a potential advantage of agentic AI over traditional human-driven processes.

major comments (1)

[Abstract] Abstract: The claim that the GROUNDING.md document 'override[s] all other contexts to enforce validity, regardless of what the user prompts' is load-bearing for the proposal but is presented without any analysis of agent architectures, context-window management, or enforcement mechanisms that would make such overriding feasible in current or near-future systems.

minor comments (1)

[Abstract] Abstract: 'GROUNDING$.$md' is a typesetting artifact and should be rendered as GROUNDING.md.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the potential significance of the GROUNDING.md proposal. We address the single major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: The claim that the GROUNDING.md document 'override[s] all other contexts to enforce validity, regardless of what the user prompts' is load-bearing for the proposal but is presented without any analysis of agent architectures, context-window management, or enforcement mechanisms that would make such overriding feasible in current or near-future systems.

Authors: We agree that the overriding property is central to the proposal's value and that the original text presented it without sufficient discussion of implementation details. The manuscript's emphasis is on the epistemic and community-governance dimensions, positing that agentic systems' superior adherence to explicit guidelines (as noted in the abstract) creates an opportunity for field-scoped invariants. However, we accept that feasibility in practice requires elaboration. In revision we will qualify the claim in the abstract, add a short discussion of relevant mechanisms such as system-prompt injection, hierarchical context management in agent scaffolds, and persistent memory architectures, and reference existing patterns from current agent frameworks where high-priority instructions can be enforced. This will make the load-bearing aspect more transparent without shifting the paper's conceptual focus. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely conceptual proposal without derivations

full rationale

The manuscript is a forward-looking conceptual proposal advocating for community-governed GROUNDING.md documents that encode hard constraints and convention parameters for AI-assisted coding in fields like mass spectrometry-based proteomics. No equations, quantitative predictions, empirical results, or derivation chains are present in the text. The central claims rest on the feasibility of consensus maintenance and AI prioritization but make no attempt to derive these from prior results or self-referential definitions within the paper itself. The argument is self-contained as a normative suggestion and does not reduce any asserted outcome to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The proposal assumes without independent evidence that agentic AI systems can be made to treat external documents as higher priority than user prompts, that communities can reach consensus on hard constraints, and that such constraints can be encoded unambiguously for AI interpretation.

axioms (2)

domain assumption Agentic AI systems will reliably adhere to field-scoped grounding documents over other context or user instructions
Invoked in the description of how the document overrides all other contexts
domain assumption Communities can define and maintain unambiguous Hard Constraints and Convention Parameters for scientific validity
Central to the community-governed aspect of GROUNDING.md

invented entities (1)

GROUNDING.md no independent evidence
purpose: Field-scoped epistemic grounding document that enforces validity invariants in AI coding
New document type proposed to solve the problem of ensuring scientific correctness in agent-generated code

pith-pipeline@v0.9.0 · 5530 in / 1325 out tokens · 35375 ms · 2026-05-09T21:11:50.829573+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Vibe coding omics data analysis applications.Journal of proteome research, 25(2):1191–1197, 2026

Jesse G Meyer. Vibe coding omics data analysis applications.Journal of proteome research, 25(2):1191–1197, 2026

2026
[2]

The adolescence of ai.https://www.darioamodei.com/essay/ the-adolescence-of-technology, 2024

Dario Amodei. The adolescence of ai.https://www.darioamodei.com/essay/ the-adolescence-of-technology, 2024

2024
[3]

Agents.md.https://agents.md/
[4]

Skill.md.https://skill.md/
[5]

Proteomics standards initiative at twenty years: current activities and future work

Eric W Deutsch, Juan Antonio Vizca´ ıno, Andrew R Jones, Pierre-Alain Binz, Henry Lam, Joshua Klein, Wout Bittremieux, Yasset Perez-Riverol, David L Tabb, Mathias Walzer, et al. Proteomics standards initiative at twenty years: current activities and future work. Journal of Proteome Research, 22(2):287–301, 2023

2023
[6]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitu- tional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[7]

Concrete Problems in AI Safety

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Man´ e. Concrete problems in ai safety.arXiv preprint arXiv:1606.06565, 2016

work page internal anchor Pith review arXiv 2016
[8]

Zenodo repository, 2024

EVERSE Research Software Quality Kit (RSQKit). Zenodo repository, 2024

2024
[9]

Openai skills.https://github.com/openai/skills/tree/main
[10]

Anthropic claude agent skills overview.https://platform.claude.com/docs/en/ agents-and-tools/agent-skills/overview
[11]

Anatomy of the claude folder.https://blog.dailydoseofds.com/p/ anatomy-of-the-claude-folder
[12]

Mass spectrometry-based plasma proteomics: considerations from sample collection to achieving translational data

Vera Ignjatovic, Philipp E Geyer, Krishnan K Palaniappan, Jessica E Chaaban, Gilbert S Omenn, Mark S Baker, Eric W Deutsch, and Jochen M Schwenk. Mass spectrometry-based plasma proteomics: considerations from sample collection to achieving translational data. Journal of proteome research, 18(12):4085–4097, 2019

2019
[13]

Quality control in the mass spectrometry proteomics core: a practical primer.Journal of biomolecular techniques: JBT, 35(3):3fc1f5fe–42308a9a, 2024

Benjamin A Neely, Yasset Perez-Riverol, and Magnus Palmblad. Quality control in the mass spectrometry proteomics core: a practical primer.Journal of biomolecular techniques: JBT, 35(3):3fc1f5fe–42308a9a, 2024. 8

2024
[14]

A proteomics sample metadata representation for multiomics integration and big data analysis.Nature Communications, 12(1):5854, 2021

Chengxin Dai, Anja F¨ ullgrabe, Julianus Pfeuffer, Elizaveta M Solovyeva, Jingwen Deng, Pablo Moreno, Selvakumar Kamatchinathan, Deepti Jaiswal Kundu, Nancy George, Silvie Fexova, et al. A proteomics sample metadata representation for multiomics integration and big data analysis.Nature Communications, 12(1):5854, 2021

2021
[15]

Data standardization and sharing—the work of the hupo-psi.Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics, 1844(1):82–87, 2014

Sandra Orchard. Data standardization and sharing—the work of the hupo-psi.Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics, 1844(1):82–87, 2014

2014
[16]

The minimum information about a proteomics experiment (miape).Nature biotechnology, 25(8):887–893, 2007

Chris F Taylor, Norman W Paton, Kathryn S Lilley, Pierre-Alain Binz, Randall K Julian Jr, Andrew R Jones, Weimin Zhu, Rolf Apweiler, Ruedi Aebersold, Eric W Deutsch, et al. The minimum information about a proteomics experiment (miape).Nature biotechnology, 25(8):887–893, 2007

2007
[17]

On the quality of protein identification by mass spectrometry

Jan Eriksson, David Feny¨ o, and Brian Chait. On the quality of protein identification by mass spectrometry. InPoster WPH 261 at ASMS 1999, 1999

1999
[18]

A model of random mass-matching and its use for auto- mated significance testing in mass spectrometric proteome analysis.Proteomics, 2(3):262– 270, 2002

Jan Eriksson and David Feny¨ o. A model of random mass-matching and its use for auto- mated significance testing in mass spectrometric proteome analysis.Proteomics, 2(3):262– 270, 2002

2002
[19]

Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment.Nature Methods, 22(7):1454–1463, 2025

Bo Wen, Jack Freestone, Michael Riffle, Michael J MacCoss, William S Noble, and Uri Keich. Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment.Nature Methods, 22(7):1454–1463, 2025

2025
[20]

Evaluating agents

Thibaud Gloaguen, Niels M¨ undler, Mark M¨ uller, Veselin Raychev, and Martin Vechev. Evaluating agents. md: Are repository-level context files helpful for coding agents?arXiv preprint arXiv:2602.11988, 2026

work page arXiv 2026
[21]

Spec kit.https://github.com/github/spec-kit

Den Delimarsky and Manfred Riem. Spec kit.https://github.com/github/spec-kit
[22]

Dario amodei - dwarkesh podcast.https://www.dwarkesh.com/p/ dario-amodei-2, 2026

Dwarkesh Patel. Dario amodei - dwarkesh podcast.https://www.dwarkesh.com/p/ dario-amodei-2, 2026

2026
[23]

Proteobench: the community-curated platform for comparing proteomics data analysis workflows.bioRxiv, pages 2025–12, 2025

Robbe Devreese, Caroline Jachmann, Bart Van Puyvelde, Holda A Anagho-Mattanovich, Witold E Wolski, Henry Webel, Matthias Anagho-Mattanovich, Wout Bittremieux, Karima Chaoui, Cristina Chiva, et al. Proteobench: the community-curated platform for comparing proteomics data analysis workflows.bioRxiv, pages 2025–12, 2025

2025
[24]

Interpretation of the dome recommenda- tions for machine learning in proteomics and metabolomics.Journal of proteome research, 21(4):1204–1207, 2022

Magnus Palmblad, Sebastian Bocker, Sven Degroeve, Oliver Kohlbacher, Lukas Kall, William Stafford Noble, and Mathias Wilhelm. Interpretation of the dome recommenda- tions for machine learning in proteomics and metabolomics.Journal of proteome research, 21(4):1204–1207, 2022

2022
[25]

Introducing the fair principles for research software

Michelle Barker, Neil P Chue Hong, Daniel S Katz, Anna-Lena Lamprecht, Carlos Martinez-Ortiz, Fotis Psomopoulos, Jennifer Harrow, Leyla Jael Castro, Morane Gruen- peter, Paula Andrea Martinez, et al. Introducing the fair principles for research software. Scientific data, 9(1):622, 2022. 9

2022