paper.json: A Coordination Convention for LLM-Agent-Actionable Papers
Pith reviewed 2026-05-19 16:56 UTC · model grok-4.3
The pith
A companion JSON file turns academic papers into documents LLM agents can act on directly.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a paper.json file using stable claim IDs, a does-not-claim list, per-figure shell commands, and stable definition IDs makes the paper LLM-agent-actionable, and that minimum viable compliance is achievable by hand in under an hour without touching the prose output.
What carries the argument
The paper.json companion file that encodes stable claim IDs, an explicit does-not-claim list, exact per-figure shell commands, and stable definition IDs to enable direct agent extraction and action.
If this is right
- Agents can cite sub-claims at stable granularity using the provided IDs.
- Scope overextension is limited by the explicit does-not-claim list.
- Figure reproduction steps become directly available from the paper file itself.
- Definitions remain trackable across multiple papers and agent sessions.
Where Pith is reading between the lines
- Widespread use could prompt publishing platforms to accept and validate paper.json as standard supplementary material.
- Authors might integrate the conventions into existing authoring tools so the JSON is generated automatically.
- This approach could support new automated review processes that check claim boundaries and reproduction commands.
Load-bearing premise
LLM agents will read and act on the paper.json file rather than continuing to parse only the prose PDF.
What would settle it
An experiment in which agents given both the PDF and a compliant paper.json still produce inaccurate sub-claim citations or scope overextensions would show the convention fails to deliver the claimed actionability.
read the original abstract
LLM agents routinely serve as first (and sometimes only) readers of academic papers, skimming for sub-claims, extracting reproducibility steps, and generalizing scope. Standard prose papers produce recurring failures in this role: sub-claims that cannot be cited at sub-paper granularity, scope overextension beyond what the paper tests, and figure commands buried in codebases rather than the paper itself. We propose `paper.json`, a companion JSON file that travels with the PDF and addresses each failure with a lightweight convention: stable claim IDs (C1), an explicit does-not-claim list (C2), exact per-figure shell commands (C3), and stable definition IDs (C5). A fifth convention (C4) holds that minimum viable compliance, hand-written JSON alongside the PDF, is achievable in under an hour for a finished paper without touching the human-readable output. C1, C2, C3, and C5 are open invitations: an agent that reads a compliant paper and acts on it produces evidence for or against them. This paper is itself compliant: `uv run validator.py paper.json --against paper.typ` passes. Repo: https://github.com/arquicanedo/paper-json
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes 'paper.json', a companion JSON file to accompany academic PDFs, to make papers more actionable for LLM agents. It defines four core conventions—stable claim IDs (C1), an explicit does-not-claim list (C2), exact per-figure shell commands (C3), and stable definition IDs (C5)—plus a fifth convention (C4) that minimum viable compliance can be achieved by hand in under one hour without altering the human-readable paper. The authors supply a validator script, demonstrate self-compliance on their own paper (uv run validator.py paper.json --against paper.typ passes), and release the conventions openly so agents can produce evidence for or against them.
Significance. If adopted, the convention could improve the precision of LLM-agent interactions with scholarly work by enabling granular claim citation, explicit scope boundaries, and direct reproducibility steps. The provision of a working validator and a self-compliant example constitutes a concrete, falsifiable starting point that lowers the barrier for initial uptake. These elements give the proposal practical value beyond a purely theoretical suggestion.
major comments (1)
- [The proposed conventions] The manuscript does not describe any discovery mechanism by which an LLM agent would locate the paper.json file (e.g., PDF metadata link, standard filename convention next to the PDF, or recommended agent prompt template). This is load-bearing for the central claim that the conventions render papers LLM-agent-actionable, because without reliable discovery the JSON remains inaccessible and the four conventions cannot be acted upon.
minor comments (1)
- [Abstract] The abstract lists C1–C3 and C5 then introduces C4 as a 'fifth convention'; the main text should explicitly clarify the numbering and whether C4 is a meta-convention or part of the core set.
Simulated Author's Rebuttal
We thank the referee for their constructive review and for recognizing the practical value of the paper.json proposal, including the validator and self-compliant example. We address the single major comment below and will revise the manuscript to incorporate the suggested improvements.
read point-by-point responses
-
Referee: [The proposed conventions] The manuscript does not describe any discovery mechanism by which an LLM agent would locate the paper.json file (e.g., PDF metadata link, standard filename convention next to the PDF, or recommended agent prompt template). This is load-bearing for the central claim that the conventions render papers LLM-agent-actionable, because without reliable discovery the JSON remains inaccessible and the four conventions cannot be acted upon.
Authors: We agree that an explicit discovery mechanism is necessary to make the conventions reliably actionable for LLM agents and that its absence weakens the central claim. The current manuscript focuses on the content and structure of paper.json (with the assumption that it travels alongside the PDF) but does not specify how agents locate it. In the revised version we will add a dedicated subsection (likely under 'Conventions' or a new 'Adoption and Discovery' section) that recommends: (1) a standard filename convention of 'paper.json' placed in the same directory or repository as the PDF, (2) embedding a URI or link to the JSON file in PDF metadata or as an embedded annotation, and (3) a minimal agent prompt template instructing the model to first search for and load a co-located paper.json before processing the paper content. These additions will be presented as non-mandatory but strongly encouraged practices that preserve the lightweight nature of the convention while directly addressing discoverability. We will also include a brief discussion of how these mechanisms can be implemented without modifying the human-readable paper itself. revision: yes
Circularity Check
No circularity: proposal of a coordination convention with no derivation chain or self-referential reductions
full rationale
The paper proposes a lightweight JSON companion format (paper.json) with four open conventions (stable claim IDs C1, does-not-claim list C2, per-figure shell commands C3, stable definition IDs C5) plus a minimum-viable-compliance rule C4. No equations, fitted parameters, predictions, or mathematical derivations appear in the provided text. The manuscript simply defines the conventions, states that this paper itself satisfies them via a validator command, and invites external agents to test them. There are no self-citations, uniqueness theorems, or ansatzes imported from prior work. The central claim is a coordination proposal whose success depends on external adoption and agent behavior, not on any internal reduction of outputs to inputs by construction. This is a self-contained proposal document with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLM agents will read and act upon an accompanying paper.json file when it is present.
- domain assumption Authors of finished papers can produce the minimal JSON in under one hour without altering the human-readable output.
Reference graph
Works this paper leans on
-
[1]
ExpertQA: Expert-Curated Questions and Attributed Answers,
C. Malaviya, S. Lee, S. Chen, E. Sieber, M. Yatskar, and D. Roth, “ExpertQA: Expert-Curated Questions and Attributed Answers, ” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics, 2024. doi: 10.18653/v1/2024.naacl-long.274
-
[2]
Evaluating Factual Consistency of Summaries with Large Language Models,
Z. Guo, M. Schlichtkrull, and A. Vlachos, “Evaluating Factual Consistency of Summaries with Large Language Models, ” in Findings of the Association for Computational Linguistics 2023 , 2023. doi: 10.18653/v1/2023.findings-acl.322
-
[3]
JATS: Journal Article Tag Suite (ANSI/NISO Z39.96-2012),
National Information Standards Organization, “JATS: Journal Article Tag Suite (ANSI/NISO Z39.96-2012), ” technical report, 2012. [Online]. Available: https://www.niso.org/standards- committees/jats
work page 2012
-
[4]
Packaging Research Artefacts with RO-Crate,
S. Soiland-Reyes et al. , “Packaging Research Artefacts with RO-Crate, ” Data Science , 2022, doi: 10.48550/arXiv.2108.06503
-
[5]
Open Collaborative Writing with Manubot,
D. S. Himmelstein et al., “Open Collaborative Writing with Manubot, ” PLOS Computational Biology, vol. 15, no. 6, p. e1007128, 2019, doi: 10.1371/journal.pcbi.1007128
- [6]
-
[7]
W3C Web Annotation Working Group, “Web Annotation Data Model. ” [Online]. Available: https:// www.w3.org/TR/annotation-model/ 7
-
[8]
Open Research Knowledge Graph: A System Walkthrough,
M. Y. Jaradeh, A. Oelen, M. Prinz, M. Stocker, and S. Auer, “Open Research Knowledge Graph: A System Walkthrough, ” in Proceedings of the 23rd International Conference on Theory and Practice of Digital Libraries (TPDL), 2019. doi: 10.1007/978-3-030-30760-8_31
-
[9]
CodeMeta: An Exchange Schema for Software Metadata, Version 2.0
M. B. Jones et al., “CodeMeta: An Exchange Schema for Software Metadata, Version 2.0. ” 2017. doi: 10.5063/schema/codemeta-2.0
-
[10]
Hybrid Analysis Pipelines in the REANA Reproducible Analysis Platform,
T. Šimko, D. Rodríguez, R. Mačiulaitis, and J. Okraska, “Hybrid Analysis Pipelines in the REANA Reproducible Analysis Platform, ” in EPJ Web of Conferences (CHEP 2019), 2020, p. 6041. doi: 10.1051/ epjconf/202024506041
-
[11]
S2ORC: The Semantic Scholar Open Research Corpus,
K. Lo, L. L. Wang, M. Neumann, R. Kinney, and D. Weld, “S2ORC: The Semantic Scholar Open Research Corpus, ” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020. doi: 10.18653/v1/2020.acl-main.447
-
[12]
OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts
J. Priem, H. Piwowar, and R. Orr, “OpenAlex: A Fully-Open Index of Scholarly Works, Authors, Venues, Institutions, and Concepts. ” 2022. doi: 10.48550/arXiv.2205.01833. 8
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2205.01833 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.