Recognition: no theorem link
Computer Science Conferences Should Require Nonrepudiable Experimental Results
Pith reviewed 2026-05-12 00:49 UTC · model grok-4.3
The pith
Computer science conferences should require tamper-evident attestations that bind reported experimental results to actual code executions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that experiment nonrepudiation must become a required property of published work, where any compliant protocol binds the numbers reported in a paper to an actual executed computation through tamper-evident and nonrepudiable means. Existing systems based on author-controlled logging and voluntary code release do not satisfy the formal security properties needed to prevent denial or post-execution changes. The authors demonstrate that the problem is solvable by building K-Veritas, a Go implementation that produces signed reports of computations without requiring access to training data, and they call for the community to develop an open standard.
What carries the argument
Experiment nonrepudiation, the protocol property that cryptographically binds reported results to actual executions so the author cannot later deny or alter the connection.
If this is right
- Reviewers would receive cryptographic assurance that the numbers in a submission match a real run of the code the paper describes.
- Authors would need to generate compliant signed reports as part of the submission process.
- Conferences could treat nonrepudiation as a mandatory check comparable to other integrity requirements.
- The community would need to maintain open, independent standards for generating and verifying the attestations.
Where Pith is reading between the lines
- Such attestations could provide persistent records that support long-term reproducibility checks beyond the initial review.
- The approach might extend to other computational fields where data access or post-hoc result changes are concerns.
- Widespread use would likely require community-supported tools to avoid disadvantaging smaller research groups.
Load-bearing premise
That a practical, low-overhead protocol can be standardized and adopted by conferences without creating new barriers to publishing or excluding researchers without access to specialized tooling.
What would settle it
A test in which authors attempt to submit altered or fabricated experimental results using the attestation protocol and the system either accepts the false claims or imposes requirements that prevent legitimate papers from being published.
Figures
read the original abstract
This position paper argues that computer science conferences should require tamper-evident, nonrepudiable attestations of experimental results. We name the underlying problem experiment nonrepudiation: a compliant protocol must bind the numbers in a paper to an actual executed computation in a way the author cannot later alter or deny. The current system relies on self-reported checklists, optional code sharing, and author-controlled logging. None of these mechanisms answer the question a reviewer cannot check: did the code the paper describes produce the numbers the paper reports? We define the problem formally, state the security properties any compliant protocol must satisfy, and describe a threat model that includes attacks current approaches do not prevent. To show that the problem is solvable, we built K-Veritas, a reference implementation in Go that produces signed reports without accessing training data. K-Veritas is a testbed, not a finished answer. We call on conferences and the community to treat nonrepudiation as a first-class requirement and to help build an open, independent standard for it.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This position paper argues that computer science conferences should require tamper-evident, nonrepudiable attestations of experimental results. It formalizes the problem of 'experiment nonrepudiation,' defines the security properties any compliant protocol must satisfy, presents a threat model including attacks not prevented by current self-reported methods, and introduces K-Veritas, a Go reference implementation that produces signed reports without accessing training data, as a testbed to demonstrate solvability.
Significance. If the central proposal holds, it could meaningfully strengthen reproducibility and trust in experimental computer science by binding reported numbers to executed computations in a verifiable way. The formalization of the problem and the construction of a reference implementation that avoids direct data access are strengths that credit the authors for moving beyond checklists toward cryptographic nonrepudiation.
major comments (1)
- [K-Veritas implementation] The K-Veritas implementation section states that the tool produces signed reports without accessing training data and positions it as a testbed for standardization, but supplies no overhead measurements, no runtime or storage benchmarks, and no compatibility evaluation with common workflows (Python scripts, Jupyter notebooks, or cloud-based training). This is load-bearing for the claim that a practical, low-overhead protocol can be adopted without creating new barriers to publishing.
minor comments (1)
- [Abstract and Introduction] The abstract and introduction could more explicitly cross-reference the exact security properties (e.g., binding, non-repudiation, tamper-evidence) defined later in the formal section to improve readability for readers unfamiliar with the threat model.
Simulated Author's Rebuttal
We thank the referee for the constructive review and positive remarks on the formalization of experiment nonrepudiation. We respond to the single major comment below.
read point-by-point responses
-
Referee: [K-Veritas implementation] The K-Veritas implementation section states that the tool produces signed reports without accessing training data and positions it as a testbed for standardization, but supplies no overhead measurements, no runtime or storage benchmarks, and no compatibility evaluation with common workflows (Python scripts, Jupyter notebooks, or cloud-based training). This is load-bearing for the claim that a practical, low-overhead protocol can be adopted without creating new barriers to publishing.
Authors: We agree that the K-Veritas section contains no overhead measurements, runtime or storage benchmarks, or compatibility evaluations with Python, Jupyter, or cloud workflows. However, we do not view these as load-bearing for the paper's claims. The manuscript explicitly frames K-Veritas as 'a reference implementation in Go' and 'a testbed, not a finished answer' whose sole purpose is to demonstrate solvability: that a protocol can produce signed reports meeting the defined security properties without accessing training data. The position paper argues that conferences should treat nonrepudiation as a requirement and that the problem is technically solvable; it makes no assertion that K-Veritas itself is a practical, low-overhead system ready for adoption or that it avoids new barriers. Performance and integration details are appropriate for subsequent engineering or standardization work, which the paper invites. We therefore see no need to expand the testbed description with benchmarks in this position paper. revision: no
Circularity Check
No significant circularity; argument rests on standard cryptographic definitions and a reference implementation
full rationale
The paper is a position paper that formally defines experiment nonrepudiation, states required security properties, and describes a threat model using established cryptographic notions of tamper-evidence and nonrepudiation. It then presents K-Veritas as an independent reference implementation in Go to demonstrate solvability. No derivation chain reduces any claim to its own inputs by construction, no parameters are fitted and relabeled as predictions, and no load-bearing premises depend on self-citations to prior work by the same authors. The central proposal remains independent of the reference implementation's specific details.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard cryptographic assumptions for digital signatures and tamper-evidence hold
Reference graph
Works this paper leans on
-
[1]
Artifact review and badging, 2020
ACM. Artifact review and badging, 2020. Version 1.1. https://www.acm.org/ publications/policies/artifact-review-and-badging-current
work page 2020
-
[2]
Henriques, Samuel Albanie, Michela Paganini, and Gül Varol
Luca Bertinetto, João F. Henriques, Samuel Albanie, Michela Paganini, and Gül Varol. Preface. In Luca Bertinetto, João F. Henriques, Samuel Albanie, Michela Paganini, and Gül Varol, editors,NeurIPS 2020 Workshop on Pre-registration in Machine Learning, volume 148 of Proceedings of Machine Learning Research, pages i–i. PMLR, 11 Dec 2021
work page 2020
- [3]
-
[4]
The scientific method in the science of machine learning, 2019
Jessica Zosa Forde and Michela Paganini. The scientific method in the science of machine learning, 2019
work page 2019
-
[5]
Datasheets for datasets.Commun
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wal- lach, Hal Daumé III, and Kate Crawford. Datasheets for datasets.Commun. ACM, 64(12):86–92, November 2021
work page 2021
-
[6]
Alexander Goldberg, Ihsan Ullah, Thanh Gia Hieu Khuong, Benedictus Kent Rachmat, Zhen Xu, Isabelle Guyon, and Nihar B. Shah. Usefulness of LLMs as an author checklist assistant for scientific papers: NeurIPS’24 experiment.arXiv preprint arXiv:2411.03417, 2024. https: //arxiv.org/abs/2411.03417
-
[7]
Sources of irreproducibility in machine learning: A review, 2023
Odd Erik Gundersen, Kevin Coakley, Christine Kirkpatrick, and Yolanda Gil. Sources of irreproducibility in machine learning: A review, 2023
work page 2023
-
[8]
State of the art: Reproducibility in artificial in- telligence
Odd Erik Gundersen and Sigbjørn Kjensmo. State of the art: Reproducibility in artificial in- telligence. pages 1644–1651, 2018. https://ojs.aaai.org/index.php/AAAI/article/ view/11503. 3ICML 2025 Reviewer Instructions:https://icml.cc/Conferences/2025/ReviewerInstructions 9
work page 2018
-
[9]
Hofman, Angelos Chatzimparmpas, Amit Sharma, Duncan J
Jake M. Hofman, Angelos Chatzimparmpas, Amit Sharma, Duncan J. Watts, and Jessica Hullman. Pre-registration for predictive modeling, 2023
work page 2023
-
[10]
Artificial intelligence faces reproducibility crisis
Matthew Hutson. Artificial intelligence faces reproducibility crisis.Science, 359(6377):725– 726, 2018.https://doi.org/10.1126/science.359.6377.725
-
[11]
ICML 2024 paper guidelines, 2024
ICML. ICML 2024 paper guidelines, 2024. https://icml.cc/Conferences/2024/ PaperGuidelines
work page 2024
-
[12]
Cantrell, Kenny Peng, Thanh Hien Pham, Christopher A
Sayash Kapoor, Emily M. Cantrell, Kenny Peng, Thanh Hien Pham, Christopher A. Bail, Odd Erik Gundersen, Jake M. Hofman, Jessica Hullman, Michael A. Lones, Momin M. Ma- lik, Priyanka Nanayakkara, Russell A. Poldrack, Inioluwa Deborah Raji, Michael Roberts, Matthew J. Salganik, Marta Serra-Garcia, Brandon M. Stewart, Gilles Vandewiele, and Arvind Narayanan....
work page 2024
-
[13]
Leakage and the reproducibility crisis in ml-based science, 2022
Sayash Kapoor and Arvind Narayanan. Leakage and the reproducibility crisis in ml-based science, 2022
work page 2022
-
[14]
Roberta: A robustly optimized bert pretraining approach, 2019
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach, 2019
work page 2019
-
[15]
NeurIPS paper checklist guidelines, 2024.https://neurips.cc/public/guides/ PaperChecklist
NeurIPS. NeurIPS paper checklist guidelines, 2024.https://neurips.cc/public/guides/ PaperChecklist
work page 2024
-
[16]
Sigstore: Software signing for everybody
Zachary Newman, John Speed Meyers, and Santiago Torres-Arias. Sigstore: Software signing for everybody. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS ’22, page 2353–2367, New York, NY , USA, 2022. Association for Computing Machinery
work page 2022
-
[17]
get in researchers; we’re measuring reproducibility
Daniel Olszewski, Allison Lu, Carson Stillman, Kevin Warren, Cole Kitroser, Alejandro Pascual, Divyajyoti Ukirde, Kevin Butler, and Patrick Traynor. "get in researchers; we’re measuring reproducibility": A reproducibility study of machine learning papers in tier 1 security confer- ences. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Comm...
work page 2023
-
[18]
OpenReview: A platform for open peer review, 2024.https://openreview
OpenReview.net. OpenReview: A platform for open peer review, 2024.https://openreview. net/
work page 2024
-
[19]
Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Florence d’Alché Buc, Emily Fox, and Hugo Larochelle. Improving reproducibility in machine learning research (a report from the NeurIPS 2019 reproducibility program).Journal of Machine Learning Research, 22(164):1–20, 2021.https://jmlr.org/papers/v22/20-303.html
work page 2019
-
[20]
Harald Semmelrock, Tony Ross-Hellauer, Simone Kopeinik, Dieter Theiler, Maximilian Haberl, Stefan Thalmann, and Dominik Kowald. Reproducibility in machine-learning-based research: Overview, barriers, and drivers.AI Magazine, 46:e70002, 2025. https://doi.org/10.1002/ aaai.70002
work page 2025
-
[21]
Manning, Andrew Ng, and Christopher Potts
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. InProceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA, October 2013. Association for Co...
work page 2013
-
[22]
Bailey, Ewa Deelman, Yolanda Gil, Brooks Hanson, Michael A
Victoria Stodden, Marcia McNutt, David H. Bailey, Ewa Deelman, Yolanda Gil, Brooks Hanson, Michael A. Heroux, John P.A. Ioannidis, and Michela Taufer. Enhancing reproducibility for computational methods.Science, 354(6317):1240–1241, 2016. 10
work page 2016
-
[23]
in-toto: Providing farm-to-table guarantees for bits and bytes
Santiago Torres-Arias, Hammad Afzali, Trishank Karthik Kuppusamy, Reza Curtmola, and Justin Cappos. in-toto: Providing farm-to-table guarantees for bits and bytes. In28th USENIX Security Symposium (USENIX Security 19), pages 1393–1410, Santa Clara, CA, August 2019. USENIX Association
work page 2019
-
[24]
Hotos xix panel report: Panel on future of reproduction and replication of systems research, 2023
Roberta De Viti, Solal Pirelli, and Vaastav Anand. Hotos xix panel report: Panel on future of reproduction and replication of systems research, 2023
work page 2023
-
[25]
Jianying Zhou and D. Gollman. A fair non-repudiation protocol. InProceedings of the 1996 IEEE Symposium on Security and Privacy, SP ’96, page 55, USA, 1996. IEEE Computer Society. 11 A Comparison with Existing Approaches Table 4: Comparison of nonrepudiation with existing reproducibility mechanisms. K-Veritas is one possible instantiation of a compliant p...
work page 1996
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.