arxiv: 2605.08586 · v1 · submitted 2026-05-09 · 💻 cs.CR

Recognition: no theorem link

Computer Science Conferences Should Require Nonrepudiable Experimental Results

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:49 UTC · model grok-4.3

classification 💻 cs.CR

keywords experiment nonrepudiationreproducibilitytamper-evident loggingconference policiesscientific integritycryptographic attestationexperimental validation

0 comments

The pith

Computer science conferences should require tamper-evident attestations that bind reported experimental results to actual code executions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This position paper argues that conferences must mandate nonrepudiable attestations for experimental results because current practices like self-reported checklists and optional code sharing fail to confirm whether the numbers in a paper were actually produced by the described code. The authors define experiment nonrepudiation as the requirement that any protocol must cryptographically link results to executions in a way authors cannot later alter or deny. They outline the necessary security properties and a threat model that includes attacks not stopped by existing methods, then present a working prototype to show the approach is feasible. If adopted, this change would make published experimental claims verifiable rather than reliant on author trust alone.

Core claim

The paper claims that experiment nonrepudiation must become a required property of published work, where any compliant protocol binds the numbers reported in a paper to an actual executed computation through tamper-evident and nonrepudiable means. Existing systems based on author-controlled logging and voluntary code release do not satisfy the formal security properties needed to prevent denial or post-execution changes. The authors demonstrate that the problem is solvable by building K-Veritas, a Go implementation that produces signed reports of computations without requiring access to training data, and they call for the community to develop an open standard.

What carries the argument

Experiment nonrepudiation, the protocol property that cryptographically binds reported results to actual executions so the author cannot later deny or alter the connection.

If this is right

Reviewers would receive cryptographic assurance that the numbers in a submission match a real run of the code the paper describes.
Authors would need to generate compliant signed reports as part of the submission process.
Conferences could treat nonrepudiation as a mandatory check comparable to other integrity requirements.
The community would need to maintain open, independent standards for generating and verifying the attestations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such attestations could provide persistent records that support long-term reproducibility checks beyond the initial review.
The approach might extend to other computational fields where data access or post-hoc result changes are concerns.
Widespread use would likely require community-supported tools to avoid disadvantaging smaller research groups.

Load-bearing premise

That a practical, low-overhead protocol can be standardized and adopted by conferences without creating new barriers to publishing or excluding researchers without access to specialized tooling.

What would settle it

A test in which authors attempt to submit altered or fabricated experimental results using the attestation protocol and the system either accepts the false claims or imposes requirements that prevent legitimate papers from being published.

Figures

Figures reproduced from arXiv: 2605.08586 by Christopher Homan, Mamadou K. Keita.

**Figure 1.** Figure 1: K-Veritas verification workflow. The user runs experiments with K-Veritas commands, writes the paper, attaches the signed report, and submits everything for review. K-Veritas is used here as a testbed for a more general protocol. Preprint. arXiv:2605.08586v1 [cs.CR] 9 May 2026 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

read the original abstract

This position paper argues that computer science conferences should require tamper-evident, nonrepudiable attestations of experimental results. We name the underlying problem experiment nonrepudiation: a compliant protocol must bind the numbers in a paper to an actual executed computation in a way the author cannot later alter or deny. The current system relies on self-reported checklists, optional code sharing, and author-controlled logging. None of these mechanisms answer the question a reviewer cannot check: did the code the paper describes produce the numbers the paper reports? We define the problem formally, state the security properties any compliant protocol must satisfy, and describe a threat model that includes attacks current approaches do not prevent. To show that the problem is solvable, we built K-Veritas, a reference implementation in Go that produces signed reports without accessing training data. K-Veritas is a testbed, not a finished answer. We call on conferences and the community to treat nonrepudiation as a first-class requirement and to help build an open, independent standard for it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper names a real gap in verifying experimental claims but leaves the practicality of its proposed fix untested.

read the letter

The main takeaway is that the authors have identified a genuine weakness in current reproducibility practices for computer science papers, but their suggestion for fixing it lacks the evidence needed to make it a requirement. They do well by naming the issue experiment nonrepudiation and defining the security properties that any solution must have. This includes binding results to an actual computation in a way authors cannot later deny or change. The threat model they describe goes beyond what code sharing or checklists can handle, covering things like post-execution tampering. Providing a reference implementation in Go called K-Veritas shows they have thought through a possible technical path. The paper is clear on why the status quo falls short. Reviewers often cannot verify if the numbers in a paper came from the code as described, especially in compute-intensive areas. That is a fair critique. Where it is soft is on the practical side. There are no overhead figures, no tests with standard workflows such as Jupyter notebooks or Python scripts, and no assessment of how accessible the system would be for researchers without specialized tools. Since the proposal calls for conferences to require this, the absence of any feasibility data is a noticeable gap. The authors themselves call K-Veritas a testbed rather than a finished product, which is accurate but leaves the adoption question open. This kind of paper is for people who organize conferences or work on research standards. Someone interested in improving trust in published results would get something out of the discussion. It deserves peer review because the problem is real and the outline of a solution is concrete. Referees could help strengthen the practicality arguments. Recommendation: Send it out, but expect the review process to focus heavily on implementation details and barriers to entry.

Referee Report

1 major / 1 minor

Summary. This position paper argues that computer science conferences should require tamper-evident, nonrepudiable attestations of experimental results. It formalizes the problem of 'experiment nonrepudiation,' defines the security properties any compliant protocol must satisfy, presents a threat model including attacks not prevented by current self-reported methods, and introduces K-Veritas, a Go reference implementation that produces signed reports without accessing training data, as a testbed to demonstrate solvability.

Significance. If the central proposal holds, it could meaningfully strengthen reproducibility and trust in experimental computer science by binding reported numbers to executed computations in a verifiable way. The formalization of the problem and the construction of a reference implementation that avoids direct data access are strengths that credit the authors for moving beyond checklists toward cryptographic nonrepudiation.

major comments (1)

[K-Veritas implementation] The K-Veritas implementation section states that the tool produces signed reports without accessing training data and positions it as a testbed for standardization, but supplies no overhead measurements, no runtime or storage benchmarks, and no compatibility evaluation with common workflows (Python scripts, Jupyter notebooks, or cloud-based training). This is load-bearing for the claim that a practical, low-overhead protocol can be adopted without creating new barriers to publishing.

minor comments (1)

[Abstract and Introduction] The abstract and introduction could more explicitly cross-reference the exact security properties (e.g., binding, non-repudiation, tamper-evidence) defined later in the formal section to improve readability for readers unfamiliar with the threat model.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and positive remarks on the formalization of experiment nonrepudiation. We respond to the single major comment below.

read point-by-point responses

Referee: [K-Veritas implementation] The K-Veritas implementation section states that the tool produces signed reports without accessing training data and positions it as a testbed for standardization, but supplies no overhead measurements, no runtime or storage benchmarks, and no compatibility evaluation with common workflows (Python scripts, Jupyter notebooks, or cloud-based training). This is load-bearing for the claim that a practical, low-overhead protocol can be adopted without creating new barriers to publishing.

Authors: We agree that the K-Veritas section contains no overhead measurements, runtime or storage benchmarks, or compatibility evaluations with Python, Jupyter, or cloud workflows. However, we do not view these as load-bearing for the paper's claims. The manuscript explicitly frames K-Veritas as 'a reference implementation in Go' and 'a testbed, not a finished answer' whose sole purpose is to demonstrate solvability: that a protocol can produce signed reports meeting the defined security properties without accessing training data. The position paper argues that conferences should treat nonrepudiation as a requirement and that the problem is technically solvable; it makes no assertion that K-Veritas itself is a practical, low-overhead system ready for adoption or that it avoids new barriers. Performance and integration details are appropriate for subsequent engineering or standardization work, which the paper invites. We therefore see no need to expand the testbed description with benchmarks in this position paper. revision: no

Circularity Check

0 steps flagged

No significant circularity; argument rests on standard cryptographic definitions and a reference implementation

full rationale

The paper is a position paper that formally defines experiment nonrepudiation, states required security properties, and describes a threat model using established cryptographic notions of tamper-evidence and nonrepudiation. It then presents K-Veritas as an independent reference implementation in Go to demonstrate solvability. No derivation chain reduces any claim to its own inputs by construction, no parameters are fitted and relabeled as predictions, and no load-bearing premises depend on self-citations to prior work by the same authors. The central proposal remains independent of the reference implementation's specific details.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The argument relies on standard cryptographic assumptions for signatures and tamper-evidence; no free parameters or new invented entities are introduced.

axioms (1)

standard math Standard cryptographic assumptions for digital signatures and tamper-evidence hold
The protocol definition depends on these established properties to achieve nonrepudiation.

pith-pipeline@v0.9.0 · 5473 in / 1012 out tokens · 47302 ms · 2026-05-12T00:49:13.404076+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

[1]

Artifact review and badging, 2020

ACM. Artifact review and badging, 2020. Version 1.1. https://www.acm.org/ publications/policies/artifact-review-and-badging-current

work page 2020
[2]

Henriques, Samuel Albanie, Michela Paganini, and Gül Varol

Luca Bertinetto, João F. Henriques, Samuel Albanie, Michela Paganini, and Gül Varol. Preface. In Luca Bertinetto, João F. Henriques, Samuel Albanie, Michela Paganini, and Gül Varol, editors,NeurIPS 2020 Workshop on Pre-registration in Machine Learning, volume 148 of Proceedings of Machine Learning Research, pages i–i. PMLR, 11 Dec 2021

work page 2020
[3]

Keras.https://keras.io, 2015

François Chollet et al. Keras.https://keras.io, 2015

work page 2015
[4]

The scientific method in the science of machine learning, 2019

Jessica Zosa Forde and Michela Paganini. The scientific method in the science of machine learning, 2019

work page 2019
[5]

Datasheets for datasets.Commun

Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wal- lach, Hal Daumé III, and Kate Crawford. Datasheets for datasets.Commun. ACM, 64(12):86–92, November 2021

work page 2021
[6]

Alexander Goldberg, Ihsan Ullah, Thanh Gia Hieu Khuong, Benedictus Kent Rachmat, Zhen Xu, Isabelle Guyon, and Nihar B. Shah. Usefulness of LLMs as an author checklist assistant for scientific papers: NeurIPS’24 experiment.arXiv preprint arXiv:2411.03417, 2024. https: //arxiv.org/abs/2411.03417

work page arXiv 2024
[7]

Sources of irreproducibility in machine learning: A review, 2023

Odd Erik Gundersen, Kevin Coakley, Christine Kirkpatrick, and Yolanda Gil. Sources of irreproducibility in machine learning: A review, 2023

work page 2023
[8]

State of the art: Reproducibility in artificial in- telligence

Odd Erik Gundersen and Sigbjørn Kjensmo. State of the art: Reproducibility in artificial in- telligence. pages 1644–1651, 2018. https://ojs.aaai.org/index.php/AAAI/article/ view/11503. 3ICML 2025 Reviewer Instructions:https://icml.cc/Conferences/2025/ReviewerInstructions 9

work page 2018
[9]

Hofman, Angelos Chatzimparmpas, Amit Sharma, Duncan J

Jake M. Hofman, Angelos Chatzimparmpas, Amit Sharma, Duncan J. Watts, and Jessica Hullman. Pre-registration for predictive modeling, 2023

work page 2023
[10]

Artificial intelligence faces reproducibility crisis

Matthew Hutson. Artificial intelligence faces reproducibility crisis.Science, 359(6377):725– 726, 2018.https://doi.org/10.1126/science.359.6377.725

work page doi:10.1126/science.359.6377.725 2018
[11]

ICML 2024 paper guidelines, 2024

ICML. ICML 2024 paper guidelines, 2024. https://icml.cc/Conferences/2024/ PaperGuidelines

work page 2024
[12]

Cantrell, Kenny Peng, Thanh Hien Pham, Christopher A

Sayash Kapoor, Emily M. Cantrell, Kenny Peng, Thanh Hien Pham, Christopher A. Bail, Odd Erik Gundersen, Jake M. Hofman, Jessica Hullman, Michael A. Lones, Momin M. Ma- lik, Priyanka Nanayakkara, Russell A. Poldrack, Inioluwa Deborah Raji, Michael Roberts, Matthew J. Salganik, Marta Serra-Garcia, Brandon M. Stewart, Gilles Vandewiele, and Arvind Narayanan....

work page 2024
[13]

Leakage and the reproducibility crisis in ml-based science, 2022

Sayash Kapoor and Arvind Narayanan. Leakage and the reproducibility crisis in ml-based science, 2022

work page 2022
[14]

Roberta: A robustly optimized bert pretraining approach, 2019

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach, 2019

work page 2019
[15]

NeurIPS paper checklist guidelines, 2024.https://neurips.cc/public/guides/ PaperChecklist

NeurIPS. NeurIPS paper checklist guidelines, 2024.https://neurips.cc/public/guides/ PaperChecklist

work page 2024
[16]

Sigstore: Software signing for everybody

Zachary Newman, John Speed Meyers, and Santiago Torres-Arias. Sigstore: Software signing for everybody. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS ’22, page 2353–2367, New York, NY , USA, 2022. Association for Computing Machinery

work page 2022
[17]

get in researchers; we’re measuring reproducibility

Daniel Olszewski, Allison Lu, Carson Stillman, Kevin Warren, Cole Kitroser, Alejandro Pascual, Divyajyoti Ukirde, Kevin Butler, and Patrick Traynor. "get in researchers; we’re measuring reproducibility": A reproducibility study of machine learning papers in tier 1 security confer- ences. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Comm...

work page 2023
[18]

OpenReview: A platform for open peer review, 2024.https://openreview

OpenReview.net. OpenReview: A platform for open peer review, 2024.https://openreview. net/

work page 2024
[19]

Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Florence d’Alché Buc, Emily Fox, and Hugo Larochelle. Improving reproducibility in machine learning research (a report from the NeurIPS 2019 reproducibility program).Journal of Machine Learning Research, 22(164):1–20, 2021.https://jmlr.org/papers/v22/20-303.html

work page 2019
[20]

Reproducibility in machine-learning-based research: Overview, barriers, and drivers.AI Magazine, 46:e70002, 2025

Harald Semmelrock, Tony Ross-Hellauer, Simone Kopeinik, Dieter Theiler, Maximilian Haberl, Stefan Thalmann, and Dominik Kowald. Reproducibility in machine-learning-based research: Overview, barriers, and drivers.AI Magazine, 46:e70002, 2025. https://doi.org/10.1002/ aaai.70002

work page 2025
[21]

Manning, Andrew Ng, and Christopher Potts

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. InProceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA, October 2013. Association for Co...

work page 2013
[22]

Bailey, Ewa Deelman, Yolanda Gil, Brooks Hanson, Michael A

Victoria Stodden, Marcia McNutt, David H. Bailey, Ewa Deelman, Yolanda Gil, Brooks Hanson, Michael A. Heroux, John P.A. Ioannidis, and Michela Taufer. Enhancing reproducibility for computational methods.Science, 354(6317):1240–1241, 2016. 10

work page 2016
[23]

in-toto: Providing farm-to-table guarantees for bits and bytes

Santiago Torres-Arias, Hammad Afzali, Trishank Karthik Kuppusamy, Reza Curtmola, and Justin Cappos. in-toto: Providing farm-to-table guarantees for bits and bytes. In28th USENIX Security Symposium (USENIX Security 19), pages 1393–1410, Santa Clara, CA, August 2019. USENIX Association

work page 2019
[24]

Hotos xix panel report: Panel on future of reproduction and replication of systems research, 2023

Roberta De Viti, Solal Pirelli, and Vaastav Anand. Hotos xix panel report: Panel on future of reproduction and replication of systems research, 2023

work page 2023
[25]

Jianying Zhou and D. Gollman. A fair non-repudiation protocol. InProceedings of the 1996 IEEE Symposium on Security and Privacy, SP ’96, page 55, USA, 1996. IEEE Computer Society. 11 A Comparison with Existing Approaches Table 4: Comparison of nonrepudiation with existing reproducibility mechanisms. K-Veritas is one possible instantiation of a compliant p...

work page 1996