pith. machine review for the scientific record. sign in

arxiv: 2605.08586 · v1 · submitted 2026-05-09 · 💻 cs.CR

Recognition: no theorem link

Computer Science Conferences Should Require Nonrepudiable Experimental Results

Mamadou K. Keita , Christopher Homan

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:49 UTC · model grok-4.3

classification 💻 cs.CR
keywords experiment nonrepudiationreproducibilitytamper-evident loggingconference policiesscientific integritycryptographic attestationexperimental validation
0
0 comments X

The pith

Computer science conferences should require tamper-evident attestations that bind reported experimental results to actual code executions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This position paper argues that conferences must mandate nonrepudiable attestations for experimental results because current practices like self-reported checklists and optional code sharing fail to confirm whether the numbers in a paper were actually produced by the described code. The authors define experiment nonrepudiation as the requirement that any protocol must cryptographically link results to executions in a way authors cannot later alter or deny. They outline the necessary security properties and a threat model that includes attacks not stopped by existing methods, then present a working prototype to show the approach is feasible. If adopted, this change would make published experimental claims verifiable rather than reliant on author trust alone.

Core claim

The paper claims that experiment nonrepudiation must become a required property of published work, where any compliant protocol binds the numbers reported in a paper to an actual executed computation through tamper-evident and nonrepudiable means. Existing systems based on author-controlled logging and voluntary code release do not satisfy the formal security properties needed to prevent denial or post-execution changes. The authors demonstrate that the problem is solvable by building K-Veritas, a Go implementation that produces signed reports of computations without requiring access to training data, and they call for the community to develop an open standard.

What carries the argument

Experiment nonrepudiation, the protocol property that cryptographically binds reported results to actual executions so the author cannot later deny or alter the connection.

If this is right

  • Reviewers would receive cryptographic assurance that the numbers in a submission match a real run of the code the paper describes.
  • Authors would need to generate compliant signed reports as part of the submission process.
  • Conferences could treat nonrepudiation as a mandatory check comparable to other integrity requirements.
  • The community would need to maintain open, independent standards for generating and verifying the attestations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such attestations could provide persistent records that support long-term reproducibility checks beyond the initial review.
  • The approach might extend to other computational fields where data access or post-hoc result changes are concerns.
  • Widespread use would likely require community-supported tools to avoid disadvantaging smaller research groups.

Load-bearing premise

That a practical, low-overhead protocol can be standardized and adopted by conferences without creating new barriers to publishing or excluding researchers without access to specialized tooling.

What would settle it

A test in which authors attempt to submit altered or fabricated experimental results using the attestation protocol and the system either accepts the false claims or imposes requirements that prevent legitimate papers from being published.

Figures

Figures reproduced from arXiv: 2605.08586 by Christopher Homan, Mamadou K. Keita.

Figure 1
Figure 1. Figure 1: K-Veritas verification workflow. The user runs experiments with K-Veritas commands, writes the paper, attaches the signed report, and submits everything for review. K-Veritas is used here as a testbed for a more general protocol. Preprint. arXiv:2605.08586v1 [cs.CR] 9 May 2026 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
read the original abstract

This position paper argues that computer science conferences should require tamper-evident, nonrepudiable attestations of experimental results. We name the underlying problem experiment nonrepudiation: a compliant protocol must bind the numbers in a paper to an actual executed computation in a way the author cannot later alter or deny. The current system relies on self-reported checklists, optional code sharing, and author-controlled logging. None of these mechanisms answer the question a reviewer cannot check: did the code the paper describes produce the numbers the paper reports? We define the problem formally, state the security properties any compliant protocol must satisfy, and describe a threat model that includes attacks current approaches do not prevent. To show that the problem is solvable, we built K-Veritas, a reference implementation in Go that produces signed reports without accessing training data. K-Veritas is a testbed, not a finished answer. We call on conferences and the community to treat nonrepudiation as a first-class requirement and to help build an open, independent standard for it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. This position paper argues that computer science conferences should require tamper-evident, nonrepudiable attestations of experimental results. It formalizes the problem of 'experiment nonrepudiation,' defines the security properties any compliant protocol must satisfy, presents a threat model including attacks not prevented by current self-reported methods, and introduces K-Veritas, a Go reference implementation that produces signed reports without accessing training data, as a testbed to demonstrate solvability.

Significance. If the central proposal holds, it could meaningfully strengthen reproducibility and trust in experimental computer science by binding reported numbers to executed computations in a verifiable way. The formalization of the problem and the construction of a reference implementation that avoids direct data access are strengths that credit the authors for moving beyond checklists toward cryptographic nonrepudiation.

major comments (1)
  1. [K-Veritas implementation] The K-Veritas implementation section states that the tool produces signed reports without accessing training data and positions it as a testbed for standardization, but supplies no overhead measurements, no runtime or storage benchmarks, and no compatibility evaluation with common workflows (Python scripts, Jupyter notebooks, or cloud-based training). This is load-bearing for the claim that a practical, low-overhead protocol can be adopted without creating new barriers to publishing.
minor comments (1)
  1. [Abstract and Introduction] The abstract and introduction could more explicitly cross-reference the exact security properties (e.g., binding, non-repudiation, tamper-evidence) defined later in the formal section to improve readability for readers unfamiliar with the threat model.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and positive remarks on the formalization of experiment nonrepudiation. We respond to the single major comment below.

read point-by-point responses
  1. Referee: [K-Veritas implementation] The K-Veritas implementation section states that the tool produces signed reports without accessing training data and positions it as a testbed for standardization, but supplies no overhead measurements, no runtime or storage benchmarks, and no compatibility evaluation with common workflows (Python scripts, Jupyter notebooks, or cloud-based training). This is load-bearing for the claim that a practical, low-overhead protocol can be adopted without creating new barriers to publishing.

    Authors: We agree that the K-Veritas section contains no overhead measurements, runtime or storage benchmarks, or compatibility evaluations with Python, Jupyter, or cloud workflows. However, we do not view these as load-bearing for the paper's claims. The manuscript explicitly frames K-Veritas as 'a reference implementation in Go' and 'a testbed, not a finished answer' whose sole purpose is to demonstrate solvability: that a protocol can produce signed reports meeting the defined security properties without accessing training data. The position paper argues that conferences should treat nonrepudiation as a requirement and that the problem is technically solvable; it makes no assertion that K-Veritas itself is a practical, low-overhead system ready for adoption or that it avoids new barriers. Performance and integration details are appropriate for subsequent engineering or standardization work, which the paper invites. We therefore see no need to expand the testbed description with benchmarks in this position paper. revision: no

Circularity Check

0 steps flagged

No significant circularity; argument rests on standard cryptographic definitions and a reference implementation

full rationale

The paper is a position paper that formally defines experiment nonrepudiation, states required security properties, and describes a threat model using established cryptographic notions of tamper-evidence and nonrepudiation. It then presents K-Veritas as an independent reference implementation in Go to demonstrate solvability. No derivation chain reduces any claim to its own inputs by construction, no parameters are fitted and relabeled as predictions, and no load-bearing premises depend on self-citations to prior work by the same authors. The central proposal remains independent of the reference implementation's specific details.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The argument relies on standard cryptographic assumptions for signatures and tamper-evidence; no free parameters or new invented entities are introduced.

axioms (1)
  • standard math Standard cryptographic assumptions for digital signatures and tamper-evidence hold
    The protocol definition depends on these established properties to achieve nonrepudiation.

pith-pipeline@v0.9.0 · 5473 in / 1012 out tokens · 47302 ms · 2026-05-12T00:49:13.404076+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    Artifact review and badging, 2020

    ACM. Artifact review and badging, 2020. Version 1.1. https://www.acm.org/ publications/policies/artifact-review-and-badging-current

  2. [2]

    Henriques, Samuel Albanie, Michela Paganini, and Gül Varol

    Luca Bertinetto, João F. Henriques, Samuel Albanie, Michela Paganini, and Gül Varol. Preface. In Luca Bertinetto, João F. Henriques, Samuel Albanie, Michela Paganini, and Gül Varol, editors,NeurIPS 2020 Workshop on Pre-registration in Machine Learning, volume 148 of Proceedings of Machine Learning Research, pages i–i. PMLR, 11 Dec 2021

  3. [3]

    Keras.https://keras.io, 2015

    François Chollet et al. Keras.https://keras.io, 2015

  4. [4]

    The scientific method in the science of machine learning, 2019

    Jessica Zosa Forde and Michela Paganini. The scientific method in the science of machine learning, 2019

  5. [5]

    Datasheets for datasets.Commun

    Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wal- lach, Hal Daumé III, and Kate Crawford. Datasheets for datasets.Commun. ACM, 64(12):86–92, November 2021

  6. [6]

    Alexander Goldberg, Ihsan Ullah, Thanh Gia Hieu Khuong, Benedictus Kent Rachmat, Zhen Xu, Isabelle Guyon, and Nihar B. Shah. Usefulness of LLMs as an author checklist assistant for scientific papers: NeurIPS’24 experiment.arXiv preprint arXiv:2411.03417, 2024. https: //arxiv.org/abs/2411.03417

  7. [7]

    Sources of irreproducibility in machine learning: A review, 2023

    Odd Erik Gundersen, Kevin Coakley, Christine Kirkpatrick, and Yolanda Gil. Sources of irreproducibility in machine learning: A review, 2023

  8. [8]

    State of the art: Reproducibility in artificial in- telligence

    Odd Erik Gundersen and Sigbjørn Kjensmo. State of the art: Reproducibility in artificial in- telligence. pages 1644–1651, 2018. https://ojs.aaai.org/index.php/AAAI/article/ view/11503. 3ICML 2025 Reviewer Instructions:https://icml.cc/Conferences/2025/ReviewerInstructions 9

  9. [9]

    Hofman, Angelos Chatzimparmpas, Amit Sharma, Duncan J

    Jake M. Hofman, Angelos Chatzimparmpas, Amit Sharma, Duncan J. Watts, and Jessica Hullman. Pre-registration for predictive modeling, 2023

  10. [10]

    Artificial intelligence faces reproducibility crisis

    Matthew Hutson. Artificial intelligence faces reproducibility crisis.Science, 359(6377):725– 726, 2018.https://doi.org/10.1126/science.359.6377.725

  11. [11]

    ICML 2024 paper guidelines, 2024

    ICML. ICML 2024 paper guidelines, 2024. https://icml.cc/Conferences/2024/ PaperGuidelines

  12. [12]

    Cantrell, Kenny Peng, Thanh Hien Pham, Christopher A

    Sayash Kapoor, Emily M. Cantrell, Kenny Peng, Thanh Hien Pham, Christopher A. Bail, Odd Erik Gundersen, Jake M. Hofman, Jessica Hullman, Michael A. Lones, Momin M. Ma- lik, Priyanka Nanayakkara, Russell A. Poldrack, Inioluwa Deborah Raji, Michael Roberts, Matthew J. Salganik, Marta Serra-Garcia, Brandon M. Stewart, Gilles Vandewiele, and Arvind Narayanan....

  13. [13]

    Leakage and the reproducibility crisis in ml-based science, 2022

    Sayash Kapoor and Arvind Narayanan. Leakage and the reproducibility crisis in ml-based science, 2022

  14. [14]

    Roberta: A robustly optimized bert pretraining approach, 2019

    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach, 2019

  15. [15]

    NeurIPS paper checklist guidelines, 2024.https://neurips.cc/public/guides/ PaperChecklist

    NeurIPS. NeurIPS paper checklist guidelines, 2024.https://neurips.cc/public/guides/ PaperChecklist

  16. [16]

    Sigstore: Software signing for everybody

    Zachary Newman, John Speed Meyers, and Santiago Torres-Arias. Sigstore: Software signing for everybody. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS ’22, page 2353–2367, New York, NY , USA, 2022. Association for Computing Machinery

  17. [17]

    get in researchers; we’re measuring reproducibility

    Daniel Olszewski, Allison Lu, Carson Stillman, Kevin Warren, Cole Kitroser, Alejandro Pascual, Divyajyoti Ukirde, Kevin Butler, and Patrick Traynor. "get in researchers; we’re measuring reproducibility": A reproducibility study of machine learning papers in tier 1 security confer- ences. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Comm...

  18. [18]

    OpenReview: A platform for open peer review, 2024.https://openreview

    OpenReview.net. OpenReview: A platform for open peer review, 2024.https://openreview. net/

  19. [19]

    Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Florence d’Alché Buc, Emily Fox, and Hugo Larochelle. Improving reproducibility in machine learning research (a report from the NeurIPS 2019 reproducibility program).Journal of Machine Learning Research, 22(164):1–20, 2021.https://jmlr.org/papers/v22/20-303.html

  20. [20]

    Reproducibility in machine-learning-based research: Overview, barriers, and drivers.AI Magazine, 46:e70002, 2025

    Harald Semmelrock, Tony Ross-Hellauer, Simone Kopeinik, Dieter Theiler, Maximilian Haberl, Stefan Thalmann, and Dominik Kowald. Reproducibility in machine-learning-based research: Overview, barriers, and drivers.AI Magazine, 46:e70002, 2025. https://doi.org/10.1002/ aaai.70002

  21. [21]

    Manning, Andrew Ng, and Christopher Potts

    Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. InProceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA, October 2013. Association for Co...

  22. [22]

    Bailey, Ewa Deelman, Yolanda Gil, Brooks Hanson, Michael A

    Victoria Stodden, Marcia McNutt, David H. Bailey, Ewa Deelman, Yolanda Gil, Brooks Hanson, Michael A. Heroux, John P.A. Ioannidis, and Michela Taufer. Enhancing reproducibility for computational methods.Science, 354(6317):1240–1241, 2016. 10

  23. [23]

    in-toto: Providing farm-to-table guarantees for bits and bytes

    Santiago Torres-Arias, Hammad Afzali, Trishank Karthik Kuppusamy, Reza Curtmola, and Justin Cappos. in-toto: Providing farm-to-table guarantees for bits and bytes. In28th USENIX Security Symposium (USENIX Security 19), pages 1393–1410, Santa Clara, CA, August 2019. USENIX Association

  24. [24]

    Hotos xix panel report: Panel on future of reproduction and replication of systems research, 2023

    Roberta De Viti, Solal Pirelli, and Vaastav Anand. Hotos xix panel report: Panel on future of reproduction and replication of systems research, 2023

  25. [25]

    Jianying Zhou and D. Gollman. A fair non-repudiation protocol. InProceedings of the 1996 IEEE Symposium on Security and Privacy, SP ’96, page 55, USA, 1996. IEEE Computer Society. 11 A Comparison with Existing Approaches Table 4: Comparison of nonrepudiation with existing reproducibility mechanisms. K-Veritas is one possible instantiation of a compliant p...