Self-Verifying Measurement Records: Hash-Linked Evidence Graphs for Hardware Benchmarking
Pith reviewed 2026-06-29 04:10 UTC · model grok-4.3
The pith
Reported hardware measurements can be made into tamper-evident records that anyone can verify offline.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We make a reported hardware measurement a tamper-evident, independently checkable record. Every quantity in the text, a table, or a figure is bound, by its content hash, to the observation and the verification behind it; the whole is a hash-linked, append-only structure that a verifier audits offline without trusting its producer. Matrix products are verified by a probabilistic identity at O(k n^2) cost under a tolerance derived from floating-point error analysis and calibrated to the device's measured residual floor.
What carries the argument
Hash-linked append-only evidence graph that binds each reported quantity to its observation and verification via content hashes, augmented by Freivalds identity for matrix products and algebraic checksums for other quantities.
If this is right
- A verifier can audit the entire record offline without trusting its producer.
- Wrong matrix products are rejected with probability 1-2^(-k) after tolerance calibration to the device's residual floor.
- Quantities without a probabilistic identity carry an algebraic checksum and a measured reproducibility class.
- Power or thermal stress applied from unprivileged access neither shifts the calibrated tolerance nor produces accepted silent errors.
- The physical-fault threat model is thereby restricted to rare defective parts or privileged attackers.
Where Pith is reading between the lines
- The same binding technique could be applied to other forms of computational reporting that currently rest on trust.
- Complete protection against privileged attackers would require the record to compose with a hardware root of trust.
- The reported residual-floor and reproducibility maps supply device-specific baselines that later work could use to refine tolerance settings.
Load-bearing premise
The calibrated tolerance derived from floating-point error analysis and the device's measured residual floor, together with Freivalds identity, suffices to reject wrong matrix products with probability 1-2^(-k) while physical faults remain limited to rare defective parts or privileged attackers.
What would settle it
An experiment in which a deliberately incorrect matrix product passes the verification check at the claimed rate, or a non-privileged physical fault on the device produces a silent error that the record accepts.
Figures
read the original abstract
Performance numbers reported for hardware are accepted on trust: the reader cannot recompute them, the apparatus is gone, and the silicon itself can be silently wrong, with fleet studies reporting on the order of one core in a thousand returning incorrect arithmetic with no error raised. We make a reported hardware measurement a tamper-evident, independently checkable record. Every quantity in the text, a table, or a figure is bound, by its content hash, to the observation and the verification behind it; the whole is a hash-linked, append-only structure (a transparency log for measurement) that a verifier audits offline without trusting its producer. Matrix products are verified by a probabilistic identity (Freivalds) at O(k n^2) cost under a tolerance we derive from floating-point error analysis and calibrate to the device's own measured residual floor, so a wrong product is rejected with probability 1 - 2^(-k); quantities with no such identity carry an algebraic checksum and a measured reproducibility class. We then treat the check itself as a security object: a probe seed committed for offline reproducibility is an attack surface, and a probe-aware adversary can hide a corruption in the probe's null space, fooling even a quorum of bit-identical witnesses, while a Fiat-Shamir challenge derived from the claimed output closes this. Driving the device from an unprivileged tenant's reach, with a di/dt power virus and a thermal soak, neither moves the calibrated tolerance nor produces a silent error, placing the physical-fault threat at the rare defective part or the privileged attacker and marking the boundary at which the record must compose with a hardware root of trust. We demonstrate the construction across Blackwell and Hopper GPUs and report a residual-floor and reproducibility map by precision, size, and device.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to construct tamper-evident, independently verifiable records of hardware benchmark measurements via hash-linked evidence graphs (a transparency log for measurements). Matrix products are checked via Freivalds' identity under a tolerance derived from floating-point error analysis plus the device's measured residual floor, yielding a claimed rejection probability of 1-2^{-k} for incorrect results; other quantities receive algebraic checksums and reproducibility classes. A Fiat-Shamir challenge derived from the claimed output closes the probe null-space attack surface. Physical stress tests (di/dt power virus, thermal soak) on Blackwell and Hopper GPUs are reported not to shift the tolerance or induce silent errors, confining the physical-fault threat to rare defective parts or privileged attackers. The construction is demonstrated with a residual-floor and reproducibility map by precision, size, and device.
Significance. If the tolerance derivation and security reduction are sound, the work would provide a practical mechanism for making reported hardware performance numbers independently auditable without trusting the producer or the apparatus. It combines standard hash chaining and Fiat-Shamir with numerical verification and device-specific calibration, and the empirical reproducibility map is a concrete contribution. The explicit treatment of the probe seed as an attack surface and the boundary condition for composition with a hardware root of trust are useful framing.
major comments (2)
- [§4] §4 (Freivalds verification under tolerance): the abstract and threat-model paragraph state that a wrong product is rejected with probability 1-2^{-k} after the tolerance τ is set from ||A||·||B||·ε + measured residual floor. Freivalds' identity is probabilistic only in exact arithmetic; the paper must show that the measure of the false-negative region created by the additive tolerance remains bounded by 2^{-k} (or that any adversarial product passing the check still lies inside the original 2^{-k} failure set). No such derivation or independence assumption between the random probe vector and rounding/residual errors is supplied.
- [§5] §5 (physical-fault experiments): the claim that di/dt power-virus and thermal-soak stress neither move the calibrated tolerance nor produce silent errors is load-bearing for confining the threat model to rare defective parts or privileged attackers. The section must report the number of trials, observed error rates, and statistical bounds that support this boundary; without them the reduction to a hardware root of trust cannot be evaluated.
minor comments (2)
- [Abstract] The reproducibility classes mentioned in the abstract are not defined or enumerated; a short table or paragraph listing the classes and the criteria for assignment would improve clarity.
- Notation for the hash-linked evidence graph (nodes, edges, commitment) is introduced gradually; an early figure or pseudocode listing the structure would aid the reader.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. We address the two major comments below and will revise the manuscript to incorporate the requested clarifications and data.
read point-by-point responses
-
Referee: [§4] §4 (Freivalds verification under tolerance): the abstract and threat-model paragraph state that a wrong product is rejected with probability 1-2^{-k} after the tolerance τ is set from ||A||·||B||·ε + measured residual floor. Freivalds' identity is probabilistic only in exact arithmetic; the paper must show that the measure of the false-negative region created by the additive tolerance remains bounded by 2^{-k} (or that any adversarial product passing the check still lies inside the original 2^{-k} failure set). No such derivation or independence assumption between the random probe vector and rounding/residual errors is supplied.
Authors: We agree that an explicit derivation is required. In the revision we will add a subsection proving that, under standard models of floating-point error (bounded by machine epsilon and independent of the random probe in distribution), the measure of the additional false-negative region is at most a small additive term that can be absorbed into the security parameter k without changing the claimed 1-2^{-k} bound. We will also state the precise independence assumption used. revision: yes
-
Referee: [§5] §5 (physical-fault experiments): the claim that di/dt power-virus and thermal-soak stress neither move the calibrated tolerance nor produce silent errors is load-bearing for confining the threat model to rare defective parts or privileged attackers. The section must report the number of trials, observed error rates, and statistical bounds that support this boundary; without them the reduction to a hardware root of trust cannot be evaluated.
Authors: We acknowledge that the current text of §5 omits the requested statistical details. The revision will expand the section to report the exact number of stress trials executed on each device, the observed silent-error count (zero), and the derived statistical bounds (e.g., upper confidence limits on the per-trial fault probability). revision: yes
Circularity Check
No circularity; claims rest on standard Freivalds identity and hash properties without reduction to self-defined inputs
full rationale
The abstract derives a tolerance from floating-point error analysis plus measured residual floor, then invokes the established Freivalds probabilistic bound (1-2^(-k)) for rejection of incorrect matrix products. Hash-linking and append-only structure rely on standard cryptographic properties. No equation or claim reduces a reported prediction to a fitted parameter by construction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled via prior work. The chain is externally grounded in Freivalds' algorithm and hash collision resistance.
Axiom & Free-Parameter Ledger
free parameters (1)
- tolerance =
device-specific residual floor
axioms (1)
- standard math Freivalds' probabilistic identity correctly verifies matrix products at the stated cost and error probability
invented entities (1)
-
hash-linked evidence graph
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Hawkeye: Reproducing GPU-level non- determinism.arXiv:2603.20421, 2026
Erez Badash, Dan Boneh, Ilan Komargodski, and Megha Srivastava. Hawkeye: Reproducing GPU-level non- determinism.arXiv:2603.20421, 2026
Pith/arXiv arXiv 2026
-
[2]
IPFS: Content addressed, versioned, P2P file system
Juan Benet. IPFS: Content addressed, versioned, P2P file system. InarXiv:1407.3561, 2014
Pith/arXiv arXiv 2014
-
[3]
Validation of GPU computation in decentralized, trustless networks.arXiv:2501.05374, 2025
Eric Boniardi, Stanley Bishop, and Alison Haire. Validation of GPU computation in decentralized, trustless networks.arXiv:2501.05374, 2025
arXiv 2025
-
[4]
Proebsting
Christian Collberg and Todd A. Proebsting. Repeatability in computer systems research.Communications of the ACM, 59(3):62–69, 2016
2016
-
[5]
Practical verified computation with streaming interactive proofs
Graham Cormode, Justin Thaler, and Ke Yi. Practical verified computation with streaming interactive proofs. InProc. Innovations in Theoretical Computer Science (ITCS), 2012
2012
-
[6]
Crosby and Dan S
Scott A. Crosby and Dan S. Wallach. Efficient data structures for tamper-evident logging. InProceedings of the 18th USENIX Security Symposium, 2009
2009
-
[7]
Huangliang Dai, Shixun Wu, Hairui Zhao, Jiajun Huang, Zizhe Jian, Yue Zhu, Haiyang Hu, and Zizhong Chen. FT-Transformer: Resilient and reliable transformer with end-to-end fault tolerant attention.arXiv:2504.02211, 2025
arXiv 2025
-
[8]
Fu, Stefano Ermon, Atri Rudra, and Christopher Ré
Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. FlashAttention: Fast and memory- efficient exact attention with IO-awareness. InAdvances in Neural Information Processing Systems (NeurIPS), 2022
2022
-
[9]
Parallel reproducible summation.IEEE Transactions on Computers, 64(7):2060–2070, 2015
James Demmel and Hong Diep Nguyen. Parallel reproducible summation.IEEE Transactions on Computers, 64(7):2060–2070, 2015
2060
-
[10]
Silent data corruptions at scale.arXiv:2102.11245, 2021
Harish Dattatraya Dixit, Sneha Pendharkar, Matt Beadon, Chris Mason, Tejasvi Chakravarthy, Bharath Muthiah, and Sriram Sankar. Silent data corruptions at scale.arXiv:2102.11245, 2021
arXiv 2021
-
[11]
Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI act); logging and record-keeping provisions
European Union. Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI act); logging and record-keeping provisions. Official Journal of the European Union, 2024
2024
-
[12]
Regulation (EU) 2024/2847 on horizontal cybersecurity requirements for products with digital elements (Cyber Resilience Act)
European Union. Regulation (EU) 2024/2847 on horizontal cybersecurity requirements for products with digital elements (Cyber Resilience Act). Official Journal of the European Union, 2024
2024
-
[13]
How to prove yourself: Practical solutions to identification and signature problems
Amos Fiat and Adi Shamir. How to prove yourself: Practical solutions to identification and signature problems. InAdvances in Cryptology: CRYPTO ’86, volume 263 ofLNCS, pages 186–194. Springer, 1987
1987
-
[14]
Probabilistic machines can use less running time
R¯ usin,š Freivalds. Probabilistic machines can use less running time. InInformation Processing 77 (IFIP Congress), pages 839–842, 1977
1977
-
[15]
The knowledge complexity of interactive proof systems
Shafi Goldwasser, Silvio Micali, and Charles Rackoff. The knowledge complexity of interactive proof systems. SIAM Journal on Computing, 18(1):186–208, 1989. 15
1989
-
[16]
Higham.Accuracy and Stability of Numerical Algorithms
Nicholas J. Higham.Accuracy and Stability of Numerical Algorithms. SIAM, 2nd edition, 2002
2002
-
[17]
Higham and Theo Mary
Nicholas J. Higham and Theo Mary. A new approach to probabilistic rounding error analysis.SIAM Journal on Scientific Computing, 41(5):A2815–A2835, 2019
2019
-
[18]
Hochschild, Paul Turner, Jeffrey C
Peter H. Hochschild, Paul Turner, Jeffrey C. Mogul, Rama Govindaraju, Parthasarathy Ranganathan, David E. Culler, and Amin Vahdat. Cores that don’t count. InProceedings of the Workshop on Hot Topics in Operating Systems (HotOS), pages 9–16. ACM, 2021
2021
-
[19]
Scientific benchmarking of parallel computing systems
Torsten Hoefler and Roberto Belli. Scientific benchmarking of parallel computing systems. InProc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis (SC). ACM, 2015
2015
-
[20]
Kuang-Hua Huang and Jacob A. Abraham. Algorithm-based fault tolerance for matrix operations.IEEE Transactions on Computers, C-33(6):518–528, 1984
1984
-
[21]
Aaron Jarmusch and Sunita Chandrasekaran. Microbenchmarking NVIDIA’s Blackwell architecture: An in-depth architectural analysis.arXiv:2512.02189, 2025
arXiv 2025
-
[22]
Dissecting the NVIDIA Blackwell architecture with microbenchmarks.arXiv:2507.10789, 2025
Aaron Jarmusch, Nathan Graddon, and Sunita Chandrasekaran. Dissecting the NVIDIA Blackwell architecture with microbenchmarks.arXiv:2507.10789, 2025
arXiv 2025
-
[23]
DRAWNAPART: A device identification technique based on remote GPU fingerprinting
Tomer Laor, Naif Mehanna, Antonin Durey, Vitaly Dyadyuk, Pierre Laperdrix, Clémentine Maurice, Yossi Oren, Romain Rouvoy, Walter Rudametkin, and Yuval Yarom. DRAWNAPART: A device identification technique based on remote GPU fingerprinting. InNetwork and Distributed System Security Symposium (NDSS), 2022
2022
-
[24]
Certificate transparency.Communications of the ACM, 57(10):40–46, 2014
Ben Laurie. Certificate transparency.Communications of the ACM, 57(10):40–46, 2014
2014
-
[25]
Lee and Katrina A
John D. Lee and Katrina A. See. Trust in automation: Designing for appropriate reliance.Human Factors, 46(1):50–80, 2004
2004
-
[26]
Lin, Onur Mutlu, et al
Chris S. Lin, Onur Mutlu, et al. GPUHammer: Rowhammer attacks on GPU memories are practical. In Proceedings of the 34th USENIX Security Symposium, 2025
2025
-
[27]
LLM-PRISM Authors. LLM-PRISM: Characterizing silent data corruption from permanent GPU faults in LLM training.arXiv:2604.10390, 2026
Pith/arXiv arXiv 2026
-
[28]
Melara, Aaron Blankstein, Joseph Bonneau, Edward W
Marcela S. Melara, Aaron Blankstein, Joseph Bonneau, Edward W. Felten, and Michael J. Freedman. CONIKS: Bringing key transparency to end users. InProceedings of the 24th USENIX Security Symposium, 2015
2015
-
[29]
Ralph C. Merkle. A digital signature based on a conventional encryption function. InAdvances in Cryptology: CRYPTO ’87, volume 293 ofLNCS, pages 369–378. Springer, 1988
1988
-
[30]
PROV-DM: The PROV data model
Luc Moreau and Paolo Missier. PROV-DM: The PROV data model. W3c recommendation, World Wide Web Consortium (W3C), 2013
2013
-
[31]
Garcia, Jo Van Bulck, Daniel Gruss, and Frank Piessens
Kit Murdock, David Oswald, Flavio D. Garcia, Jo Van Bulck, Daniel Gruss, and Frank Piessens. Plundervolt: Software-based fault injection attacks against Intel SGX. InIEEE Symposium on Security and Privacy (S&P), 2020
2020
-
[32]
Secure hash standard (SHS)
National Institute of Standards and Technology. Secure hash standard (SHS). Technical Report FIPS PUB 180-4, NIST, 2015
2015
-
[33]
NVIDIA confidential computing and device attestation for Hopper and Blackwell GPUs
NVIDIA Corporation. NVIDIA confidential computing and device attestation for Hopper and Blackwell GPUs. Whitepaper, NVIDIA Corporation, 2024
2024
-
[34]
NVIDIA Blackwell GPU architecture
NVIDIA Corporation. NVIDIA Blackwell GPU architecture. Whitepaper, NVIDIA Corporation, 2025
2025
-
[35]
GeForce RTX 5090: Specifications
NVIDIA Corporation. GeForce RTX 5090: Specifications. https://www.nvidia.com/en-us/geforce/ graphics-cards/50-series/rtx-5090/, 2026. Accessed 2026-06-25
2026
-
[36]
NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition: Specifications.https:// www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000-max-q/ , 2026
NVIDIA Corporation. NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition: Specifications.https:// www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000-max-q/ , 2026. Accessed 2026-06-25
2026
-
[37]
NVIDIA RTX PRO 6000 Blackwell Server Edition: Specifications.https://www.nvidia
NVIDIA Corporation. NVIDIA RTX PRO 6000 Blackwell Server Edition: Specifications.https://www.nvidia. com/en-us/data-center/rtx-pro-6000-blackwell-server-edition/, 2026. Accessed 2026-06-25
2026
-
[38]
Physical one-way functions.Science, 297(5589):2026–2030, 2002
Ravikanth Pappu, Ben Recht, Jason Taylor, and Neil Gershenfeld. Physical one-way functions.Science, 297(5589):2026–2030, 2002
2026
-
[39]
Majid Sabbagh, Yunsi Fei, and David Kaeli. Lightning: Striking the secure isolation on GPU clouds with transient hardware faults.arXiv:2112.03662, 2021
arXiv 2021
-
[40]
The anatomy of silent data corruption: GPU error pattern study and modeling guidance
SDC Anatomy Authors. The anatomy of silent data corruption: GPU error pattern study and modeling guidance. arXiv:2605.04213, 2026. 16
Pith/arXiv arXiv 2026
-
[41]
Sanjif Shanmugavelu, Mathieu Taillefumier, Christopher Culver, Oscar Hernandez, Mark Coletti, and Ada Sedova. Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications. arXiv:2408.05148, 2024
arXiv 2024
-
[42]
Sinclair, and Shivaram Venkataraman
Prasoon Sinha, Akhil Guliani, Rutwik Jain, Brandon Tran, Matthew D. Sinclair, and Shivaram Venkataraman. Not all GPUs are created equal: Characterizing variability in large-scale, accelerator-rich systems. InProc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis (SC). IEEE, 2022
2022
-
[43]
honest or bust
Ewa Syta, Iulia Tamas, Dylan Visher, David Isaac Wolinsky, Philipp Jovanovic, Linus Gasser, Nicolas Gailly, Ismail Khoffi, and Bryan Ford. Keeping authorities “honest or bust” with decentralized witness cosigning. In IEEE Symposium on Security and Privacy (S&P), 2016
2016
-
[44]
Time-optimal interactive proofs for circuit evaluation
Justin Thaler. Time-optimal interactive proofs for circuit evaluation. InAdvances in Cryptology: CRYPTO 2013, volume 8043 ofLNCS, pages 71–89. Springer, 2013
2013
-
[45]
Custom algorithm-based fault tolerance for attention layers in transformers.arXiv:2507.16676, 2025
Vasileios Titopoulos, Kosmas Alexandridis, and Giorgos Dimitrakopoulos. Custom algorithm-based fault tolerance for attention layers in transformers.arXiv:2507.16676, 2025
arXiv 2025
-
[46]
Paris agreement, article 13: Enhanced transparency framework
United Nations Framework Convention on Climate Change. Paris agreement, article 13: Enhanced transparency framework. United Nations, 2015
2015
-
[47]
Precision & performance: Floating point and IEEE 754 compliance for NVIDIA GPUs
Nathan Whitehead and Alex Fit-Florea. Precision & performance: Floating point and IEEE 754 compliance for NVIDIA GPUs. Technical report, NVIDIA Corporation, 2011
2011
-
[48]
Wilkinson et al
Mark D. Wilkinson et al. The FAIR guiding principles for scientific data management and stewardship.Scientific Data, 3:160018, 2016
2016
-
[49]
TAO: Tolerance-aware optimistic verification for floating-point neural networks
Jianzhu Yao, Hongxu Su, Taobo Liao, Zerui Cheng, Huan Zhang, Xuechao Wang, and Pramod Viswanath. TAO: Tolerance-aware optimistic verification for floating-point neural networks. InProceedings of the 21st European Conference on Computer Systems (EuroSys), pages 1515–1532, 2026. 17
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.