pith. sign in

arxiv: 2606.27934 · v1 · pith:BLBRVZSPnew · submitted 2026-06-26 · 💻 cs.CR · cs.AR

Self-Verifying Measurement Records: Hash-Linked Evidence Graphs for Hardware Benchmarking

Pith reviewed 2026-06-29 04:10 UTC · model grok-4.3

classification 💻 cs.CR cs.AR
keywords hardware benchmarkingtamper-evident recordshash-linked evidence graphsFreivalds verificationmeasurement transparencyevidence graphsGPU benchmarks
0
0 comments X

The pith

Reported hardware measurements can be made into tamper-evident records that anyone can verify offline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to bind every reported hardware measurement to a hash-linked append-only structure so that each quantity carries its own observation record and verification proof. A reader can then audit the entire log offline without needing to trust the device owner or the hardware. For matrix products the verification uses Freivalds probabilistic identity after calibrating a tolerance to the device's measured floating-point residual floor, rejecting wrong results with probability 1-2^(-k). Other quantities receive algebraic checksums together with a measured reproducibility class. The method also closes an attack vector in which an adversary who knows the verification probes could hide corruption in their null space.

Core claim

We make a reported hardware measurement a tamper-evident, independently checkable record. Every quantity in the text, a table, or a figure is bound, by its content hash, to the observation and the verification behind it; the whole is a hash-linked, append-only structure that a verifier audits offline without trusting its producer. Matrix products are verified by a probabilistic identity at O(k n^2) cost under a tolerance derived from floating-point error analysis and calibrated to the device's measured residual floor.

What carries the argument

Hash-linked append-only evidence graph that binds each reported quantity to its observation and verification via content hashes, augmented by Freivalds identity for matrix products and algebraic checksums for other quantities.

If this is right

  • A verifier can audit the entire record offline without trusting its producer.
  • Wrong matrix products are rejected with probability 1-2^(-k) after tolerance calibration to the device's residual floor.
  • Quantities without a probabilistic identity carry an algebraic checksum and a measured reproducibility class.
  • Power or thermal stress applied from unprivileged access neither shifts the calibrated tolerance nor produces accepted silent errors.
  • The physical-fault threat model is thereby restricted to rare defective parts or privileged attackers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same binding technique could be applied to other forms of computational reporting that currently rest on trust.
  • Complete protection against privileged attackers would require the record to compose with a hardware root of trust.
  • The reported residual-floor and reproducibility maps supply device-specific baselines that later work could use to refine tolerance settings.

Load-bearing premise

The calibrated tolerance derived from floating-point error analysis and the device's measured residual floor, together with Freivalds identity, suffices to reject wrong matrix products with probability 1-2^(-k) while physical faults remain limited to rare defective parts or privileged attackers.

What would settle it

An experiment in which a deliberately incorrect matrix product passes the verification check at the claimed rate, or a non-privileged physical fault on the device produces a silent error that the record accepts.

Figures

Figures reproduced from arXiv: 2606.27934 by Baris Basaran, Faruk Alpay.

Figure 1
Figure 1. Figure 1: The evidence graph for one workload. Each arrow carries the content hash of the node it points at [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Throughput against SM clock for a bounded ( [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The archive as a single sealed record: the document sources compile to the paper, the ancillary [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
read the original abstract

Performance numbers reported for hardware are accepted on trust: the reader cannot recompute them, the apparatus is gone, and the silicon itself can be silently wrong, with fleet studies reporting on the order of one core in a thousand returning incorrect arithmetic with no error raised. We make a reported hardware measurement a tamper-evident, independently checkable record. Every quantity in the text, a table, or a figure is bound, by its content hash, to the observation and the verification behind it; the whole is a hash-linked, append-only structure (a transparency log for measurement) that a verifier audits offline without trusting its producer. Matrix products are verified by a probabilistic identity (Freivalds) at O(k n^2) cost under a tolerance we derive from floating-point error analysis and calibrate to the device's own measured residual floor, so a wrong product is rejected with probability 1 - 2^(-k); quantities with no such identity carry an algebraic checksum and a measured reproducibility class. We then treat the check itself as a security object: a probe seed committed for offline reproducibility is an attack surface, and a probe-aware adversary can hide a corruption in the probe's null space, fooling even a quorum of bit-identical witnesses, while a Fiat-Shamir challenge derived from the claimed output closes this. Driving the device from an unprivileged tenant's reach, with a di/dt power virus and a thermal soak, neither moves the calibrated tolerance nor produces a silent error, placing the physical-fault threat at the rare defective part or the privileged attacker and marking the boundary at which the record must compose with a hardware root of trust. We demonstrate the construction across Blackwell and Hopper GPUs and report a residual-floor and reproducibility map by precision, size, and device.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to construct tamper-evident, independently verifiable records of hardware benchmark measurements via hash-linked evidence graphs (a transparency log for measurements). Matrix products are checked via Freivalds' identity under a tolerance derived from floating-point error analysis plus the device's measured residual floor, yielding a claimed rejection probability of 1-2^{-k} for incorrect results; other quantities receive algebraic checksums and reproducibility classes. A Fiat-Shamir challenge derived from the claimed output closes the probe null-space attack surface. Physical stress tests (di/dt power virus, thermal soak) on Blackwell and Hopper GPUs are reported not to shift the tolerance or induce silent errors, confining the physical-fault threat to rare defective parts or privileged attackers. The construction is demonstrated with a residual-floor and reproducibility map by precision, size, and device.

Significance. If the tolerance derivation and security reduction are sound, the work would provide a practical mechanism for making reported hardware performance numbers independently auditable without trusting the producer or the apparatus. It combines standard hash chaining and Fiat-Shamir with numerical verification and device-specific calibration, and the empirical reproducibility map is a concrete contribution. The explicit treatment of the probe seed as an attack surface and the boundary condition for composition with a hardware root of trust are useful framing.

major comments (2)
  1. [§4] §4 (Freivalds verification under tolerance): the abstract and threat-model paragraph state that a wrong product is rejected with probability 1-2^{-k} after the tolerance τ is set from ||A||·||B||·ε + measured residual floor. Freivalds' identity is probabilistic only in exact arithmetic; the paper must show that the measure of the false-negative region created by the additive tolerance remains bounded by 2^{-k} (or that any adversarial product passing the check still lies inside the original 2^{-k} failure set). No such derivation or independence assumption between the random probe vector and rounding/residual errors is supplied.
  2. [§5] §5 (physical-fault experiments): the claim that di/dt power-virus and thermal-soak stress neither move the calibrated tolerance nor produce silent errors is load-bearing for confining the threat model to rare defective parts or privileged attackers. The section must report the number of trials, observed error rates, and statistical bounds that support this boundary; without them the reduction to a hardware root of trust cannot be evaluated.
minor comments (2)
  1. [Abstract] The reproducibility classes mentioned in the abstract are not defined or enumerated; a short table or paragraph listing the classes and the criteria for assignment would improve clarity.
  2. Notation for the hash-linked evidence graph (nodes, edges, commitment) is introduced gradually; an early figure or pseudocode listing the structure would aid the reader.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. We address the two major comments below and will revise the manuscript to incorporate the requested clarifications and data.

read point-by-point responses
  1. Referee: [§4] §4 (Freivalds verification under tolerance): the abstract and threat-model paragraph state that a wrong product is rejected with probability 1-2^{-k} after the tolerance τ is set from ||A||·||B||·ε + measured residual floor. Freivalds' identity is probabilistic only in exact arithmetic; the paper must show that the measure of the false-negative region created by the additive tolerance remains bounded by 2^{-k} (or that any adversarial product passing the check still lies inside the original 2^{-k} failure set). No such derivation or independence assumption between the random probe vector and rounding/residual errors is supplied.

    Authors: We agree that an explicit derivation is required. In the revision we will add a subsection proving that, under standard models of floating-point error (bounded by machine epsilon and independent of the random probe in distribution), the measure of the additional false-negative region is at most a small additive term that can be absorbed into the security parameter k without changing the claimed 1-2^{-k} bound. We will also state the precise independence assumption used. revision: yes

  2. Referee: [§5] §5 (physical-fault experiments): the claim that di/dt power-virus and thermal-soak stress neither move the calibrated tolerance nor produce silent errors is load-bearing for confining the threat model to rare defective parts or privileged attackers. The section must report the number of trials, observed error rates, and statistical bounds that support this boundary; without them the reduction to a hardware root of trust cannot be evaluated.

    Authors: We acknowledge that the current text of §5 omits the requested statistical details. The revision will expand the section to report the exact number of stress trials executed on each device, the observed silent-error count (zero), and the derived statistical bounds (e.g., upper confidence limits on the per-trial fault probability). revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on standard Freivalds identity and hash properties without reduction to self-defined inputs

full rationale

The abstract derives a tolerance from floating-point error analysis plus measured residual floor, then invokes the established Freivalds probabilistic bound (1-2^(-k)) for rejection of incorrect matrix products. Hash-linking and append-only structure rely on standard cryptographic properties. No equation or claim reduces a reported prediction to a fitted parameter by construction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled via prior work. The chain is externally grounded in Freivalds' algorithm and hash collision resistance.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Abstract-only view limits visibility; the approach rests on standard probabilistic verification and floating-point analysis plus a new evidence structure whose details are not expanded.

free parameters (1)
  • tolerance = device-specific residual floor
    Calibrated to the device's own measured residual floor after derivation from floating-point error analysis.
axioms (1)
  • standard math Freivalds' probabilistic identity correctly verifies matrix products at the stated cost and error probability
    Invoked for O(k n^2) verification of matrix products with probability 1-2^(-k)
invented entities (1)
  • hash-linked evidence graph no independent evidence
    purpose: Tamper-evident append-only structure binding measurements to verifications
    New structure proposed to make records independently checkable offline

pith-pipeline@v0.9.1-grok · 5852 in / 1468 out tokens · 41435 ms · 2026-06-29T04:10:54.894251+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 4 linked inside Pith

  1. [1]

    Hawkeye: Reproducing GPU-level non- determinism.arXiv:2603.20421, 2026

    Erez Badash, Dan Boneh, Ilan Komargodski, and Megha Srivastava. Hawkeye: Reproducing GPU-level non- determinism.arXiv:2603.20421, 2026

  2. [2]

    IPFS: Content addressed, versioned, P2P file system

    Juan Benet. IPFS: Content addressed, versioned, P2P file system. InarXiv:1407.3561, 2014

  3. [3]

    Validation of GPU computation in decentralized, trustless networks.arXiv:2501.05374, 2025

    Eric Boniardi, Stanley Bishop, and Alison Haire. Validation of GPU computation in decentralized, trustless networks.arXiv:2501.05374, 2025

  4. [4]

    Proebsting

    Christian Collberg and Todd A. Proebsting. Repeatability in computer systems research.Communications of the ACM, 59(3):62–69, 2016

  5. [5]

    Practical verified computation with streaming interactive proofs

    Graham Cormode, Justin Thaler, and Ke Yi. Practical verified computation with streaming interactive proofs. InProc. Innovations in Theoretical Computer Science (ITCS), 2012

  6. [6]

    Crosby and Dan S

    Scott A. Crosby and Dan S. Wallach. Efficient data structures for tamper-evident logging. InProceedings of the 18th USENIX Security Symposium, 2009

  7. [7]

    FT-Transformer: Resilient and reliable transformer with end-to-end fault tolerant attention.arXiv:2504.02211, 2025

    Huangliang Dai, Shixun Wu, Hairui Zhao, Jiajun Huang, Zizhe Jian, Yue Zhu, Haiyang Hu, and Zizhong Chen. FT-Transformer: Resilient and reliable transformer with end-to-end fault tolerant attention.arXiv:2504.02211, 2025

  8. [8]

    Fu, Stefano Ermon, Atri Rudra, and Christopher Ré

    Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. FlashAttention: Fast and memory- efficient exact attention with IO-awareness. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

  9. [9]

    Parallel reproducible summation.IEEE Transactions on Computers, 64(7):2060–2070, 2015

    James Demmel and Hong Diep Nguyen. Parallel reproducible summation.IEEE Transactions on Computers, 64(7):2060–2070, 2015

  10. [10]

    Silent data corruptions at scale.arXiv:2102.11245, 2021

    Harish Dattatraya Dixit, Sneha Pendharkar, Matt Beadon, Chris Mason, Tejasvi Chakravarthy, Bharath Muthiah, and Sriram Sankar. Silent data corruptions at scale.arXiv:2102.11245, 2021

  11. [11]

    Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI act); logging and record-keeping provisions

    European Union. Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI act); logging and record-keeping provisions. Official Journal of the European Union, 2024

  12. [12]

    Regulation (EU) 2024/2847 on horizontal cybersecurity requirements for products with digital elements (Cyber Resilience Act)

    European Union. Regulation (EU) 2024/2847 on horizontal cybersecurity requirements for products with digital elements (Cyber Resilience Act). Official Journal of the European Union, 2024

  13. [13]

    How to prove yourself: Practical solutions to identification and signature problems

    Amos Fiat and Adi Shamir. How to prove yourself: Practical solutions to identification and signature problems. InAdvances in Cryptology: CRYPTO ’86, volume 263 ofLNCS, pages 186–194. Springer, 1987

  14. [14]

    Probabilistic machines can use less running time

    R¯ usin,š Freivalds. Probabilistic machines can use less running time. InInformation Processing 77 (IFIP Congress), pages 839–842, 1977

  15. [15]

    The knowledge complexity of interactive proof systems

    Shafi Goldwasser, Silvio Micali, and Charles Rackoff. The knowledge complexity of interactive proof systems. SIAM Journal on Computing, 18(1):186–208, 1989. 15

  16. [16]

    Higham.Accuracy and Stability of Numerical Algorithms

    Nicholas J. Higham.Accuracy and Stability of Numerical Algorithms. SIAM, 2nd edition, 2002

  17. [17]

    Higham and Theo Mary

    Nicholas J. Higham and Theo Mary. A new approach to probabilistic rounding error analysis.SIAM Journal on Scientific Computing, 41(5):A2815–A2835, 2019

  18. [18]

    Hochschild, Paul Turner, Jeffrey C

    Peter H. Hochschild, Paul Turner, Jeffrey C. Mogul, Rama Govindaraju, Parthasarathy Ranganathan, David E. Culler, and Amin Vahdat. Cores that don’t count. InProceedings of the Workshop on Hot Topics in Operating Systems (HotOS), pages 9–16. ACM, 2021

  19. [19]

    Scientific benchmarking of parallel computing systems

    Torsten Hoefler and Roberto Belli. Scientific benchmarking of parallel computing systems. InProc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis (SC). ACM, 2015

  20. [20]

    Kuang-Hua Huang and Jacob A. Abraham. Algorithm-based fault tolerance for matrix operations.IEEE Transactions on Computers, C-33(6):518–528, 1984

  21. [21]

    Microbenchmarking NVIDIA’s Blackwell architecture: An in-depth architectural analysis.arXiv:2512.02189, 2025

    Aaron Jarmusch and Sunita Chandrasekaran. Microbenchmarking NVIDIA’s Blackwell architecture: An in-depth architectural analysis.arXiv:2512.02189, 2025

  22. [22]

    Dissecting the NVIDIA Blackwell architecture with microbenchmarks.arXiv:2507.10789, 2025

    Aaron Jarmusch, Nathan Graddon, and Sunita Chandrasekaran. Dissecting the NVIDIA Blackwell architecture with microbenchmarks.arXiv:2507.10789, 2025

  23. [23]

    DRAWNAPART: A device identification technique based on remote GPU fingerprinting

    Tomer Laor, Naif Mehanna, Antonin Durey, Vitaly Dyadyuk, Pierre Laperdrix, Clémentine Maurice, Yossi Oren, Romain Rouvoy, Walter Rudametkin, and Yuval Yarom. DRAWNAPART: A device identification technique based on remote GPU fingerprinting. InNetwork and Distributed System Security Symposium (NDSS), 2022

  24. [24]

    Certificate transparency.Communications of the ACM, 57(10):40–46, 2014

    Ben Laurie. Certificate transparency.Communications of the ACM, 57(10):40–46, 2014

  25. [25]

    Lee and Katrina A

    John D. Lee and Katrina A. See. Trust in automation: Designing for appropriate reliance.Human Factors, 46(1):50–80, 2004

  26. [26]

    Lin, Onur Mutlu, et al

    Chris S. Lin, Onur Mutlu, et al. GPUHammer: Rowhammer attacks on GPU memories are practical. In Proceedings of the 34th USENIX Security Symposium, 2025

  27. [27]

    LLM-PRISM: Characterizing silent data corruption from permanent GPU faults in LLM training.arXiv:2604.10390, 2026

    LLM-PRISM Authors. LLM-PRISM: Characterizing silent data corruption from permanent GPU faults in LLM training.arXiv:2604.10390, 2026

  28. [28]

    Melara, Aaron Blankstein, Joseph Bonneau, Edward W

    Marcela S. Melara, Aaron Blankstein, Joseph Bonneau, Edward W. Felten, and Michael J. Freedman. CONIKS: Bringing key transparency to end users. InProceedings of the 24th USENIX Security Symposium, 2015

  29. [29]

    Ralph C. Merkle. A digital signature based on a conventional encryption function. InAdvances in Cryptology: CRYPTO ’87, volume 293 ofLNCS, pages 369–378. Springer, 1988

  30. [30]

    PROV-DM: The PROV data model

    Luc Moreau and Paolo Missier. PROV-DM: The PROV data model. W3c recommendation, World Wide Web Consortium (W3C), 2013

  31. [31]

    Garcia, Jo Van Bulck, Daniel Gruss, and Frank Piessens

    Kit Murdock, David Oswald, Flavio D. Garcia, Jo Van Bulck, Daniel Gruss, and Frank Piessens. Plundervolt: Software-based fault injection attacks against Intel SGX. InIEEE Symposium on Security and Privacy (S&P), 2020

  32. [32]

    Secure hash standard (SHS)

    National Institute of Standards and Technology. Secure hash standard (SHS). Technical Report FIPS PUB 180-4, NIST, 2015

  33. [33]

    NVIDIA confidential computing and device attestation for Hopper and Blackwell GPUs

    NVIDIA Corporation. NVIDIA confidential computing and device attestation for Hopper and Blackwell GPUs. Whitepaper, NVIDIA Corporation, 2024

  34. [34]

    NVIDIA Blackwell GPU architecture

    NVIDIA Corporation. NVIDIA Blackwell GPU architecture. Whitepaper, NVIDIA Corporation, 2025

  35. [35]

    GeForce RTX 5090: Specifications

    NVIDIA Corporation. GeForce RTX 5090: Specifications. https://www.nvidia.com/en-us/geforce/ graphics-cards/50-series/rtx-5090/, 2026. Accessed 2026-06-25

  36. [36]

    NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition: Specifications.https:// www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000-max-q/ , 2026

    NVIDIA Corporation. NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition: Specifications.https:// www.nvidia.com/en-us/products/workstations/professional-desktop-gpus/rtx-pro-6000-max-q/ , 2026. Accessed 2026-06-25

  37. [37]

    NVIDIA RTX PRO 6000 Blackwell Server Edition: Specifications.https://www.nvidia

    NVIDIA Corporation. NVIDIA RTX PRO 6000 Blackwell Server Edition: Specifications.https://www.nvidia. com/en-us/data-center/rtx-pro-6000-blackwell-server-edition/, 2026. Accessed 2026-06-25

  38. [38]

    Physical one-way functions.Science, 297(5589):2026–2030, 2002

    Ravikanth Pappu, Ben Recht, Jason Taylor, and Neil Gershenfeld. Physical one-way functions.Science, 297(5589):2026–2030, 2002

  39. [39]

    Lightning: Striking the secure isolation on GPU clouds with transient hardware faults.arXiv:2112.03662, 2021

    Majid Sabbagh, Yunsi Fei, and David Kaeli. Lightning: Striking the secure isolation on GPU clouds with transient hardware faults.arXiv:2112.03662, 2021

  40. [40]

    The anatomy of silent data corruption: GPU error pattern study and modeling guidance

    SDC Anatomy Authors. The anatomy of silent data corruption: GPU error pattern study and modeling guidance. arXiv:2605.04213, 2026. 16

  41. [41]

    Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications

    Sanjif Shanmugavelu, Mathieu Taillefumier, Christopher Culver, Oscar Hernandez, Mark Coletti, and Ada Sedova. Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications. arXiv:2408.05148, 2024

  42. [42]

    Sinclair, and Shivaram Venkataraman

    Prasoon Sinha, Akhil Guliani, Rutwik Jain, Brandon Tran, Matthew D. Sinclair, and Shivaram Venkataraman. Not all GPUs are created equal: Characterizing variability in large-scale, accelerator-rich systems. InProc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis (SC). IEEE, 2022

  43. [43]

    honest or bust

    Ewa Syta, Iulia Tamas, Dylan Visher, David Isaac Wolinsky, Philipp Jovanovic, Linus Gasser, Nicolas Gailly, Ismail Khoffi, and Bryan Ford. Keeping authorities “honest or bust” with decentralized witness cosigning. In IEEE Symposium on Security and Privacy (S&P), 2016

  44. [44]

    Time-optimal interactive proofs for circuit evaluation

    Justin Thaler. Time-optimal interactive proofs for circuit evaluation. InAdvances in Cryptology: CRYPTO 2013, volume 8043 ofLNCS, pages 71–89. Springer, 2013

  45. [45]

    Custom algorithm-based fault tolerance for attention layers in transformers.arXiv:2507.16676, 2025

    Vasileios Titopoulos, Kosmas Alexandridis, and Giorgos Dimitrakopoulos. Custom algorithm-based fault tolerance for attention layers in transformers.arXiv:2507.16676, 2025

  46. [46]

    Paris agreement, article 13: Enhanced transparency framework

    United Nations Framework Convention on Climate Change. Paris agreement, article 13: Enhanced transparency framework. United Nations, 2015

  47. [47]

    Precision & performance: Floating point and IEEE 754 compliance for NVIDIA GPUs

    Nathan Whitehead and Alex Fit-Florea. Precision & performance: Floating point and IEEE 754 compliance for NVIDIA GPUs. Technical report, NVIDIA Corporation, 2011

  48. [48]

    Wilkinson et al

    Mark D. Wilkinson et al. The FAIR guiding principles for scientific data management and stewardship.Scientific Data, 3:160018, 2016

  49. [49]

    TAO: Tolerance-aware optimistic verification for floating-point neural networks

    Jianzhu Yao, Hongxu Su, Taobo Liao, Zerui Cheng, Huan Zhang, Xuechao Wang, and Pramod Viswanath. TAO: Tolerance-aware optimistic verification for floating-point neural networks. InProceedings of the 21st European Conference on Computer Systems (EuroSys), pages 1515–1532, 2026. 17