Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models
Pith reviewed 2026-06-26 21:36 UTC · model grok-4.3
The pith
Free-energy signatures from attention Laplacians detect LLM hallucinations more accurately than prior spectral summaries without retraining the model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Treating the spectrum of an attention-derived graph Laplacian as the energy levels of a Hamiltonian yields Free-Energy Signatures (partition function, free energy, spectral entropy, heat capacity, and random-matrix spectral form factor) that enrich finite spectral summaries, remain stable under small attention changes, and support both supervised and unsupervised hallucination detectors whose performance exceeds that of earlier eigenvalue-based baselines.
What carries the argument
Free-Energy Signatures (Fes), obtained by interpreting each layer's attention Laplacian as a Hamiltonian and computing its thermodynamic potentials together with the random-matrix-theory spectral form factor.
If this is right
- A lightweight probe using Fes descriptors achieves the highest aggregate AUROC among attention-spectral baselines on six models and six benchmarks.
- An unsupervised RMT-deviation score alone reaches mean AUROC 0.71 without any labeled data.
- Correct generations display more Wigner-Dyson spectral statistics while hallucinations display more Poisson-like statistics.
- Fes descriptors remain Lipschitz-stable under attention perturbations and approximate moment-derived spectral functionals under the stated regularity and grid-resolution conditions.
Where Pith is reading between the lines
- The same Hamiltonian treatment could be applied to other graph constructions inside neural networks to extract thermodynamic diagnostics for tasks beyond hallucination detection.
- The observed shift from Wigner-Dyson to Poisson statistics suggests that loss of spectral chaos may serve as a general marker of failure modes in sequential generation.
- Because the method requires only attention weights, it could be adapted to monitor closed models through API-exposed attention if such access becomes available.
Load-bearing premise
The attention-derived graph Laplacian can be treated as a Hamiltonian whose thermodynamic potentials and random-matrix statistics meaningfully capture differences in reasoning quality.
What would settle it
Run the Fes probe and the RMT-deviation score on a fresh collection of LLM generations whose correctness has been verified by an independent oracle and check whether the reported AUROC margins and the Wigner-Dyson versus Poisson distinction persist.
Figures
read the original abstract
Hallucination detection in large language models (LLMs) is deployment-critical, and recent work shows that the spectrum of attention-derived graph Laplacians carries strong signal about reasoning quality. Prior spectral diagnostics, however, summarize the Laplacian spectrum by a handful of eigenvalues or hand-picked scalars, leaving most of its structure unused. We propose Free-Energy Signatures (Fes), a spectral descriptor that treats each layer's attention Laplacian as a Hamiltonian and extracts its thermodynamic potentials partition function, free energy, spectral entropy, heat capacity together with the random-matrix-theory (RMT) spectral form factor. We prove three results: (i)~Lipschitz stability of Fes under attention perturbation; (ii)~an expressiveness result showing that Fes enriches finite spectral summaries and approximates moment-derived spectral functionals under explicit regularity and grid-resolution assumptions; and (iii)~a finite-sample PAC bound on the AUROC of a training-free detector built from Fes. Empirically, across six open-weight LLMs and six benchmarks, a lightweight probe on Fes descriptors achieves the strongest aggregate AUROC among attention-spectral baselines, improving over LapEig by $+6.5$ AUROC points and over GoR-4 by $+2.4$ points on average, while requiring no update to the underlying LLM. In the fully unsupervised setting, an RMT-deviation score achieves mean AUROC $0.71$, providing a label-free but weaker detector. A complementary RMT analysis shows that correct generations exhibit more Wigner-Dyson like spectral statistics, whereas hallucinations exhibit more Poisson-like statistics. The anonymized code and config are provided in the supplementary material.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Free-Energy Signatures (Fes), which treat attention-derived graph Laplacians as Hamiltonians and extract thermodynamic potentials (partition function, free energy, spectral entropy, heat capacity) together with the RMT spectral form factor. It claims three theoretical results—Lipschitz stability of Fes, an expressiveness result showing enrichment of finite spectral summaries under regularity and grid-resolution assumptions, and a finite-sample PAC bound on AUROC for a training-free detector—and reports that a lightweight Fes probe achieves the highest aggregate AUROC across six LLMs and six benchmarks (+6.5 over LapEig, +2.4 over GoR-4), with an unsupervised RMT-deviation score reaching mean AUROC 0.71. Code is provided.
Significance. If the regularity and grid-resolution assumptions hold for real attention Laplacians and the empirical gains prove robust, Fes would supply a new, training-free spectral-thermodynamic lens on reasoning quality that goes beyond hand-picked eigenvalue summaries. The provision of anonymized code and config is a clear strength for reproducibility.
major comments (2)
- [Abstract] Abstract (statement of expressiveness result and PAC bound): both results are explicitly conditioned on regularity and grid-resolution assumptions on the Laplacian treated as Hamiltonian, yet the manuscript provides no verification that the attention matrices of the six evaluated LLMs satisfy these assumptions at the layer resolutions used; if the assumptions fail, the claimed enrichment and PAC guarantee do not apply.
- [Empirical results] Empirical section (AUROC results): aggregate improvements are reported without error bars, without details on data exclusion criteria, hyperparameter choices, or statistical significance testing of the +6.5 / +2.4 point gains, leaving the robustness of the headline claim unexamined.
minor comments (1)
- [Methods] Notation for the thermodynamic potentials and the precise definition of the RMT spectral form factor should be stated explicitly in the main text rather than deferred to supplementary material.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract (statement of expressiveness result and PAC bound): both results are explicitly conditioned on regularity and grid-resolution assumptions on the Laplacian treated as Hamiltonian, yet the manuscript provides no verification that the attention matrices of the six evaluated LLMs satisfy these assumptions at the layer resolutions used; if the assumptions fail, the claimed enrichment and PAC guarantee do not apply.
Authors: We acknowledge that the expressiveness result and PAC bound are conditioned on the regularity and grid-resolution assumptions, which are stated explicitly in the theoretical sections. The current manuscript does not include verification that these hold for the attention Laplacians of the six evaluated LLMs. In revision we will add a dedicated subsection reporting empirical checks (e.g., bounded operator norms, spectral smoothness, and effective grid resolution) on the attention matrices from the models and layers used; the checks will either confirm applicability of the guarantees or qualify their scope. revision: yes
-
Referee: [Empirical results] Empirical section (AUROC results): aggregate improvements are reported without error bars, without details on data exclusion criteria, hyperparameter choices, or statistical significance testing of the +6.5 / +2.4 point gains, leaving the robustness of the headline claim unexamined.
Authors: We agree that the empirical results require additional detail to substantiate robustness. The revised manuscript will report error bars (standard deviation across seeds or bootstrap estimates), explicit data-exclusion criteria and preprocessing steps, full hyperparameter specifications for the Fes probe and baselines, and results of statistical significance tests (e.g., paired Wilcoxon or t-tests) on the AUROC differences. revision: yes
Circularity Check
No circularity; derivation chain is self-contained
full rationale
The paper defines Fes by treating attention Laplacians as Hamiltonians and extracting thermodynamic quantities plus RMT form factor. It proves Lipschitz stability, an expressiveness result under explicit regularity/grid-resolution assumptions on the input Laplacian, and a PAC bound on the detector. These are standard conditional proofs, not reductions of the claimed AUROC gains to the definitions themselves. The reported empirical improvements (+6.5 AUROC over LapEig) are measured against external baselines on six LLMs and benchmarks; the unsupervised RMT-deviation score uses raw spectral statistics without parameter fitting. No self-citation is load-bearing, no fitted input is relabeled as prediction, and no ansatz is smuggled. The construction is therefore independent of its outputs.
Axiom & Free-Parameter Ledger
axioms (3)
- standard math Lipschitz stability of Fes under attention perturbation
- domain assumption Expressiveness of Fes under explicit regularity and grid-resolution assumptions
- standard math Finite-sample PAC bound on AUROC of the Fes-based detector
Reference graph
Works this paper leans on
-
[1]
Aho and Jeffrey D
Alfred V. Aho and Jeffrey D. Ullman , title =. 1972
1972
-
[2]
Publications Manual , year = "1983", publisher =
1983
-
[3]
Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243
-
[4]
Scalable training of
Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
-
[5]
Dan Gusfield , title =. 1997
1997
-
[6]
Tetreault , title =
Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =
2015
-
[7]
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =
Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
-
[8]
Nature , volume =
Detecting hallucinations in large language models using semantic entropy , author =. Nature , volume =
-
[9]
The Internal State of an
Azaria, Amos and Mitchell, Tom , booktitle =. The Internal State of an
-
[10]
Li, Hao and others , journal =
-
[11]
ACL , year =
Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models , author =. ACL , year =
-
[12]
ICLR , year =
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation , author =. ICLR , year =
-
[13]
ACM Computing Surveys , year =
Survey of Hallucination in Natural Language Generation , author =. ACM Computing Surveys , year =
-
[14]
Lin, Stephanie and Hilton, Jacob and Evans, Owain , booktitle =
-
[15]
Li, Junyi and Cheng, Xiaoxue and Zhao, Wayne Xin and Nie, Jian-Yun and Wen, Ji-Rong , booktitle =
-
[16]
and Zettlemoyer, Luke , booktitle =
Joshi, Mandar and Choi, Eunsol and Weld, Daniel S. and Zettlemoyer, Luke , booktitle =
-
[17]
Transactions of the ACL , volume =
Natural Questions: A Benchmark for Question Answering Research , author =. Transactions of the ACL , volume =
-
[18]
arXiv preprint arXiv:2110.14168 , year =
Training Verifiers to Solve Math Word Problems , author =. arXiv preprint arXiv:2110.14168 , year =
-
[19]
Measuring Mathematical Problem Solving With the
Hendrycks, Dan and Burns, Collin and Kadavath, Saurav and Arora, Akul and Basart, Steven and Tang, Eric and Song, Dawn and Steinhardt, Jacob , booktitle =. Measuring Mathematical Problem Solving With the
-
[20]
ICLR , year =
Let's Verify Step by Step , author =. ICLR , year =
-
[21]
Thorne, James and Vlachos, Andreas and Christodoulopoulos, Christos and Mittal, Arpit , booktitle =
-
[22]
2004 , publisher=
Random matrices , author=. 2004 , publisher=
2004
-
[23]
Physical Review Letters , volume =
Distribution of the Ratio of Consecutive Level Spacings in Random Matrix Ensembles , author =. Physical Review Letters , volume =
-
[24]
Duke Mathematical Journal , volume =
The Variation of the Spectrum of a Normal Matrix , author =. Duke Mathematical Journal , volume =
-
[25]
Journal of the American Statistical Association , volume =
Probability Inequalities for Sums of Bounded Random Variables , author =. Journal of the American Statistical Association , volume =
-
[26]
and Till, Robert J
Hand, David J. and Till, Robert J. , journal =. A simple generalisation of the area under the
-
[27]
NeurIPS , year =
Nonlinear Random Matrix Theory for Deep Learning , author =. NeurIPS , year =
-
[28]
arXiv preprint arXiv:1811.07062 , year =
The Full Spectrum of Deep Net Hessians at Scale , author =. arXiv preprint arXiv:1811.07062 , year =
-
[29]
arXiv preprint arXiv:1706.04454 , year =
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , author =. arXiv preprint arXiv:1706.04454 , year =
-
[30]
Journal of Machine Learning Research , volume =
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning , author =. Journal of Machine Learning Research , volume =
-
[31]
Hallucination Detection in
Binkowski, Jakub and Janiak, Denis and Sawczyn, Albert and Gabrys, Bogdan and Kajdanowicz, Tomasz Jan , booktitle =. Hallucination Detection in. 2025 , address =
2025
-
[32]
arXiv preprint arXiv:2601.00791 , year =
Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning , author =. arXiv preprint arXiv:2601.00791 , year =
-
[33]
arXiv preprint arXiv:2510.19117 , year =
A Graph Signal Processing Framework for Hallucination Detection in Large Language Models , author =. arXiv preprint arXiv:2510.19117 , year =
-
[34]
Chen, Chao and Liu, Kai and Chen, Ze and Gu, Yi and Wu, Yue and Tao, Mingyuan and Fu, Zhihang and Ye, Jieping , booktitle =
-
[35]
Zhang, Zhenliang and Hu, Xinyu and Zhang, Huixuan and Zhang, Junzhe and Wan, Xiaojun , booktitle =
-
[36]
1997 , doi =
Matrix Analysis , author =. 1997 , doi =
1997
-
[37]
Proceedings of the Royal Society of London
Level clustering in the regular spectrum , author=. Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences , volume=. 1977 , publisher=
1977
-
[38]
Physical review letters , volume=
Characterization of chaotic quantum spectra and universality of level fluctuation laws , author=. Physical review letters , volume=. 1984 , publisher=
1984
-
[39]
arXiv preprint arXiv:2601.02273 , year=
TopoLoRA-SAM: Topology-Aware Parameter-Efficient Adaptation of Foundation Segmenters for Thin-Structure and Cross-Domain Binary Semantic Segmentation , author=. arXiv preprint arXiv:2601.02273 , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.