pith. machine review for the scientific record. sign in

arxiv: 2605.09031 · v1 · submitted 2026-05-09 · 💻 cs.LG

Recognition: no theorem link

Spherical Boltzmann machines: a solvable theory of learning and generation in energy-based models

Jorge Fernandez-De-Cossio-Diaz, R\'emi Monasson, Simona Cocco, Thomas Tulinski

Pith reviewed 2026-05-12 02:00 UTC · model grok-4.3

classification 💻 cs.LG
keywords energy-based modelsBoltzmann machinesphase transitionstraining dynamicsgenerative modelsrandom matrix theorydynamical mean-field theoryBayesian evidence
0
0 comments X

The pith

The spherical Boltzmann machine provides an exactly solvable model for the training dynamics and phase transitions in energy-based generative models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a solvable theory for spherical Boltzmann machines, a constrained version of energy-based models. By using random matrix theory and dynamical mean-field theory in the high-dimensional limit, it derives exact equations for how the model learns from data and generates new samples. The work identifies cascades of phase transitions during training and with varying hyperparameters, linked to how the model's coupling matrix aligns with data modes. Understanding these transitions explains phenomena like double descent and temperature effects in sampling, which the authors show also occur in standard non-spherical models through numerical checks. This offers a theoretical window into why and how generative models succeed or fail in learning and producing data.

Core claim

The spherical Boltzmann machine (SBM) allows exact solution of its training dynamics via random matrix and dynamical mean-field methods. The Bayesian evidence, acting as a partition function over parameters, reveals global properties of the trained model. Cascades of phase transitions arise from successive alignment and condensation of the top modes of the coupling matrix to the data, both during training and as hyperparameters change. These transitions connect to generative behaviors such as sampling temperature tuning, double descent with regularization, tempered posteriors, and out-of-equilibrium training biases.

What carries the argument

The spherical Boltzmann machine under random matrix theory and dynamical mean-field theory analysis, which solves exact training equations and computes the Bayesian evidence to reveal mode alignment phase transitions.

Load-bearing premise

The analysis assumes the high-dimensional limit with spherical constraints, with numerical evidence bridging to finite-dimensional non-spherical cases.

What would settle it

Observing no phase transitions, no double descent, or no tempered effects in a finite non-spherical energy-based model trained similarly would falsify the generality claim.

Figures

Figures reproduced from arXiv: 2605.09031 by Jorge Fernandez-De-Cossio-Diaz, R\'emi Monasson, Simona Cocco, Thomas Tulinski.

Figure 1
Figure 1. Figure 1: Equilibrium phase diagram for K = 1. A) Schematic eigenvalue configurations in each phase: bulk semicircle (light blue), top eigenvalue λ1 and the Lagrange multiplier µ enforcing ∥x∥ 2 = N. B) Phases in the (γ, η) plane; the blue dashed line separates outlier from edge regimes. C) λ1 and Lagrange multiplier µ vs. γ in the small-η regime, where λ1 sticks to the edge; phase transitions occur at the vertical … view at source ↗
Figure 2
Figure 2. Figure 2: Training dynamics. A) Top eigenvalue trajectories λi(t) from finite-N simulations (black) vs. the early-time predictions (14) (red dotted). Vertical dashed lines are the detachment (14) and condensation times t∗. Blue horizontal line is the bulk edge (2σ). B) s1(t) from DMFT (red) compared to finite-N simulations (blue). C) Stationary overlap sst as a function of ν for selected values of γ (dashed lines in… view at source ↗
Figure 3
Figure 3. Figure 3: Sampling temperature tun￾ing (TT). A) Mode rescue through tem￾perature tuning, illustrating how the bulk and outlier position change. B) DKL(P ∗∥PβW ) vs. β. Similarly, the generative performance of the poste￾rior predictive student can be measured by the re￾verse DKL(Ppp∥P ∗ ) or forward DKL(P ∗∥Ppp). The posterior predictive is popular in Bayesian prediction [Aitchison, 1975, Brown et al., 2008], but, wi… view at source ↗
Figure 4
Figure 4. Figure 4: Double descent. η indicated in each curve. Black dots: minima. Orange dashed: thresholds η = ηDD (upper) and η → ∞ (lower). Vertical dashed: bound￾ary between h ̸= 0 and h = 0 phases. The model responds in qualitatively different ways to temper￾ature tuning, depending on the starting phase. Furthermore, changing the value of β can transport the model from one phase to another. If the trained model starts i… view at source ↗
Figure 5
Figure 5. Figure 5: Tempered-posterior. A) Phases of the optimal η rev pp (γ, ω∗ ). B) Reverse KL vs. η for selected γ’s (dots: minima). Double descent [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Out-of-equilibrium training. A) Reverse DKL(PW(t)∥P ∗ ) during train￾ing, B) λ1(t), C) u1(t). Color: ν. Blue dashed: optimal early-stopping time. Or￾ange dashed: equilibrium values (at ν → ∞). An optimal posterior temperature different from 1 (the Bayesian prescription) has been referred to as a cold/warm￾posterior effect [Wenzel et al., 2020, Noci et al., 2021, Nabarro et al., 2022, Pitas and Arbel, 2024]… view at source ↗
Figure 7
Figure 7. Figure 7: K = 2 phase diagrams. Phase diagram in the (γ, η) plane for c1 ∈ {1.1, 1.3, 1.5, 1.7} (with c2 = 2 − c1). Black lines are phase boundaries; the teal line separates outlier from edge phases [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Validation of the large-N DMFT against finite-N SBM training. Each panel compares DMFT (solid) with finite-N SGD runs (persistent MCMC, mini-batch size = 1) simulated by Euler integration of the coupled Langevin system (4)–(3) at N = 4000, averaged over five seeds (dashed, with ±SEM bands). Top row: signal overlaps sk(t); bottom row: Lagrange multiplier κ(t). The gray strip marks the ±1/ √ N finite-N noise… view at source ↗
Figure 9
Figure 9. Figure 9: Dynamical phase structure of the K=1 DMFT at finite η. Three values η ∈ {1, 3, 10} are shown in columns. Row 1: dynamical phase diagram νc(γ) (solid black), obtained by warm￾started path continuation on a fine τ -grid (∆τ ≃ 4 × 10−3 ) with the guard νc∆τ < 0.6; red shading is the condensed branch (sst ̸= 0) and blue shading the uncondensed branch (sst → 0). The dashed vertical line marks γ = 1, the vertica… view at source ↗
Figure 10
Figure 10. Figure 10: Validation of the large-K DMFT equations for nearly finite-dimensional Ising data. A, B) Spectrum of the empirical covariance matrix C for K configurations of the two-dimensional periodic Ising model at fixed linear size L = 32, N = L×L(= 1024). Below the critical temperature, Tlow = 2.2 < Tc ≃ 2.269, the leading eigenvalue is extensive, c1 ≃ Km(Tlow) 2 , while c2/K decreases with K, consistent with effec… view at source ↗
Figure 11
Figure 11. Figure 11: Large-K training dynamics for nearly finite-dimensional Ising data. Comparison between the effective K′ = 1 large-K DMFT prediction and finite-N coupled Langevin simulations for below-critical two-dimensional Ising data at T = 1.8, with K = 512 and N = 2025. Time is shown in the rescaled variable t = Kt/K′ , and colors indicate the rescaled persistent-chain update rate ν. Solid curves show the effective K… view at source ↗
Figure 12
Figure 12. Figure 12: Optimum η fwd pp = arg minη>0 DKL(P ∗∥Ppp) for the rank-one K=2 teacher. A) Phases in the (ω ∗ , γ) plane: warm flat (yellow), unique cold (medium blue), MAP (light blue); the hatched sliver is the mixed warm/cold tie. Boundaries: dashed teal γwc (Bayes-crossing, η fwd pp = 1), dotted black γflat (upper edge of the warm flat interval), solid black γ∞ (cold/MAP). B) DKL(P ∗∥Ppp)/N vs. η at ω ∗ = 2.2 for γ … view at source ↗
Figure 13
Figure 13. Figure 13: Comparison of the SBM with the unconstrained Gaussian model. A) Typical reverse KL divergence ⟨DKL(PW ∥P ∗ )⟩/N as a function of the prior strength γ, at η = 5 and ω ∗ = 2.5 (K = 2). The SBM curve (solid) is colored by phase: condensed h ̸= 0 (red) and uncondensed h = 0 (blue). The Gaussian equivalent (dashed gray) is the h = 0 formula (ω ∗ − ln ω ∗ − 1 + 1/(2γη))/2, which coincides with the SBM throughou… view at source ↗
Figure 14
Figure 14. Figure 14: Temperature tuning on Potts BM trained on PF00072. Pairwise Potts mod￾els on the Pfam response-regulator family PF00072 (L = 111, q = 21), trained via adab￾mDCA [Rosset et al., 2026]. A) Pearson correlation between generated and data connected cor￾relations as a function of sampling inverse temperature β, for sixteen L2 regularizations γ (color bar). Red dots mark the peak of each curve, defining βopt(γ).… view at source ↗
Figure 15
Figure 15. Figure 15: Temperature tuning on Potts BM trained on PF00018. Pairwise Potts models on the Pfam SH3 domain PF00018 (L = 48, q = 21), trained via adabmDCA. A) Pearson correlation between generated and data connected correlations as a function of sampling inverse temperature β, for sixteen L2 regularizations (color bar); red dots mark βopt. B) Top six covariance eigenvalues σ 2 k (β) at γ = 1 (left axis, solid lines);… view at source ↗
Figure 16
Figure 16. Figure 16: Double descent in a tractable flow generative model. Posterior-averaged reverse KL divergence per dimension (top) and h ̸= 0 fraction across training seeds (bottom) as functions of the L2 prior strength γ, for a Householder-and-altitude normalizing flow on SN [Rezende et al., 2020] trained by reverse-KL variational inference against the same rank-one Bingham teacher as in [PITH_FULL_IMAGE:figures/full_fi… view at source ↗
Figure 17
Figure 17. Figure 17: Double descent in a binary FVSBN persists across K on rank-1-dominated teachers. SWAG posterior-averaged reverse KL per spin, DKL(qW ∥P ∗ )/N, vs. the paper-convention regular￾ization γ, for the binary FVSBN trained on A) a rank-1 Curie–Weiss / Hopfield teacher at β = 2.0, N = 16, and B) a 2D Ising teacher on the L = 4 periodic square lattice (N = 16) at β = 0.5, deep in the ferromagnetic phase. Each pane… view at source ↗
Figure 18
Figure 18. Figure 18: Double descent in a Gaussian-visible RBM trained on financial return data. A) Eigenvalue spectrum of the empirical training-set covariance C on a log-λ axis. The bulk (gray histogram) spans λ ∼ 10−1 to λ ∼ 2, dominated by sectoral correlations; a single rank-one outlier sits at λ1 ≈ 25 (blue line), more than an order of magnitude above the bulk maximum λ2 ≈ 1.6, the “market mode” that drives the retarded-… view at source ↗
Figure 19
Figure 19. Figure 19: Tempered Bayesian GAN on a unimodal target: warm-to-cold migration with prior strength. A) Posterior-predictive reverse KL, DKL(Ppp∥P ∗ ), versus η, for four prior widths σp ∈ {0.3, 1, 3, 10}. B) Forward DKL(P ∗∥Ppp). Target: single isotropic Gaussian at the origin with σtarget=0.5. Up to twelve seeds per (σp, η) cell. Bands are ±1 SEM. Vertical dashed line: η = 1 (Bayes). F.3 Tempered posterior effects F… view at source ↗
Figure 20
Figure 20. Figure 20: Out-of-equilibrium training in a Potts BM. A Potts Boltzmann machine (L=27, q=20) is trained by PCD-MAP on lattice-protein sequences (βsel=1000), with γ=0.01, ηlr=0.01, tage=5000, Nchains=1. A) Top coupling eigenvalue maxk |λk(J)| vs. sampling rate k/L (site updates per gradient step, normalized by chain length). At small k the weights overshoot the equilibrium value (dashed line) because the single chain… view at source ↗
Figure 21
Figure 21. Figure 21: Out-of-equilibrium training dynamics on real protein data (SH3 domain PF00018). Same protocol as [PITH_FULL_IMAGE:figures/full_fig_p054_21.png] view at source ↗
read the original abstract

Energy-based models (EBMs) are flexible generative architectures inspired by statistical physics, but their learning and generative properties remain poorly understood. Here, we analyze a solvable EBM in the high-dimensional limit: the spherical Boltzmann machine (SBM). Combining tools from random matrix theory and dynamical mean-field theory, we: solve exact equations describing the training dynamics of the SBM; compute the Bayesian evidence, which acts as a partition function in parameter space and encodes global properties of the trained model; and uncover cascades of phase transitions that occur both during training and as a function of hyperparameters, related to successive alignment and condensation of the top modes of the coupling matrix to the data. We connect these transitions to sampling-time generative phenomena in a teacher-student scenario, including: sampling temperature tuning, double descent as a function of regularization strength, tempered posterior effects, and out-of-equilibrium effects during training that induce biases in the trained model. We provide numerical evidence demonstrating that all these phenomena appear in standard generative architectures, beyond the SBM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper analyzes the spherical Boltzmann machine (SBM) as a solvable high-dimensional energy-based model. Using random matrix theory and dynamical mean-field theory, it derives exact equations for the training dynamics, computes the Bayesian evidence as a partition function over parameters, and identifies cascades of phase transitions tied to successive alignment and condensation of the top modes of the coupling matrix. These transitions are connected to generative phenomena including sampling temperature effects, double descent under regularization, tempered posteriors, and training-induced biases, with numerical evidence that the same phenomena appear in standard (non-spherical, finite-dimensional) EBMs.

Significance. If the exact high-dimensional derivations hold and the numerical mappings are robust, the work supplies a rare solvable limit that explains several otherwise opaque behaviors in EBM training and sampling. The combination of RMT and DMFT to obtain closed equations for dynamics and evidence is a clear technical strength, and the identification of mode-condensation transitions offers a concrete mechanism for double-descent and out-of-equilibrium biases.

major comments (1)
  1. [Numerical experiments (section describing validation on standard architectures)] The central claim that the SBM phase transitions and generative phenomena explain behavior in practical EBMs rests on numerical evidence whose quantitative accuracy is not assessed. No finite-N scaling, deviation-from-sphericity metrics, or error bars on critical hyperparameter values are reported, so it is impossible to judge how faithfully the high-dimensional spherical predictions carry over to finite non-spherical models.
minor comments (1)
  1. [Theory sections] Notation for the spherical constraint and the coupling-matrix eigenvalues should be introduced once with a clear table or glossary, as the same symbols appear in both the RMT and DMFT sections.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We address the single major comment below and outline the revisions we will make to strengthen the numerical validation.

read point-by-point responses
  1. Referee: The central claim that the SBM phase transitions and generative phenomena explain behavior in practical EBMs rests on numerical evidence whose quantitative accuracy is not assessed. No finite-N scaling, deviation-from-sphericity metrics, or error bars on critical hyperparameter values are reported, so it is impossible to judge how faithfully the high-dimensional spherical predictions carry over to finite non-spherical models.

    Authors: We agree that the current numerical section provides primarily qualitative demonstrations that the phenomena appear in standard architectures, without quantitative metrics of agreement or robustness. This limitation weakens the strength of the mapping claim. In the revised manuscript we will add: (i) error bars on all reported critical hyperparameter values obtained from multiple independent runs, (ii) finite-N scaling plots for the non-spherical models to illustrate convergence toward the high-dimensional predictions, and (iii) explicit deviation-from-sphericity metrics (e.g., the Frobenius distance of the trained coupling matrix from its spherical projection) evaluated at the observed transition points. These additions will allow a clearer assessment of how faithfully the spherical limit carries over. revision: yes

Circularity Check

0 steps flagged

No circularity: exact solutions derived from external RMT and DMFT frameworks

full rationale

The paper solves exact training dynamics, Bayesian evidence, and phase transitions for the spherical Boltzmann machine by combining random matrix theory and dynamical mean-field theory in the high-dimensional spherical limit. These are independent external mathematical tools applied to the model, not reductions of the model's own fitted parameters, data, or self-citations. The subsequent connections to sampling phenomena and numerical checks on standard EBMs are presented as separate evidence rather than load-bearing derivations. No self-definitional steps, fitted inputs renamed as predictions, or ansatz smuggling via self-citation appear in the derivation chain. The analysis is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the high-dimensional limit and the spherical constraint; these are domain assumptions required for the RMT/DMFT analysis to close. No free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption High-dimensional limit (N→∞) with spherical constraint on weights
    Invoked to enable exact closure of the dynamical equations via RMT and DMFT.

pith-pipeline@v0.9.0 · 5488 in / 1402 out tokens · 54482 ms · 2026-05-12T02:00:07.368031+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

126 extracted references · 126 canonical work pages · 4 internal anchors

  1. [1]

    Nature communications , volume=

    Exploring the space of self-reproducing ribozymes using generative models , author=. Nature communications , volume=. 2025 , publisher=

  2. [2]

    2026 , eprint=

    There Will Be a Scientific Theory of Deep Learning , author=. 2026 , eprint=

  3. [3]

    Replica Theory of Spherical Boltzmann Machine Ensembles

    Thomas Tulinski and Jorge Fernandez-De-Cossio-Diaz and Simona Cocco and Rémi Monasson , year=. Replica Theory of Spherical. 2604.17936 , archivePrefix=

  4. [4]

    Steering Sequence Generation in Protein Language Models through Iterative Lookback Monte Carlo Sampling , elocation-id =

    Calvanese, Francesco and Lombardi, Gianluca and Weigt, Martin and. Steering Sequence Generation in Protein Language Models through Iterative Lookback Monte Carlo Sampling , elocation-id =. 2026 , doi =. https://www.biorxiv.org/content/early/2026/05/07/2026.05.01.722156.full.pdf , journal =

  5. [5]

    Bayesian Learning in Undirected Graphical Models: Approximate

    Iain Murray and Zoubin Ghahramani , year=. Bayesian Learning in Undirected Graphical Models: Approximate. 1207.4134 , archivePrefix=

  6. [6]

    1986 , doi =

    Mezard, M and Parisi, G and Virasoro, M , title =. 1986 , doi =

  7. [7]

    The space of interactions in neural network models , author =. J. Phys. A: Math. Gen. , volume =. 1988 , doi =

  8. [8]

    Sampling the space of solutions of an artificial neural network , author =. Phys. Rev. E , volume =. 2025 , doi =

  9. [9]

    Controlled

    Zambon, Alessandro and Caruso, Francesca and Zecchina, Riccardo and Tiana, Guido , year =. Controlled. 2603.15367 , archivePrefix =

  10. [10]

    Differential operators on a semisimple

    Harish-Chandra , journal=. Differential operators on a semisimple. 1957 , publisher=

  11. [11]

    and Zuber, J.-B

    Itzykson, C. and Zuber, J.-B. , journal=. The planar approximation. 1980 , publisher=

  12. [12]

    2018 , publisher=

    Introduction to Random Matrices: Theory and Practice , author=. 2018 , publisher=

  13. [13]

    2017 , eprint=

    Decoupled weight decay regularization , author=. 2017 , eprint=

  14. [14]

    Probability Theory and Related Fields , volume=

    Spherical integrals of sublinear rank , author=. Probability Theory and Related Fields , volume=. 2025 , publisher=

  15. [15]

    ALEA: Latin American Journal of Probability and Mathematical Statistics , volume=

    Asymptotics of k dimensional spherical integrals and Applications , author=. ALEA: Latin American Journal of Probability and Mathematical Statistics , volume=

  16. [16]

    Electronic Communications in Probability , number =

    Giulio Biroli and Alice Guionnet , title =. Electronic Communications in Probability , number =. 2020 , doi =

  17. [17]

    Physical Review E , volume=

    Principal-component-analysis eigenvalue spectra from data with symmetry-breaking structure , author=. Physical Review E , volume=. 2004 , publisher=

  18. [18]

    Physical Review E , volume=

    Statistical mechanics of learning multiple orthogonal signals: asymptotic theory and fluctuation effects , author=. Physical Review E , volume=. 2007 , publisher=

  19. [19]

    The Annals of Probability , number =

    Jinho Baik and G. The Annals of Probability , number =. 2005 , doi =

  20. [20]

    IEEE Transactions on Information Theory , volume=

    Matrix inference in growing rank regimes , author=. IEEE Transactions on Information Theory , volume=. 2024 , publisher=

  21. [21]

    Physical Review X , volume=

    Phase diagram of extensive-rank symmetric matrix denoising beyond rotational invariance , author=. Physical Review X , volume=. 2025 , publisher=

  22. [22]

    Extreme value statistics of eigenvalues of Gaussian random matrices , author =. Phys. Rev. E , volume =. 2008 , month =. doi:10.1103/PhysRevE.77.041108 , url =

  23. [23]

    2020 , publisher=

    A first course in random matrix theory: for physicists, engineers and data scientists , author=. 2020 , publisher=

  24. [24]

    Emergence of compositional representations in restricted

    Tubiana, J. Emergence of compositional representations in restricted. Physical Review Letters , volume=. 2017 , publisher=

  25. [25]

    and Penney, R

    Coolen, A.C.C. and Penney, R. and Sherrington, D. , booktitle =. Coupled Dynamics of Fast Neurons and Slow Interactions , url =

  26. [26]

    Physical Review Letters , volume=

    Rigorous Bounds to Retarded Learning , author=. Physical Review Letters , volume=. 2002 , publisher=

  27. [27]

    Dynamical decoupling of generalization and overfitting in large two-layer networks,

    Dynamical decoupling of generalization and overfitting in large two-layer networks , author=. arXiv preprint arXiv:2502.21269 , year=

  28. [28]

    2003 , publisher=

    Information theory, inference and learning algorithms , author=. 2003 , publisher=

  29. [29]

    2025 , eprint=

    Understanding temperature tuning in energy-based models , author=. 2025 , eprint=

  30. [30]

    Nature Communications , volume=

    Designing molecular. Nature Communications , volume=. 2025 , publisher=

  31. [31]

    Recent Applications of Dynamical Mean-Field Methods

    Cugliandolo, Leticia F. Recent Applications of Dynamical Mean-Field Methods. Annual Review of Condensed Matter Physics. 2024. doi:https://doi.org/10.1146/annurev-conmatphys-040721-022848

  32. [32]

    International Conference on Machine Learning , pages=

    Normalizing flows on tori and spheres , author=. International Conference on Machine Learning , pages=. 2020 , organization=

  33. [33]

    The Annals of Statistics , pages=

    An antipodally symmetric distribution on the sphere , author=. The Annals of Statistics , pages=. 1974 , publisher=

  34. [34]

    Hamelryck, Thomas and Mardia, Kanti V. , year=. Unfolding. 2505.19763 , archivePrefix=

  35. [35]

    Nature , volume=

    Detection of a particle shower at the Glashow resonance with IceCube , author=. Nature , volume=. 2021 , publisher=

  36. [36]

    PLoS Computational Biology , volume=

    Sampling realistic protein conformations using local structural bias , author=. PLoS Computational Biology , volume=. 2006 , publisher=

  37. [37]

    Science , volume=

    An evolution-based model for designing chorismate mutase enzymes , author=. Science , volume=. 2020 , publisher=

  38. [38]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume=

    The Fisher-Bingham distribution on the sphere , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1982 , publisher=

  39. [39]

    2025 , eprint=

    Reasoning with Sampling: Your Base Model is Smarter Than You Think , author=. 2025 , eprint=

  40. [40]

    Advances in Neural Information Processing Systems , volume=

    Implicit generation and modeling with energy based models , author=. Advances in Neural Information Processing Systems , volume=

  41. [41]

    2024 , eprint=

    Phase Transitions in the Output Distribution of Large Language Models , author=. 2024 , eprint=

  42. [42]

    Journal of Statistical Mechanics: Theory and Experiment , volume=

    Spin-glass theory for pedestrians , author=. Journal of Statistical Mechanics: Theory and Experiment , volume=. 2005 , publisher=

  43. [43]

    Journal of Statistical Mechanics: Theory and Experiment , volume=

    Replica method for computational problems with randomness: principles and illustrations , author=. Journal of Statistical Mechanics: Theory and Experiment , volume=. 2024 , publisher=

  44. [44]

    Physical Review Letters , volume=

    Spherical model of a spin-glass , author=. Physical Review Letters , volume=. 1976 , publisher=

  45. [45]

    Physical Review , volume=

    Information theory and statistical mechanics , author=. Physical Review , volume=. 1957 , publisher=

  46. [46]

    2013 , eprint=

    A new method to simulate the Bingham and related distributions in directional data analysis with applications , author=. 2013 , eprint=

  47. [47]

    Baxter , title =

    Rodney J. Baxter , title =. 1990 , edition =

  48. [48]

    Physical Review A , volume=

    Spin-glass models of neural networks , author=. Physical Review A , volume=. 1985 , publisher=

  49. [49]

    2025 , publisher=

    Valentina Ros , journal=. 2025 , publisher=. doi:10.21468/SciPostPhysLectNotes.102 , url=

  50. [50]

    Physical Review , volume=

    The spherical model of a ferromagnet , author=. Physical Review , volume=. 1952 , publisher=

  51. [51]

    Journal of Physics A: Mathematical and Theoretical , volume=

    Gaussian-spherical restricted Boltzmann machines , author=. Journal of Physics A: Mathematical and Theoretical , volume=. 2020 , publisher=

  52. [52]

    Proceedings of the National Academy of Sciences , volume =

    Mikhail Belkin and Daniel Hsu and Siyuan Ma and Soumik Mandal , title =. Proceedings of the National Academy of Sciences , volume =. 2019 , doi =

  53. [53]

    Chinese Physics B , volume=

    Restricted Boltzmann machine: Recent advances and mean-field theory , author=. Chinese Physics B , volume=. 2021 , publisher=

  54. [54]

    Data augmentation in

    Nabarro, Seth and Ganev, Stoil and Garriga-Alonso, Adri. Data augmentation in. Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence , series =

  55. [55]

    Advances in Neural Information Processing Systems , volume =

    Disentangling the roles of curation, data-augmentation and the prior in the cold posterior effect , author =. Advances in Neural Information Processing Systems , volume =

  56. [56]

    Proceedings of the 15th Asian Conference on Machine Learning , series =

    The fine print on tempered posteriors , author =. Proceedings of the 15th Asian Conference on Machine Learning , series =

  57. [57]

    Fundamental operating regimes, hyper-parameter fine-tuning and glassiness: towards an interpretable replica-theory for trained restricted

    Fachechi, Alberto and Agliari, Elena and Aquaro, Miriam and Coolen, Anthony and Mulder, Menno , journal=. Fundamental operating regimes, hyper-parameter fine-tuning and glassiness: towards an interpretable replica-theory for trained restricted. 2025 , publisher=

  58. [58]

    2024 , eprint=

    Cascade of phase transitions in the training of Energy-based models , author=. 2024 , eprint=

  59. [59]

    2025 , eprint=

    A theoretical framework for overfitting in energy-based modeling , author=. 2025 , eprint=

  60. [60]

    The Volume of Non-Restricted

    Cheema, Prasad and Sugiyama, Mahito , booktitle =. The Volume of Non-Restricted. 2020 , url =

  61. [61]

    Neural Networks , volume=

    Modeling structured data learning with Restricted Boltzmann machines in the teacher--student setting , author=. Neural Networks , volume=. 2025 , publisher=

  62. [62]

    How Good is the

    Wenzel, Florian and Roth, Kevin and Veeling, Bastiaan S and. How Good is the. International Conference on Machine Learning , pages=. 2020 , organization=

  63. [63]

    Communications on Pure and Applied Mathematics , volume=

    The generalization error of random features regression: Precise asymptotics and the double descent curve , author=. Communications on Pure and Applied Mathematics , volume=. 2022 , publisher=

  64. [64]

    Journal of Multivariate Analysis , volume=

    The singular values and vectors of low rank perturbations of large rectangular random matrices , author=. Journal of Multivariate Analysis , volume=. 2012 , publisher=

  65. [65]

    Physical Review E , volume=

    Overlaps between eigenvectors of spiked, correlated random matrices , author=. Physical Review E , volume=. 2023 , publisher=

  66. [66]

    Neural Networks , volume=

    High-dimensional dynamics of generalization error in neural networks , author=. Neural Networks , volume=. 2020 , publisher=

  67. [67]

    2006 , publisher=

    Pattern Recognition and Machine Learning , author=. 2006 , publisher=

  68. [68]

    The largest eigenvalue of small rank perturbations of

    P. The largest eigenvalue of small rank perturbations of. Probability Theory and Related Fields , publisher=. 2005 , pages=. doi:10.1007/s00440-005-0466-z , number=

  69. [69]

    The largest eigenvalues of finite rank deformation of large

    Capitaine, Mireille and Donati-Martin, Catherine and F. The largest eigenvalues of finite rank deformation of large. The Annals of Probability , publisher=. doi:10.1214/08-AOP394 , number=

  70. [70]

    D’Alessio, Y

    Zdeborov. Statistical physics of inference: thresholds and algorithms , volume=. Advances in Physics , publisher=. 2016 , pages=. doi:10.1080/00018732.2016.1211393 , number=

  71. [71]

    Sebastian Seung, Haim Sompolinsky, and Naftali Tishby

    Seung, Hyunjune S. and Sompolinsky, Haim and Tishby, Naftali , year=. Statistical mechanics of learning from examples , volume=. Physical Review A , publisher=. doi:10.1103/PhysRevA.45.6056 , number=

  72. [72]

    Statistical Mechanics of Learning , DOI=

    Engel, Andreas and Van den Broeck, Chris , year=. Statistical Mechanics of Learning , DOI=

  73. [73]

    Hastie, Trevor and Montanari, Andrea and Rosset, Saharon and Tibshirani, Ryan J. , year=. Surprises in high-dimensional ridgeless least squares interpolation , volume=. The Annals of Statistics , publisher=. doi:10.1214/21-AOS2133 , number=

  74. [74]

    and Sommers, H.-J

    Crisanti, A. and Sommers, H.-J. , year=. The spherical p -spin interaction spin glass model: the statics , volume=. Zeitschrift f. doi:10.1007/BF01309287 , number=

  75. [75]

    2020 , eprint=

    Cold Posteriors and Aleatoric Uncertainty , author=. 2020 , eprint=

  76. [76]

    The Safe

    Gr. The Safe. 2012 , pages=. doi:10.1007/978-3-642-34106-9_16 , booktitle=

  77. [77]

    and Stariolo, Daniel A

    de Freitas Pimenta, Pedro H. and Stariolo, Daniel A. , year=. Finite-Size Relaxational Dynamics of a Spike Random Matrix Spherical Model , volume=. Entropy , publisher=. doi:10.3390/e25060957 , number=

  78. [78]

    Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

    Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , author=. International Conference on Learning Representations , year=. 1312.6120 , archivePrefix=

  79. [79]

    Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

    Grokking: Generalization beyond overfitting on small algorithmic datasets , author=. ICLR 2022 MATH-AI Workshop , year=. 2201.02177 , archivePrefix=

  80. [80]

    Progress measures for grokking via mechanistic interpretability,

    Progress measures for grokking via mechanistic interpretability , author=. arXiv preprint arXiv:2301.05217 , year=. 2301.05217 , archivePrefix=

Showing first 80 references.