arxiv: 2605.09031 · v1 · submitted 2026-05-09 · 💻 cs.LG

Recognition: no theorem link

Spherical Boltzmann machines: a solvable theory of learning and generation in energy-based models

Jorge Fernandez-De-Cossio-Diaz, R\'emi Monasson, Simona Cocco, Thomas Tulinski

Pith reviewed 2026-05-12 02:00 UTC · model grok-4.3

classification 💻 cs.LG

keywords energy-based modelsBoltzmann machinesphase transitionstraining dynamicsgenerative modelsrandom matrix theorydynamical mean-field theoryBayesian evidence

0 comments

The pith

The spherical Boltzmann machine provides an exactly solvable model for the training dynamics and phase transitions in energy-based generative models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a solvable theory for spherical Boltzmann machines, a constrained version of energy-based models. By using random matrix theory and dynamical mean-field theory in the high-dimensional limit, it derives exact equations for how the model learns from data and generates new samples. The work identifies cascades of phase transitions during training and with varying hyperparameters, linked to how the model's coupling matrix aligns with data modes. Understanding these transitions explains phenomena like double descent and temperature effects in sampling, which the authors show also occur in standard non-spherical models through numerical checks. This offers a theoretical window into why and how generative models succeed or fail in learning and producing data.

Core claim

The spherical Boltzmann machine (SBM) allows exact solution of its training dynamics via random matrix and dynamical mean-field methods. The Bayesian evidence, acting as a partition function over parameters, reveals global properties of the trained model. Cascades of phase transitions arise from successive alignment and condensation of the top modes of the coupling matrix to the data, both during training and as hyperparameters change. These transitions connect to generative behaviors such as sampling temperature tuning, double descent with regularization, tempered posteriors, and out-of-equilibrium training biases.

What carries the argument

The spherical Boltzmann machine under random matrix theory and dynamical mean-field theory analysis, which solves exact training equations and computes the Bayesian evidence to reveal mode alignment phase transitions.

Load-bearing premise

The analysis assumes the high-dimensional limit with spherical constraints, with numerical evidence bridging to finite-dimensional non-spherical cases.

What would settle it

Observing no phase transitions, no double descent, or no tempered effects in a finite non-spherical energy-based model trained similarly would falsify the generality claim.

Figures

Figures reproduced from arXiv: 2605.09031 by Jorge Fernandez-De-Cossio-Diaz, R\'emi Monasson, Simona Cocco, Thomas Tulinski.

**Figure 1.** Figure 1: Equilibrium phase diagram for K = 1. A) Schematic eigenvalue configurations in each phase: bulk semicircle (light blue), top eigenvalue λ1 and the Lagrange multiplier µ enforcing ∥x∥ 2 = N. B) Phases in the (γ, η) plane; the blue dashed line separates outlier from edge regimes. C) λ1 and Lagrange multiplier µ vs. γ in the small-η regime, where λ1 sticks to the edge; phase transitions occur at the vertical … view at source ↗

**Figure 2.** Figure 2: Training dynamics. A) Top eigenvalue trajectories λi(t) from finite-N simulations (black) vs. the early-time predictions (14) (red dotted). Vertical dashed lines are the detachment (14) and condensation times t∗. Blue horizontal line is the bulk edge (2σ). B) s1(t) from DMFT (red) compared to finite-N simulations (blue). C) Stationary overlap sst as a function of ν for selected values of γ (dashed lines in… view at source ↗

**Figure 3.** Figure 3: Sampling temperature tuning (TT). A) Mode rescue through temperature tuning, illustrating how the bulk and outlier position change. B) DKL(P ∗∥PβW ) vs. β. Similarly, the generative performance of the posterior predictive student can be measured by the reverse DKL(Ppp∥P ∗ ) or forward DKL(P ∗∥Ppp). The posterior predictive is popular in Bayesian prediction [Aitchison, 1975, Brown et al., 2008], but, wi… view at source ↗

**Figure 4.** Figure 4: Double descent. η indicated in each curve. Black dots: minima. Orange dashed: thresholds η = ηDD (upper) and η → ∞ (lower). Vertical dashed: boundary between h ̸= 0 and h = 0 phases. The model responds in qualitatively different ways to temperature tuning, depending on the starting phase. Furthermore, changing the value of β can transport the model from one phase to another. If the trained model starts i… view at source ↗

**Figure 5.** Figure 5: Tempered-posterior. A) Phases of the optimal η rev pp (γ, ω∗ ). B) Reverse KL vs. η for selected γ’s (dots: minima). Double descent [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Out-of-equilibrium training. A) Reverse DKL(PW(t)∥P ∗ ) during training, B) λ1(t), C) u1(t). Color: ν. Blue dashed: optimal early-stopping time. Orange dashed: equilibrium values (at ν → ∞). An optimal posterior temperature different from 1 (the Bayesian prescription) has been referred to as a cold/warmposterior effect [Wenzel et al., 2020, Noci et al., 2021, Nabarro et al., 2022, Pitas and Arbel, 2024]… view at source ↗

**Figure 7.** Figure 7: K = 2 phase diagrams. Phase diagram in the (γ, η) plane for c1 ∈ {1.1, 1.3, 1.5, 1.7} (with c2 = 2 − c1). Black lines are phase boundaries; the teal line separates outlier from edge phases [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Validation of the large-N DMFT against finite-N SBM training. Each panel compares DMFT (solid) with finite-N SGD runs (persistent MCMC, mini-batch size = 1) simulated by Euler integration of the coupled Langevin system (4)–(3) at N = 4000, averaged over five seeds (dashed, with ±SEM bands). Top row: signal overlaps sk(t); bottom row: Lagrange multiplier κ(t). The gray strip marks the ±1/ √ N finite-N noise… view at source ↗

**Figure 9.** Figure 9: Dynamical phase structure of the K=1 DMFT at finite η. Three values η ∈ {1, 3, 10} are shown in columns. Row 1: dynamical phase diagram νc(γ) (solid black), obtained by warmstarted path continuation on a fine τ -grid (∆τ ≃ 4 × 10−3 ) with the guard νc∆τ < 0.6; red shading is the condensed branch (sst ̸= 0) and blue shading the uncondensed branch (sst → 0). The dashed vertical line marks γ = 1, the vertica… view at source ↗

**Figure 10.** Figure 10: Validation of the large-K DMFT equations for nearly finite-dimensional Ising data. A, B) Spectrum of the empirical covariance matrix C for K configurations of the two-dimensional periodic Ising model at fixed linear size L = 32, N = L×L(= 1024). Below the critical temperature, Tlow = 2.2 < Tc ≃ 2.269, the leading eigenvalue is extensive, c1 ≃ Km(Tlow) 2 , while c2/K decreases with K, consistent with effec… view at source ↗

**Figure 11.** Figure 11: Large-K training dynamics for nearly finite-dimensional Ising data. Comparison between the effective K′ = 1 large-K DMFT prediction and finite-N coupled Langevin simulations for below-critical two-dimensional Ising data at T = 1.8, with K = 512 and N = 2025. Time is shown in the rescaled variable t = Kt/K′ , and colors indicate the rescaled persistent-chain update rate ν. Solid curves show the effective K… view at source ↗

**Figure 12.** Figure 12: Optimum η fwd pp = arg minη>0 DKL(P ∗∥Ppp) for the rank-one K=2 teacher. A) Phases in the (ω ∗ , γ) plane: warm flat (yellow), unique cold (medium blue), MAP (light blue); the hatched sliver is the mixed warm/cold tie. Boundaries: dashed teal γwc (Bayes-crossing, η fwd pp = 1), dotted black γflat (upper edge of the warm flat interval), solid black γ∞ (cold/MAP). B) DKL(P ∗∥Ppp)/N vs. η at ω ∗ = 2.2 for γ … view at source ↗

**Figure 13.** Figure 13: Comparison of the SBM with the unconstrained Gaussian model. A) Typical reverse KL divergence ⟨DKL(PW ∥P ∗ )⟩/N as a function of the prior strength γ, at η = 5 and ω ∗ = 2.5 (K = 2). The SBM curve (solid) is colored by phase: condensed h ̸= 0 (red) and uncondensed h = 0 (blue). The Gaussian equivalent (dashed gray) is the h = 0 formula (ω ∗ − ln ω ∗ − 1 + 1/(2γη))/2, which coincides with the SBM throughou… view at source ↗

**Figure 14.** Figure 14: Temperature tuning on Potts BM trained on PF00072. Pairwise Potts models on the Pfam response-regulator family PF00072 (L = 111, q = 21), trained via adabmDCA [Rosset et al., 2026]. A) Pearson correlation between generated and data connected correlations as a function of sampling inverse temperature β, for sixteen L2 regularizations γ (color bar). Red dots mark the peak of each curve, defining βopt(γ).… view at source ↗

**Figure 15.** Figure 15: Temperature tuning on Potts BM trained on PF00018. Pairwise Potts models on the Pfam SH3 domain PF00018 (L = 48, q = 21), trained via adabmDCA. A) Pearson correlation between generated and data connected correlations as a function of sampling inverse temperature β, for sixteen L2 regularizations (color bar); red dots mark βopt. B) Top six covariance eigenvalues σ 2 k (β) at γ = 1 (left axis, solid lines);… view at source ↗

**Figure 16.** Figure 16: Double descent in a tractable flow generative model. Posterior-averaged reverse KL divergence per dimension (top) and h ̸= 0 fraction across training seeds (bottom) as functions of the L2 prior strength γ, for a Householder-and-altitude normalizing flow on SN [Rezende et al., 2020] trained by reverse-KL variational inference against the same rank-one Bingham teacher as in [PITH_FULL_IMAGE:figures/full_fi… view at source ↗

**Figure 17.** Figure 17: Double descent in a binary FVSBN persists across K on rank-1-dominated teachers. SWAG posterior-averaged reverse KL per spin, DKL(qW ∥P ∗ )/N, vs. the paper-convention regularization γ, for the binary FVSBN trained on A) a rank-1 Curie–Weiss / Hopfield teacher at β = 2.0, N = 16, and B) a 2D Ising teacher on the L = 4 periodic square lattice (N = 16) at β = 0.5, deep in the ferromagnetic phase. Each pane… view at source ↗

**Figure 18.** Figure 18: Double descent in a Gaussian-visible RBM trained on financial return data. A) Eigenvalue spectrum of the empirical training-set covariance C on a log-λ axis. The bulk (gray histogram) spans λ ∼ 10−1 to λ ∼ 2, dominated by sectoral correlations; a single rank-one outlier sits at λ1 ≈ 25 (blue line), more than an order of magnitude above the bulk maximum λ2 ≈ 1.6, the “market mode” that drives the retarded-… view at source ↗

**Figure 19.** Figure 19: Tempered Bayesian GAN on a unimodal target: warm-to-cold migration with prior strength. A) Posterior-predictive reverse KL, DKL(Ppp∥P ∗ ), versus η, for four prior widths σp ∈ {0.3, 1, 3, 10}. B) Forward DKL(P ∗∥Ppp). Target: single isotropic Gaussian at the origin with σtarget=0.5. Up to twelve seeds per (σp, η) cell. Bands are ±1 SEM. Vertical dashed line: η = 1 (Bayes). F.3 Tempered posterior effects F… view at source ↗

**Figure 20.** Figure 20: Out-of-equilibrium training in a Potts BM. A Potts Boltzmann machine (L=27, q=20) is trained by PCD-MAP on lattice-protein sequences (βsel=1000), with γ=0.01, ηlr=0.01, tage=5000, Nchains=1. A) Top coupling eigenvalue maxk |λk(J)| vs. sampling rate k/L (site updates per gradient step, normalized by chain length). At small k the weights overshoot the equilibrium value (dashed line) because the single chain… view at source ↗

**Figure 21.** Figure 21: Out-of-equilibrium training dynamics on real protein data (SH3 domain PF00018). Same protocol as [PITH_FULL_IMAGE:figures/full_fig_p054_21.png] view at source ↗

read the original abstract

Energy-based models (EBMs) are flexible generative architectures inspired by statistical physics, but their learning and generative properties remain poorly understood. Here, we analyze a solvable EBM in the high-dimensional limit: the spherical Boltzmann machine (SBM). Combining tools from random matrix theory and dynamical mean-field theory, we: solve exact equations describing the training dynamics of the SBM; compute the Bayesian evidence, which acts as a partition function in parameter space and encodes global properties of the trained model; and uncover cascades of phase transitions that occur both during training and as a function of hyperparameters, related to successive alignment and condensation of the top modes of the coupling matrix to the data. We connect these transitions to sampling-time generative phenomena in a teacher-student scenario, including: sampling temperature tuning, double descent as a function of regularization strength, tempered posterior effects, and out-of-equilibrium effects during training that induce biases in the trained model. We provide numerical evidence demonstrating that all these phenomena appear in standard generative architectures, beyond the SBM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Exact high-d spherical SBM training equations and phase transitions are cleanly derived, but the numerical link to finite non-spherical EBMs is unquantified and remains the weakest part.

read the letter

The paper delivers an exactly solvable spherical Boltzmann machine in the high-dimensional limit. Random matrix theory and dynamical mean-field theory produce closed equations for the training dynamics, the Bayesian evidence treated as a partition function over parameters, and cascades of phase transitions driven by successive alignment and condensation of the leading modes of the coupling matrix. These transitions are then tied to sampling phenomena such as temperature effects, double descent under regularization, tempered posteriors, and out-of-equilibrium training biases. Numerical checks indicate that the same qualitative behaviors appear in ordinary Boltzmann machines and other generators. That combination of exact solvability plus explicit links to known EBM puzzles is the concrete advance. The derivations rest on standard, well-controlled tools for the spherical case, so the core math looks reproducible in the N to infinity limit. The numerics for standard architectures are a necessary step toward relevance. The soft spot is the extrapolation itself. The abstract and stress-test note both indicate that the mapping from spherical high-d solutions to finite, non-spherical practical EBMs is supported only by numerical evidence, with no reported finite-size corrections, deviation metrics, or sensitivity analysis on how much the spherical constraint shifts the critical hyperparameter values. Without those quantifications the claim that the SBM theory explains behavior in real models stays provisional rather than demonstrated. This work is for theorists who want closed-form benchmarks for EBM learning dynamics or phase-transition pictures of double descent. A reader already comfortable with RMT and DMFT will find usable equations and testable predictions. It is coherent on its own terms and engages the relevant literature without circularity. I would send it to peer review so the derivations can be verified and the numerical bridge can be examined in detail.

Referee Report

1 major / 1 minor

Summary. The paper analyzes the spherical Boltzmann machine (SBM) as a solvable high-dimensional energy-based model. Using random matrix theory and dynamical mean-field theory, it derives exact equations for the training dynamics, computes the Bayesian evidence as a partition function over parameters, and identifies cascades of phase transitions tied to successive alignment and condensation of the top modes of the coupling matrix. These transitions are connected to generative phenomena including sampling temperature effects, double descent under regularization, tempered posteriors, and training-induced biases, with numerical evidence that the same phenomena appear in standard (non-spherical, finite-dimensional) EBMs.

Significance. If the exact high-dimensional derivations hold and the numerical mappings are robust, the work supplies a rare solvable limit that explains several otherwise opaque behaviors in EBM training and sampling. The combination of RMT and DMFT to obtain closed equations for dynamics and evidence is a clear technical strength, and the identification of mode-condensation transitions offers a concrete mechanism for double-descent and out-of-equilibrium biases.

major comments (1)

[Numerical experiments (section describing validation on standard architectures)] The central claim that the SBM phase transitions and generative phenomena explain behavior in practical EBMs rests on numerical evidence whose quantitative accuracy is not assessed. No finite-N scaling, deviation-from-sphericity metrics, or error bars on critical hyperparameter values are reported, so it is impossible to judge how faithfully the high-dimensional spherical predictions carry over to finite non-spherical models.

minor comments (1)

[Theory sections] Notation for the spherical constraint and the coupling-matrix eigenvalues should be introduced once with a clear table or glossary, as the same symbols appear in both the RMT and DMFT sections.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We address the single major comment below and outline the revisions we will make to strengthen the numerical validation.

read point-by-point responses

Referee: The central claim that the SBM phase transitions and generative phenomena explain behavior in practical EBMs rests on numerical evidence whose quantitative accuracy is not assessed. No finite-N scaling, deviation-from-sphericity metrics, or error bars on critical hyperparameter values are reported, so it is impossible to judge how faithfully the high-dimensional spherical predictions carry over to finite non-spherical models.

Authors: We agree that the current numerical section provides primarily qualitative demonstrations that the phenomena appear in standard architectures, without quantitative metrics of agreement or robustness. This limitation weakens the strength of the mapping claim. In the revised manuscript we will add: (i) error bars on all reported critical hyperparameter values obtained from multiple independent runs, (ii) finite-N scaling plots for the non-spherical models to illustrate convergence toward the high-dimensional predictions, and (iii) explicit deviation-from-sphericity metrics (e.g., the Frobenius distance of the trained coupling matrix from its spherical projection) evaluated at the observed transition points. These additions will allow a clearer assessment of how faithfully the spherical limit carries over. revision: yes

Circularity Check

0 steps flagged

No circularity: exact solutions derived from external RMT and DMFT frameworks

full rationale

The paper solves exact training dynamics, Bayesian evidence, and phase transitions for the spherical Boltzmann machine by combining random matrix theory and dynamical mean-field theory in the high-dimensional spherical limit. These are independent external mathematical tools applied to the model, not reductions of the model's own fitted parameters, data, or self-citations. The subsequent connections to sampling phenomena and numerical checks on standard EBMs are presented as separate evidence rather than load-bearing derivations. No self-definitional steps, fitted inputs renamed as predictions, or ansatz smuggling via self-citation appear in the derivation chain. The analysis is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the high-dimensional limit and the spherical constraint; these are domain assumptions required for the RMT/DMFT analysis to close. No free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption High-dimensional limit (N→∞) with spherical constraint on weights
Invoked to enable exact closure of the dynamical equations via RMT and DMFT.

pith-pipeline@v0.9.0 · 5488 in / 1402 out tokens · 54482 ms · 2026-05-12T02:00:07.368031+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

126 extracted references · 126 canonical work pages · 4 internal anchors

[1]

Nature communications , volume=

Exploring the space of self-reproducing ribozymes using generative models , author=. Nature communications , volume=. 2025 , publisher=

work page 2025
[2]

2026 , eprint=

There Will Be a Scientific Theory of Deep Learning , author=. 2026 , eprint=

work page 2026
[3]

Replica Theory of Spherical Boltzmann Machine Ensembles

Thomas Tulinski and Jorge Fernandez-De-Cossio-Diaz and Simona Cocco and Rémi Monasson , year=. Replica Theory of Spherical. 2604.17936 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Steering Sequence Generation in Protein Language Models through Iterative Lookback Monte Carlo Sampling , elocation-id =

Calvanese, Francesco and Lombardi, Gianluca and Weigt, Martin and. Steering Sequence Generation in Protein Language Models through Iterative Lookback Monte Carlo Sampling , elocation-id =. 2026 , doi =. https://www.biorxiv.org/content/early/2026/05/07/2026.05.01.722156.full.pdf , journal =

work page 2026
[5]

Bayesian Learning in Undirected Graphical Models: Approximate

Iain Murray and Zoubin Ghahramani , year=. Bayesian Learning in Undirected Graphical Models: Approximate. 1207.4134 , archivePrefix=

work page arXiv
[6]

1986 , doi =

Mezard, M and Parisi, G and Virasoro, M , title =. 1986 , doi =

work page 1986
[7]

The space of interactions in neural network models , author =. J. Phys. A: Math. Gen. , volume =. 1988 , doi =

work page 1988
[8]

Sampling the space of solutions of an artificial neural network , author =. Phys. Rev. E , volume =. 2025 , doi =

work page 2025
[9]

Controlled

Zambon, Alessandro and Caruso, Francesca and Zecchina, Riccardo and Tiana, Guido , year =. Controlled. 2603.15367 , archivePrefix =

work page arXiv
[10]

Differential operators on a semisimple

Harish-Chandra , journal=. Differential operators on a semisimple. 1957 , publisher=

work page 1957
[11]

and Zuber, J.-B

Itzykson, C. and Zuber, J.-B. , journal=. The planar approximation. 1980 , publisher=

work page 1980
[12]

2018 , publisher=

Introduction to Random Matrices: Theory and Practice , author=. 2018 , publisher=

work page 2018
[13]

2017 , eprint=

Decoupled weight decay regularization , author=. 2017 , eprint=

work page 2017
[14]

Probability Theory and Related Fields , volume=

Spherical integrals of sublinear rank , author=. Probability Theory and Related Fields , volume=. 2025 , publisher=

work page 2025
[15]

ALEA: Latin American Journal of Probability and Mathematical Statistics , volume=

Asymptotics of k dimensional spherical integrals and Applications , author=. ALEA: Latin American Journal of Probability and Mathematical Statistics , volume=

work page
[16]

Electronic Communications in Probability , number =

Giulio Biroli and Alice Guionnet , title =. Electronic Communications in Probability , number =. 2020 , doi =

work page 2020
[17]

Physical Review E , volume=

Principal-component-analysis eigenvalue spectra from data with symmetry-breaking structure , author=. Physical Review E , volume=. 2004 , publisher=

work page 2004
[18]

Physical Review E , volume=

Statistical mechanics of learning multiple orthogonal signals: asymptotic theory and fluctuation effects , author=. Physical Review E , volume=. 2007 , publisher=

work page 2007
[19]

The Annals of Probability , number =

Jinho Baik and G. The Annals of Probability , number =. 2005 , doi =

work page 2005
[20]

IEEE Transactions on Information Theory , volume=

Matrix inference in growing rank regimes , author=. IEEE Transactions on Information Theory , volume=. 2024 , publisher=

work page 2024
[21]

Physical Review X , volume=

Phase diagram of extensive-rank symmetric matrix denoising beyond rotational invariance , author=. Physical Review X , volume=. 2025 , publisher=

work page 2025
[22]

Extreme value statistics of eigenvalues of Gaussian random matrices , author =. Phys. Rev. E , volume =. 2008 , month =. doi:10.1103/PhysRevE.77.041108 , url =

work page doi:10.1103/physreve.77.041108 2008
[23]

2020 , publisher=

A first course in random matrix theory: for physicists, engineers and data scientists , author=. 2020 , publisher=

work page 2020
[24]

Emergence of compositional representations in restricted

Tubiana, J. Emergence of compositional representations in restricted. Physical Review Letters , volume=. 2017 , publisher=

work page 2017
[25]

and Penney, R

Coolen, A.C.C. and Penney, R. and Sherrington, D. , booktitle =. Coupled Dynamics of Fast Neurons and Slow Interactions , url =

work page
[26]

Physical Review Letters , volume=

Rigorous Bounds to Retarded Learning , author=. Physical Review Letters , volume=. 2002 , publisher=

work page 2002
[27]

Dynamical decoupling of generalization and overfitting in large two-layer networks,

Dynamical decoupling of generalization and overfitting in large two-layer networks , author=. arXiv preprint arXiv:2502.21269 , year=

work page arXiv
[28]

2003 , publisher=

Information theory, inference and learning algorithms , author=. 2003 , publisher=

work page 2003
[29]

2025 , eprint=

Understanding temperature tuning in energy-based models , author=. 2025 , eprint=

work page 2025
[30]

Nature Communications , volume=

Designing molecular. Nature Communications , volume=. 2025 , publisher=

work page 2025
[31]

Recent Applications of Dynamical Mean-Field Methods

Cugliandolo, Leticia F. Recent Applications of Dynamical Mean-Field Methods. Annual Review of Condensed Matter Physics. 2024. doi:https://doi.org/10.1146/annurev-conmatphys-040721-022848

work page doi:10.1146/annurev-conmatphys-040721-022848 2024
[32]

International Conference on Machine Learning , pages=

Normalizing flows on tori and spheres , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020
[33]

The Annals of Statistics , pages=

An antipodally symmetric distribution on the sphere , author=. The Annals of Statistics , pages=. 1974 , publisher=

work page 1974
[34]

Hamelryck, Thomas and Mardia, Kanti V. , year=. Unfolding. 2505.19763 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv
[35]

Nature , volume=

Detection of a particle shower at the Glashow resonance with IceCube , author=. Nature , volume=. 2021 , publisher=

work page 2021
[36]

PLoS Computational Biology , volume=

Sampling realistic protein conformations using local structural bias , author=. PLoS Computational Biology , volume=. 2006 , publisher=

work page 2006
[37]

Science , volume=

An evolution-based model for designing chorismate mutase enzymes , author=. Science , volume=. 2020 , publisher=

work page 2020
[38]

Journal of the Royal Statistical Society: Series B (Methodological) , volume=

The Fisher-Bingham distribution on the sphere , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1982 , publisher=

work page 1982
[39]

2025 , eprint=

Reasoning with Sampling: Your Base Model is Smarter Than You Think , author=. 2025 , eprint=

work page 2025
[40]

Advances in Neural Information Processing Systems , volume=

Implicit generation and modeling with energy based models , author=. Advances in Neural Information Processing Systems , volume=

work page
[41]

2024 , eprint=

Phase Transitions in the Output Distribution of Large Language Models , author=. 2024 , eprint=

work page 2024
[42]

Journal of Statistical Mechanics: Theory and Experiment , volume=

Spin-glass theory for pedestrians , author=. Journal of Statistical Mechanics: Theory and Experiment , volume=. 2005 , publisher=

work page 2005
[43]

Journal of Statistical Mechanics: Theory and Experiment , volume=

Replica method for computational problems with randomness: principles and illustrations , author=. Journal of Statistical Mechanics: Theory and Experiment , volume=. 2024 , publisher=

work page 2024
[44]

Physical Review Letters , volume=

Spherical model of a spin-glass , author=. Physical Review Letters , volume=. 1976 , publisher=

work page 1976
[45]

Physical Review , volume=

Information theory and statistical mechanics , author=. Physical Review , volume=. 1957 , publisher=

work page 1957
[46]

2013 , eprint=

A new method to simulate the Bingham and related distributions in directional data analysis with applications , author=. 2013 , eprint=

work page 2013
[47]

Baxter , title =

Rodney J. Baxter , title =. 1990 , edition =

work page 1990
[48]

Physical Review A , volume=

Spin-glass models of neural networks , author=. Physical Review A , volume=. 1985 , publisher=

work page 1985
[49]

2025 , publisher=

Valentina Ros , journal=. 2025 , publisher=. doi:10.21468/SciPostPhysLectNotes.102 , url=

work page doi:10.21468/scipostphyslectnotes.102 2025
[50]

Physical Review , volume=

The spherical model of a ferromagnet , author=. Physical Review , volume=. 1952 , publisher=

work page 1952
[51]

Journal of Physics A: Mathematical and Theoretical , volume=

Gaussian-spherical restricted Boltzmann machines , author=. Journal of Physics A: Mathematical and Theoretical , volume=. 2020 , publisher=

work page 2020
[52]

Proceedings of the National Academy of Sciences , volume =

Mikhail Belkin and Daniel Hsu and Siyuan Ma and Soumik Mandal , title =. Proceedings of the National Academy of Sciences , volume =. 2019 , doi =

work page 2019
[53]

Chinese Physics B , volume=

Restricted Boltzmann machine: Recent advances and mean-field theory , author=. Chinese Physics B , volume=. 2021 , publisher=

work page 2021
[54]

Data augmentation in

Nabarro, Seth and Ganev, Stoil and Garriga-Alonso, Adri. Data augmentation in. Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence , series =

work page
[55]

Advances in Neural Information Processing Systems , volume =

Disentangling the roles of curation, data-augmentation and the prior in the cold posterior effect , author =. Advances in Neural Information Processing Systems , volume =

work page
[56]

Proceedings of the 15th Asian Conference on Machine Learning , series =

The fine print on tempered posteriors , author =. Proceedings of the 15th Asian Conference on Machine Learning , series =

work page
[57]

Fundamental operating regimes, hyper-parameter fine-tuning and glassiness: towards an interpretable replica-theory for trained restricted

Fachechi, Alberto and Agliari, Elena and Aquaro, Miriam and Coolen, Anthony and Mulder, Menno , journal=. Fundamental operating regimes, hyper-parameter fine-tuning and glassiness: towards an interpretable replica-theory for trained restricted. 2025 , publisher=

work page 2025
[58]

2024 , eprint=

Cascade of phase transitions in the training of Energy-based models , author=. 2024 , eprint=

work page 2024
[59]

2025 , eprint=

A theoretical framework for overfitting in energy-based modeling , author=. 2025 , eprint=

work page 2025
[60]

The Volume of Non-Restricted

Cheema, Prasad and Sugiyama, Mahito , booktitle =. The Volume of Non-Restricted. 2020 , url =

work page 2020
[61]

Neural Networks , volume=

Modeling structured data learning with Restricted Boltzmann machines in the teacher--student setting , author=. Neural Networks , volume=. 2025 , publisher=

work page 2025
[62]

How Good is the

Wenzel, Florian and Roth, Kevin and Veeling, Bastiaan S and. How Good is the. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020
[63]

Communications on Pure and Applied Mathematics , volume=

The generalization error of random features regression: Precise asymptotics and the double descent curve , author=. Communications on Pure and Applied Mathematics , volume=. 2022 , publisher=

work page 2022
[64]

Journal of Multivariate Analysis , volume=

The singular values and vectors of low rank perturbations of large rectangular random matrices , author=. Journal of Multivariate Analysis , volume=. 2012 , publisher=

work page 2012
[65]

Physical Review E , volume=

Overlaps between eigenvectors of spiked, correlated random matrices , author=. Physical Review E , volume=. 2023 , publisher=

work page 2023
[66]

Neural Networks , volume=

High-dimensional dynamics of generalization error in neural networks , author=. Neural Networks , volume=. 2020 , publisher=

work page 2020
[67]

2006 , publisher=

Pattern Recognition and Machine Learning , author=. 2006 , publisher=

work page 2006
[68]

The largest eigenvalue of small rank perturbations of

P. The largest eigenvalue of small rank perturbations of. Probability Theory and Related Fields , publisher=. 2005 , pages=. doi:10.1007/s00440-005-0466-z , number=

work page doi:10.1007/s00440-005-0466-z 2005
[69]

The largest eigenvalues of finite rank deformation of large

Capitaine, Mireille and Donati-Martin, Catherine and F. The largest eigenvalues of finite rank deformation of large. The Annals of Probability , publisher=. doi:10.1214/08-AOP394 , number=

work page doi:10.1214/08-aop394
[70]

D’Alessio, Y

Zdeborov. Statistical physics of inference: thresholds and algorithms , volume=. Advances in Physics , publisher=. 2016 , pages=. doi:10.1080/00018732.2016.1211393 , number=

work page doi:10.1080/00018732.2016.1211393 2016
[71]

Sebastian Seung, Haim Sompolinsky, and Naftali Tishby

Seung, Hyunjune S. and Sompolinsky, Haim and Tishby, Naftali , year=. Statistical mechanics of learning from examples , volume=. Physical Review A , publisher=. doi:10.1103/PhysRevA.45.6056 , number=

work page doi:10.1103/physreva.45.6056
[72]

Statistical Mechanics of Learning , DOI=

Engel, Andreas and Van den Broeck, Chris , year=. Statistical Mechanics of Learning , DOI=

work page
[73]

Hastie, Trevor and Montanari, Andrea and Rosset, Saharon and Tibshirani, Ryan J. , year=. Surprises in high-dimensional ridgeless least squares interpolation , volume=. The Annals of Statistics , publisher=. doi:10.1214/21-AOS2133 , number=

work page doi:10.1214/21-aos2133
[74]

and Sommers, H.-J

Crisanti, A. and Sommers, H.-J. , year=. The spherical p -spin interaction spin glass model: the statics , volume=. Zeitschrift f. doi:10.1007/BF01309287 , number=

work page doi:10.1007/bf01309287
[75]

2020 , eprint=

Cold Posteriors and Aleatoric Uncertainty , author=. 2020 , eprint=

work page 2020
[76]

The Safe

Gr. The Safe. 2012 , pages=. doi:10.1007/978-3-642-34106-9_16 , booktitle=

work page doi:10.1007/978-3-642-34106-9_16 2012
[77]

and Stariolo, Daniel A

de Freitas Pimenta, Pedro H. and Stariolo, Daniel A. , year=. Finite-Size Relaxational Dynamics of a Spike Random Matrix Spherical Model , volume=. Entropy , publisher=. doi:10.3390/e25060957 , number=

work page doi:10.3390/e25060957
[78]

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , author=. International Conference on Learning Representations , year=. 1312.6120 , archivePrefix=

work page arXiv
[79]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Grokking: Generalization beyond overfitting on small algorithmic datasets , author=. ICLR 2022 MATH-AI Workshop , year=. 2201.02177 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv 2022
[80]

Progress measures for grokking via mechanistic interpretability,

Progress measures for grokking via mechanistic interpretability , author=. arXiv preprint arXiv:2301.05217 , year=. 2301.05217 , archivePrefix=

work page arXiv

Showing first 80 references.