pith. sign in

arxiv: 2606.31110 · v1 · pith:PP7LGLFKnew · submitted 2026-06-30 · 💻 cs.LG · cond-mat.stat-mech

Explaining Machine Learning and Memorization with Statistical Mechanics

Pith reviewed 2026-07-01 06:02 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.stat-mech
keywords statistical mechanicsdense associative memoryrestricted Boltzmann machinesadversarial attackslow-dimensional learningmemorizationneural networks
0
0 comments X

The pith

Statistical mechanics applied to associative memory models reveals the low-dimensional structure of neural network learning and the basis for adversarial vulnerabilities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies tools from statistical mechanics to dense associative memory and restricted Boltzmann machines to analyze how neural networks fit data through varying mixtures of learning and memorization. It focuses on explaining why training occurs along an implicitly low-dimensional subspace of parameter space and what underlies the susceptibility of trained networks to adversarial attacks. A sympathetic reader would care because clearer theoretical accounts of these phenomena could guide the design of training procedures that better exploit the low-dimensional structure and produce more robust models.

Core claim

By studying connections between different formulations of dense associative memory and restricted Boltzmann machines, statistical mechanics methods can characterize the regimes in which these models learn versus memorize and thereby expose the physical-like mechanisms behind the low-dimensional trajectories taken during training and the origins of adversarial confusion.

What carries the argument

Statistical mechanics analysis of dense associative memory (DAM) and restricted Boltzmann machines (RBM), with emphasis on inter-model connections that simplify analytical calculations of learning versus memorization.

If this is right

  • Improved understanding of low-dimensional learning trajectories can be used to design training algorithms that remain inside the effective subspace and converge faster.
  • Characterizing memorization regimes in DAM and RBM supplies quantitative criteria for when a network has begun to overfit rather than generalize.
  • The same statistical mechanics framework identifies the energy-landscape features responsible for adversarial examples, suggesting concrete modifications to network architecture or loss functions.
  • Analytical connections between DAM and RBM versions reduce the computational cost of studying larger networks that fit data with mixed learning and memorization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same low-dimensional subspace description may apply to modern deep networks trained with gradient descent, offering a route to dimension-reduction techniques that do not require explicit regularization.
  • If the statistical mechanics mapping holds, one could test whether adversarial robustness improves when networks are explicitly constrained to the learned subspace during inference.
  • The approach suggests examining whether other generative models exhibit analogous memorization transitions that can be located with partition-function techniques.

Load-bearing premise

The low-dimensional structure observed in neural network training corresponds to a genuine physical-like phenomenon that statistical mechanics can usefully describe.

What would settle it

Empirical measurements of adversarial attack success rates on DAM or RBM networks that systematically deviate from the predictions obtained via the statistical mechanics mapping to low-dimensional subspaces.

Figures

Figures reproduced from arXiv: 2606.31110 by Robin Theriault.

Figure 1.1
Figure 1.1. Figure 1.1: In the left panel, a linear function ymodel = Fw (x) = wx fit to data ydata = x + ε, where ε is Gaussian noise with variance σ 2 = 1 (see [PITH_FULL_IMAGE:figures/full_fig_p008_1_1.png] view at source ↗
Figure 1.2
Figure 1.2. Figure 1.2: In the left panel, the network representation of the linear model [PITH_FULL_IMAGE:figures/full_fig_p009_1_2.png] view at source ↗
Figure 1.3
Figure 1.3. Figure 1.3: Illustration of the probability density distribution [PITH_FULL_IMAGE:figures/full_fig_p009_1_3.png] view at source ↗
Figure 1.4
Figure 1.4. Figure 1.4: An adversarial attack making an Neural Network (NN) trained on the ImageNet dataset [14] [PITH_FULL_IMAGE:figures/full_fig_p010_1_4.png] view at source ↗
Figure 1.5
Figure 1.5. Figure 1.5: Sketch of two balls sliding down a mountain. At equilibrium, they stand motionless at the bottom [PITH_FULL_IMAGE:figures/full_fig_p011_1_5.png] view at source ↗
Figure 1.6
Figure 1.6. Figure 1.6: In the left panel, images from the binarized version [48] of the MNIST dataset of handwrittern digits [PITH_FULL_IMAGE:figures/full_fig_p013_1_6.png] view at source ↗
Figure 1.7
Figure 1.7. Figure 1.7: Patterns ξ µ learned by the dense Hopfield network (HN) studied in [30] when trained on the MNIST dataset of handwritten digits. This plot comes from [30], which contains additional details about the dense HN in question. connections, which are called p-body interactions, reduce to those of the original HN when p = 2. 1 Dense HNs bridge the gap between ML and MM in the sense that they learn prototypes of… view at source ↗
Figure 1.8
Figure 1.8. Figure 1.8: The network representation of the restricted Boltzmann machine (RBM) Hamiltonian [PITH_FULL_IMAGE:figures/full_fig_p018_1_8.png] view at source ↗
Figure 1.9
Figure 1.9. Figure 1.9: Phase diagram of Hopfield networks (HNs) in the teacher-student setting when the student and the [PITH_FULL_IMAGE:figures/full_fig_p020_1_9.png] view at source ↗
Figure 2.1
Figure 2.1. Figure 2.1: RS phase diagrams of the direct models with [PITH_FULL_IMAGE:figures/full_fig_p025_2_1.png] view at source ↗
Figure 2.2
Figure 2.2. Figure 2.2: Exact RS phase diagrams of inverse models on the Nishimori line, i.e. [PITH_FULL_IMAGE:figures/full_fig_p032_2_2.png] view at source ↗
Figure 2.3
Figure 2.3. Figure 2.3: The first row of this diagram sketches how a [PITH_FULL_IMAGE:figures/full_fig_p033_2_3.png] view at source ↗
Figure 2.4
Figure 2.4. Figure 2.4: Monte Carlo simulations of the p = 3 inverse model compared against RS saddle-point solutions. The lR phase is included on the left and central plots, but not on the right one. The left plot has ε = 0, and the two other ones have a handpicked ε such that the simulations are initalized near the saddle-point solutions. The dots are simulation data at a few values of α, and the lines are slices of the saddl… view at source ↗
Figure 2.5
Figure 2.5. Figure 2.5: RS phase diagrams of inverse models with [PITH_FULL_IMAGE:figures/full_fig_p035_2_5.png] view at source ↗
Figure 2.6
Figure 2.6. Figure 2.6: RS phase diagrams of inverse models with [PITH_FULL_IMAGE:figures/full_fig_p037_2_6.png] view at source ↗
Figure 2.7
Figure 2.7. Figure 2.7: Monte Carlo simulations (dashed lines) and RS saddle-point solutions (full lines) of the inverse [PITH_FULL_IMAGE:figures/full_fig_p037_2_7.png] view at source ↗
Figure 2.8
Figure 2.8. Figure 2.8: Monte Carlo simulations of the overlap q ∗ as a function of α and adversarial attack size ε in the inverse model with p ∗ = 2, β ∗ = 1 − √ 1 2 , p = 4, β = ∞ and N = 1024. The simulation results are averaged over L = 100 student patterns. On the left plot, the inverse model is corrupted by an example σ a that has a small overlap with ξ ∗ in absolute value. On the right plot, it is corrupted by the exampl… view at source ↗
Figure 2.9
Figure 2.9. Figure 2.9: Monte Carlo simulations of the p = 3 inverse model compared against saddle-point solutions for different values of N. The lR phase is not included in these plots. The left plot has N = 128, the center plot has N = 256, and the right plot has N = 512. The dots are simulation data at a few values of α, and the lines are slices of the saddle-point solutions at the same α. There are M = αNp−1 p! examples σ a… view at source ↗
Figure 3.1
Figure 3.1. Figure 3.1: RS phase diagrams of the teacher-student setting with [PITH_FULL_IMAGE:figures/full_fig_p067_3_1.png] view at source ↗
Figure 3.2
Figure 3.2. Figure 3.2: Permutation Symmetry Breaking (PSB) solution of Eqs. (3.14) for binary student patterns with a [PITH_FULL_IMAGE:figures/full_fig_p068_3_2.png] view at source ↗
Figure 3.3
Figure 3.3. Figure 3.3: Partial Permutation Symmetry Breaking (partial PSB) solutions of Eqs. (3.14) for binary student [PITH_FULL_IMAGE:figures/full_fig_p069_3_3.png] view at source ↗
Figure 3.4
Figure 3.4. Figure 3.4: The magnetization m solving Eqs. (3.22), in orange, compared against N = 512 dimensional Monte Carlo simulations, in blue, of the teacher-student problem where the student has P = 2 binary patterns with a uniform prior and the teacher has P ∗ = P = 2 binary patterns with covariance Q = I. The blue dots and error bars represent the means and standard deviations, respectively, of the diagonal of the magnet… view at source ↗
Figure 3.5
Figure 3.5. Figure 3.5: Solutions of Eqs. (3.14) shown in Fig. (3.3), in orange, compared against [PITH_FULL_IMAGE:figures/full_fig_p071_3_5.png] view at source ↗
Figure 3.6
Figure 3.6. Figure 3.6: Solutions of Eqs. (3.14) for real-valued student patterns with a standard Gaussian prior and teacher [PITH_FULL_IMAGE:figures/full_fig_p073_3_6.png] view at source ↗
Figure 3.7
Figure 3.7. Figure 3.7: Results of the lottery ticket experiment described in Section 3.3.2.3. In the left panel, [PITH_FULL_IMAGE:figures/full_fig_p074_3_7.png] view at source ↗
Figure 3.8
Figure 3.8. Figure 3.8: Critical load αcrit for β = β ∗ and P = P ∗ as a function of the number of hidden units P, the temperature T and the correlation c. αcrit is obtained from Eq. (3.18). The top row has Qµν = δµν+(1 − δµν) c, so the max eigenvalue λ S max is that of Eq. (3.24). The bottom row is the arithmetic mean αcrit over correlation matrices Q sampled from the projected Wishart distribution W (c, P) defined in 3.A.2. 3… view at source ↗
Figure 3.9
Figure 3.9. Figure 3.9: Largest eigenvalue λ S max of S = QR (see 3.F) for β = β ∗ and P = P ∗ as a function of the number of hidden units P, the temperature T and the correlation c. The top row has Qµν = δµν + (1 − δµν) c, so the max eigenvalue λ S max is that of Eq. (3.24). The bottom row is the harmonic mean h 1/λS maxi−1 over correlation matrices Q sampled from the projected Wishart distribution W (c, P) defined in 3.A.2. I… view at source ↗
Figure 3.10
Figure 3.10. Figure 3.10: Mattis magnetization m for β = β ∗ and P = P ∗ as a function of the number of hidden units P, the correlation c, the temperature T and the data load α. m is obtained by solving Eqs. (3.14) numerically for binary student patterns with a uniform prior and binary teacher patterns with covariance Qµν = δµν + (1 − δµν) c, where c ∈ [0, 1) (see 3.H). The top and bottom rows feature P = 2 and P = 3, respective… view at source ↗
Figure 3.11
Figure 3.11. Figure 3.11: Mattis magnetization m for β ∗ = 0.8 and P = P ∗ as a function of the number of hidden units P, the correlation c, the temperature T and the data load α. m is obtained by solving Eqs. (3.14) numerically for binary student patterns with a uniform prior and binary teacher patterns with covariance Qµν = δµν + (1 − δµν) c, where c ∈ [0, 1) (see 3.H). The top and bottom rows feature P = 2 and P = 3, respecti… view at source ↗
Figure 3.12
Figure 3.12. Figure 3.12: SG overlap q for β ∗ = 0.8 and P = P ∗ as a function of the number of hidden units P, the correlation c, the temperature T and the data load α. q is obtained by solving Eqs. (3.14) numerically for binary student patterns with a uniform prior and binary teacher patterns with covariance Qµν = δµν + (1 − δµν) c, where c ∈ [0, 1) (see 3.H). The top and bottom rows feature P = 2 and P = 3, respectively. The … view at source ↗
Figure 3.13
Figure 3.13. Figure 3.13: Mattis magnetization m and SG overlap q solving Eqs. (3.14), in orange and red, as a function of the load α and number of student patterns P for β = β ∗ = 1, P ∗ = 2 and c = 0.3. The top and bottom branches of the plots are respectively the diagonal and off-diagonal coefficients of m and q. In the top-left panel, m is compared against N = 512 dimensional Monte Carlo simulations, in blue. The blue dots a… view at source ↗
Figure 3.14
Figure 3.14. Figure 3.14: Mattis magnetization m and SG overlap q solving Eqs. (3.14), in orange and red, as a function of the load α and teacher pattern correlations c for β = β ∗ = 1 and P = P ∗ = 3. The top and bottom branches of the plots are respectively the diagonal and off-diagonal coefficients of m and q. m is compared against N = 512 dimensional Monte Carlo simulations, in blue. The blue dots and error bars represent th… view at source ↗
Figure 3.15
Figure 3.15. Figure 3.15: Results of the lottery ticket experiment of Section 3.3.2.3 when the teacher patterns have a [PITH_FULL_IMAGE:figures/full_fig_p082_3_15.png] view at source ↗
Figure 3.16
Figure 3.16. Figure 3.16: Mattis magnetization m for β = β ∗ and P = P ∗ as a function of the number of hidden units P, the correlation c, the temperature T and the data load α. m is obtained by solving Eqs. (3.14) for binary student patterns with a uniform prior and binary teacher patterns with covariance Qµν ∼ W (c, P), where c ∈ [0, 1) (see 3.A.2). The top and bottom rows feature P = 2 and P = 3, respectively. The white lines… view at source ↗
Figure 3.17
Figure 3.17. Figure 3.17: Free entropy difference of the so-called PSB and partial PSB solutions of Eqs. (3.14) shown in [PITH_FULL_IMAGE:figures/full_fig_p100_3_17.png] view at source ↗
Figure 3.18
Figure 3.18. Figure 3.18: Permutation Symmetry Breaking (PSB) solution of Eqs. (3.14) for real-valued student patterns [PITH_FULL_IMAGE:figures/full_fig_p101_3_18.png] view at source ↗
Figure 3.19
Figure 3.19. Figure 3.19: Partial Permutation Symmetry Breaking (partial PSB) solutions of Eqs. (3.14) for real-valued [PITH_FULL_IMAGE:figures/full_fig_p102_3_19.png] view at source ↗
Figure 3.20
Figure 3.20. Figure 3.20: Mattis magnetization m for β = β ∗ and P = P ∗ as a function of the number of hidden units P, the correlation c, the temperature T and the data load α. m is obtained by solving Eqs. (3.14) numerically for real-valued student patterns with a standard Gaussian prior and teacher pattern covariance Qµν = δµν + (1 − δµν) c, where c ∈ [0, 1) (see 3.H). The top and bottom rows feature P = 2 and P = 3, respecti… view at source ↗
Figure 3.21
Figure 3.21. Figure 3.21: Mattis magnetization m for β = β ∗ and P = P ∗ as a function of the number of hidden units P, the correlation c, the temperature T and the data load α. m is obtained by solving Eqs. (3.14) for real-valued student patterns with a standard Gaussian prior and teacher pattern covariance Qµν ∼ W (c, P), where c ∈ [0, 1) (see 3.A.2). The top and bottom rows feature P = 2 and P = 3, respectively. The white lin… view at source ↗
Figure 4.1
Figure 4.1. Figure 4.1: All of the P = 25 memories {wµ} 25 µ=1 learned by an instance of our model with β = 16 when it is trained on the MNIST dataset of handwritten digits [8] using constrained stochastic gradient descent (SGD) of the negative log-likelihood loss (Eq. 4.4). The hidden units are indexed using pairs of letters from A to E. Pβ (x|w, p) = PC y=0 Pβ (x, y|w, p) and conditional distributions Pβ (x|y; w, p) = P Pβ(x,… view at source ↗
Figure 4.2
Figure 4.2. Figure 4.2: Illustration of the relationship between [PITH_FULL_IMAGE:figures/full_fig_p113_4_2.png] view at source ↗
Figure 4.3
Figure 4.3. Figure 4.3: 25 of the P = 1000 memories wµ learned by two instances of our dense associative memory (DAM) model with different values of β. Both networks are trained on the MNIST dataset of handwritten digits [8] using constrained stochastic gradient descent (SGD) of the negative log-likelihood loss (Eq. 4.4). The left-panel model has β = 18, and the right-panel one β = 6. DAMs with 18 > β > 6 learn memories that in… view at source ↗
Figure 4.4
Figure 4.4. Figure 4.4: In the top panel, 25 of the P = 1000 memories wµ learned by an instance of our dense associative memory (DAM) model trained on the MNIST dataset of handwritten digits [8] using constrained stochastic gradient descent (SGD) of the effective loss (Eq. 4.4) with ς = 0.25. In the bottom panel, the corresponding rescaled class weights p µ/ph (µ), where ph (γ) = 1 P +1 for all 0 ≤ γ ≤ P. The hidden units are i… view at source ↗
Figure 4.5
Figure 4.5. Figure 4.5: In the top panel, 25 of the P = 100 memories wµ learned by an instance of our dense associative memory (DAM) model trained in an unsupervised way (Eq. 4.15) on 6 × 6 patches of the MNIST dataset of handwritten digits [8] while assuming C = 10 latent classes and ς = 0.6. In the bottom panel, the corresponding rescaled class weights p µ/ph (µ), where ph (γ) = 1 P +1 for all 0 ≤ γ ≤ P. The hidden units are … view at source ↗
Figure 4.6
Figure 4.6. Figure 4.6: Overlaps mµ∗µ (x ∗ , w) = PN i=1 x ∗µ∗ i w µ i between the first 1000 digits x ∗µ∗ =  x ∗µ∗ i N i=1 of the MNIST training set [8] and the memories wµ = {w µ i } N i=1 of our dense associative memory (DAM) model while it is learning them. Each point is one of the high-dimensional magnetization vectors m·µ (x ∗ , w) = {mµ∗µ (x ∗ , w)} P ∗ µ∗=1 projected onto a two-dimensional plane using the UMAP algorith… view at source ↗
Figure 4.7
Figure 4.7. Figure 4.7: The classification accuracy and training time of dense associative memory (DAM) networks trained [PITH_FULL_IMAGE:figures/full_fig_p122_4_7.png] view at source ↗
Figure 4.8
Figure 4.8. Figure 4.8: In the top panel, 25 of the P = 100 memories wµ learned by an instance of our dense associative memory (DAM) model trained in an unsupervised way (Eq. 4.15) on 6 × 6 patches of the MNIST dataset of handwritten digits [8] while assuming C = 10 latent classes and ς = 0.6. In the bottom panel, the corresponding rescaled class weights p µ/ph (µ), where ph (γ) = 1 P +1 for all 0 ≤ γ ≤ P. The hidden units are … view at source ↗
Figure 4.9
Figure 4.9. Figure 4.9: In the top panel, 25 of the P = 100 memories wµ learned by an instance of our dense associative memory (DAM) model trained in an unsupervised way (Eq. 4.15) on 6 × 6 patches of the MNIST dataset of handwritten digits [8] while assuming C = 10 latent classes and ς = 0.6. In the bottom panel, the corresponding rescaled class weights p µ/ph (µ), where ph (γ) = 1 P +1 for all 0 ≤ γ ≤ P. The hidden units are … view at source ↗
Figure 4.10
Figure 4.10. Figure 4.10: In the top panel, 25 of the P = 100 memories wµ learned by an instance of our dense associative memory (DAM) model trained in an unsupervised way (Eq. 4.15) on 6 × 6 patches of the MNIST dataset of handwritten digits [8] while assuming C = 10 latent classes and ς = 0.6. In the bottom panel, the corresponding rescaled class weights p µ/ph (µ), where ph (γ) = 1 P +1 for all 0 ≤ γ ≤ P. The hidden units are… view at source ↗
Figure 4.11
Figure 4.11. Figure 4.11: In the top panel, 25 of the P = 100 memories wµ learned by an instance of our dense associative memory (DAM) model trained in an unsupervised way (Eq. 4.15) on 6 × 6 patches of the MNIST dataset of handwritten digits [8] while assuming C = 10 latent classes and ς = 0.6. In the bottom panel, the corresponding rescaled class weights p µ/ph (µ), where ph (γ) = 1 P +1 for all 0 ≤ γ ≤ P. The hidden units are… view at source ↗
read the original abstract

Artificial neural networks (NNs) and machine learning (ML) algorithms are poorly understood from a theoretical perspective, which makes it difficult to fully realize their potential and overcome their weaknesses. For instance, ML algorithms train NN weights by moving them along a low-dimensional subspace of their allowed values, but this implicitly low-dimensional learning structure is not properly exploited to improve training because its nature is not well understood. Moreover, trained NNs are easily confused by pervasive adversarial attacks whose theoretical underpinnings are still unclear. This thesis aims to improve our theoretical understanding of NNs and ML, with a particular focus on adversarial attacks and implicitly low-dimensional learning. For this purpose, we use mathematical tools from statistical mechanics to study different types of NNs and ways in which they can fit the data. In particular, we study two classes of models that fit the data with various degrees of learning and memorization: dense associative memory (DAM) and restricted Boltzmann machines (RBM). In the process, we investigate connections between different versions of these models that are useful to make analytical investigations more efficient.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript is a thesis that applies statistical mechanics tools to dense associative memory (DAM) and restricted Boltzmann machines (RBM) to investigate implicitly low-dimensional learning in neural network training and the underpinnings of adversarial attacks, while also examining connections between model variants to facilitate analytical progress.

Significance. If the promised derivations and connections are carried through rigorously, the work could provide a physics-inspired framework for understanding generalization and robustness in ML; however, the abstract frames the contributions as aims rather than completed analyses with explicit results or validations.

major comments (2)
  1. [Abstract] Abstract (third paragraph): the central claims that DAM/RBM analyses 'reveal the nature of implicitly low-dimensional learning' and 'the underpinnings of adversarial attacks' are stated programmatically without any derivations, fitted quantities, or error analysis supplied, leaving the load-bearing assertions unevaluated.
  2. [Abstract] Abstract (second paragraph): the assumption that low-dimensional structure in NN training constitutes an equilibrium-like physical phenomenon amenable to stat-mech analysis of energy-based models is load-bearing for the claimed explanations; if the structure instead arises from SGD geometry or loss-landscape curvature, the DAM/RBM results would not transfer. A concrete test would be to check whether the equilibrium predictions match SGD-trained networks on identical tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their review of our thesis manuscript. We address the major comments point by point below, clarifying the scope of the completed analyses while acknowledging where the presentation can be improved.

read point-by-point responses
  1. Referee: [Abstract] Abstract (third paragraph): the central claims that DAM/RBM analyses 'reveal the nature of implicitly low-dimensional learning' and 'the underpinnings of adversarial attacks' are stated programmatically without any derivations, fitted quantities, or error analysis supplied, leaving the load-bearing assertions unevaluated.

    Authors: The abstract provides a high-level overview of the thesis goals and contributions. The full manuscript contains the detailed statistical mechanics derivations for both DAM and RBM models, including explicit calculations of energy functions, phase transitions, and connections between model variants that quantify low-dimensional structure and robustness properties. These are supported by analytical results on memorization capacity and adversarial vulnerability. We agree the abstract could more explicitly reference these completed results rather than framing them only as aims, and will revise it accordingly. revision: yes

  2. Referee: [Abstract] Abstract (second paragraph): the assumption that low-dimensional structure in NN training constitutes an equilibrium-like physical phenomenon amenable to stat-mech analysis of energy-based models is load-bearing for the claimed explanations; if the structure instead arises from SGD geometry or loss-landscape curvature, the DAM/RBM results would not transfer. A concrete test would be to check whether the equilibrium predictions match SGD-trained networks on identical tasks.

    Authors: We acknowledge this is a substantive point about the validity of the equilibrium approximation. The manuscript justifies the stat-mech approach by showing that the energy-based DAM and RBM models capture the effective low-dimensional fitting behavior and robustness characteristics observed in trained networks, with explicit mappings between model parameters and learning outcomes. We argue these models are chosen for their analytical tractability in revealing mechanisms that transfer to more general NNs. A direct head-to-head comparison of equilibrium predictions versus SGD dynamics on identical tasks is not included, as the thesis prioritizes deriving closed-form insights over numerical benchmarking; however, the work discusses the conditions under which the approximation is expected to hold. revision: partial

Circularity Check

0 steps flagged

No circularity detected; derivation self-contained

full rationale

The provided abstract and context contain no equations, fitted parameters, or self-citations that reduce any claimed prediction or result to its inputs by construction. The work applies standard statistical mechanics tools to established models (DAM, RBM) to analyze low-dimensional learning and adversarial attacks. No load-bearing step is shown to be self-definitional, a renamed fit, or dependent on an unverified author citation chain. The central claims rest on external mathematical tools and model properties rather than tautological re-derivation of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are specified.

pith-pipeline@v0.9.1-grok · 5708 in / 966 out tokens · 36403 ms · 2026-07-01T06:02:16.353853+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

229 extracted references · 190 canonical work pages · 38 internal anchors

  1. [1]

    A committee of neural networks for traffic sign classification

    Dan Cires ¸an, Ueli Meier, Jonathan Masci, and J¨urgen Schmidhuber. “A committee of neural networks for traffic sign classification”. In:The 2011 International Joint Conference on Neural Networks. 2011, pp. 1918–1921.DOI:10.1109/IJCNN.2011.6033458

  2. [2]

    OpenAI et al.GPT-4 Technical Report. 2024. arXiv: 2303 . 08774 [cs.CL].URL: https : //arxiv.org/abs/2303.08774

  3. [3]

    Highly accurate protein structure prediction with AlphaFold

    John Jumper et al. “Highly accurate protein structure prediction with AlphaFold”. In:Nature596.7873 (Aug. 2021), pp. 583–589.ISSN: 1476-4687.DOI: 10.1038/s41586- 021- 03819- 2 .URL: https://doi.org/10.1038/s41586-021-03819-2

  4. [4]

    Accurate structure prediction of biomolecular interactions with AlphaFold 3

    Josh Abramson et al. “Accurate structure prediction of biomolecular interactions with AlphaFold 3”. In:Nature630.8016 (June 2024), pp. 493–500.ISSN: 1476-4687.DOI: 10.1038/s41586-024- 07487-w.URL:https://doi.org/10.1038/s41586-024-07487-w

  5. [5]

    AlphaFold2 and its applications in the fields of biology and medicine

    Zhenyu Yang, Xiaoxi Zeng, Yi Zhao, and Runsheng Chen. “AlphaFold2 and its applications in the fields of biology and medicine”. In:Signal Transduction and Targeted Therapy8.1 (Mar. 2023), p. 115. ISSN: 2059-3635.DOI: 10.1038/s41392- 023- 01381- z.URL: https://doi.org/10. 1038/s41392-023-01381-z

  6. [6]

    High-Resolution Image Synthesis with Latent Diffusion Models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. “High- Resolution Image Synthesis With Latent Diffusion Models”. In:Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR). June 2022, pp. 10684–10695.DOI: 10.48550/arXiv.2112.10752

  7. [8]

    Lecun, L

    Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner. “Gradient-based learning applied to document recognition”. In:Proceedings of the IEEE86.11 (1998), pp. 2278–2324.DOI: 10.1109/5.726791

  8. [9]

    The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

    Jonathan Frankle and Michael Carbin. “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks”. In:International Conference on Learning Representations. 2019.DOI: https: / / doi . org / 10 . 48550 / arXiv . 1803 . 03635.URL: https : / / openreview . net / forum?id=rJl-b3RcF7. 148

  9. [10]

    The training process of many deep networks explores the same low-dimensional manifold

    Jialin Mao et al. “The training process of many deep networks explores the same low-dimensional manifold”. In:Proceedings of the National Academy of Sciences121.12 (2024), e2310002121.DOI: 10. 1073/pnas.2310002121. eprint: https://www.pnas.org/doi/pdf/10.1073/pnas. 2310002121.URL:https://www.pnas.org/doi/abs/10.1073/pnas.2310002121

  10. [11]

    Evasion Attacks against Machine Learning at Test Time

    Battista Biggio et al. “Evasion Attacks against Machine Learning at Test Time”. In:Machine Learning and Knowledge Discovery in Databases. Ed. by Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, and Filip ˇZelezn´y. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 387–402.ISBN: 978-3-642-40994-3.DOI:https://doi.org/10.1007/978-3-642-40994-3_25

  11. [12]

    Intriguing properties of neural networks

    Christian Szegedy et al. “Intriguing properties of neural networks”. In:arXiv e-prints, arXiv:1312.6199 (Dec. 2013), arXiv:1312.6199.DOI: 10 . 48550 / arXiv . 1312 . 6199. arXiv: 1312 . 6199 [cs.CV]

  12. [13]

    Adversarial Attacks on Traffic Sign Recog- nition: A Survey

    Svetlana Pavlitska, Nico Lambing, and J. Marius Z¨ollner. “Adversarial Attacks on Traffic Sign Recog- nition: A Survey”. In:2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME). 2023, pp. 1–6.DOI: 10 . 1109 / ICECCME57830 . 2023.10252727

  13. [14]

    ImageNet: A large-scale hierarchical image database

    Jia Deng et al. “ImageNet: A large-scale hierarchical image database”. In:2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, pp. 248–255.DOI: 10.1109/CVPR.2009. 5206848

  14. [15]

    Explaining and Harnessing Adversarial Examples

    Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. “EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES”. In:stat1050, arXiv:1412.6572 (2015), p. 20.DOI: 10.48550/ arXiv . 1412 . 6572. arXiv: 1412 . 6572 [stat.ML].URL: https : / / doi . org / 10 . 48550/arXiv.1412.6572

  15. [16]

    Modeling the Influence of Data Structure on Learning in Neural Networks: The Hidden Manifold Model

    Sebastian Goldt, Marc M´ezard, Florent Krzakala, and Lenka Zdeborov´a. “Modeling the Influence of Data Structure on Learning in Neural Networks: The Hidden Manifold Model”. In:Phys. Rev. X10 (4 Dec. 2020), p. 041044.DOI: 10.1103/PhysRevX.10.041044.URL: https://link.aps. org/doi/10.1103/PhysRevX.10.041044

  16. [17]

    Generalisa- tion error in learning with random features and the hidden manifold model*

    Federica Gerace, Bruno Loureiro, Florent Krzakala, Marc M´ezard, and Lenka Zdeborov´a. “Generalisa- tion error in learning with random features and the hidden manifold model*”. In:Journal of Statistical Mechanics: Theory and Experiment2021.12 (Dec. 2021), p. 124013.DOI: 10 . 1088 / 1742 - 5468/ac3ae6.URL:https://dx.doi.org/10.1088/1742-5468/ac3ae6

  17. [18]

    2024.DOI: 10

    Kasimir Tanner, Matteo Vilucchio, Bruno Loureiro, and Florent Krzakala.A High Dimensional Statistical Model for Adversarial Training: Geometry and Trade-Offs. 2024.DOI: 10 . 48550 / arXiv.2402.05674. arXiv: 2402.05674 [stat.ML].URL: https://arxiv.org/abs/ 2402.05674

  18. [19]

    Matteo Vilucchio, Nikolaos Tsilivis, Bruno Loureiro, and Julia Kempe.On the Geometry of Regular- ization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds. 2024. DOI: 10 . 48550 / arXiv . 2410 . 16073. arXiv: 2410 . 16073 [stat.ML].URL: https : //arxiv.org/abs/2410.16073. 149

  19. [20]

    On the existence of consistent adversarial attacks in high-dimensional linear classification

    Matteo Vilucchio, Lenka Zdeborov ´a, and Bruno Loureiro.On the existence of consistent adversarial attacks in high-dimensional linear classification. 2025.DOI: 10.48550/arXiv.2506.12454 . arXiv:2506.12454 [stat.ML].URL:https://arxiv.org/abs/2506.12454

  20. [21]

    Jean Barbier, Francesco Camilli, Minh-Toan Nguyen, Mauro Pastore, and Rudy Skerk.Optimal generalisation and learning transition in extensive-width shallow neural networks near interpolation

  21. [22]

    arXiv:2501.18530 [stat.ML].URL:https://arxiv.org/abs/2501.18530

  22. [23]

    WORLD SCIENTIFIC, 2023.DOI: 10.1142/13341

    Patrick Charbonneau et al.Spin Glass Theory and Far Beyond. WORLD SCIENTIFIC, 2023.DOI: 10.1142/13341. eprint: https://www.worldscientific.com/doi/pdf/10.1142/ 13341.URL:https://www.worldscientific.com/doi/abs/10.1142/13341

  23. [24]

    Neural networks and physical systems with emergent collective computational abilities

    J. J. Hopfield. “Neural networks and physical systems with emergent collective computational abilities.” In:Proceedings of the National Academy of Sciences79.8 (1982), pp. 2554–2558.DOI: 10.1073/ pnas.79.8.2554. eprint: https://www.pnas.org/doi/pdf/10.1073/pnas.79.8. 2554.URL:https://www.pnas.org/doi/abs/10.1073/pnas.79.8.2554

  24. [25]

    Training Products of Experts by Minimizing Contrastive Divergence

    Geoffrey E. Hinton. “Training Products of Experts by Minimizing Contrastive Divergence”. In:Neural Computation14.8 (2002), pp. 1771–1800.DOI:10.1162/089976602760128018

  25. [26]

    High order correlation model for associative memory

    H. H. Chen et al. “High order correlation model for associative memory”. In:AIP Conference Proceedings151.1 (Aug. 1986), pp. 86–99.ISSN: 0094-243X.DOI: 10.1063/1.36224 . eprint: https://pubs.aip.org/aip/acp/article- pdf/151/1/86/12091820/86\_1\ _online.pdf.URL:https://doi.org/10.1063/1.36224

  26. [27]

    Nonlinear discriminant functions and associative memories

    Demetri Psaltis and Cheol Hoon Park. “Nonlinear discriminant functions and associative memories”. In:AIP Conference Proceedings151.1 (Aug. 1986), pp. 370–375.ISSN: 0094-243X.DOI: 10.1063/ 1.36241 . eprint: https://pubs.aip.org/aip/acp/article- pdf/151/1/370/ 12091772/370\_1\_online.pdf.URL:https://doi.org/10.1063/1.36241

  27. [29]

    Multiconnected neural network models

    E Gardner. “Multiconnected neural network models”. In:Journal of Physics A: Mathematical and General20.11 (Aug. 1987), p. 3453.DOI: 10.1088/0305-4470/20/11/046 .URL: https: //dx.doi.org/10.1088/0305-4470/20/11/046

  28. [30]

    Capacities of multiconnected memory models

    Horn, D. and Usher, M. “Capacities of multiconnected memory models”. In:J. Phys. France49.3 (1988), pp. 389–395.DOI: 10.1051/jphys:01988004903038900 .URL: https://doi. org/10.1051/jphys:01988004903038900

  29. [31]

    Dense Associative Memory for Pattern Recognition

    Dmitry Krotov and John J. Hopfield. “Dense Associative Memory for Pattern Recognition”. In: Advances in Neural Information Processing Systems. Ed. by D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett. V ol. 29. NIPS’16. Barcelona, Spain: Curran Associates, Inc., 2016, pp. 1180– 1188.ISBN: 9781510838819.DOI: 10 . 48550 / arXiv . 1606 . 01164. arXiv...

  30. [32]

    Spectral dynamics of learning in restricted Boltzmann machines

    A. Decelle, G. Fissore, and C. Furtlehner. “Spectral dynamics of learning in restricted Boltzmann machines”. In:Europhysics Letters119.6 (Nov. 2017), p. 60001.DOI: 10.1209/0295- 5075/ 119/60001.URL:https://dx.doi.org/10.1209/0295-5075/119/60001

  31. [35]

    Waddington landscape for prototype learning in generalized Hopfield networks

    Nacer Eddine Boukacem et al. “Waddington landscape for prototype learning in generalized Hopfield networks”. In:Phys. Rev. Res.6 (3 July 2024), p. 033098.DOI: 10.1103/PhysRevResearch. 6 . 033098.URL: https : / / link . aps . org / doi / 10 . 1103 / PhysRevResearch . 6 . 033098

  32. [36]

    15 figures, 31 pages

    Nicolas Bereux, Aur´elien Decelle, Cyril Furtlehner, Lorenzo Rosset, and Beatriz Seoane.Fast training and sampling of Restricted Boltzmann Machines. 15 figures, 31 pages. Singaour, Singapore, Apr. 2025.DOI: 10.48550/arXiv.2405.15376.URL: https://inria.hal.science/hal- 04885777

  33. [37]

    Dense Associative Memory is Robust to Adversarial Inputs

    Dmitry Krotov and John Hopfield. “Dense Associative Memory Is Robust to Adversarial Inputs”. In: Neural Computation30.12 (Dec. 2018), pp. 3151–3167.ISSN: 0899-7667.DOI: 10.1162/neco_ a_01143. arXiv: 1701.00939 [cs.LG].URL: https://doi.org/10.1162/neco%5C_ a%5C_01143

  34. [38]

    Yang Song et al.Score-Based Generative Modeling through Stochastic Differential Equations. 2021. arXiv:2011.13456 [cs.LG].URL:https://arxiv.org/abs/2011.13456

  35. [39]

    Hopfield Networks is All You Need

    Hubert Ramsauer et al. “Hopfield Networks is All You Need”. In:International Conference on Learning Representations. 2021.DOI: 10.48550/arXiv.2008.02217. arXiv: 2008.02217 [cs.NE].URL:https://openreview.net/forum?id=tL89RnzIiCd

  36. [40]

    Phase transitions in Restricted Boltzmann Machines with generic priors

    Adriano Barra, Giuseppe Genovese, Peter Sollich, and Daniele Tantari. “Phase transitions in restricted Boltzmann machines with generic priors”. In:Phys. Rev. E96 (4 Oct. 2017), p. 042156.DOI: 10. 1103/PhysRevE.96.042156 . arXiv: 1612.03132 [cond-mat.dis-nn] .URL: https: //link.aps.org/doi/10.1103/PhysRevE.96.042156

  37. [41]

    Inverse problems for structured datasets using parallel TAP equations and restricted Boltzmann machines

    Aurelien Decelle, Sungmin Hwang, Jacopo Rocchi, and Daniele Tantari. “Inverse problems for structured datasets using parallel TAP equations and restricted Boltzmann machines”. In:Scientific Reports11, 19990 (Oct. 2021), p. 19990.DOI: 10 . 1038 / s41598 - 021 - 99353 - 2. arXiv: 1906.11988 [cond-mat.dis-nn]. 151

  38. [42]

    Replica Symmetry Breaking in Dense Hebbian Neural Networks

    Linda Albanese, Francesco Alemanno, Andrea Alessandrelli, and Adriano Barra. “Replica Symmetry Breaking in Dense Hebbian Neural Networks”. In:Journal of Statistical Physics189.2, 24 (Nov. 2022), p. 24.ISSN: 1572-9613.DOI: 10.1007/s10955-022-02966-8 . arXiv: 2111.12997 [cond-mat.dis-nn].URL:https://doi.org/10.1007/s10955-022-02966-8

  39. [43]

    Minimal model of permutation symme- try in unsupervised learning

    Tianqi Hou, K Y Michael Wong, and Haiping Huang. “Minimal model of permutation symme- try in unsupervised learning”. In:Journal of Physics A: Mathematical and Theoretical52.41 (Sept. 2019), p. 414001.DOI: 10 . 1088 / 1751 - 8121 / ab3f3f . arXiv: 1904 . 13052 [cond-mat.dis-nn].URL:https://dx.doi.org/10.1088/1751-8121/ab3f3f

  40. [44]

    Hopfield model with planted patterns: A teacher-student self-supervised learning model

    Francesco Alemanno, Luca Camanzi, Gianluca Manzan, and Daniele Tantari. “Hopfield model with planted patterns: A teacher-student self-supervised learning model”. In:Applied Mathematics and Computation458 (2023), p. 128253.ISSN: 0096-3003.DOI: https://doi.org/10.1016/ j.amc.2023.128253 . arXiv: 2304.13710 [cond-mat.dis-nn] .URL: https://www. sciencedirect....

  41. [45]

    The effect of priors on Learning with Restricted Boltzmann Ma- chines

    Gianluca Manzan and Daniele Tantari. “The effect of priors on Learning with Restricted Boltzmann Ma- chines”. In:Physica A: Statistical Mechanics and its Applications674 (2025), p. 130766.ISSN: 0378- 4371.DOI: 10.1016/j.physa.2025.130766 .URL: https://www.sciencedirect. com/science/article/pii/S0378437125004182

  42. [46]

    Dense Hopfield networks in the teacher-student setting

    Robin Th´eriault and Daniele Tantari. “Dense Hopfield networks in the teacher-student setting”. In: SciPost Phys.17 (2024), p. 040.DOI: 10.21468/SciPostPhys.17.2.040 .URL: https: //scipost.org/10.21468/SciPostPhys.17.2.040

  43. [47]

    Modeling structured data learning with Restricted Boltzmann machines in the teacher–student setting

    Robin Th ´eriault, Francesco Tosello, and Daniele Tantari. “Modeling structured data learning with Restricted Boltzmann machines in the teacher–student setting”. In:Neural Networks189 (2025), p. 107542.ISSN: 0893-6080.DOI: https : / / doi . org / 10 . 1016 / j . neunet . 2025.107542.URL: https://www.sciencedirect.com/science/article/pii/ S0893608025004216

  44. [48]

    Saddle hierarchy in dense associative memory

    Robin Th´eriault and Daniele Tantari. “Saddle hierarchy in dense associative memory”. In:Machine Learning: Science and Technology7.1 (Jan. 2026), p. 015001.DOI: 10 . 1088 / 2632 - 2153 / ae3051.URL:https://doi.org/10.1088/2632-2153/ae3051

  45. [49]

    On the quantitative analysis of deep belief networks

    Ruslan Salakhutdinov and Iain Murray. “On the quantitative analysis of deep belief networks”. In:Proceedings of the 25th International Conference on Machine Learning. ICML ’08. Helsinki, Finland: Association for Computing Machinery, 2008, pp. 872–879.ISBN: 9781605582054.DOI: 10.1145/1390156.1390266.URL:https://doi.org/10.1145/1390156.1390266

  46. [50]

    Associative recall of memory without errors

    I. Kanter and H. Sompolinsky. “Associative recall of memory without errors”. In:Phys. Rev. A35 (1 Jan. 1987), pp. 380–392.DOI: 10.1103/PhysRevA.35.380.URL: https://link.aps. org/doi/10.1103/PhysRevA.35.380

  47. [51]

    Increasing the capacity of a hopfield network without sacrificing functionality

    Amos Storkey. “Increasing the capacity of a hopfield network without sacrificing functionality”. In: Artificial Neural Networks — ICANN’97. Ed. by Wulfram Gerstner, Alain Germond, Martin Hasler, and Jean-Daniel Nicoud. Berlin, Heidelberg: Springer Berlin Heidelberg, 1997, pp. 451–456. 152

  48. [52]

    On the equivalence of Hopfield networks and Boltzmann Machines

    Adriano Barra, Alberto Bernacchia, Enrica Santucci, and Pierluigi Contucci. “On the equivalence of Hopfield networks and Boltzmann Machines”. In:Neural Networks34 (2012), pp. 1–9.ISSN: 0893-6080.DOI: https://doi.org/10.1016/j.neunet.2012.06.003 .URL: https: //www.sciencedirect.com/science/article/pii/S0893608012001608

  49. [53]

    Daydreaming Hopfield Networks and their surprising effectiveness on correlated data

    Ludovica Serricchio et al. “Daydreaming Hopfield Networks and their surprising effectiveness on correlated data”. In:Neural Networks186 (2025), p. 107216.ISSN: 0893-6080.DOI: https://doi. org/10.1016/j.neunet.2025.107216 .URL: https://www.sciencedirect.com/ science/article/pii/S0893608025000954

  50. [54]

    Psychology press, 2005.ISBN: 9781410612403.DOI:https://doi.org/10.4324/9781410612403

    Donald Olding Hebb.The organization of behavior: A neuropsychological theory. Psychology press, 2005.ISBN: 9781410612403.DOI:https://doi.org/10.4324/9781410612403

  51. [55]

    Storing Infinite Numbers of Patterns in a Spin-Glass Model of Neural Networks

    Daniel J. Amit, Hanoch Gutfreund, and H. Sompolinsky. “Storing Infinite Numbers of Patterns in a Spin-Glass Model of Neural Networks”. In:Phys. Rev. Lett.55 (14 Sept. 1985), pp. 1530–1533.DOI: 10.1103/PhysRevLett.55.1530 .URL: https://link.aps.org/doi/10.1103/ PhysRevLett.55.1530

  52. [56]

    Statistical mechanics of neural networks near saturation

    Daniel J Amit, Hanoch Gutfreund, and H Sompolinsky. “Statistical mechanics of neural networks near saturation”. In:Annals of Physics173.1 (1987), pp. 30–67.ISSN: 0003-4916.DOI: https://doi. org/10.1016/0003-4916(87)90092-3 .URL: https://www.sciencedirect.com/ science/article/pii/0003491687900923

  53. [57]

    Information storage in neural networks with low levels of activity

    Daniel J. Amit, Hanoch Gutfreund, and H. Sompolinsky. “Information storage in neural networks with low levels of activity”. In:Phys. Rev. A35 (5 Mar. 1987), pp. 2293–2303.DOI: 10.1103/PhysRevA. 35.2293.URL:https://link.aps.org/doi/10.1103/PhysRevA.35.2293

  54. [58]

    The perceptron: A probabilistic model for information storage and organization in the brain

    F. Rosenblatt. “The perceptron: A probabilistic model for information storage and organization in the brain.” In:Psychological Review65.6 (1958), pp. 386–408.DOI: 10.1037/h0042519.URL: https://doi.org/10.1037/h0042519

  55. [59]

    Maximum Likelihood from Incomplete Data Via the EM Algorithm

    A. P. Dempster, N. M. Laird, and D. B. Rubin. “Maximum Likelihood from Incomplete Data Via the EM Algorithm”. In:Journal of the Royal Statistical Society: Series B (Methodological)39.1 (1977), pp. 1–22.DOI: https://doi.org/10.1111/j.2517-6161.1977.tb01600.x . eprint: https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.2517- 6161 . 1977 . tb01600 . x.URL...

  56. [60]

    Sebastian Ruder.An overview of gradient descent optimization algorithms. 2017. arXiv:1609.04747 [cs.LG].URL:https://arxiv.org/abs/1609.04747

  57. [61]

    Statistical physics of inference: thresholds and algorithms

    Lenka Zdeborov´a and Florent Krzakala. “Statistical physics of inference: thresholds and algorithms”. In:Advances in Physics65.5 (2016), pp. 453–552.DOI: 10.1080/00018732.2016.1211393. arXiv: 1511 . 02476 [cond-mat.stat-mech].URL: https : / / doi . org / 10 . 1080 / 00018732.2016.1211393

  58. [62]

    Texier and G

    E Gardner and B Derrida. “Three unfinished works on the optimal storage capacity of networks”. In: Journal of Physics A: Mathematical and General22.12 (June 1989), p. 1983.DOI: 10.1088/0305- 4470/22/12/004.URL:https://dx.doi.org/10.1088/0305-4470/22/12/004. 153

  59. [63]

    First-order transition to perfect generalization in a neural network with binary synapses

    G´eza Gy ¨orgyi. “First-order transition to perfect generalization in a neural network with binary synapses”. In:Phys. Rev. A41 (12 June 1990), pp. 7097–7100.DOI: 10 . 1103 / PhysRevA . 41.7097.URL:https://link.aps.org/doi/10.1103/PhysRevA.41.7097

  60. [64]

    Large Associative Memory Problem in Neurobiology and Ma- chine Learning

    Dmitry Krotov and John J. Hopfield. “Large Associative Memory Problem in Neurobiology and Ma- chine Learning”. In:International Conference on Learning Representations. 2021.DOI: 10.48550/ arXiv.2008.06996 . arXiv: 2008.06996 [q-bio.NC] .URL: https://openreview. net/forum?id=X4y_10OX-hX

  61. [65]

    Dmitry Krotov, Benjamin Hoover, Parikshit Ram, and Bao Pham.Modern Methods in Associative Memory. 2025. arXiv: 2507 . 06211 [cs.LG].URL: https : / / arxiv . org / abs / 2507 . 06211

  62. [66]

    Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories

    Benjamin Hoover et al. “Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories”. In:arXiv e-prints, arXiv:2309.16750 (Sept. 2023), arXiv:2309.16750.DOI:10.48550/arXiv.2309.16750. arXiv:2309.16750 [cs.LG]

  63. [67]

    Attention in a Family of Boltzmann Machines Emerging From Modern Hopfield Networks

    Toshihiro Ota and Ryo Karakida. “Attention in a Family of Boltzmann Machines Emerging From Modern Hopfield Networks”. In:Neural Computation35.8 (July 2023), pp. 1463–1480.ISSN: 0899- 7667.DOI: 10 . 1162 / neco _ a _ 01597. eprint: https : / / direct . mit . edu / neco / article- pdf/35/8/1463/2143211/neco\_a\_01597.pdf .URL: https://doi. org/10.1162/neco%...

  64. [68]

    In Search of Dispersed Memories: Generative Diffusion Models Are Associative Memory Networks

    Luca Ambrogioni. “In Search of Dispersed Memories: Generative Diffusion Models Are Associative Memory Networks”. In:Entropy26.5 (2024).ISSN: 1099-4300.DOI: 10.3390/e26050381.URL: https://www.mdpi.com/1099-4300/26/5/381

  65. [69]

    Ryo Karakida, Toshihiro Ota, and Masato Taki.Hierarchical Associative Memory, Parallelized MLP- Mixer, and Symmetry Breaking. 2024. arXiv: 2406.12220 [cs.LG] .URL: https://arxiv. org/abs/2406.12220

  66. [70]

    Restricted Boltzmann machines for collaborative filtering

    Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. “Restricted Boltzmann machines for collaborative filtering”. In:Proceedings of the 24th International Conference on Machine Learning. ICML ’07. Corvalis, Oregon, USA: Association for Computing Machinery, 2007, pp. 791–798.ISBN: 9781595937933.DOI: 10.1145/1273496.1273596.URL: https://doi.org/10.1145/ ...

  67. [71]

    Some generalized order-disorder transformations

    Renfrey Burnard Potts. “Some generalized order-disorder transformations”. In:Mathematical proceed- ings of the cambridge philosophical society. V ol. 48. 1. Cambridge University Press. 1952, pp. 106– 109.DOI:10.1017/S0305004100027419

  68. [72]

    The potts model

    Fa-Yueh Wu. “The potts model”. In:Reviews of modern physics54.1 (1982), p. 235.DOI: 10.1103/ RevModPhys.54.235

  69. [73]

    Restricted Boltzmann machine: Recent advances and mean- field theory*

    Aur´elien Decelle and Cyril Furtlehner. “Restricted Boltzmann machine: Recent advances and mean- field theory*”. In:Chinese Physics B30.4 (Apr. 2021), p. 040202.DOI: 10.1088/1674-1056/ abd160.URL:https://dx.doi.org/10.1088/1674-1056/abd160. 154

  70. [74]

    Exact results and critical properties of the Ising model with competing interactions

    H Nishimori. “Exact results and critical properties of the Ising model with competing interactions”. In:Journal of Physics C: Solid State Physics13.21 (July 1980), p. 4071.DOI: 10.1088/0022- 3719/13/21/012.URL:https://dx.doi.org/10.1088/0022-3719/13/21/012

  71. [75]

    Oxford University Press, July 2001.ISBN: 9780198509417.DOI: 10

    Hidetoshi Nishimori.Statistical Physics of Spin Glasses and Information Processing: An Introduction. Oxford University Press, July 2001.ISBN: 9780198509417.DOI: 10 . 1093 / acprof : oso / 9780198509417.001.0001. eprint: https://academic.oup.com/book/5185/book- pdf/54038185/acprof-9780198509400.pdf .URL: https://doi.org/10.1093/ acprof:oso/9780198509417.001.0001

  72. [76]

    Spin Glass Identities and the Nishimori Line

    Pierluigi Contucci, Cristian Giardin`a, and Hidetoshi Nishimori. “Spin Glass Identities and the Nishi- mori Line”. In:Spin Glasses: Statics and Dynamics. Ed. by Anne Boutet de Monvel and Anton Bovier. Basel: Birkh ¨auser Basel, 2009, pp. 103–121.DOI: https://doi.org/10.1007/978- 3- 7643-9891-0_4. arXiv:0805.0754 [cond-mat.dis-nn]

  73. [77]

    The Nishimori line and Bayesian Statistics

    Yukito Iba. “The Nishimori line and Bayesian statistics”. In:Journal of Physics A Mathematical General32.21 (May 1999), pp. 3875–3888.DOI: 10.1088/0305-4470/32/21/302 . arXiv: cond-mat/9809190 [cond-mat.dis-nn]

  74. [78]

    Algorithmic barriers from phase transitions

    Dimitris Achlioptas and Amin Coja-Oghlan. “Algorithmic Barriers from Phase Transitions”. In: 2008 49th Annual IEEE Symposium on Foundations of Computer Science. 2008, pp. 793–802.DOI: 10.1109/FOCS.2008.11. arXiv:0803.2122 [math.CO]

  75. [80]

    Quiet Planting in the Locked Constraint Satisfaction Problems

    Lenka Zdeborov´a and Florent Krzakala. “Quiet Planting in the Locked Constraint Satisfaction Prob- lems”. In:SIAM Journal on Discrete Mathematics25.2 (2011), pp. 750–770.DOI: 10 . 1137 / 090750755. arXiv: 0902.4185 [cond-mat.stat-mech].URL: https://doi.org/10. 1137/090750755

  76. [81]

    Exponential Capacity of Dense Associative Memories

    Carlo Lucibello and Marc M´ezard. “Exponential Capacity of Dense Associative Memories”. In:Phys. Rev. Lett.132 (7 Feb. 2024), p. 077301.DOI: 10.1103/PhysRevLett.132.077301 .URL: https://link.aps.org/doi/10.1103/PhysRevLett.132.077301

  77. [82]

    Using Boltzmann Machines for probability estimation

    Bert Kappen. “Using Boltzmann Machines for probability estimation”. In:ICANN ’93. Ed. by Stan Gielen and Bert Kappen. London: Springer London, 1993, pp. 521–526.ISBN: 978-1-4471-2063-6

  78. [83]

    Deterministic learning rules for boltzmann machines

    Hilbert J. Kappen. “Deterministic learning rules for boltzmann machines”. In:Neural Networks 8.4 (1995), pp. 537–548.ISSN: 0893-6080.DOI: https : / / doi . org / 10 . 1016 / 0893 - 6080(94)00112- Y.URL: https://www.sciencedirect.com/science/article/ pii/089360809400112Y

  79. [84]

    Symmetry Breaking and Training from Incomplete Data with Radial Basis Boltzmann Machines

    Marcel J. Nijman and Hilbert J. Kappen. “Symmetry Breaking and Training from Incomplete Data with Radial Basis Boltzmann Machines”. In:International Journal of Neural Systems08.03 (1997), pp. 301–315.DOI: 10.1142/S0129065797000318. eprint: https://doi.org/10.1142/ S0129065797000318.URL:https://doi.org/10.1142/S0129065797000318. 155

  80. [86]

    Non- linear excitation of zonal flows by turbulent energy flux

    Martin Kloppenburg and Paul Tavan. “Deterministic annealing for density estimation by multivariate normal mixtures”. In:Phys. Rev. E55 (3 Mar. 1997), R2089–R2092.DOI: 10.1103/PhysRevE. 55.R2089.URL:https://link.aps.org/doi/10.1103/PhysRevE.55.R2089

Showing first 80 references.