Discovering and decoding latent mean-field structure with variational autoencoders

Marco Biroli; Max Welling; Vincenzo Vitelli

arxiv: 2606.08694 · v1 · pith:EFXHWSDXnew · submitted 2026-06-07 · ❄️ cond-mat.soft · cond-mat.stat-mech· cs.LG

Discovering and decoding latent mean-field structure with variational autoencoders

Marco Biroli , Max Welling , Vincenzo Vitelli This is my paper

Pith reviewed 2026-06-27 17:49 UTC · model grok-4.3

classification ❄️ cond-mat.soft cond-mat.stat-mechcs.LG

keywords variational autoencodersmean-field theorymany-body systemslatent variablesHopfield modelmutual informationneural population data

0 comments

The pith

A VAE with bounded capacity and independent decoder must reconstruct data via a latent mean-field factorization whose parameters are readable from the network.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a capacity bound for variational autoencoders by comparing the latent channel rate to the bipartite mutual information of many-body data. When this bound is satisfied, the conditionally independent decoder becomes structurally identical to a finite-size mean-field factorization. Successful reconstruction therefore supplies direct evidence of an underlying mean-field theory, with all microscopic parameters extractable from the trained decoder. The claim is checked on Curie-Weiss, Hopfield and Maier-Saupe models and on retinal recordings, where two latent variables suffice to recover stored patterns and to write a matching generalized Hopfield model.

Core claim

A bound on VAE capacity is obtained by comparing the rate of the latent channel to the bipartite mutual information of the data. Using this bound, the conditionally independent decoder of any successful VAE is structurally identical to a finite-size mean-field factorization. Hence a successful reconstruction is direct evidence for a latent mean-field theory and the microscopic parameters of that theory can be read off the trained decoder.

What carries the argument

The capacity bound obtained by comparing latent-channel rate to bipartite mutual information, which enforces structural identity between the decoder and a finite-size mean-field factorization.

If this is right

The full Hopfield pattern matrix is recovered from equilibrium samples alone.
A two-latent VAE reproduces salamander retinal population statistics using only two effective collective variables.
The trained decoder yields a generalized Hopfield model that correctly describes the experimental retinal data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decoder readout could be applied to other many-body datasets to discover effective mean-field theories without prior specification of order parameters.
If the capacity bound is respected, the method supplies a systematic route from raw samples to the microscopic couplings of the latent theory.
Extension to time-series data might allow extraction of dynamical mean-field equations from non-equilibrium trajectories.

Load-bearing premise

The VAE decoder is conditionally independent, so that its factorization matches a finite-size mean-field form once the capacity bound is satisfied.

What would settle it

A dataset whose joint distribution cannot be written as a mean-field factorization yet is still accurately reconstructed by a VAE whose latent rate lies below the measured bipartite mutual information.

Figures

Figures reproduced from arXiv: 2606.08694 by Marco Biroli, Max Welling, Vincenzo Vitelli.

**Figure 2.** Figure 2: FIG. 2 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3 [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5 [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

Generative models are increasingly used to capture correlations in many-body systems, but the representations they learn remain largely opaque to physical interpretation. Here, we establish an intuitive criterion that quantifies the capacity of a variational autoencoder (VAE) to faithfully reconstruct the joint probability distribution of a many body system. In a nutshell, a bound on the VAE capacity is obtained by comparing the rate of the latent channel to the bipartite mutual information of the data. Using this bound, we show that the conditionally independent decoder of any successful VAE is structurally identical to a finite-size mean-field factorization. Hence, a successful reconstruction is direct evidence for a latent mean-field theory and the microscopic parameters of that theory can be read off the trained decoder. We validate these conclusions on a hierarchy of solvable models with scalar (Curie-Weiss), vector (Hopfield) and tensor (Maier-Saupe) order parameters, recovering the full Hopfield pattern matrix from equilibrium samples alone. We find that, when applied to Salamander retinal recordings, a two-latent VAE reproduces the population statistics with only two effective collective variables allowing us to recover the `stored patterns' of the neural population and write a generalized Hopfield model which correctly models the experimental data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper ties VAE reconstruction success under a bipartite-MI capacity bound to mean-field factorization and shows parameter recovery on models plus retinal data, but latent symmetries leave uniqueness of the extracted couplings unproven.

read the letter

The main point is that a VAE with a product decoder, once it clears a capacity bound set by bipartite mutual information, has a structure identical to a finite-size mean-field theory, so the trained decoder weights give the microscopic parameters. They demonstrate this by recovering the full Hopfield pattern matrix from equilibrium samples on solvable models and by fitting a two-latent generalized Hopfield model to salamander retinal recordings that matches the observed population statistics.

What is new is the explicit capacity criterion that turns VAE success into evidence for an underlying mean-field description, plus the concrete extraction step on both synthetic and experimental data. The validation on the Curie-Weiss, Hopfield, and Maier-Saupe cases is clean and shows the method works when the ground truth is known.

The soft spot is uniqueness. The conditionally independent decoder always matches the mean-field product form, and the data-processing inequality correctly caps the latent rate. But vector or tensor order parameters admit continuous latent symmetries (orthogonal transformations) that leave p(x) invariant while changing the apparent coupling matrix read from the decoder. The abstract reports recovery of the pattern matrix, yet supplies no argument or numerical test that the extracted parameters are the only ones consistent with the data up to the usual discrete symmetries. That gap matters if the goal is unambiguous decoding of microscopic parameters.

The bound derivation itself is not visible in the abstract, so the full text needs to confirm it holds without extra assumptions. Minor issues aside, the work is aimed at people who want data-driven routes to effective theories in many-body or neural systems. A reader already working on VAEs for physics or on extracting collective variables from recordings will get direct value.

It deserves a serious referee because the central claim is testable, the examples are reproducible in principle, and the concern about uniqueness is fixable rather than fatal.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that a bound on VAE capacity, obtained by comparing the latent channel rate to the bipartite mutual information of the data, implies that any successful VAE with a conditionally independent decoder is structurally identical to a finite-size mean-field factorization. Consequently, successful reconstruction constitutes direct evidence for a latent mean-field theory whose microscopic parameters can be read off the trained decoder. The conclusions are validated on a hierarchy of solvable models (Curie-Weiss scalar, Hopfield vector, Maier-Saupe tensor order parameters), with explicit recovery of the full Hopfield pattern matrix from equilibrium samples, and applied to salamander retinal recordings to recover two effective collective variables and a generalized Hopfield model.

Significance. If the central claims hold, the work supplies a principled, interpretable link between VAE representations and mean-field theories in many-body systems, enabling parameter extraction from data alone. The validation on exactly solvable models and the concrete application to experimental neural data constitute clear strengths.

major comments (2)

[Validation on Hopfield model (and abstract claim)] The central claim that microscopic parameters can be read off the trained decoder once the capacity bound is satisfied is undermined by the absence of any argument or numerical check establishing uniqueness of the decoder-to-parameter mapping under latent symmetries. For the Hopfield case, orthogonal transformations in the latent space leave p(x) invariant while altering the apparent coupling matrix; the reported recovery of the pattern matrix therefore does not yet demonstrate that the extracted parameters are unique up to the known discrete symmetries (sign flips, permutations).
[Capacity bound derivation] The derivation of the capacity bound via the data-processing inequality I(X_L; X_R) ≤ I(X_L; z) is presented as non-circular, yet the manuscript does not explicitly verify that the bound remains tight and non-vacuous once the decoder is restricted to the conditionally independent (product) form required by the mean-field identification. This equivalence is load-bearing for the assertion that success implies a latent mean-field theory.

minor comments (2)

[Validation sections] Quantitative metrics (e.g., parameter recovery error, reconstruction fidelity relative to the bound) for the solvable-model validations are not summarized in the abstract or main text; adding a table of these values would strengthen the validation section.
[Introduction / Methods] Notation for the bipartite mutual information and the latent rate should be introduced with an explicit equation reference at first use to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment of significance, and constructive major comments. We address each point below with clarifications and planned revisions.

read point-by-point responses

Referee: [Validation on Hopfield model (and abstract claim)] The central claim that microscopic parameters can be read off the trained decoder once the capacity bound is satisfied is undermined by the absence of any argument or numerical check establishing uniqueness of the decoder-to-parameter mapping under latent symmetries. For the Hopfield case, orthogonal transformations in the latent space leave p(x) invariant while altering the apparent coupling matrix; the reported recovery of the pattern matrix therefore does not yet demonstrate that the extracted parameters are unique up to the known discrete symmetries (sign flips, permutations).

Authors: We agree that explicit treatment of latent symmetries is needed to fully substantiate uniqueness. Orthogonal transformations of the latent variables preserve p(x) and rotate the decoder weights, while sign flips and permutations are discrete symmetries of the Hopfield model. In the revised manuscript we will add a dedicated paragraph (and supplementary numerical check) clarifying that parameters are recovered uniquely up to these symmetries. We will align the extracted pattern matrix to the ground truth via orthogonal Procrustes analysis and report the residual error after alignment, confirming that the mapping is unique modulo the known gauge freedom. This refines but does not alter the central claim that successful VAEs yield readable mean-field parameters. revision: yes
Referee: [Capacity bound derivation] The derivation of the capacity bound via the data-processing inequality I(X_L; X_R) ≤ I(X_L; z) is presented as non-circular, yet the manuscript does not explicitly verify that the bound remains tight and non-vacuous once the decoder is restricted to the conditionally independent (product) form required by the mean-field identification. This equivalence is load-bearing for the assertion that success implies a latent mean-field theory.

Authors: The data-processing inequality follows directly from the Markov chain X_L – z – X_R and holds for any decoder. To address the request for explicit verification under the product-decoder restriction, the revision will include a short calculation (main text or SI) for the Curie-Weiss and Hopfield cases showing that, whenever reconstruction error is low, I(X_L; z) saturates I(X_L; X_R) and the trained product decoder exactly reproduces the mean-field factorization. This confirms the bound remains tight and non-vacuous precisely when the mean-field identification applies. revision: yes

Circularity Check

0 steps flagged

No significant circularity; bound and structural claim are independent of fitted outputs

full rationale

The capacity bound follows from the data-processing inequality I(X_L; X_R) ≤ I(X_L; z) applied to the latent channel rate, a general information-theoretic fact independent of the paper's model or data. The structural identity between a conditionally independent decoder and finite-size mean-field factorization is a direct consequence of the standard VAE architecture (product decoder), but the central inference—that successful reconstruction under the bound evidences a latent mean-field structure in the data—is a non-tautological claim about the data distribution. No load-bearing self-citation, fitted parameter renamed as prediction, or uniqueness theorem imported from prior author work appears in the derivation chain. Parameter recovery on solvable models is presented as empirical validation rather than a constructed identity. The noted latent symmetries affect uniqueness of extracted couplings but do not reduce any claimed result to its inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The key addition is the capacity bound based on bipartite mutual information, which is not a standard axiom but a derived criterion.

free parameters (1)

latent dimension = 2
Selected for the retinal data application to capture the effective collective variables.

axioms (1)

domain assumption VAE decoder assumes conditional independence of observations given latents
This is central to equating the decoder to mean-field factorization.

pith-pipeline@v0.9.1-grok · 5759 in / 1368 out tokens · 35103 ms · 2026-06-27T17:49:59.528619+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 5 linked inside Pith

[1]

Mehta, M

P. Mehta, M. Bukov, C.-H. Wang, A. G. Day, C. Richard- son, C. K. Fisher, and D. J. Schwab, Physics reports810, 1 (2019)

2019
[2]

Carrasquilla and R

J. Carrasquilla and R. G. Melko, Nature Physics13, 431 (2017)

2017
[3]

S. J. Wetzel, Physical Review E96, 022140 (2017)

2017
[4]

R. Iten, T. Metger, H. Wilming, L. Del Rio, and R. Ren- ner, Physical review letters124, 010508 (2020)

2020
[5]

Tubiana and R

J. Tubiana and R. Monasson, Physical review letters118, 138301 (2017)

2017
[6]

Mehta and D

P. Mehta and D. J. Schwab, arXiv preprint arXiv:1410.3831 (2014)

Pith/arXiv arXiv 2014
[7]

Decelle and C

A. Decelle and C. Furtlehner, Chinese Physics B30, 040202 (2021)

2021
[8]

D. Wu, L. Wang, and P. Zhang, Physical review letters 122, 080602 (2019)

2019
[9]

Sohl-Dickstein, E

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, Proceedings of the 32nd International Con- ference on Machine Learning (ICML) , 2256 (2015)

2015
[10]

J. Ho, A. Jain, and P. Abbeel, Advances in neural infor- mation processing systems33, 6840 (2020)

2020
[11]

D. P. Kingma and M. Welling, arXiv preprint arXiv:1312.6114 (2013)

Pith/arXiv arXiv 2013
[12]

Pandarinath, D

C. Pandarinath, D. J. O’Shea, J. Collins, R. Jozefowicz, S. D. Stavisky, J. C. Kao, E. M. Trautmann, M. T. Kauf- man, S. I. Ryu, L. R. Hochberg,et al., Nature methods 15, 805 (2018)

2018
[13]

Walker, K.-M

N. Walker, K.-M. Tam, and M. Jarrell, Scientific reports 10, 13047 (2020)

2020
[14]

S. Iso, S. Shiba, and S. Yokoo, Physical review E97, 053304 (2018)

2018
[15]

Cossu, L

G. Cossu, L. Del Debbio, T. Giani, A. Khamseh, and M. Wilson, Physical Review B100, 064304 (2019)

2019
[16]

Cristoforetti, G

M. Cristoforetti, G. Jurman, A. I. Nishi, and C. Furlanello, arXiv:1705.09524 (2017)

Pith/arXiv arXiv 2017
[17]

Cocco and R

S. Cocco and R. Monasson, Physical review letters106, 090601 (2011)

2011
[18]

H. C. Nguyen, R. Zecchina, and J. Berg, Advances in physics66, 197 (2017)

2017
[19]

M´ ezard and A

M. M´ ezard and A. Montanari,Information, Physics, and Computation(Oxford University Press, 2009)

2009
[20]

Torlai and R

G. Torlai and R. G. Melko, Phys. Rev. B94, 165134 (2016)

2016
[21]

Goodfellow,Deep learning(MIT press, 2016)

I. Goodfellow,Deep learning(MIT press, 2016)

2016
[22]

S. Zhao, J. Song, and S. Ermon, arXiv preprint arXiv:1702.08658 (2017)

Pith/arXiv arXiv 2017
[23]

A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, Proceedings of the 33rd International Con- ference on Machine Learning (ICML) , 1558 (2016)

2016
[24]

Schneidman, M

E. Schneidman, M. J. Berry, R. Segev, and W. Bialek, Nature440, 1007 (2006)

2006
[25]

Krotov and J

D. Krotov and J. J. Hopfield, Advances in neural infor- mation processing systems29(2016)

2016
[26]

P. K. Diederik and W. Max, Foundations and Trends® in Machine Learning12, 307 (2019)

2019
[27]

Danilo and S

R. Danilo and S. Mohamed, Proceedings of the 32nd International Conference on Machine Learning (ICML) (2015)

2015
[28]

Conor, A

D. Conor, A. Bekasov, I. Murray, and G. Papamakarios, Advances in neural information processing systems32 (2019)

2019
[29]

Biroli, arXiv preprint arXiv:2508.12818 (2025)

M. Biroli, arXiv preprint arXiv:2508.12818 (2025)

arXiv 2025
[30]

J. P. Sethna,Statistical mechanics: entropy, order pa- rameters, and complexity, Vol. 14 (Oxford University Press, 2021)

2021
[31]

L. P. Kadanoff,Statistical Physics: Statics, Dynamics, and Renormalization(World Scientific, 2000)

2000
[32]

Onsager, Physical review65, 117 (1944)

L. Onsager, Physical review65, 117 (1944)

1944
[33]

Y. Xu, Y. Wei, and L. Ma, arXiv preprint arXiv:2510.25507 (2025)

arXiv 2025
[34]

M. D. Hoffman and M. J. Johnson, Workshop in advances in approximate Bayesian inference, NIPS1(2016)

2016
[35]

Alemi, B

A. Alemi, B. Poole, I. Fischer, J. Dillon, R. A. Saurous, and K. Murphy, Proceedings of the 35th International Conference on Machine Learning (ICML) , 159 (2018)

2018
[36]

Naftali and N

T. Naftali and N. Zaslavsky, IEEE information theory workshop (2015)

2015
[37]

Wolff, Physical Review Letters62, 361 (1989)

U. Wolff, Physical Review Letters62, 361 (1989). 11

1989
[38]

J. J. Hopfield, Proceedings of the national academy of sciences79, 2554 (1982)

1982
[39]

D. J. Amit, H. Gutfreund, and H. Sompolinsky, Physical review letters55, 1530 (1985)

1985
[40]

Hyvarinen, IEEE transactions on Neural Networks10, 626 (1999)

A. Hyvarinen, IEEE transactions on Neural Networks10, 626 (1999)

1999
[41]

Maier and A

W. Maier and A. Saupe, Zeitschrift f¨ ur Naturforschung A14, 882 (1959)

1959
[42]

De Gennes and J

P.-G. De Gennes and J. Prost,The physics of liquid crys- tals, 83 (Oxford university press, 1993)

1993
[43]

Bingham, The Annals of Statistics , 1201 (1974)

C. Bingham, The Annals of Statistics , 1201 (1974)

1974
[44]

J. T. Kent, Journal of the Royal Statistical Society: Se- ries B (Methodological)44, 71 (1982)

1982
[45]

Tkaˇ cik, O

G. Tkaˇ cik, O. Marre, D. Amodei, E. Schneidman, W. Bialek, and M. J. Berry, PLoS computational biol- ogy10, e1003408 (2014)

2014
[46]

Demircigil, J

M. Demircigil, J. Heusel, M. L¨ owe, S. Upgang, and F. Vermet, Journal of Statistical Physics168, 288 (2017)

2017
[47]

Ramsauer, B

H. Ramsauer, B. Sch¨ afl, J. Lehner, P. Seidl, M. Widrich, T. Adler, L. Gruber, M. Holzleitner, M. Pavlovi´ c, G. K. Sandve,et al., arXiv preprint arXiv:2008.02217 (2020)

Pith/arXiv arXiv 2008
[48]

M´ ezard, G

M. M´ ezard, G. Parisi, M. A. Virasoro, and D. J. Thou- less,Spin glass theory and beyond(American Institute of Physics, 1988)

1988
[49]

H. W. Kuhn, Naval research logistics quarterly2, 83 (1955)

1955
[50]

Nguyen, M

X. Nguyen, M. J. Wainwright, and M. I. Jordan, IEEE Transactions on Information Theory56, 5847–5861 (2010)

2010
[51]

Wilms, M

J. Wilms, M. Troyer, and F. Verstraete, J. Stat. Mech: Theory and Experiment , P10011 (2011)

2011
[52]

Iaconis, S

J. Iaconis, S. Inglis, A. B. Kallin, and R. G. Melko, Phys. Rev. B87, 195134 (2013)

2013
[53]

St´ ephan, S

J.-M. St´ ephan, S. Inglis, P. Fendley, and R. G. Melko, Phys. Rev. Lett.112, 127204 (2014). 12 IX. METHODS A. VAE architecture Encoder (CNN).The temperature is concatenated as an extra channel to every single spin. Then the network consists of two convolutional blocks with hidden channels [32,64], kernel size 3, stride 2, padding 1. Each block ap- plies ...

2014
[54]

Here we aim to relax these conditions

Going beyond the Gaussian regime The above mentioned capacity criterion is restricted to a Gaussian priorp ψ(z) =N(0,I d) with a Gaussian poste- riorq ϕ(z|x) =N(µ(x), σ(x) 2Id) in the near-deterministic regimeσ≪1. Here we aim to relax these conditions. Firstly, we adress the learned priorp ψ(z) which is generally non-Gaussian. Concretely, the architecture...

[1] [1]

Mehta, M

P. Mehta, M. Bukov, C.-H. Wang, A. G. Day, C. Richard- son, C. K. Fisher, and D. J. Schwab, Physics reports810, 1 (2019)

2019

[2] [2]

Carrasquilla and R

J. Carrasquilla and R. G. Melko, Nature Physics13, 431 (2017)

2017

[3] [3]

S. J. Wetzel, Physical Review E96, 022140 (2017)

2017

[4] [4]

R. Iten, T. Metger, H. Wilming, L. Del Rio, and R. Ren- ner, Physical review letters124, 010508 (2020)

2020

[5] [5]

Tubiana and R

J. Tubiana and R. Monasson, Physical review letters118, 138301 (2017)

2017

[6] [6]

Mehta and D

P. Mehta and D. J. Schwab, arXiv preprint arXiv:1410.3831 (2014)

Pith/arXiv arXiv 2014

[7] [7]

Decelle and C

A. Decelle and C. Furtlehner, Chinese Physics B30, 040202 (2021)

2021

[8] [8]

D. Wu, L. Wang, and P. Zhang, Physical review letters 122, 080602 (2019)

2019

[9] [9]

Sohl-Dickstein, E

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, Proceedings of the 32nd International Con- ference on Machine Learning (ICML) , 2256 (2015)

2015

[10] [10]

J. Ho, A. Jain, and P. Abbeel, Advances in neural infor- mation processing systems33, 6840 (2020)

2020

[11] [11]

D. P. Kingma and M. Welling, arXiv preprint arXiv:1312.6114 (2013)

Pith/arXiv arXiv 2013

[12] [12]

Pandarinath, D

C. Pandarinath, D. J. O’Shea, J. Collins, R. Jozefowicz, S. D. Stavisky, J. C. Kao, E. M. Trautmann, M. T. Kauf- man, S. I. Ryu, L. R. Hochberg,et al., Nature methods 15, 805 (2018)

2018

[13] [13]

Walker, K.-M

N. Walker, K.-M. Tam, and M. Jarrell, Scientific reports 10, 13047 (2020)

2020

[14] [14]

S. Iso, S. Shiba, and S. Yokoo, Physical review E97, 053304 (2018)

2018

[15] [15]

Cossu, L

G. Cossu, L. Del Debbio, T. Giani, A. Khamseh, and M. Wilson, Physical Review B100, 064304 (2019)

2019

[16] [16]

Cristoforetti, G

M. Cristoforetti, G. Jurman, A. I. Nishi, and C. Furlanello, arXiv:1705.09524 (2017)

Pith/arXiv arXiv 2017

[17] [17]

Cocco and R

S. Cocco and R. Monasson, Physical review letters106, 090601 (2011)

2011

[18] [18]

H. C. Nguyen, R. Zecchina, and J. Berg, Advances in physics66, 197 (2017)

2017

[19] [19]

M´ ezard and A

M. M´ ezard and A. Montanari,Information, Physics, and Computation(Oxford University Press, 2009)

2009

[20] [20]

Torlai and R

G. Torlai and R. G. Melko, Phys. Rev. B94, 165134 (2016)

2016

[21] [21]

Goodfellow,Deep learning(MIT press, 2016)

I. Goodfellow,Deep learning(MIT press, 2016)

2016

[22] [22]

S. Zhao, J. Song, and S. Ermon, arXiv preprint arXiv:1702.08658 (2017)

Pith/arXiv arXiv 2017

[23] [23]

A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, Proceedings of the 33rd International Con- ference on Machine Learning (ICML) , 1558 (2016)

2016

[24] [24]

Schneidman, M

E. Schneidman, M. J. Berry, R. Segev, and W. Bialek, Nature440, 1007 (2006)

2006

[25] [25]

Krotov and J

D. Krotov and J. J. Hopfield, Advances in neural infor- mation processing systems29(2016)

2016

[26] [26]

P. K. Diederik and W. Max, Foundations and Trends® in Machine Learning12, 307 (2019)

2019

[27] [27]

Danilo and S

R. Danilo and S. Mohamed, Proceedings of the 32nd International Conference on Machine Learning (ICML) (2015)

2015

[28] [28]

Conor, A

D. Conor, A. Bekasov, I. Murray, and G. Papamakarios, Advances in neural information processing systems32 (2019)

2019

[29] [29]

Biroli, arXiv preprint arXiv:2508.12818 (2025)

M. Biroli, arXiv preprint arXiv:2508.12818 (2025)

arXiv 2025

[30] [30]

J. P. Sethna,Statistical mechanics: entropy, order pa- rameters, and complexity, Vol. 14 (Oxford University Press, 2021)

2021

[31] [31]

L. P. Kadanoff,Statistical Physics: Statics, Dynamics, and Renormalization(World Scientific, 2000)

2000

[32] [32]

Onsager, Physical review65, 117 (1944)

L. Onsager, Physical review65, 117 (1944)

1944

[33] [33]

Y. Xu, Y. Wei, and L. Ma, arXiv preprint arXiv:2510.25507 (2025)

arXiv 2025

[34] [34]

M. D. Hoffman and M. J. Johnson, Workshop in advances in approximate Bayesian inference, NIPS1(2016)

2016

[35] [35]

Alemi, B

A. Alemi, B. Poole, I. Fischer, J. Dillon, R. A. Saurous, and K. Murphy, Proceedings of the 35th International Conference on Machine Learning (ICML) , 159 (2018)

2018

[36] [36]

Naftali and N

T. Naftali and N. Zaslavsky, IEEE information theory workshop (2015)

2015

[37] [37]

Wolff, Physical Review Letters62, 361 (1989)

U. Wolff, Physical Review Letters62, 361 (1989). 11

1989

[38] [38]

J. J. Hopfield, Proceedings of the national academy of sciences79, 2554 (1982)

1982

[39] [39]

D. J. Amit, H. Gutfreund, and H. Sompolinsky, Physical review letters55, 1530 (1985)

1985

[40] [40]

Hyvarinen, IEEE transactions on Neural Networks10, 626 (1999)

A. Hyvarinen, IEEE transactions on Neural Networks10, 626 (1999)

1999

[41] [41]

Maier and A

W. Maier and A. Saupe, Zeitschrift f¨ ur Naturforschung A14, 882 (1959)

1959

[42] [42]

De Gennes and J

P.-G. De Gennes and J. Prost,The physics of liquid crys- tals, 83 (Oxford university press, 1993)

1993

[43] [43]

Bingham, The Annals of Statistics , 1201 (1974)

C. Bingham, The Annals of Statistics , 1201 (1974)

1974

[44] [44]

J. T. Kent, Journal of the Royal Statistical Society: Se- ries B (Methodological)44, 71 (1982)

1982

[45] [45]

Tkaˇ cik, O

G. Tkaˇ cik, O. Marre, D. Amodei, E. Schneidman, W. Bialek, and M. J. Berry, PLoS computational biol- ogy10, e1003408 (2014)

2014

[46] [46]

Demircigil, J

M. Demircigil, J. Heusel, M. L¨ owe, S. Upgang, and F. Vermet, Journal of Statistical Physics168, 288 (2017)

2017

[47] [47]

Ramsauer, B

H. Ramsauer, B. Sch¨ afl, J. Lehner, P. Seidl, M. Widrich, T. Adler, L. Gruber, M. Holzleitner, M. Pavlovi´ c, G. K. Sandve,et al., arXiv preprint arXiv:2008.02217 (2020)

Pith/arXiv arXiv 2008

[48] [48]

M´ ezard, G

M. M´ ezard, G. Parisi, M. A. Virasoro, and D. J. Thou- less,Spin glass theory and beyond(American Institute of Physics, 1988)

1988

[49] [49]

H. W. Kuhn, Naval research logistics quarterly2, 83 (1955)

1955

[50] [50]

Nguyen, M

X. Nguyen, M. J. Wainwright, and M. I. Jordan, IEEE Transactions on Information Theory56, 5847–5861 (2010)

2010

[51] [51]

Wilms, M

J. Wilms, M. Troyer, and F. Verstraete, J. Stat. Mech: Theory and Experiment , P10011 (2011)

2011

[52] [52]

Iaconis, S

J. Iaconis, S. Inglis, A. B. Kallin, and R. G. Melko, Phys. Rev. B87, 195134 (2013)

2013

[53] [53]

St´ ephan, S

J.-M. St´ ephan, S. Inglis, P. Fendley, and R. G. Melko, Phys. Rev. Lett.112, 127204 (2014). 12 IX. METHODS A. VAE architecture Encoder (CNN).The temperature is concatenated as an extra channel to every single spin. Then the network consists of two convolutional blocks with hidden channels [32,64], kernel size 3, stride 2, padding 1. Each block ap- plies ...

2014

[54] [54]

Here we aim to relax these conditions

Going beyond the Gaussian regime The above mentioned capacity criterion is restricted to a Gaussian priorp ψ(z) =N(0,I d) with a Gaussian poste- riorq ϕ(z|x) =N(µ(x), σ(x) 2Id) in the near-deterministic regimeσ≪1. Here we aim to relax these conditions. Firstly, we adress the learned priorp ψ(z) which is generally non-Gaussian. Concretely, the architecture...