Discovering and decoding latent mean-field structure with variational autoencoders
Pith reviewed 2026-06-27 17:49 UTC · model grok-4.3
The pith
A VAE with bounded capacity and independent decoder must reconstruct data via a latent mean-field factorization whose parameters are readable from the network.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A bound on VAE capacity is obtained by comparing the rate of the latent channel to the bipartite mutual information of the data. Using this bound, the conditionally independent decoder of any successful VAE is structurally identical to a finite-size mean-field factorization. Hence a successful reconstruction is direct evidence for a latent mean-field theory and the microscopic parameters of that theory can be read off the trained decoder.
What carries the argument
The capacity bound obtained by comparing latent-channel rate to bipartite mutual information, which enforces structural identity between the decoder and a finite-size mean-field factorization.
If this is right
- The full Hopfield pattern matrix is recovered from equilibrium samples alone.
- A two-latent VAE reproduces salamander retinal population statistics using only two effective collective variables.
- The trained decoder yields a generalized Hopfield model that correctly describes the experimental retinal data.
Where Pith is reading between the lines
- The same decoder readout could be applied to other many-body datasets to discover effective mean-field theories without prior specification of order parameters.
- If the capacity bound is respected, the method supplies a systematic route from raw samples to the microscopic couplings of the latent theory.
- Extension to time-series data might allow extraction of dynamical mean-field equations from non-equilibrium trajectories.
Load-bearing premise
The VAE decoder is conditionally independent, so that its factorization matches a finite-size mean-field form once the capacity bound is satisfied.
What would settle it
A dataset whose joint distribution cannot be written as a mean-field factorization yet is still accurately reconstructed by a VAE whose latent rate lies below the measured bipartite mutual information.
Figures
read the original abstract
Generative models are increasingly used to capture correlations in many-body systems, but the representations they learn remain largely opaque to physical interpretation. Here, we establish an intuitive criterion that quantifies the capacity of a variational autoencoder (VAE) to faithfully reconstruct the joint probability distribution of a many body system. In a nutshell, a bound on the VAE capacity is obtained by comparing the rate of the latent channel to the bipartite mutual information of the data. Using this bound, we show that the conditionally independent decoder of any successful VAE is structurally identical to a finite-size mean-field factorization. Hence, a successful reconstruction is direct evidence for a latent mean-field theory and the microscopic parameters of that theory can be read off the trained decoder. We validate these conclusions on a hierarchy of solvable models with scalar (Curie-Weiss), vector (Hopfield) and tensor (Maier-Saupe) order parameters, recovering the full Hopfield pattern matrix from equilibrium samples alone. We find that, when applied to Salamander retinal recordings, a two-latent VAE reproduces the population statistics with only two effective collective variables allowing us to recover the `stored patterns' of the neural population and write a generalized Hopfield model which correctly models the experimental data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that a bound on VAE capacity, obtained by comparing the latent channel rate to the bipartite mutual information of the data, implies that any successful VAE with a conditionally independent decoder is structurally identical to a finite-size mean-field factorization. Consequently, successful reconstruction constitutes direct evidence for a latent mean-field theory whose microscopic parameters can be read off the trained decoder. The conclusions are validated on a hierarchy of solvable models (Curie-Weiss scalar, Hopfield vector, Maier-Saupe tensor order parameters), with explicit recovery of the full Hopfield pattern matrix from equilibrium samples, and applied to salamander retinal recordings to recover two effective collective variables and a generalized Hopfield model.
Significance. If the central claims hold, the work supplies a principled, interpretable link between VAE representations and mean-field theories in many-body systems, enabling parameter extraction from data alone. The validation on exactly solvable models and the concrete application to experimental neural data constitute clear strengths.
major comments (2)
- [Validation on Hopfield model (and abstract claim)] The central claim that microscopic parameters can be read off the trained decoder once the capacity bound is satisfied is undermined by the absence of any argument or numerical check establishing uniqueness of the decoder-to-parameter mapping under latent symmetries. For the Hopfield case, orthogonal transformations in the latent space leave p(x) invariant while altering the apparent coupling matrix; the reported recovery of the pattern matrix therefore does not yet demonstrate that the extracted parameters are unique up to the known discrete symmetries (sign flips, permutations).
- [Capacity bound derivation] The derivation of the capacity bound via the data-processing inequality I(X_L; X_R) ≤ I(X_L; z) is presented as non-circular, yet the manuscript does not explicitly verify that the bound remains tight and non-vacuous once the decoder is restricted to the conditionally independent (product) form required by the mean-field identification. This equivalence is load-bearing for the assertion that success implies a latent mean-field theory.
minor comments (2)
- [Validation sections] Quantitative metrics (e.g., parameter recovery error, reconstruction fidelity relative to the bound) for the solvable-model validations are not summarized in the abstract or main text; adding a table of these values would strengthen the validation section.
- [Introduction / Methods] Notation for the bipartite mutual information and the latent rate should be introduced with an explicit equation reference at first use to improve readability.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment of significance, and constructive major comments. We address each point below with clarifications and planned revisions.
read point-by-point responses
-
Referee: [Validation on Hopfield model (and abstract claim)] The central claim that microscopic parameters can be read off the trained decoder once the capacity bound is satisfied is undermined by the absence of any argument or numerical check establishing uniqueness of the decoder-to-parameter mapping under latent symmetries. For the Hopfield case, orthogonal transformations in the latent space leave p(x) invariant while altering the apparent coupling matrix; the reported recovery of the pattern matrix therefore does not yet demonstrate that the extracted parameters are unique up to the known discrete symmetries (sign flips, permutations).
Authors: We agree that explicit treatment of latent symmetries is needed to fully substantiate uniqueness. Orthogonal transformations of the latent variables preserve p(x) and rotate the decoder weights, while sign flips and permutations are discrete symmetries of the Hopfield model. In the revised manuscript we will add a dedicated paragraph (and supplementary numerical check) clarifying that parameters are recovered uniquely up to these symmetries. We will align the extracted pattern matrix to the ground truth via orthogonal Procrustes analysis and report the residual error after alignment, confirming that the mapping is unique modulo the known gauge freedom. This refines but does not alter the central claim that successful VAEs yield readable mean-field parameters. revision: yes
-
Referee: [Capacity bound derivation] The derivation of the capacity bound via the data-processing inequality I(X_L; X_R) ≤ I(X_L; z) is presented as non-circular, yet the manuscript does not explicitly verify that the bound remains tight and non-vacuous once the decoder is restricted to the conditionally independent (product) form required by the mean-field identification. This equivalence is load-bearing for the assertion that success implies a latent mean-field theory.
Authors: The data-processing inequality follows directly from the Markov chain X_L – z – X_R and holds for any decoder. To address the request for explicit verification under the product-decoder restriction, the revision will include a short calculation (main text or SI) for the Curie-Weiss and Hopfield cases showing that, whenever reconstruction error is low, I(X_L; z) saturates I(X_L; X_R) and the trained product decoder exactly reproduces the mean-field factorization. This confirms the bound remains tight and non-vacuous precisely when the mean-field identification applies. revision: yes
Circularity Check
No significant circularity; bound and structural claim are independent of fitted outputs
full rationale
The capacity bound follows from the data-processing inequality I(X_L; X_R) ≤ I(X_L; z) applied to the latent channel rate, a general information-theoretic fact independent of the paper's model or data. The structural identity between a conditionally independent decoder and finite-size mean-field factorization is a direct consequence of the standard VAE architecture (product decoder), but the central inference—that successful reconstruction under the bound evidences a latent mean-field structure in the data—is a non-tautological claim about the data distribution. No load-bearing self-citation, fitted parameter renamed as prediction, or uniqueness theorem imported from prior author work appears in the derivation chain. Parameter recovery on solvable models is presented as empirical validation rather than a constructed identity. The noted latent symmetries affect uniqueness of extracted couplings but do not reduce any claimed result to its inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- latent dimension =
2
axioms (1)
- domain assumption VAE decoder assumes conditional independence of observations given latents
Reference graph
Works this paper leans on
-
[1]
Mehta, M
P. Mehta, M. Bukov, C.-H. Wang, A. G. Day, C. Richard- son, C. K. Fisher, and D. J. Schwab, Physics reports810, 1 (2019)
2019
-
[2]
Carrasquilla and R
J. Carrasquilla and R. G. Melko, Nature Physics13, 431 (2017)
2017
-
[3]
S. J. Wetzel, Physical Review E96, 022140 (2017)
2017
-
[4]
R. Iten, T. Metger, H. Wilming, L. Del Rio, and R. Ren- ner, Physical review letters124, 010508 (2020)
2020
-
[5]
Tubiana and R
J. Tubiana and R. Monasson, Physical review letters118, 138301 (2017)
2017
- [6]
-
[7]
Decelle and C
A. Decelle and C. Furtlehner, Chinese Physics B30, 040202 (2021)
2021
-
[8]
D. Wu, L. Wang, and P. Zhang, Physical review letters 122, 080602 (2019)
2019
-
[9]
Sohl-Dickstein, E
J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, Proceedings of the 32nd International Con- ference on Machine Learning (ICML) , 2256 (2015)
2015
-
[10]
J. Ho, A. Jain, and P. Abbeel, Advances in neural infor- mation processing systems33, 6840 (2020)
2020
-
[11]
D. P. Kingma and M. Welling, arXiv preprint arXiv:1312.6114 (2013)
Pith/arXiv arXiv 2013
-
[12]
Pandarinath, D
C. Pandarinath, D. J. O’Shea, J. Collins, R. Jozefowicz, S. D. Stavisky, J. C. Kao, E. M. Trautmann, M. T. Kauf- man, S. I. Ryu, L. R. Hochberg,et al., Nature methods 15, 805 (2018)
2018
-
[13]
Walker, K.-M
N. Walker, K.-M. Tam, and M. Jarrell, Scientific reports 10, 13047 (2020)
2020
-
[14]
S. Iso, S. Shiba, and S. Yokoo, Physical review E97, 053304 (2018)
2018
-
[15]
Cossu, L
G. Cossu, L. Del Debbio, T. Giani, A. Khamseh, and M. Wilson, Physical Review B100, 064304 (2019)
2019
-
[16]
M. Cristoforetti, G. Jurman, A. I. Nishi, and C. Furlanello, arXiv:1705.09524 (2017)
Pith/arXiv arXiv 2017
-
[17]
Cocco and R
S. Cocco and R. Monasson, Physical review letters106, 090601 (2011)
2011
-
[18]
H. C. Nguyen, R. Zecchina, and J. Berg, Advances in physics66, 197 (2017)
2017
-
[19]
M´ ezard and A
M. M´ ezard and A. Montanari,Information, Physics, and Computation(Oxford University Press, 2009)
2009
-
[20]
Torlai and R
G. Torlai and R. G. Melko, Phys. Rev. B94, 165134 (2016)
2016
-
[21]
Goodfellow,Deep learning(MIT press, 2016)
I. Goodfellow,Deep learning(MIT press, 2016)
2016
-
[22]
S. Zhao, J. Song, and S. Ermon, arXiv preprint arXiv:1702.08658 (2017)
Pith/arXiv arXiv 2017
-
[23]
A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, Proceedings of the 33rd International Con- ference on Machine Learning (ICML) , 1558 (2016)
2016
-
[24]
Schneidman, M
E. Schneidman, M. J. Berry, R. Segev, and W. Bialek, Nature440, 1007 (2006)
2006
-
[25]
Krotov and J
D. Krotov and J. J. Hopfield, Advances in neural infor- mation processing systems29(2016)
2016
-
[26]
P. K. Diederik and W. Max, Foundations and Trends® in Machine Learning12, 307 (2019)
2019
-
[27]
Danilo and S
R. Danilo and S. Mohamed, Proceedings of the 32nd International Conference on Machine Learning (ICML) (2015)
2015
-
[28]
Conor, A
D. Conor, A. Bekasov, I. Murray, and G. Papamakarios, Advances in neural information processing systems32 (2019)
2019
-
[29]
Biroli, arXiv preprint arXiv:2508.12818 (2025)
M. Biroli, arXiv preprint arXiv:2508.12818 (2025)
arXiv 2025
-
[30]
J. P. Sethna,Statistical mechanics: entropy, order pa- rameters, and complexity, Vol. 14 (Oxford University Press, 2021)
2021
-
[31]
L. P. Kadanoff,Statistical Physics: Statics, Dynamics, and Renormalization(World Scientific, 2000)
2000
-
[32]
Onsager, Physical review65, 117 (1944)
L. Onsager, Physical review65, 117 (1944)
1944
-
[33]
Y. Xu, Y. Wei, and L. Ma, arXiv preprint arXiv:2510.25507 (2025)
arXiv 2025
-
[34]
M. D. Hoffman and M. J. Johnson, Workshop in advances in approximate Bayesian inference, NIPS1(2016)
2016
-
[35]
Alemi, B
A. Alemi, B. Poole, I. Fischer, J. Dillon, R. A. Saurous, and K. Murphy, Proceedings of the 35th International Conference on Machine Learning (ICML) , 159 (2018)
2018
-
[36]
Naftali and N
T. Naftali and N. Zaslavsky, IEEE information theory workshop (2015)
2015
-
[37]
Wolff, Physical Review Letters62, 361 (1989)
U. Wolff, Physical Review Letters62, 361 (1989). 11
1989
-
[38]
J. J. Hopfield, Proceedings of the national academy of sciences79, 2554 (1982)
1982
-
[39]
D. J. Amit, H. Gutfreund, and H. Sompolinsky, Physical review letters55, 1530 (1985)
1985
-
[40]
Hyvarinen, IEEE transactions on Neural Networks10, 626 (1999)
A. Hyvarinen, IEEE transactions on Neural Networks10, 626 (1999)
1999
-
[41]
Maier and A
W. Maier and A. Saupe, Zeitschrift f¨ ur Naturforschung A14, 882 (1959)
1959
-
[42]
De Gennes and J
P.-G. De Gennes and J. Prost,The physics of liquid crys- tals, 83 (Oxford university press, 1993)
1993
-
[43]
Bingham, The Annals of Statistics , 1201 (1974)
C. Bingham, The Annals of Statistics , 1201 (1974)
1974
-
[44]
J. T. Kent, Journal of the Royal Statistical Society: Se- ries B (Methodological)44, 71 (1982)
1982
-
[45]
Tkaˇ cik, O
G. Tkaˇ cik, O. Marre, D. Amodei, E. Schneidman, W. Bialek, and M. J. Berry, PLoS computational biol- ogy10, e1003408 (2014)
2014
-
[46]
Demircigil, J
M. Demircigil, J. Heusel, M. L¨ owe, S. Upgang, and F. Vermet, Journal of Statistical Physics168, 288 (2017)
2017
-
[47]
H. Ramsauer, B. Sch¨ afl, J. Lehner, P. Seidl, M. Widrich, T. Adler, L. Gruber, M. Holzleitner, M. Pavlovi´ c, G. K. Sandve,et al., arXiv preprint arXiv:2008.02217 (2020)
Pith/arXiv arXiv 2008
-
[48]
M´ ezard, G
M. M´ ezard, G. Parisi, M. A. Virasoro, and D. J. Thou- less,Spin glass theory and beyond(American Institute of Physics, 1988)
1988
-
[49]
H. W. Kuhn, Naval research logistics quarterly2, 83 (1955)
1955
-
[50]
Nguyen, M
X. Nguyen, M. J. Wainwright, and M. I. Jordan, IEEE Transactions on Information Theory56, 5847–5861 (2010)
2010
-
[51]
Wilms, M
J. Wilms, M. Troyer, and F. Verstraete, J. Stat. Mech: Theory and Experiment , P10011 (2011)
2011
-
[52]
Iaconis, S
J. Iaconis, S. Inglis, A. B. Kallin, and R. G. Melko, Phys. Rev. B87, 195134 (2013)
2013
-
[53]
St´ ephan, S
J.-M. St´ ephan, S. Inglis, P. Fendley, and R. G. Melko, Phys. Rev. Lett.112, 127204 (2014). 12 IX. METHODS A. VAE architecture Encoder (CNN).The temperature is concatenated as an extra channel to every single spin. Then the network consists of two convolutional blocks with hidden channels [32,64], kernel size 3, stride 2, padding 1. Each block ap- plies ...
2014
-
[54]
Here we aim to relax these conditions
Going beyond the Gaussian regime The above mentioned capacity criterion is restricted to a Gaussian priorp ψ(z) =N(0,I d) with a Gaussian poste- riorq ϕ(z|x) =N(µ(x), σ(x) 2Id) in the near-deterministic regimeσ≪1. Here we aim to relax these conditions. Firstly, we adress the learned priorp ψ(z) which is generally non-Gaussian. Concretely, the architecture...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.