Recognition: unknown
Linear equivalence of nonlinear recurrent neural networks
Pith reviewed 2026-05-08 05:00 UTC · model grok-4.3
The pith
At large scale, nonlinear recurrent neural networks have the same activity covariance matrix as equivalent linear networks driven by independent noise.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
At large N, the covariance matrix for a typical quenched realization takes the same form as that of a linear network with the same couplings, driven by independent noise, with DMFT order parameters setting the transfer function and the noise spectrum. This is shown using the two-site cavity method, which demonstrates suppressed cross-covariances between nonlinear residuals and recovers an emergent external drive from non-Gaussian terms.
What carries the argument
Decomposition of unit activity into linear response to local field plus nonlinear residual, with the two-site cavity method proving that residuals act as independent noise at large N.
Load-bearing premise
Cross-covariances between nonlinear residuals at distinct sites are strongly suppressed at large N so that the residuals behave as independent noise.
What would settle it
Compute the average cross-covariance between nonlinear residuals for two different neurons in a large simulated network and verify that it approaches zero.
Figures
read the original abstract
Large nonlinear recurrent neural networks with random couplings generate high-dimensional, potentially chaotic activity whose structure is of interest in neuroscience and other fields. A fundamental object encoding the collective structure of this activity is the $N \times N$ covariance matrix. Prior analytical work on the covariance matrix has been limited to low-dimensional summary statistics. Recent work proposed an ansatz in which, at large $N$, the covariance matrix for a typical quenched realization takes the same form as that of a linear network with the same couplings, driven by independent noise, with DMFT order parameters setting the transfer function and the noise spectrum. Here, we derive this ansatz using the two-site cavity method, providing two derivations with complementary perspectives. The first decomposes each unit's activity into a linear response to its local field and a nonlinear residual, and shows that cross-covariances between residuals at distinct sites are strongly suppressed, so the residuals act as independent noise driving a linear network. The second derives a self-consistent matrix equation for the covariance matrix. A naive Gaussian closure for the joint statistics of local fields at distinct sites misses cross terms that, in a linear network, would be generated by an external drive. The cavity method recovers these terms from non-Gaussian contributions, revealing an emergent external drive. Higher-order cross-site moments follow a Wick-like decomposition into products of pairwise covariances at leading order, reducing them to the linear-equivalent form. We verify the predictions in simulations. These results extend linear equivalence from feedforward high-dimensional nonlinear systems, where the activations are independent of the weights, to recurrent networks, where the activations are correlated with the couplings that generate them.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that at large N, the N×N covariance matrix of activity in a typical quenched realization of a nonlinear RNN with random couplings is identical to the covariance of the corresponding linear network with the same couplings, driven by independent noise whose spectrum and effective transfer function are set by DMFT order parameters. This linear equivalence is derived via the two-site cavity method in two complementary ways: (i) decomposing each unit's activity into linear response plus nonlinear residual and showing that cross-residual covariances are O(1/N)-suppressed so the residuals act as independent noise, and (ii) obtaining a self-consistent matrix equation for the covariance in which non-Gaussian contributions supply the emergent drive terms missed by a naive Gaussian closure. Higher-order cross-site moments reduce to Wick products of pairwise covariances at leading order. The predictions are checked in simulations.
Significance. If the equivalence holds, the result supplies an analytical route to the full covariance structure of high-dimensional recurrent nonlinear dynamics, extending prior linear-equivalence results from feedforward networks (where activations are independent of weights) to the recurrent setting where activations correlate with the couplings. This is relevant for neuroscience models of collective activity and provides falsifiable predictions for covariance matrices. Credit is due for the two complementary cavity derivations and the simulation verification.
major comments (2)
- [First derivation (decomposition into linear response plus nonlinear residual)] In the first (decomposition) derivation: the central claim that residuals at distinct sites act as independent noise rests on cross-residual covariances being strongly suppressed at large N. An explicit scaling argument or bound on these cross terms (e.g., showing they are O(1/N) with a remainder that does not affect the leading covariance) is needed to make the error controlled and the equivalence rigorous.
- [Second derivation (self-consistent matrix equation)] In the second (self-consistent matrix) derivation: the recovery of the linear-network form from non-Gaussian contributions is load-bearing. It should be shown explicitly how the emergent drive terms generated by the cavity method are parameterized exactly by the DMFT order parameters, without residual corrections that would distinguish the nonlinear covariance from the linear one.
minor comments (2)
- [Simulations] The simulation section should report quantitative metrics (e.g., average Frobenius distance or element-wise correlation between the nonlinear and linear-equivalent covariance matrices) together with the values of N, number of realizations, and integration time used, to allow direct assessment of the agreement.
- Notation for the DMFT order parameters (e.g., how they enter the effective transfer function and noise spectrum) should be introduced once with a clear mapping to the linear-network parameters.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and the constructive comments that help strengthen the rigor of our derivations. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: In the first (decomposition) derivation: the central claim that residuals at distinct sites act as independent noise rests on cross-residual covariances being strongly suppressed at large N. An explicit scaling argument or bound on these cross terms (e.g., showing they are O(1/N) with a remainder that does not affect the leading covariance) is needed to make the error controlled and the equivalence rigorous.
Authors: We agree that an explicit bound would make the argument more rigorous. In the revised manuscript we will add a dedicated paragraph deriving the scaling of the cross-residual covariance via the two-site cavity expansion. We show that Cov(r_i, r_j) for i ≠ j is O(1/N) by expanding the joint moment-generating function to leading order in 1/N, using the weak correlation of local fields at distinct sites; the remainder is O(1/N^{3/2}) and vanishes upon summation over the covariance matrix, leaving the leading-order equivalence unaffected. revision: yes
-
Referee: In the second (self-consistent matrix) derivation: the recovery of the linear-network form from non-Gaussian contributions is load-bearing. It should be shown explicitly how the emergent drive terms generated by the cavity method are parameterized exactly by the DMFT order parameters, without residual corrections that would distinguish the nonlinear covariance from the linear one.
Authors: We thank the referee for this observation. In the revision we will insert an explicit matching step after Eq. (12). We demonstrate that the non-Gaussian cavity corrections to the self-consistent covariance equation are identical to the effective external drive whose variance and spectrum are fixed by the DMFT order parameters (q, χ, and the effective gain). Because the cavity equations close exactly onto the DMFT self-consistency relations at leading order, no distinguishing residual terms remain in the thermodynamic limit. revision: yes
Circularity Check
No significant circularity; derivation self-contained
full rationale
The paper derives the claimed linear equivalence of the N x N covariance matrix for nonlinear RNNs at large N using the two-site cavity method. It decomposes each unit into linear response plus nonlinear residual, demonstrates O(1/N) suppression of cross-residual covariances (allowing residuals to act as independent noise), and recovers the emergent drive terms in the self-consistent matrix equation from non-Gaussian contributions that a Gaussian closure misses. Higher-order moments reduce to Wick products of pairwise covariances at leading order. These steps are explicit large-N calculations rather than definitions, fits, or self-citations. DMFT order parameters enter as standard self-consistent inputs to set the effective transfer function and noise spectrum, not as post-hoc adjustments to the covariance itself. The central result is therefore independent of its inputs and not forced by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- DMFT order parameters
axioms (2)
- domain assumption Large N limit with random couplings of finite variance
- domain assumption Two-site cavity approximation
Reference graph
Works this paper leans on
-
[1]
Dynamicsofongoing activity: explanationofthelargevariabilityinevokedcorticalresponses.Science,273(5283): 1868–1871, 1996
AmosArieli,AlexanderSterkin,AmiramGrinvald,andADAertsen. Dynamicsofongoing activity: explanationofthelargevariabilityinevokedcorticalresponses.Science,273(5283): 1868–1871, 1996
1996
-
[2]
Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication.science, 304(5667):78–80, 2004
Herbert Jaeger and Harald Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication.science, 304(5667):78–80, 2004
2004
-
[3]
Structure, disorder, and dynamics in task-trained recurrent neural circuits.bioRxiv, pages 2026–03, 2026
David G Clark, Blake Bordelon, Jacob A Zavatone-Veth, and Cengiz Pehlevan. Structure, disorder, and dynamics in task-trained recurrent neural circuits.bioRxiv, pages 2026–03, 2026
2026
-
[4]
Large-scale high-density brain-wide neural recording in nonhuman primates.Nature Neuroscience, 28 (7):1562–1575, 2025
Eric M Trautmann, Janis K Hesse, Gabriel M Stine, Ruobing Xia, Shude Zhu, Daniel J O’Shea, Bill Karsh, Jennifer Colonell, Frank F Lanfranchi, Saurabh Vyas, et al. Large-scale high-density brain-wide neural recording in nonhuman primates.Nature Neuroscience, 28 (7):1562–1575, 2025
2025
-
[5]
Simultaneous, cortex-wide dynamics of up to 1 millionneuronsrevealunboundedscalingofdimensionalitywithneuronnumber.Neuron, 112(10):1694–1709, 2024
Jason Manley, Sihao Lu, Kevin Barber, Jeffrey Demas, Hyewon Kim, David Meyer, Fran- cisca Martínez Traub, and Alipasha Vaziri. Simultaneous, cortex-wide dynamics of up to 1 millionneuronsrevealunboundedscalingofdimensionalitywithneuronnumber.Neuron, 112(10):1694–1709, 2024
2024
-
[6]
Neural population geometry: An approach for understanding biological and artificial neural networks.Current opinion in neurobiology, 70: 137–144, 2021
SueYeon Chung and Larry F Abbott. Neural population geometry: An approach for understanding biological and artificial neural networks.Current opinion in neurobiology, 70: 137–144, 2021
2021
-
[7]
Dimensionality reduction for large-scale neural recordings.Nature neuroscience, 17(11):1500–1509, 2014
John P Cunningham and Byron M Yu. Dimensionality reduction for large-scale neural recordings.Nature neuroscience, 17(11):1500–1509, 2014
2014
-
[8]
On simplicity and complexity in the brave new world of large-scale neuroscience.Current opinion in neurobiology, 32:148–155, 2015
Peiran Gao and Surya Ganguli. On simplicity and complexity in the brave new world of large-scale neuroscience.Current opinion in neurobiology, 32:148–155, 2015
2015
-
[9]
High-dimensional geometry of population responses in visual cortex.Nature, 571 (7765):361–365, 2019
Carsen Stringer, Marius Pachitariu, Nicholas Steinmetz, Matteo Carandini, and Kenneth D Harris. High-dimensional geometry of population responses in visual cortex.Nature, 571 (7765):361–365, 2019
2019
-
[10]
Revisiting the high-dimensional geometry of population responses in the visual cortex.Proceedings of the National Academy of Sciences, 122(45):e2506535122, 2025
Dean A Pospisil and Jonathan W Pillow. Revisiting the high-dimensional geometry of population responses in the visual cortex.Proceedings of the National Academy of Sciences, 122(45):e2506535122, 2025
2025
-
[11]
YuHuandHaimSompolinsky.Thespectrumofcovariancematricesofrandomlyconnected recurrent neuronal networks with linear dynamics.PLoS computational biology, 18(7): e1010327, 2022
2022
-
[12]
Chaos in random neural networks.Physical review letters, 61(3):259, 1988
Haim Sompolinsky, Andrea Crisanti, and Hans-Jurgen Sommers. Chaos in random neural networks.Physical review letters, 61(3):259, 1988
1988
-
[13]
Dimension of activity in random neural networks.Physical Review Letters, 131(11):118401, 2023
David G Clark, LF Abbott, and Ashok Litwin-Kumar. Dimension of activity in random neural networks.Physical Review Letters, 131(11):118401, 2023. 36
2023
-
[14]
Connectivity structure and dynamics of nonlinear recurrent neural networks.Physical Review X, 15(4):041019, 2025
David G Clark, Owen Marschall, Alexander Van Meegen, and Ashok Litwin-Kumar. Connectivity structure and dynamics of nonlinear recurrent neural networks.Physical Review X, 15(4):041019, 2025
2025
-
[15]
Xuanyu Shen and Yu Hu. Covariance spectrum in nonlinear recurrent neural networks. arXiv preprint arXiv:2508.05288, 2025
-
[16]
Dynamics of random neural networks with bistable units.Physical Review E, 90(6):062710, 2014
Merav Stern, Haim Sompolinsky, and Laurence F Abbott. Dynamics of random neural networks with bistable units.Physical Review E, 90(6):062710, 2014
2014
-
[17]
DavidGClarkandLarryFAbbott.Theoryofcoupledneuronal-synapticdynamics.Physical Review X, 14(2):021001, 2024
2024
-
[18]
Felix Roy, Giulio Biroli, Guy Bunin, and Chiara Cammarota. Numerical implementation of dynamical mean field theory for disordered systems: Application to the lotka–volterra model of ecosystems.Journal of Physics A: Mathematical and Theoretical, 52(48):484001, 2019
2019
-
[19]
Dynamic theory of the spin-glass phase
Haim Sompolinsky and Annette Zippelius. Dynamic theory of the spin-glass phase. Physical Review Letters, 47(5):359, 1981
1981
-
[20]
Analytical solution of the off-equilibrium dynamics of a long-range spin-glass model.Physical Review Letters, 71(1):173, 1993
Leticia F Cugliandolo and Jorge Kurchan. Analytical solution of the off-equilibrium dynamics of a long-range spin-glass model.Physical Review Letters, 71(1):173, 1993
1993
-
[21]
Dy- namicalmean-fieldtheoryforstochasticgradientdescentingaussianmixtureclassification
Francesca Mignacco, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborová. Dy- namicalmean-fieldtheoryforstochasticgradientdescentingaussianmixtureclassification. Advances in Neural Information Processing Systems, 33:9540–9550, 2020
2020
-
[22]
satisficing
Jerome Garnier-Brun, Michael Benzaquen, and Jean-Philippe Bouchaud. Unlearnable games and “satisficing” decisions: a simple model for a complex world.Physical Review X, 14(2):021039, 2024
2024
-
[23]
Optimal sequence memory in driven random networks.Physical Review X, 8(4):041029, 2018
Jannis Schuecker, Sven Goedeke, and Moritz Helias. Optimal sequence memory in driven random networks.Physical Review X, 8(4):041029, 2018
2018
-
[24]
Pathintegralapproachtorandomneuralnetworks.Physical Review E, 98(6):062120, 2018
ACrisantiandHSompolinsky. Pathintegralapproachtorandomneuralnetworks.Physical Review E, 98(6):062120, 2018
2018
-
[25]
Correlations between synapses in pairs of neurons slow down dynamics in randomly connected neural networks.Physical Review E, 97(6):062314, 2018
Daniel Martí, Nicolas Brunel, and Srdjan Ostojic. Correlations between synapses in pairs of neurons slow down dynamics in randomly connected neural networks.Physical Review E, 97(6):062314, 2018
2018
-
[26]
BioRxiv, page 214262, 2017
Peiran Gao, Eric Trautmann, Byron Yu, Gopal Santhanam, Stephen Ryu, Krishna Shenoy, andSuryaGanguli.Atheoryofmultineuronaldimensionality,dynamicsandmeasurement. BioRxiv, page 214262, 2017
2017
-
[27]
Optimal degrees of synaptic connectivity.Neuron, 93(5):1153–1164, 2017
Ashok Litwin-Kumar, Kameron Decker Harris, Richard Axel, Haim Sompolinsky, and LF Abbott. Optimal degrees of synaptic connectivity.Neuron, 93(5):1153–1164, 2017
2017
-
[28]
Blake Bordelon and Cengiz Pehlevan. Disordered dynamics in high dimensions: Connec- tions to random matrices and machine learning.arXiv preprint arXiv:2601.01010, 2026
-
[29]
Solution of’solvable model of a spin glass’.Philosophical Magazine, 35(3):593–601, 1977
David J Thouless, Philip W Anderson, and Robert G Palmer. Solution of’solvable model of a spin glass’.Philosophical Magazine, 35(3):593–601, 1977. 37
1977
-
[30]
Convergence condition of the tap equation for the infinite-ranged ising spin glass model.Journal of Physics A: Mathematical and general, 15(6):1971–1978, 1982
Timm Plefka. Convergence condition of the tap equation for the infinite-ranged ising spin glass model.Journal of Physics A: Mathematical and general, 15(6):1971–1978, 1982
1971
-
[31]
Asymptotic distribution of singular values of powers of random matrices.Lithuanian mathematical journal, 50(2): 121–132, 2010
Nikita Alexeev, Friedrich Götze, and Alexander Tikhomirov. Asymptotic distribution of singular values of powers of random matrices.Lithuanian mathematical journal, 50(2): 121–132, 2010
2010
-
[32]
Replica method for eigenvalues of real wishart product matrices.SciPost Physics Core, 6(2):026, 2023
Jacob A Zavatone-Veth and Cengiz Pehlevan. Replica method for eigenvalues of real wishart product matrices.SciPost Physics Core, 6(2):026, 2023
2023
-
[33]
Albert J. Wakhloo. Moments and response functions of large nonlinear recurrent neural networks at fixed connectivity.arXiv preprint, 2026
2026
-
[34]
The spectrum of kernel random matrices.The Annals of Statistics, 38(1):1–50, February 2010
Noureddine El Karoui. The spectrum of kernel random matrices.The Annals of Statistics, 38(1):1–50, February 2010
2010
-
[35]
Nonlinear random matrix theory for deep learning
Jeffrey Pennington and Pratik Worah. Nonlinear random matrix theory for deep learning. Advances in neural information processing systems, 30, 2017
2017
-
[36]
A random matrix approach to neural networks.The Annals of Applied Probability, 28(2):1190–1248, 2018
Cosme Louart, Zhenyu Liao, and Romain Couillet. A random matrix approach to neural networks.The Annals of Applied Probability, 28(2):1190–1248, 2018
2018
-
[37]
The gaussian equivalence of generative models for learning with shallow neural networks
SebastianGoldt,BrunoLoureiro,GalenReeves,FlorentKrzakala,MarcMézard,andLenka Zdeborová. The gaussian equivalence of generative models for learning with shallow neural networks. InMathematical and Scientific Machine Learning, pages 426–471. PMLR, 2022
2022
-
[38]
Universality laws for high-dimensional learning with random features.IEEE Transactions on Information Theory, 69(3):1932–1964, 2022
Hong Hu and Yue M Lu. Universality laws for high-dimensional learning with random features.IEEE Transactions on Information Theory, 69(3):1932–1964, 2022
1932
-
[39]
Shotaro Takasu and Toshio Aoyagi. Neuronal correlations shape the scaling behavior of memory capacity and nonlinear computational capability of reservoir recurrent neural networks.Physical Review Research, 7(4), 2025. 38
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.