arxiv: 2604.23489 · v2 · submitted 2026-04-26 · ❄️ cond-mat.dis-nn · q-bio.NC

Recognition: unknown

Linear equivalence of nonlinear recurrent neural networks

David G. Clark

Authors on Pith no claims yet

Pith reviewed 2026-05-08 05:00 UTC · model grok-4.3

classification ❄️ cond-mat.dis-nn q-bio.NC

keywords recurrent neural networkscovariance matrixlinear equivalencecavity methoddynamical mean field theoryrandom couplingschaotic dynamics

0 comments

The pith

At large scale, nonlinear recurrent neural networks have the same activity covariance matrix as equivalent linear networks driven by independent noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives that in large nonlinear RNNs with random couplings, the N by N covariance matrix of network activity matches the form obtained from a linear network using the same couplings but with added independent noise. Mean-field theory order parameters determine the effective nonlinearity and noise spectrum. A sympathetic reader would care because this provides an analytical handle on the collective behavior of high-dimensional chaotic dynamics relevant to brain-inspired computing and neuroscience. The derivation relies on the cavity method showing that nonlinear parts of the activity act like uncorrelated drives. This extends prior results from feedforward to recurrent architectures where activities correlate with the weights.

Core claim

At large N, the covariance matrix for a typical quenched realization takes the same form as that of a linear network with the same couplings, driven by independent noise, with DMFT order parameters setting the transfer function and the noise spectrum. This is shown using the two-site cavity method, which demonstrates suppressed cross-covariances between nonlinear residuals and recovers an emergent external drive from non-Gaussian terms.

What carries the argument

Decomposition of unit activity into linear response to local field plus nonlinear residual, with the two-site cavity method proving that residuals act as independent noise at large N.

Load-bearing premise

Cross-covariances between nonlinear residuals at distinct sites are strongly suppressed at large N so that the residuals behave as independent noise.

What would settle it

Compute the average cross-covariance between nonlinear residuals for two different neurons in a large simulated network and verify that it approaches zero.

Figures

Figures reproduced from arXiv: 2604.23489 by David G. Clark.

**Figure 1.** Figure 1: Time-lagged cross-covariances √ 𝑁 𝐶𝜙 𝑖𝑗(𝜏) for five randomly chosen off-diagonal pairs (𝑖, 𝑗) at each network size 𝑁, comparing direct simulation (solid) with the ansatz 𝐶¯ 𝜙 𝑖𝑗(𝜏) (dashed), at 𝑔 = 2.5 and sampling ratio 𝛼 = 800. As 𝑁 increases, the agreement between simulation and theory improves, consistent with the 𝒪 view at source ↗

**Figure 2.** Figure 2: Scaling with network size 𝑁 at 𝑔 = 2.5, evaluated at 𝜏 = 0. Lines show medians over 10 independent realizations of 𝑱; shaded regions indicate interquartile ranges. Colors correspond to sampling ratios 𝛼. Reference power laws 1 𝑁1/2 , 1 𝑁 , and 1 𝑁3/2 are shown in gray. (a) Off-diagonal RMS of the prediction error 𝐶¯ 𝜙 𝑖𝑗(0)−𝐶 𝜙 𝑖𝑗(0), scaling as 1 𝑁 . (b) Relative off-diagonal RMS (normalized by the off-di… view at source ↗

**Figure 3.** Figure 3: Autocovariance 𝐶 𝜙 𝑖𝑖(𝜏) compared with the DMFT prediction 𝐶 𝜙 ★ (𝜏) (solid black) for a subset of network sizes at 𝑔 = 2.5 and 𝛼 = 800. Dashed lines show the empirical mean across the 𝐵 = min(𝑁, 1000) diagonal elements; shaded regions indicate ±1 standard deviation. The spread across neurons shrinks with 𝑁, consistent with the 𝒪 1 √ 𝑁 diagonal concentration of Eq. (15). 32 view at source ↗

read the original abstract

Large nonlinear recurrent neural networks with random couplings generate high-dimensional, potentially chaotic activity whose structure is of interest in neuroscience and other fields. A fundamental object encoding the collective structure of this activity is the $N \times N$ covariance matrix. Prior analytical work on the covariance matrix has been limited to low-dimensional summary statistics. Recent work proposed an ansatz in which, at large $N$, the covariance matrix for a typical quenched realization takes the same form as that of a linear network with the same couplings, driven by independent noise, with DMFT order parameters setting the transfer function and the noise spectrum. Here, we derive this ansatz using the two-site cavity method, providing two derivations with complementary perspectives. The first decomposes each unit's activity into a linear response to its local field and a nonlinear residual, and shows that cross-covariances between residuals at distinct sites are strongly suppressed, so the residuals act as independent noise driving a linear network. The second derives a self-consistent matrix equation for the covariance matrix. A naive Gaussian closure for the joint statistics of local fields at distinct sites misses cross terms that, in a linear network, would be generated by an external drive. The cavity method recovers these terms from non-Gaussian contributions, revealing an emergent external drive. Higher-order cross-site moments follow a Wick-like decomposition into products of pairwise covariances at leading order, reducing them to the linear-equivalent form. We verify the predictions in simulations. These results extend linear equivalence from feedforward high-dimensional nonlinear systems, where the activations are independent of the weights, to recurrent networks, where the activations are correlated with the couplings that generate them.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that at large N, the N×N covariance matrix of activity in a typical quenched realization of a nonlinear RNN with random couplings is identical to the covariance of the corresponding linear network with the same couplings, driven by independent noise whose spectrum and effective transfer function are set by DMFT order parameters. This linear equivalence is derived via the two-site cavity method in two complementary ways: (i) decomposing each unit's activity into linear response plus nonlinear residual and showing that cross-residual covariances are O(1/N)-suppressed so the residuals act as independent noise, and (ii) obtaining a self-consistent matrix equation for the covariance in which non-Gaussian contributions supply the emergent drive terms missed by a naive Gaussian closure. Higher-order cross-site moments reduce to Wick products of pairwise covariances at leading order. The predictions are checked in simulations.

Significance. If the equivalence holds, the result supplies an analytical route to the full covariance structure of high-dimensional recurrent nonlinear dynamics, extending prior linear-equivalence results from feedforward networks (where activations are independent of weights) to the recurrent setting where activations correlate with the couplings. This is relevant for neuroscience models of collective activity and provides falsifiable predictions for covariance matrices. Credit is due for the two complementary cavity derivations and the simulation verification.

major comments (2)

[First derivation (decomposition into linear response plus nonlinear residual)] In the first (decomposition) derivation: the central claim that residuals at distinct sites act as independent noise rests on cross-residual covariances being strongly suppressed at large N. An explicit scaling argument or bound on these cross terms (e.g., showing they are O(1/N) with a remainder that does not affect the leading covariance) is needed to make the error controlled and the equivalence rigorous.
[Second derivation (self-consistent matrix equation)] In the second (self-consistent matrix) derivation: the recovery of the linear-network form from non-Gaussian contributions is load-bearing. It should be shown explicitly how the emergent drive terms generated by the cavity method are parameterized exactly by the DMFT order parameters, without residual corrections that would distinguish the nonlinear covariance from the linear one.

minor comments (2)

[Simulations] The simulation section should report quantitative metrics (e.g., average Frobenius distance or element-wise correlation between the nonlinear and linear-equivalent covariance matrices) together with the values of N, number of realizations, and integration time used, to allow direct assessment of the agreement.
Notation for the DMFT order parameters (e.g., how they enter the effective transfer function and noise spectrum) should be introduced once with a clear mapping to the linear-network parameters.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the constructive comments that help strengthen the rigor of our derivations. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: In the first (decomposition) derivation: the central claim that residuals at distinct sites act as independent noise rests on cross-residual covariances being strongly suppressed at large N. An explicit scaling argument or bound on these cross terms (e.g., showing they are O(1/N) with a remainder that does not affect the leading covariance) is needed to make the error controlled and the equivalence rigorous.

Authors: We agree that an explicit bound would make the argument more rigorous. In the revised manuscript we will add a dedicated paragraph deriving the scaling of the cross-residual covariance via the two-site cavity expansion. We show that Cov(r_i, r_j) for i ≠ j is O(1/N) by expanding the joint moment-generating function to leading order in 1/N, using the weak correlation of local fields at distinct sites; the remainder is O(1/N^{3/2}) and vanishes upon summation over the covariance matrix, leaving the leading-order equivalence unaffected. revision: yes
Referee: In the second (self-consistent matrix) derivation: the recovery of the linear-network form from non-Gaussian contributions is load-bearing. It should be shown explicitly how the emergent drive terms generated by the cavity method are parameterized exactly by the DMFT order parameters, without residual corrections that would distinguish the nonlinear covariance from the linear one.

Authors: We thank the referee for this observation. In the revision we will insert an explicit matching step after Eq. (12). We demonstrate that the non-Gaussian cavity corrections to the self-consistent covariance equation are identical to the effective external drive whose variance and spectrum are fixed by the DMFT order parameters (q, χ, and the effective gain). Because the cavity equations close exactly onto the DMFT self-consistency relations at leading order, no distinguishing residual terms remain in the thermodynamic limit. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper derives the claimed linear equivalence of the N x N covariance matrix for nonlinear RNNs at large N using the two-site cavity method. It decomposes each unit into linear response plus nonlinear residual, demonstrates O(1/N) suppression of cross-residual covariances (allowing residuals to act as independent noise), and recovers the emergent drive terms in the self-consistent matrix equation from non-Gaussian contributions that a Gaussian closure misses. Higher-order moments reduce to Wick products of pairwise covariances at leading order. These steps are explicit large-N calculations rather than definitions, fits, or self-citations. DMFT order parameters enter as standard self-consistent inputs to set the effective transfer function and noise spectrum, not as post-hoc adjustments to the covariance itself. The central result is therefore independent of its inputs and not forced by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The claim rests on the large-N limit suppressing cross-residual covariances and on standard assumptions of random quenched couplings with finite moments; DMFT supplies self-consistent effective parameters but is not invented here.

free parameters (1)

DMFT order parameters
Self-consistent quantities that set the effective transfer function and noise spectrum; determined by solving DMFT equations rather than fitted directly to covariance data.

axioms (2)

domain assumption Large N limit with random couplings of finite variance
Invoked to suppress cross-covariances between residuals at distinct sites and to justify the cavity decomposition.
domain assumption Two-site cavity approximation
Used to derive the self-consistent matrix equation and the Wick-like decomposition of higher moments.

pith-pipeline@v0.9.0 · 5591 in / 1433 out tokens · 56781 ms · 2026-05-08T05:00:51.124349+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 2 canonical work pages

[1]

Dynamicsofongoing activity: explanationofthelargevariabilityinevokedcorticalresponses.Science,273(5283): 1868–1871, 1996

AmosArieli,AlexanderSterkin,AmiramGrinvald,andADAertsen. Dynamicsofongoing activity: explanationofthelargevariabilityinevokedcorticalresponses.Science,273(5283): 1868–1871, 1996

1996
[2]

Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication.science, 304(5667):78–80, 2004

Herbert Jaeger and Harald Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication.science, 304(5667):78–80, 2004

2004
[3]

Structure, disorder, and dynamics in task-trained recurrent neural circuits.bioRxiv, pages 2026–03, 2026

David G Clark, Blake Bordelon, Jacob A Zavatone-Veth, and Cengiz Pehlevan. Structure, disorder, and dynamics in task-trained recurrent neural circuits.bioRxiv, pages 2026–03, 2026

2026
[4]

Large-scale high-density brain-wide neural recording in nonhuman primates.Nature Neuroscience, 28 (7):1562–1575, 2025

Eric M Trautmann, Janis K Hesse, Gabriel M Stine, Ruobing Xia, Shude Zhu, Daniel J O’Shea, Bill Karsh, Jennifer Colonell, Frank F Lanfranchi, Saurabh Vyas, et al. Large-scale high-density brain-wide neural recording in nonhuman primates.Nature Neuroscience, 28 (7):1562–1575, 2025

2025
[5]

Simultaneous, cortex-wide dynamics of up to 1 millionneuronsrevealunboundedscalingofdimensionalitywithneuronnumber.Neuron, 112(10):1694–1709, 2024

Jason Manley, Sihao Lu, Kevin Barber, Jeffrey Demas, Hyewon Kim, David Meyer, Fran- cisca Martínez Traub, and Alipasha Vaziri. Simultaneous, cortex-wide dynamics of up to 1 millionneuronsrevealunboundedscalingofdimensionalitywithneuronnumber.Neuron, 112(10):1694–1709, 2024

2024
[6]

Neural population geometry: An approach for understanding biological and artificial neural networks.Current opinion in neurobiology, 70: 137–144, 2021

SueYeon Chung and Larry F Abbott. Neural population geometry: An approach for understanding biological and artificial neural networks.Current opinion in neurobiology, 70: 137–144, 2021

2021
[7]

Dimensionality reduction for large-scale neural recordings.Nature neuroscience, 17(11):1500–1509, 2014

John P Cunningham and Byron M Yu. Dimensionality reduction for large-scale neural recordings.Nature neuroscience, 17(11):1500–1509, 2014

2014
[8]

On simplicity and complexity in the brave new world of large-scale neuroscience.Current opinion in neurobiology, 32:148–155, 2015

Peiran Gao and Surya Ganguli. On simplicity and complexity in the brave new world of large-scale neuroscience.Current opinion in neurobiology, 32:148–155, 2015

2015
[9]

High-dimensional geometry of population responses in visual cortex.Nature, 571 (7765):361–365, 2019

Carsen Stringer, Marius Pachitariu, Nicholas Steinmetz, Matteo Carandini, and Kenneth D Harris. High-dimensional geometry of population responses in visual cortex.Nature, 571 (7765):361–365, 2019

2019
[10]

Revisiting the high-dimensional geometry of population responses in the visual cortex.Proceedings of the National Academy of Sciences, 122(45):e2506535122, 2025

Dean A Pospisil and Jonathan W Pillow. Revisiting the high-dimensional geometry of population responses in the visual cortex.Proceedings of the National Academy of Sciences, 122(45):e2506535122, 2025

2025
[11]

YuHuandHaimSompolinsky.Thespectrumofcovariancematricesofrandomlyconnected recurrent neuronal networks with linear dynamics.PLoS computational biology, 18(7): e1010327, 2022

2022
[12]

Chaos in random neural networks.Physical review letters, 61(3):259, 1988

Haim Sompolinsky, Andrea Crisanti, and Hans-Jurgen Sommers. Chaos in random neural networks.Physical review letters, 61(3):259, 1988

1988
[13]

Dimension of activity in random neural networks.Physical Review Letters, 131(11):118401, 2023

David G Clark, LF Abbott, and Ashok Litwin-Kumar. Dimension of activity in random neural networks.Physical Review Letters, 131(11):118401, 2023. 36

2023
[14]

Connectivity structure and dynamics of nonlinear recurrent neural networks.Physical Review X, 15(4):041019, 2025

David G Clark, Owen Marschall, Alexander Van Meegen, and Ashok Litwin-Kumar. Connectivity structure and dynamics of nonlinear recurrent neural networks.Physical Review X, 15(4):041019, 2025

2025
[15]

Shen and Y

Xuanyu Shen and Yu Hu. Covariance spectrum in nonlinear recurrent neural networks. arXiv preprint arXiv:2508.05288, 2025

work page arXiv 2025
[16]

Dynamics of random neural networks with bistable units.Physical Review E, 90(6):062710, 2014

Merav Stern, Haim Sompolinsky, and Laurence F Abbott. Dynamics of random neural networks with bistable units.Physical Review E, 90(6):062710, 2014

2014
[17]

DavidGClarkandLarryFAbbott.Theoryofcoupledneuronal-synapticdynamics.Physical Review X, 14(2):021001, 2024

2024
[18]

Felix Roy, Giulio Biroli, Guy Bunin, and Chiara Cammarota. Numerical implementation of dynamical mean field theory for disordered systems: Application to the lotka–volterra model of ecosystems.Journal of Physics A: Mathematical and Theoretical, 52(48):484001, 2019

2019
[19]

Dynamic theory of the spin-glass phase

Haim Sompolinsky and Annette Zippelius. Dynamic theory of the spin-glass phase. Physical Review Letters, 47(5):359, 1981

1981
[20]

Analytical solution of the off-equilibrium dynamics of a long-range spin-glass model.Physical Review Letters, 71(1):173, 1993

Leticia F Cugliandolo and Jorge Kurchan. Analytical solution of the off-equilibrium dynamics of a long-range spin-glass model.Physical Review Letters, 71(1):173, 1993

1993
[21]

Dy- namicalmean-fieldtheoryforstochasticgradientdescentingaussianmixtureclassification

Francesca Mignacco, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborová. Dy- namicalmean-fieldtheoryforstochasticgradientdescentingaussianmixtureclassification. Advances in Neural Information Processing Systems, 33:9540–9550, 2020

2020
[22]

satisficing

Jerome Garnier-Brun, Michael Benzaquen, and Jean-Philippe Bouchaud. Unlearnable games and “satisficing” decisions: a simple model for a complex world.Physical Review X, 14(2):021039, 2024

2024
[23]

Optimal sequence memory in driven random networks.Physical Review X, 8(4):041029, 2018

Jannis Schuecker, Sven Goedeke, and Moritz Helias. Optimal sequence memory in driven random networks.Physical Review X, 8(4):041029, 2018

2018
[24]

Pathintegralapproachtorandomneuralnetworks.Physical Review E, 98(6):062120, 2018

ACrisantiandHSompolinsky. Pathintegralapproachtorandomneuralnetworks.Physical Review E, 98(6):062120, 2018

2018
[25]

Correlations between synapses in pairs of neurons slow down dynamics in randomly connected neural networks.Physical Review E, 97(6):062314, 2018

Daniel Martí, Nicolas Brunel, and Srdjan Ostojic. Correlations between synapses in pairs of neurons slow down dynamics in randomly connected neural networks.Physical Review E, 97(6):062314, 2018

2018
[26]

BioRxiv, page 214262, 2017

Peiran Gao, Eric Trautmann, Byron Yu, Gopal Santhanam, Stephen Ryu, Krishna Shenoy, andSuryaGanguli.Atheoryofmultineuronaldimensionality,dynamicsandmeasurement. BioRxiv, page 214262, 2017

2017
[27]

Optimal degrees of synaptic connectivity.Neuron, 93(5):1153–1164, 2017

Ashok Litwin-Kumar, Kameron Decker Harris, Richard Axel, Haim Sompolinsky, and LF Abbott. Optimal degrees of synaptic connectivity.Neuron, 93(5):1153–1164, 2017

2017
[28]

Disordered dynamics in high dimensions: Connec- tions to random matrices and machine learning.arXiv preprint arXiv:2601.01010, 2026

Blake Bordelon and Cengiz Pehlevan. Disordered dynamics in high dimensions: Connec- tions to random matrices and machine learning.arXiv preprint arXiv:2601.01010, 2026

work page arXiv 2026
[29]

Solution of’solvable model of a spin glass’.Philosophical Magazine, 35(3):593–601, 1977

David J Thouless, Philip W Anderson, and Robert G Palmer. Solution of’solvable model of a spin glass’.Philosophical Magazine, 35(3):593–601, 1977. 37

1977
[30]

Convergence condition of the tap equation for the infinite-ranged ising spin glass model.Journal of Physics A: Mathematical and general, 15(6):1971–1978, 1982

Timm Plefka. Convergence condition of the tap equation for the infinite-ranged ising spin glass model.Journal of Physics A: Mathematical and general, 15(6):1971–1978, 1982

1971
[31]

Asymptotic distribution of singular values of powers of random matrices.Lithuanian mathematical journal, 50(2): 121–132, 2010

Nikita Alexeev, Friedrich Götze, and Alexander Tikhomirov. Asymptotic distribution of singular values of powers of random matrices.Lithuanian mathematical journal, 50(2): 121–132, 2010

2010
[32]

Replica method for eigenvalues of real wishart product matrices.SciPost Physics Core, 6(2):026, 2023

Jacob A Zavatone-Veth and Cengiz Pehlevan. Replica method for eigenvalues of real wishart product matrices.SciPost Physics Core, 6(2):026, 2023

2023
[33]

Albert J. Wakhloo. Moments and response functions of large nonlinear recurrent neural networks at fixed connectivity.arXiv preprint, 2026

2026
[34]

The spectrum of kernel random matrices.The Annals of Statistics, 38(1):1–50, February 2010

Noureddine El Karoui. The spectrum of kernel random matrices.The Annals of Statistics, 38(1):1–50, February 2010

2010
[35]

Nonlinear random matrix theory for deep learning

Jeffrey Pennington and Pratik Worah. Nonlinear random matrix theory for deep learning. Advances in neural information processing systems, 30, 2017

2017
[36]

A random matrix approach to neural networks.The Annals of Applied Probability, 28(2):1190–1248, 2018

Cosme Louart, Zhenyu Liao, and Romain Couillet. A random matrix approach to neural networks.The Annals of Applied Probability, 28(2):1190–1248, 2018

2018
[37]

The gaussian equivalence of generative models for learning with shallow neural networks

SebastianGoldt,BrunoLoureiro,GalenReeves,FlorentKrzakala,MarcMézard,andLenka Zdeborová. The gaussian equivalence of generative models for learning with shallow neural networks. InMathematical and Scientific Machine Learning, pages 426–471. PMLR, 2022

2022
[38]

Universality laws for high-dimensional learning with random features.IEEE Transactions on Information Theory, 69(3):1932–1964, 2022

Hong Hu and Yue M Lu. Universality laws for high-dimensional learning with random features.IEEE Transactions on Information Theory, 69(3):1932–1964, 2022

1932
[39]

Shotaro Takasu and Toshio Aoyagi. Neuronal correlations shape the scaling behavior of memory capacity and nonlinear computational capability of reservoir recurrent neural networks.Physical Review Research, 7(4), 2025. 38

2025