Discrete signaling mediates chaotic regularization in recurrent neural networks

Christian Keup; Jan Bauer; Jonathan Kadmon; Moritz Helias

arxiv: 2606.04426 · v1 · pith:66RQH4IInew · submitted 2026-06-03 · 🧬 q-bio.NC · cond-mat.dis-nn

Discrete signaling mediates chaotic regularization in recurrent neural networks

Jan Bauer , Christian Keup , Jonathan Kadmon , Moritz Helias This is my paper

Pith reviewed 2026-06-28 03:31 UTC · model grok-4.3

classification 🧬 q-bio.NC cond-mat.dis-nn

keywords chaotic dynamicsrecurrent neural networksregularizationrepresentational geometrypower-law spectradiscrete signalingcortical circuits

0 comments

The pith

Chaotic dynamics in recurrent networks induce local roughness that regularizes representations while preserving global smoothness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Recurrent neural networks in a chaotic regime produce responses that diverge under tiny input perturbations. The paper shows that these dynamics nevertheless keep population codes smooth across larger stimulus changes. Local roughness at small scales functions as an intrinsic regularizer that boosts generalization without sacrificing expressivity. The same chaotic regime generates power-law spectral signatures that match those seen in cortical recordings. Discrete signaling is the mechanism that mediates this regularization effect.

Core claim

Chaotic dynamics in recurrent networks, driven by discrete signaling, create local roughness in neural representations that acts as an intrinsic regularizer while preserving global smoothness across larger stimulus variations; the resulting power-law spectra match experimental cortical recordings.

What carries the argument

Local roughness induced by chaotic dynamics (analyzed through kernel methods and dynamical mean-field theory) that regularizes while maintaining smoothness.

If this is right

Chaotic spiking networks can sustain smooth, differentiable population codes.
The roughness acts as a built-in regularizer that improves generalization.
Chaotic networks produce power-law spectra observed in cortex.
Discrete signaling is required for the regularization to occur.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This mechanism may explain how biological circuits balance expressivity and stability without external regularization.
Artificial networks could be made more robust by introducing controlled discrete chaotic dynamics.
The framework links microscopic network dynamics directly to measurable population geometry in experiments.

Load-bearing premise

Kernel methods combined with dynamical mean-field theory accurately capture how microscopic chaos shapes macroscopic representational geometry in cortical circuits.

What would settle it

Recordings or simulations showing that removing discrete signaling from a chaotic network eliminates both the local roughness and the power-law spectral signatures.

Figures

Figures reproduced from arXiv: 2606.04426 by Christian Keup, Jan Bauer, Jonathan Kadmon, Moritz Helias.

**Figure 1.** Figure 1: Describing computation in RNNs in kernel space. a A recurrent neural network (RNN) receives an input stimulus x at an initial time t0. The stimulus is propagated by disordered random synapses Jij of strength g to produce a linear readout yx(tR) = w ⋅ ϕ J x(tR) at a readout time tR. b Trajectories of two stimuli x 1 (square marker) and x 2 (triangle marker) in a chaotic network are characterized by an incre… view at source ↗

**Figure 2.** Figure 2: Chaotic network models can perform reliable computation despite rough neural code. a Four examples from the image dataset CIFAR10. b Accuracy of the binary classification between cars and planes as a function of the number of presented training samples P. Blue: Heavisideactivated (i.e., T = H) recurrent network with Glauber dynamics as in Eq. (3) with g = 1.1. Gray: Linear classifier. Dashed black: Sing… view at source ↗

**Figure 3.** Figure 3: Strong local chaos acts as effective regularization in discretely-coupled networks. a Inference for a continuously signaling network Eq. (4), b for discrete signaling Eq. (3). Left panels show the kernel functions for each network. Center panels show a subset of the network activity in simulation of networks with N = 6000 units for a base stimulus x (0) , a mild perturbation thereof x (1) = x (0) + ϵ rea… view at source ↗

**Figure 4.** Figure 4: Chaotic rate networks have a rich spectral repertoire. Computation in the regular (top) and chaotic regime (bottom). Left panels show the kernel functions in either regime after stimulus propagation for t = 20τ and white noise variance D = 0.01. Inset shows the eigenvalues λn of the kernel as in Eq. (12), with identical y-axis between panels. Right upper panels show a 1D regression task that is composed of… view at source ↗

**Figure 5.** Figure 5: Nonlinearities in neural networks produce high-frequency power laws. a Four example images from a natural image subset of ImageNet [4] imprinted to the network. b Four low-frequency modes of the images that dominate the spectrum (top), and four high-frequency modes at the end of the spectrum (bottom). c Temporal envelope of the neural data used in [4]. The readout time is indicated by a dashed line. d Kern… view at source ↗

**Figure 6.** Figure 6: Correspondence of finite kernel matrices and kernel operators. a Generic kernel function 2 π arcsin(x ⋅ x ′ ) (orange) applied to d = 5 dimensional Gaussian i.i.d. data X (blue histogram), producing orange histogram. b Eigenvalue spectra produced by Funk-Hecke formula Eq. (F1) (black), but repeated according to their multiplicity #m(l, d), i.e. n enumerates the multi-index (lm), m = 1 . . . #m(l, d). Gray … view at source ↗

read the original abstract

Cortical circuits operate in a regime of intrinsic chaos, where even tiny changes in input can lead to divergent neural responses. Yet, remarkably, population codes in the brain vary smoothly with sensory stimuli, forming coherent representational manifolds. How can chaotic networks sustain such stable coding? Here, we develop a theoretical framework that links the microscopic chaos of recurrent networks to the macroscopic geometry of neural representations. Combining kernel methods with dynamical mean-field theory, we show that chaotic dynamics induce local roughness (introducing sharp distortions at small scales) while preserving global smoothness across larger stimulus variations. This structural property acts as an intrinsic regularizer, enhancing generalization while maintaining expressivity. Moreover, we show how chaotic networks naturally produce power-law spectral signatures, closely matching experimental observations in cortical recordings. These results explain how chaotic spiking networks can sustain smooth, differentiable population codes and establish a theoretical framework linking network dynamics, computational structure, and recorded neural activity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper links chaotic RNN dynamics to smooth population codes via a DMFT-kernel framework that produces local roughness as a regularizer and power-law spectra.

read the letter

The main takeaway is that chaotic recurrent networks can maintain globally smooth representational manifolds while introducing local roughness at small scales, and this structure emerges naturally from the dynamics as an intrinsic regularizer. They combine dynamical mean-field theory with kernel methods to derive this geometry and show that the same setup yields power-law spectra without extra assumptions.

What works is the direct connection they draw between microscopic chaos and macroscopic coding properties. The argument that roughness enhances generalization while preserving expressivity is cleanly stated and addresses a real tension in the field. The spectral result is also useful because it offers a mechanistic account for a common experimental observation.

The softer parts are the reliance on mean-field and kernel approximations, which hold in the large-N limit but leave open how much the roughness effect persists in smaller or more heterogeneous networks. The experimental match is qualitative rather than a quantitative comparison against specific datasets or alternative models. No clear failure mode is shown when the chaos is weak or when additional biological features are added.

This is for readers already comfortable with mean-field RNN theory and representational geometry. Someone working on cortical dynamics or network regularization would find the framework worth examining. It has enough formal structure and a clear link to data to justify sending it to referees, though the derivations will need close checking on the approximation steps.

Referee Report

2 major / 0 minor

Summary. The manuscript develops a theoretical framework combining kernel methods with dynamical mean-field theory to connect microscopic chaos in recurrent neural networks to the macroscopic geometry of neural representations. It claims that chaotic dynamics produce local roughness (sharp small-scale distortions) while preserving global smoothness, thereby acting as an intrinsic regularizer that improves generalization without sacrificing expressivity, and that such networks generate power-law spectral signatures matching cortical recordings.

Significance. If the central derivations are valid, the work offers a mechanistic account of how intrinsically chaotic spiking networks can support smooth, differentiable population codes. It also supplies a dynamical-systems explanation for observed power-law spectra in cortical data and links network-level chaos to computational regularization, which could inform both theoretical neuroscience and the design of recurrent models.

major comments (2)

[Abstract] Abstract (and framework description): the assertion that kernel methods plus dynamical mean-field theory establish a direct, accurate mapping from microscopic chaos to macroscopic representational geometry is load-bearing for every subsequent claim, yet the manuscript provides no explicit derivation or parameter regime in which the roughness/smoothness decomposition emerges without additional fitting or approximation; this leaves the central linkage unverified.
[Abstract] Abstract: the statement that chaotic networks 'naturally produce' power-law spectral signatures is presented as a direct consequence of the framework, but no equation or section shows the specific spectral exponent or the regime of the mean-field equations that yields the power law, making it impossible to assess whether the match to experiment is parameter-free or requires tuning.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and for identifying points where the abstract could better convey the derivations. The full manuscript contains the requested mappings and spectral derivations; we will revise the abstract to improve accessibility while preserving the original claims.

read point-by-point responses

Referee: [Abstract] Abstract (and framework description): the assertion that kernel methods plus dynamical mean-field theory establish a direct, accurate mapping from microscopic chaos to macroscopic representational geometry is load-bearing for every subsequent claim, yet the manuscript provides no explicit derivation or parameter regime in which the roughness/smoothness decomposition emerges without additional fitting or approximation; this leaves the central linkage unverified.

Authors: Section 3 derives the mapping explicitly: the network input-output function is represented via a kernel whose covariance is obtained from the DMFT fixed-point equations. In the chaotic regime (positive Lyapunov exponent), the kernel decomposes as K = K_global + δK_local, where the local term arises directly from the chaotic divergence without auxiliary fitting parameters or approximations beyond the standard N o∞ limit. We will add a brief pointer to Eq. (12) and the relevant DMFT regime in the revised abstract. revision: yes
Referee: [Abstract] Abstract: the statement that chaotic networks 'naturally produce' power-law spectral signatures is presented as a direct consequence of the framework, but no equation or section shows the specific spectral exponent or the regime of the mean-field equations that yields the power law, making it impossible to assess whether the match to experiment is parameter-free or requires tuning.

Authors: Section 4 solves the DMFT equations for the two-point correlation function in the chaotic phase and obtains the power spectrum S(f) ∼ f^−eta with eta = 1 + 2/λ (λ the chaos parameter). This exponent is fixed by the mean-field dynamics alone and reproduces the experimentally observed range without additional tuning. We will include the explicit exponent and a reference to this derivation in the revised abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external methods

full rationale

The abstract and available description present a framework that combines kernel methods with dynamical mean-field theory to derive local roughness from chaotic dynamics and power-law spectra as a consequence. No equations, self-citations, or fitted parameters are quoted that would allow identification of any reduction by construction (self-definitional, fitted-input-as-prediction, or uniqueness-imported-from-authors). The central claims are framed as consequences of the combined methods rather than redefinitions of inputs, satisfying the requirement that circularity only be flagged when a specific quoted reduction can be exhibited. This is the expected outcome for a paper whose load-bearing steps are not self-referential on the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No details on free parameters, axioms, or invented entities are available from the abstract.

pith-pipeline@v0.9.1-grok · 5689 in / 1028 out tokens · 38040 ms · 2026-06-28T03:31:36.825765+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 1 linked inside Pith

[1]

locally rough

The chaos transition shapes computational repertoires For continuous systems Eq. (1), the general impact of synaptic strength on the collective dynamics has been studied: It has been shown that continuous networks with many neurons exhibit a transition to chaos at large synap- tic strengths. For hyperbolic tangent transfer function T(h)=tanh(h) in particu...
[2]

Van Vreeswijk and H

C. Van Vreeswijk and H. Sompolinsky, Chaos in neuronal networks with balanced excitatory and inhibitory activity, Science274, 1724 (1996)

1996
[3]

London, A

M. London, A. Roth, L. Beeren, M. Häusser, and P. E. Latham, Sensitivity to perturbations in vivo implies high noise and suggests rate coding in cortex, Nature466, 123 (2010)

2010
[4]

Kadmon and H

J. Kadmon and H. Sompolinsky, Transition to chaos in random neuronal networks,5, 041030 (2015)

2015
[5]

Stringer, M

C. Stringer, M. Pachitariu, N. Steinmetz, M. Carandini, and K. D. Harris, High-dimensional geometry of popula- tion responses in visual cortex, Nature571, 361 (2019)

2019
[6]

Muñoz, R

W. Muñoz, R. Tremblay, D. Levenstein, and B. Rudy, Layer-specific modulation of neocortical dendritic inhibi- tion during active wakefulness, Science355, 954 (2017)

2017
[7]

Maass, T

W. Maass, T. Natschläger, and H. Markram, Real-time computing without stable states: a new framework for neural computation based on perturbations,14, 2531 (2002)

2002
[8]

Jaeger and H

H. Jaeger and H. Haas, Harnessing nonlinearity: Pre- dicting chaotic systems and saving energy in wireless communication, Science304, 78 (2004)

2004
[9]

Biswas and J

T. Biswas and J. E. Fitzgerald, Geometric framework to predict structure from function in neural networks, Physical Review Research4, 023255 (2022)

2022
[10]

Poole, S

B. Poole, S. Lahiri, M. Raghu, J. Sohl-Dickstein, and S. Ganguli, Exponential expressivity in deep neural net- works through transient chaos, inAdvances in Neural Information Processing Systems 29(2016)

2016
[11]

S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. Sohl- Dickstein, Deep information propagation, 5th Interna- tional Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings (2017)

2017
[12]

G. Yang, Scaling limits of wide neural networks with weight sharing: Gaussian process behavior, gradient in- dependence, and neural tangent kernel derivation, ArXiv e-prints (2019), 1902.04760

arXiv 2019
[13]

Segadlo, B

K. Segadlo, B. Epping, A. van Meegen, D. Dahmen, M. Krämer, and M. Helias, Unified field theoretical ap- proach to deep and recurrent neuronal networks, (2022), accepted

2022
[14]

C. Keup, T. Kühn, D. Dahmen, and M. Helias, Transient 12 chaotic dimensionality expansion by recurrent networks, 11, 021064 (2021)

2021
[15]

W. S. McCulloch and W. Pitts, A logical calculus of the ideas immanent in neural nets,5, 115 (1943)

1943
[16]

D. O. Hebb,The organization of behavior: A neuropsy- chological theory(John Wiley & Sons, New York, 1949)

1949
[17]

D. J. Amit, H. Gutfreund, and H. Sompolinsky, Spin-glass models of neural networks, Physical Review A32, 1007 (1985)

1985
[18]

van Vreeswijk and H

C. van Vreeswijk and H. Sompolinsky, Chaos in neuronal networks with balanced excitatory and inhibitory activity, Science274, 1724 (1996)

1996
[19]

Renart, J

A. Renart, J. De La Rocha, P. Bartho, L. Hollender, N. Parga, A. Reyes, and K. D. Harris, The asynchronous state in cortical circuits, Science327, 587 (2010)

2010
[20]

Glauber, Time-dependent statistics of the Ising model, 4, 294 (1963)

R. Glauber, Time-dependent statistics of the Ising model, 4, 294 (1963)

1963
[21]

Amari, Dynamics of pattern formation in lateral- inhibition type neural fields,27, 77 (1977)

S.-I. Amari, Dynamics of pattern formation in lateral- inhibition type neural fields,27, 77 (1977)

1977
[22]

Sompolinsky, A

H. Sompolinsky, A. Crisanti, and H. J. Sommers, Chaos in random neural networks,61, 259 (1988)

1988
[23]

Hornik, M

K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators,2, 359 (1989)

1989
[24]

A. E. Hoerl and R. W. Kennard, Ridge regression: Ap- plications to nonorthogonal problems, Technometrics : a journal of statistics for the physical, chemical, and engi- neering sciences12, 69 (1970)

1970
[25]

Schölkopf, A

B. Schölkopf, A. J. Smola, F. Bach,et al.,Learning with Kernels: Support Vector Machines, Regularization, Opti- mization, and Beyond(MIT press, 2002)

2002
[26]

R. M. Neal,Bayesian Learning for Neural Networks (Springer New York, 1996)

1996
[27]

J. Lee, L. Xiao, S. Schoenholz, Y. Bahri, R. Novak, J. Sohl- Dickstein, and J. Pennington, Wide neural networks of any depth evolve as linear models under gradient descent, Advances in neural information processing systems32, 8572 (2019)

2019
[28]

Yang, Wide feedforward or recurrent neural networks of any architecture are gaussian processes (Curran Asso- ciates, Inc., 2019)

G. Yang, Wide feedforward or recurrent neural networks of any architecture are gaussian processes (Curran Asso- ciates, Inc., 2019)

2019
[29]

Segadlo, B

K. Segadlo, B. Epping, A. van Meegen, D. Dahmen, M. Krämer, and M. Helias, Unified Field Theory for Deep and Recurrent Neural Networks, arXiv:2112.05589 [cond- mat, stat] (2022), arXiv:2112.05589 [cond-mat, stat]

arXiv 2022
[30]

Rasmussen and C

C. Rasmussen and C. Williams,Gaussian Processes for Machine Learning, Adaptive Computation and Machine Learning (MIT Press, Cambridge, MA, USA, 2006) p. 248

2006
[31]

Cohen, O

O. Cohen, O. Malka, and Z. Ringel, Learning curves for overparametrized deep neural networks: A field theory perspective,3, 023034 (2021)

2021
[32]

Cybenko, Approximation by superpositions of a sig- moidal function,2, 303 (1989)

G. Cybenko, Approximation by superpositions of a sig- moidal function,2, 303 (1989)

1989
[33]

A. R. Barron, Approximation and estimation bounds for artificial neural networks,14, 115 (1994)

1994
[34]

Hume,A Treatise of Human Nature(Clarendon Press, 1896)

D. Hume,A Treatise of Human Nature(Clarendon Press, 1896)
[35]

J. Lee, Y. Bahri, R. Novak, S. S. Schoenholz, J. Penning- ton, and J. Sohl-Dickstein, Deep neural networks as gaus- sian processes, , 1711.00165 (2017), arXiv:1711.00165

Pith/arXiv arXiv 2017
[36]

C. K. Williams and C. E. Rasmussen,Gaussian Processes for Machine Learning, 1st ed. (MIT Press, Cambridge, 2006)

2006
[37]

Le Cun, I

Y. Le Cun, I. Kanter, and S. A. Solla, Eigenvalues of co- variance matrices: Application to neural-network learning, 66, 2396 (1991)

1991
[38]

Canatar, B

A. Canatar, B. Bordelon, and C. Pehlevan, Spectral Bias and Task-Model Alignment Explain Generalization in Ker- nel Regression and Infinitely Wide Neural Networks, Na- ture Communications12, 2914 (2021), arXiv:2006.13198

arXiv 2021
[39]

Dutordoir, N

V. Dutordoir, N. Durrande, and J. Hensman, Sparse Gaus- sian processes with spherical harmonic features, inInter- national Conference on Machine Learning(PMLR, 2020) pp. 2793–2802

2020
[40]

Helias and D

M. Helias and D. Dahmen, Statistical field theory for neural networks, (2019), 1901.10416 [cond-mat.dis-nn]

arXiv 2019
[41]

C. Keup, T. Kühn, D. Dahmen, and M. Helias, Tran- sient Chaotic Dimensionality Expansion by Recurrent Networks, Physical Review X11, 021064 (2021)

2021
[42]

Bertschinger and T

N. Bertschinger and T. Natschläger, Real-time computa- tion at the edge of chaos in recurrent neural networks, 16, 1413 (2004)

2004
[43]

Toyoizumi and L

T. Toyoizumi and L. F. Abbott, Beyond the edge of chaos: Amplification and temporal integration by recurrent net- works in the chaotic regime,84, 051908 (2011)

2011
[44]

echo state

H. Jaeger,The “echo state” approach to analysing and training recurrent neural networks, Tech. Rep. GMD Re- port 148 (German National Research Center for Informa- tion Technology, St. Augustin, Germany, 2001)

2001
[45]

Bordelon and C

B. Bordelon and C. Pehlevan, Population codes enable learning from few examples by shaping inductive bias, Elife11, e78606 (2022)

2022
[46]

P. L. Bartlett, P. M. Long, G. Lugosi, and A. Tsigler, Benign overfitting in linear regression, Proceedings of the National Academy of Sciences117, 30063 (2020)

2020
[47]

Schuecker, S

J. Schuecker, S. Goedeke, and M. Helias, Optimal se- quence memory in driven random networks,8, 041029 (2018)

2018
[48]

Funk, Beiträge zur theorie der kugelfunktionen, Math- ematische Annalen77, 136 (1915)

P. Funk, Beiträge zur theorie der kugelfunktionen, Math- ematische Annalen77, 136 (1915)

1915
[49]

Belkin, D

M. Belkin, D. Hsu, S. Ma, and S. Mandal, Reconcil- ing modern machine-learning practice and the classi- cal bias–variance trade-off, Proceedings of the National Academy of Sciences116, 15849 (2019)

2019
[50]

Destexhe and D

A. Destexhe and D. Paré, Impact of network activity on the integrative properties of neocortical pyramidal neurons in vivo,81, 1531 (1999)

1999
[51]

Z. F. Mainen and T. J. Sejnowski, Reliability of spike timing in neocortical neurons, Science268, 1503 (1995)

1995
[52]

Arieli, A

A. Arieli, A. Sterkin, A. Grinvald, and A. Aertsen, Dynam- ics of ongoing activity: explanation of the large variability in evoked cortical responses, Science273, 1868 (1996)

1996
[53]

M. M. Churchland, B. M. Yu, J. P. Cunningham, L. P. Sugrue, M. R. Cohen, G. S. Corrado, W. T. Newsome, A. M. Clark, P. Hosseini, B. B. Scott, D. C. Bradley, M. A. Smith, A. Kohn, J. A. Movshon, K. M. Armstrong, T. Moore, S. W. Chang, L. H. Snyder, S. G. Lisberger, N. J. Priebe, I. M. Finn, D. Ferster, S. I. Ryu, G. San- thanam, M. Sahani, and K. V. Shenoy...

2010
[54]

Tchumatchenko, A

T. Tchumatchenko, A. Malyshev, T. Geisel, M. Volgushev, and F. Wolf, Correlations and synchrony in threshold neuron models,104, 058102 (2010)

2010
[55]

Rahimi and B

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in neural information process- ing systems20(2007). 13

2007
[56]

Chizat, E

L. Chizat, E. Oyallon, and F. Bach, On lazy training in differentiable programming (2019)

2019
[57]

Bordelon and C

B. Bordelon and C. Pehlevan, Population codes enable learning from few examples by shaping inductive bias, bioRxiv : the preprint server for biology , 2021 (2022)

2021
[58]

Fischer, J

K. Fischer, J. Lindner, D. Dahmen, Z. Ringel, M. Krämer, and M. Helias, Critical feature learning in deep neural networks (2024), arXiv:2405.10761 [cond-mat.dis-nn]

arXiv 2024
[59]

Ringel, N

Z. Ringel, N. Rubin, E. Mor, M. Helias, and I. Seroussi, Applications of Statistical Field Theory in Deep Learning (2025), arXiv:2502.18553 [stat]

arXiv 2025
[60]

Lauditi, B

C. Lauditi, B. Bordelon, and C. Pehlevan, Adaptive kernel predictors from feature-learning infinite limits of neural networks (2025), arXiv:2502.07998 [cs]

arXiv 2025
[61]

J. P. Bauer, K. Fischer, M. Helias, and A. Palmigiano, A unified theory of feature learning in RNNs and DNNs (2026), arXiv:2602.15593 [cs]

arXiv 2026
[62]

D. G. Clark, B. Bordelon, J. A. Zavatone-Veth, and C. Pehlevan, Structure, disorder, and dynamics in task- trained recurrent neural circuits (2026)

2026
[63]

A. C. C. Coolen, Statistical mechanics of recurrent neural networks ii. dynamics, (2000)

2000
[64]

Kadmon and H

J. Kadmon and H. Sompolinsky, Transition to chaos in random neuronal networks, Physical Review X5, 041030 (2015)

2015
[65]

Bradbury, R

J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. Vander- Plas, S. Wanderman-Milne, and Q. Zhang, JAX: Com- posable transformations of Python+NumPy programs (2018)

2018
[66]

Koltchinskii and E

V. Koltchinskii and E. Giné, Random matrix approxima- tion of spectra of integral operators, Bernoulli. Official Journal of the Bernoulli Society for Mathematical Statis- tics and Probability , 113 (2000)

2000
[67]

M. L. Braun, Spectral properties of the kernel matrix and their relation to kernel methods in machine learning, (2005)

2005
[68]

Suetin, Ultraspherical polynomials, Encyclopaedia of mathematics

PK. Suetin, Ultraspherical polynomials, Encyclopaedia of mathematics. Springer, Berlin (2001). Appendix A: Model-independent mean-field theory for random networks This section presents a self-contained derivation of the model-independent mean-field theory for networks with Gaussian random connectivityJij i.i.d.∼N( ¯g N , g2 N ). This formalism is the basi...

2001
[69]

The probability for this event ise−(t−s)/τp(ϕα s =1, ϕ β s =1∣hαhβ)

At time s, both variables are in stateϕα s =ϕ β s = 1and there is no update within[s, t]. The probability for this event ise−(t−s)/τp(ϕα s =1, ϕ β s =1∣hαhβ)
[70]

At time s, variable ϕβ s is in stateϕβ s = 1and ϕα s is arbitrary, which happens with the probability that the last update took ϕβ s to the up-state, which isp(ϕβ s = 1)= ∫ s −∞ ds′ τ e−(s−s′) Tp(hβ s′) and within[s, t]the last update that broughtϕ α into stateϕ α t =1. The probability for the joint occurrence of this event is p[≥1update in[s, t], ϕα t =1...
[71]

To analyze the discontinuity, we take the limitϵ0→0, ϵt = (c(1−e−t/2τ)+√ϵ0e−t/2τ) 2 ≃c2(1−e−t/2τ) 2 , giving a drop∆t/c2 = (1−e−t/2τ) 2 that is finite for any finite time

Limiting cases Small decorrelation Letc= 2√π g2⟨T′(h)⟩N(0, Q 0).Defining a small decorrelationϵ t =Q 0−Q12 tt, [13] finds (τ ∂t+1)ϵ t =c √ϵt with solution ϵt = (c−(c−√ϵ0)e−t/2τ) 2 . To analyze the discontinuity, we take the limitϵ0→0, ϵt = (c(1−e−t/2τ)+√ϵ0e−t/2τ) 2 ≃c2(1−e−t/2τ) 2 , giving a drop∆t/c2 = (1−e−t/2τ) 2 that is finite for any finite time. Mic...

[1] [1]

locally rough

The chaos transition shapes computational repertoires For continuous systems Eq. (1), the general impact of synaptic strength on the collective dynamics has been studied: It has been shown that continuous networks with many neurons exhibit a transition to chaos at large synap- tic strengths. For hyperbolic tangent transfer function T(h)=tanh(h) in particu...

[2] [2]

Van Vreeswijk and H

C. Van Vreeswijk and H. Sompolinsky, Chaos in neuronal networks with balanced excitatory and inhibitory activity, Science274, 1724 (1996)

1996

[3] [3]

London, A

M. London, A. Roth, L. Beeren, M. Häusser, and P. E. Latham, Sensitivity to perturbations in vivo implies high noise and suggests rate coding in cortex, Nature466, 123 (2010)

2010

[4] [4]

Kadmon and H

J. Kadmon and H. Sompolinsky, Transition to chaos in random neuronal networks,5, 041030 (2015)

2015

[5] [5]

Stringer, M

C. Stringer, M. Pachitariu, N. Steinmetz, M. Carandini, and K. D. Harris, High-dimensional geometry of popula- tion responses in visual cortex, Nature571, 361 (2019)

2019

[6] [6]

Muñoz, R

W. Muñoz, R. Tremblay, D. Levenstein, and B. Rudy, Layer-specific modulation of neocortical dendritic inhibi- tion during active wakefulness, Science355, 954 (2017)

2017

[7] [7]

Maass, T

W. Maass, T. Natschläger, and H. Markram, Real-time computing without stable states: a new framework for neural computation based on perturbations,14, 2531 (2002)

2002

[8] [8]

Jaeger and H

H. Jaeger and H. Haas, Harnessing nonlinearity: Pre- dicting chaotic systems and saving energy in wireless communication, Science304, 78 (2004)

2004

[9] [9]

Biswas and J

T. Biswas and J. E. Fitzgerald, Geometric framework to predict structure from function in neural networks, Physical Review Research4, 023255 (2022)

2022

[10] [10]

Poole, S

B. Poole, S. Lahiri, M. Raghu, J. Sohl-Dickstein, and S. Ganguli, Exponential expressivity in deep neural net- works through transient chaos, inAdvances in Neural Information Processing Systems 29(2016)

2016

[11] [11]

S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. Sohl- Dickstein, Deep information propagation, 5th Interna- tional Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings (2017)

2017

[12] [12]

G. Yang, Scaling limits of wide neural networks with weight sharing: Gaussian process behavior, gradient in- dependence, and neural tangent kernel derivation, ArXiv e-prints (2019), 1902.04760

arXiv 2019

[13] [13]

Segadlo, B

K. Segadlo, B. Epping, A. van Meegen, D. Dahmen, M. Krämer, and M. Helias, Unified field theoretical ap- proach to deep and recurrent neuronal networks, (2022), accepted

2022

[14] [14]

C. Keup, T. Kühn, D. Dahmen, and M. Helias, Transient 12 chaotic dimensionality expansion by recurrent networks, 11, 021064 (2021)

2021

[15] [15]

W. S. McCulloch and W. Pitts, A logical calculus of the ideas immanent in neural nets,5, 115 (1943)

1943

[16] [16]

D. O. Hebb,The organization of behavior: A neuropsy- chological theory(John Wiley & Sons, New York, 1949)

1949

[17] [17]

D. J. Amit, H. Gutfreund, and H. Sompolinsky, Spin-glass models of neural networks, Physical Review A32, 1007 (1985)

1985

[18] [18]

van Vreeswijk and H

C. van Vreeswijk and H. Sompolinsky, Chaos in neuronal networks with balanced excitatory and inhibitory activity, Science274, 1724 (1996)

1996

[19] [19]

Renart, J

A. Renart, J. De La Rocha, P. Bartho, L. Hollender, N. Parga, A. Reyes, and K. D. Harris, The asynchronous state in cortical circuits, Science327, 587 (2010)

2010

[20] [20]

Glauber, Time-dependent statistics of the Ising model, 4, 294 (1963)

R. Glauber, Time-dependent statistics of the Ising model, 4, 294 (1963)

1963

[21] [21]

Amari, Dynamics of pattern formation in lateral- inhibition type neural fields,27, 77 (1977)

S.-I. Amari, Dynamics of pattern formation in lateral- inhibition type neural fields,27, 77 (1977)

1977

[22] [22]

Sompolinsky, A

H. Sompolinsky, A. Crisanti, and H. J. Sommers, Chaos in random neural networks,61, 259 (1988)

1988

[23] [23]

Hornik, M

K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators,2, 359 (1989)

1989

[24] [24]

A. E. Hoerl and R. W. Kennard, Ridge regression: Ap- plications to nonorthogonal problems, Technometrics : a journal of statistics for the physical, chemical, and engi- neering sciences12, 69 (1970)

1970

[25] [25]

Schölkopf, A

B. Schölkopf, A. J. Smola, F. Bach,et al.,Learning with Kernels: Support Vector Machines, Regularization, Opti- mization, and Beyond(MIT press, 2002)

2002

[26] [26]

R. M. Neal,Bayesian Learning for Neural Networks (Springer New York, 1996)

1996

[27] [27]

J. Lee, L. Xiao, S. Schoenholz, Y. Bahri, R. Novak, J. Sohl- Dickstein, and J. Pennington, Wide neural networks of any depth evolve as linear models under gradient descent, Advances in neural information processing systems32, 8572 (2019)

2019

[28] [28]

Yang, Wide feedforward or recurrent neural networks of any architecture are gaussian processes (Curran Asso- ciates, Inc., 2019)

G. Yang, Wide feedforward or recurrent neural networks of any architecture are gaussian processes (Curran Asso- ciates, Inc., 2019)

2019

[29] [29]

Segadlo, B

K. Segadlo, B. Epping, A. van Meegen, D. Dahmen, M. Krämer, and M. Helias, Unified Field Theory for Deep and Recurrent Neural Networks, arXiv:2112.05589 [cond- mat, stat] (2022), arXiv:2112.05589 [cond-mat, stat]

arXiv 2022

[30] [30]

Rasmussen and C

C. Rasmussen and C. Williams,Gaussian Processes for Machine Learning, Adaptive Computation and Machine Learning (MIT Press, Cambridge, MA, USA, 2006) p. 248

2006

[31] [31]

Cohen, O

O. Cohen, O. Malka, and Z. Ringel, Learning curves for overparametrized deep neural networks: A field theory perspective,3, 023034 (2021)

2021

[32] [32]

Cybenko, Approximation by superpositions of a sig- moidal function,2, 303 (1989)

G. Cybenko, Approximation by superpositions of a sig- moidal function,2, 303 (1989)

1989

[33] [33]

A. R. Barron, Approximation and estimation bounds for artificial neural networks,14, 115 (1994)

1994

[34] [34]

Hume,A Treatise of Human Nature(Clarendon Press, 1896)

D. Hume,A Treatise of Human Nature(Clarendon Press, 1896)

[35] [35]

J. Lee, Y. Bahri, R. Novak, S. S. Schoenholz, J. Penning- ton, and J. Sohl-Dickstein, Deep neural networks as gaus- sian processes, , 1711.00165 (2017), arXiv:1711.00165

Pith/arXiv arXiv 2017

[36] [36]

C. K. Williams and C. E. Rasmussen,Gaussian Processes for Machine Learning, 1st ed. (MIT Press, Cambridge, 2006)

2006

[37] [37]

Le Cun, I

Y. Le Cun, I. Kanter, and S. A. Solla, Eigenvalues of co- variance matrices: Application to neural-network learning, 66, 2396 (1991)

1991

[38] [38]

Canatar, B

A. Canatar, B. Bordelon, and C. Pehlevan, Spectral Bias and Task-Model Alignment Explain Generalization in Ker- nel Regression and Infinitely Wide Neural Networks, Na- ture Communications12, 2914 (2021), arXiv:2006.13198

arXiv 2021

[39] [39]

Dutordoir, N

V. Dutordoir, N. Durrande, and J. Hensman, Sparse Gaus- sian processes with spherical harmonic features, inInter- national Conference on Machine Learning(PMLR, 2020) pp. 2793–2802

2020

[40] [40]

Helias and D

M. Helias and D. Dahmen, Statistical field theory for neural networks, (2019), 1901.10416 [cond-mat.dis-nn]

arXiv 2019

[41] [41]

C. Keup, T. Kühn, D. Dahmen, and M. Helias, Tran- sient Chaotic Dimensionality Expansion by Recurrent Networks, Physical Review X11, 021064 (2021)

2021

[42] [42]

Bertschinger and T

N. Bertschinger and T. Natschläger, Real-time computa- tion at the edge of chaos in recurrent neural networks, 16, 1413 (2004)

2004

[43] [43]

Toyoizumi and L

T. Toyoizumi and L. F. Abbott, Beyond the edge of chaos: Amplification and temporal integration by recurrent net- works in the chaotic regime,84, 051908 (2011)

2011

[44] [44]

echo state

H. Jaeger,The “echo state” approach to analysing and training recurrent neural networks, Tech. Rep. GMD Re- port 148 (German National Research Center for Informa- tion Technology, St. Augustin, Germany, 2001)

2001

[45] [45]

Bordelon and C

B. Bordelon and C. Pehlevan, Population codes enable learning from few examples by shaping inductive bias, Elife11, e78606 (2022)

2022

[46] [46]

P. L. Bartlett, P. M. Long, G. Lugosi, and A. Tsigler, Benign overfitting in linear regression, Proceedings of the National Academy of Sciences117, 30063 (2020)

2020

[47] [47]

Schuecker, S

J. Schuecker, S. Goedeke, and M. Helias, Optimal se- quence memory in driven random networks,8, 041029 (2018)

2018

[48] [48]

Funk, Beiträge zur theorie der kugelfunktionen, Math- ematische Annalen77, 136 (1915)

P. Funk, Beiträge zur theorie der kugelfunktionen, Math- ematische Annalen77, 136 (1915)

1915

[49] [49]

Belkin, D

M. Belkin, D. Hsu, S. Ma, and S. Mandal, Reconcil- ing modern machine-learning practice and the classi- cal bias–variance trade-off, Proceedings of the National Academy of Sciences116, 15849 (2019)

2019

[50] [50]

Destexhe and D

A. Destexhe and D. Paré, Impact of network activity on the integrative properties of neocortical pyramidal neurons in vivo,81, 1531 (1999)

1999

[51] [51]

Z. F. Mainen and T. J. Sejnowski, Reliability of spike timing in neocortical neurons, Science268, 1503 (1995)

1995

[52] [52]

Arieli, A

A. Arieli, A. Sterkin, A. Grinvald, and A. Aertsen, Dynam- ics of ongoing activity: explanation of the large variability in evoked cortical responses, Science273, 1868 (1996)

1996

[53] [53]

M. M. Churchland, B. M. Yu, J. P. Cunningham, L. P. Sugrue, M. R. Cohen, G. S. Corrado, W. T. Newsome, A. M. Clark, P. Hosseini, B. B. Scott, D. C. Bradley, M. A. Smith, A. Kohn, J. A. Movshon, K. M. Armstrong, T. Moore, S. W. Chang, L. H. Snyder, S. G. Lisberger, N. J. Priebe, I. M. Finn, D. Ferster, S. I. Ryu, G. San- thanam, M. Sahani, and K. V. Shenoy...

2010

[54] [54]

Tchumatchenko, A

T. Tchumatchenko, A. Malyshev, T. Geisel, M. Volgushev, and F. Wolf, Correlations and synchrony in threshold neuron models,104, 058102 (2010)

2010

[55] [55]

Rahimi and B

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in neural information process- ing systems20(2007). 13

2007

[56] [56]

Chizat, E

L. Chizat, E. Oyallon, and F. Bach, On lazy training in differentiable programming (2019)

2019

[57] [57]

Bordelon and C

B. Bordelon and C. Pehlevan, Population codes enable learning from few examples by shaping inductive bias, bioRxiv : the preprint server for biology , 2021 (2022)

2021

[58] [58]

Fischer, J

K. Fischer, J. Lindner, D. Dahmen, Z. Ringel, M. Krämer, and M. Helias, Critical feature learning in deep neural networks (2024), arXiv:2405.10761 [cond-mat.dis-nn]

arXiv 2024

[59] [59]

Ringel, N

Z. Ringel, N. Rubin, E. Mor, M. Helias, and I. Seroussi, Applications of Statistical Field Theory in Deep Learning (2025), arXiv:2502.18553 [stat]

arXiv 2025

[60] [60]

Lauditi, B

C. Lauditi, B. Bordelon, and C. Pehlevan, Adaptive kernel predictors from feature-learning infinite limits of neural networks (2025), arXiv:2502.07998 [cs]

arXiv 2025

[61] [61]

J. P. Bauer, K. Fischer, M. Helias, and A. Palmigiano, A unified theory of feature learning in RNNs and DNNs (2026), arXiv:2602.15593 [cs]

arXiv 2026

[62] [62]

D. G. Clark, B. Bordelon, J. A. Zavatone-Veth, and C. Pehlevan, Structure, disorder, and dynamics in task- trained recurrent neural circuits (2026)

2026

[63] [63]

A. C. C. Coolen, Statistical mechanics of recurrent neural networks ii. dynamics, (2000)

2000

[64] [64]

Kadmon and H

J. Kadmon and H. Sompolinsky, Transition to chaos in random neuronal networks, Physical Review X5, 041030 (2015)

2015

[65] [65]

Bradbury, R

J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. Vander- Plas, S. Wanderman-Milne, and Q. Zhang, JAX: Com- posable transformations of Python+NumPy programs (2018)

2018

[66] [66]

Koltchinskii and E

V. Koltchinskii and E. Giné, Random matrix approxima- tion of spectra of integral operators, Bernoulli. Official Journal of the Bernoulli Society for Mathematical Statis- tics and Probability , 113 (2000)

2000

[67] [67]

M. L. Braun, Spectral properties of the kernel matrix and their relation to kernel methods in machine learning, (2005)

2005

[68] [68]

Suetin, Ultraspherical polynomials, Encyclopaedia of mathematics

PK. Suetin, Ultraspherical polynomials, Encyclopaedia of mathematics. Springer, Berlin (2001). Appendix A: Model-independent mean-field theory for random networks This section presents a self-contained derivation of the model-independent mean-field theory for networks with Gaussian random connectivityJij i.i.d.∼N( ¯g N , g2 N ). This formalism is the basi...

2001

[69] [69]

The probability for this event ise−(t−s)/τp(ϕα s =1, ϕ β s =1∣hαhβ)

At time s, both variables are in stateϕα s =ϕ β s = 1and there is no update within[s, t]. The probability for this event ise−(t−s)/τp(ϕα s =1, ϕ β s =1∣hαhβ)

[70] [70]

At time s, variable ϕβ s is in stateϕβ s = 1and ϕα s is arbitrary, which happens with the probability that the last update took ϕβ s to the up-state, which isp(ϕβ s = 1)= ∫ s −∞ ds′ τ e−(s−s′) Tp(hβ s′) and within[s, t]the last update that broughtϕ α into stateϕ α t =1. The probability for the joint occurrence of this event is p[≥1update in[s, t], ϕα t =1...

[71] [71]

To analyze the discontinuity, we take the limitϵ0→0, ϵt = (c(1−e−t/2τ)+√ϵ0e−t/2τ) 2 ≃c2(1−e−t/2τ) 2 , giving a drop∆t/c2 = (1−e−t/2τ) 2 that is finite for any finite time

Limiting cases Small decorrelation Letc= 2√π g2⟨T′(h)⟩N(0, Q 0).Defining a small decorrelationϵ t =Q 0−Q12 tt, [13] finds (τ ∂t+1)ϵ t =c √ϵt with solution ϵt = (c−(c−√ϵ0)e−t/2τ) 2 . To analyze the discontinuity, we take the limitϵ0→0, ϵt = (c(1−e−t/2τ)+√ϵ0e−t/2τ) 2 ≃c2(1−e−t/2τ) 2 , giving a drop∆t/c2 = (1−e−t/2τ) 2 that is finite for any finite time. Mic...