Theory of learning of high-dimensional controlled non-linear dynamical systems (I): models and methods

Pierfrancesco Urbani

arxiv: 2606.07247 · v2 · pith:N5P752ZZnew · submitted 2026-06-05 · ❄️ cond-mat.dis-nn · cond-mat.stat-mech· stat.ML

Theory of learning of high-dimensional controlled non-linear dynamical systems (I): models and methods

Pierfrancesco Urbani This is my paper

Pith reviewed 2026-06-27 20:22 UTC · model grok-4.3

classification ❄️ cond-mat.dis-nn cond-mat.stat-mechstat.ML

keywords neural ODEsdynamical mean field theoryhigh-dimensional limitonline stochastic gradient descentlearning curvestraining dynamicscontrolled dynamical systemsgeneralization

0 comments

The pith

A class of controlled non-linear dynamical systems allows exact solution of neural ODE training dynamics in the high-dimensional limit via dynamical mean field theory.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a theoretically grounded class of controlled non-linear dynamical systems to model neural ODEs trained by online stochastic gradient descent. It applies dynamical mean field theory to close the equations for the dual inference and training dynamics. This yields explicit learning curves in the high-dimensional limit. The framework covers settings such as ResNets, autoregressive models, generative models, and recurrent networks in neuroscience. A sympathetic reader would care because it turns the training of continuous-time neural networks into a solvable problem rather than a black-box optimization task.

Core claim

We introduce a theoretically grounded class of models for studying neural ODEs trained via online stochastic gradient descent. We solve the training dynamics of these models via dynamical mean field theory and derive learning curves in the high-dimensional limit.

What carries the argument

Dynamical mean field theory closure on the coupled inference and training dynamics of the controlled non-linear systems.

Load-bearing premise

The introduced class of controlled non-linear dynamical systems is representative enough that the mean-field equations capture the essential training behavior of the wider family of neural ODEs.

What would settle it

A direct numerical simulation of a neural ODE outside the controlled class whose measured training trajectory deviates from the predicted learning curve in the high-dimensional regime.

Figures

Figures reproduced from arXiv: 2606.07247 by Pierfrancesco Urbani.

**Figure 1.** Figure 1: The pipeline of training dynamics. condition. The algorithm requires storing the initial and final states of the Teacher process, x(0, α) and x(Ts, α), which constitute the α-th training point, as well as the entire trajectory of the Student process, y(t, α), for t ∈ [0, Ts]. Following this, the dynamics of the adjoint fields π are computed backward in time. Once the adjoint trajectory is obtained, the gra… view at source ↗

**Figure 2.** Figure 2: Comparison between DMFT and numerical simulations. We plot tree observables [PITH_FULL_IMAGE:figures/full_fig_p023_2.png] view at source ↗

read the original abstract

Neural ordinary differential equations (neural ODEs) have rapidly gained prominence as a powerful and unifying framework for conceptualizing artificial neural networks, elegantly connecting the continuous-time modeling of dynamical systems with the discrete, data-driven paradigm of modern deep learning. Beyond their practical advantages they offer fresh theoretical insights into the training and generalization properties of neural networks. The distinctive feature of this framework is its dual dynamical nature: inference dynamics, which govern the ODE evolution during forward computation, and training dynamics, which control the optimization of model parameters. This makes neural ODEs a particularly well-suited theoretical framework for studying a large variety of settings such as multi-layer neural networks (ResNets for example), autoregressive models (with next-token generation dynamics), generative models, and recurrent neural networks in theoretical neuroscience. In this work, we introduce a theoretically grounded class of models for studying neural ODEs trained via online stochastic gradient descent. We solve the training dynamics of these models via dynamical mean field theory and derive learning curves in the high-dimensional limit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims a DMFT closure for online training of a new class of controlled neural ODEs that covers ResNets, RNNs and similar models, but the abstract supplies no equations, no order-parameter definitions, and no finite-size checks.

read the letter

The central claim is that a specific family of controlled non-linear dynamical systems admits an exact dynamical mean-field treatment whose high-dimensional limit gives closed learning curves under online SGD. That is the one concrete thing the abstract puts forward.

What is new is the particular combination: continuous-time controlled ODEs with online gradient updates, framed so that DMFT can be applied directly. Prior DMFT work has handled discrete-layer networks and some recurrent cases; moving the same machinery to this controlled continuous setting is a straightforward but legitimate extension. If the closure actually works, it would give an analytic route to the entire training trajectory rather than just equilibria, which is rarer in this literature.

The obvious soft spot is the complete absence of any supporting material in the abstract. No equations for the order parameters, no statement of the self-consistent equations, and no mention of numerical checks against finite networks. The stress-test note correctly flags that we still need to see whether the chosen control and non-linearity structure is generic enough that the derived curves actually describe standard neural-ODE instantiations without hidden assumptions. Until the full derivation is examined, that step remains unverified.

This is for people already working inside the DMFT-for-learning program who want to know whether the method extends to continuous controlled systems. A reader outside that niche will get little from it. The work is coherent on its own terms and engages the right literature, so it deserves a serious referee even if the eventual verdict depends on whether the closure holds and whether the model class is representative.

Referee Report

2 major / 0 minor

Summary. The paper introduces a class of controlled non-linear dynamical systems as a theoretically grounded model for neural ODEs (including ResNets, autoregressive models, RNNs, and generative models) trained by online SGD. It claims to solve the training dynamics exactly via dynamical mean-field theory (DMFT) and to derive closed learning curves in the high-dimensional limit.

Significance. If the DMFT closure is exact for the proposed class and the class is representative of standard neural-ODE architectures, the work would supply the first parameter-free, high-dimensional learning curves for a broad family of continuous-depth models, a substantial advance over existing heuristic or simulation-based analyses.

major comments (2)

[Abstract and §1] Abstract and §1: the central claim that the introduced controlled non-linear class is sufficiently representative for the DMFT closure to capture essential forward and training dynamics of the broader family of neural ODEs (ResNets, autoregressive models, RNNs) is asserted without an explicit mapping, specialization check, or numerical comparison showing that the closure survives when the model is reduced to standard neural-ODE forms. This representativeness step is load-bearing for the applicability statements.
[Abstract] Abstract: the assertion that DMFT 'yields closed learning curves' is stated without any displayed order-parameter equations, closure ansatz, or finite-N validation; the absence of these elements in the provided text prevents assessment of whether the claimed closure is actually achieved or merely conjectured.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract and §1] Abstract and §1: the central claim that the introduced controlled non-linear class is sufficiently representative for the DMFT closure to capture essential forward and training dynamics of the broader family of neural ODEs (ResNets, autoregressive models, RNNs) is asserted without an explicit mapping, specialization check, or numerical comparison showing that the closure survives when the model is reduced to standard neural-ODE forms. This representativeness step is load-bearing for the applicability statements.

Authors: We agree that explicit mappings would strengthen the claims. In the revised manuscript we will add a subsection in §1 providing explicit reductions of the controlled non-linear class to ResNets, RNNs and autoregressive models, together with the corresponding specializations of the DMFT equations and numerical checks confirming that the closure is preserved under these reductions. revision: yes
Referee: [Abstract] Abstract: the assertion that DMFT 'yields closed learning curves' is stated without any displayed order-parameter equations, closure ansatz, or finite-N validation; the absence of these elements in the provided text prevents assessment of whether the claimed closure is actually achieved or merely conjectured.

Authors: The order-parameter equations, closure ansatz and finite-N validations appear in Sections 3–5. The abstract summarizes the result at a high level, which is standard. To improve clarity we will insert a concise outline of the key DMFT equations and ansatz into the introduction of the revised version. revision: partial

Circularity Check

0 steps flagged

No circularity: DMFT applied to newly introduced model class

full rationale

The paper introduces a class of controlled non-linear dynamical systems and applies dynamical mean-field theory to derive its training dynamics and learning curves in the high-d limit. No quoted step reduces a result to a fitted parameter, self-definition, or self-citation chain; the derivation is presented as a standard closure on the defined model without evidence that predictions are tautological with inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the applicability of dynamical mean-field theory to the newly defined models; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Dynamical mean field theory provides an exact closure for the training dynamics of the introduced high-dimensional models.
The abstract states that the training dynamics are solved via DMFT, which presupposes that the mean-field approximation becomes exact in the high-dimensional limit.

pith-pipeline@v0.9.1-grok · 5714 in / 1230 out tokens · 21578 ms · 2026-06-27T20:22:46.574908+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 4 linked inside Pith

[1]

Learning representations by back-propagating errors.Nature, 323(6088):533–536, 1986

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors.Nature, 323(6088):533–536, 1986. 2

1986
[2]

Multilayer feedforward networks are universal approximators.Neural networks, 2(5):359–366, 1989

Kurt Hornik, Maxwell B Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators.Neural networks, 2(5):359–366, 1989. 2 24

1989
[3]

MIT Press, Cambridge, MA, USA, 2021

Yoshua Bengio, Ian Goodfellow, and Aaron Courville.Deep Learning. MIT Press, Cambridge, MA, USA, 2021. 2

2021
[4]

Deep convolutional neural networks for image classification: A comprehensive review.Neural Computation, 29(9):2352–2449, 2017

Wang Rawat and Zan Wang. Deep convolutional neural networks for image classification: A comprehensive review.Neural Computation, 29(9):2352–2449, 2017. 2

2017
[5]

Springer science & business media,

Vladimir Vapnik.The nature of statistical learning theory. Springer science & business media,
[6]

Understand- ing deep learning requires rethinking generalization.arXiv preprint arXiv:1611.03530, 2016

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understand- ing deep learning requires rethinking generalization.arXiv preprint arXiv:1611.03530, 2016. 3

Pith/arXiv arXiv 2016
[7]

Under- standing deep learning (still) requires rethinking generalization.Communications of the ACM, 64(3):107–115, 2021

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Under- standing deep learning (still) requires rethinking generalization.Communications of the ACM, 64(3):107–115, 2021. 3

2021
[8]

Neural tangent kernel: Convergence and generalization in neural networks.arXiv preprint arXiv:1806.07572, 2018

Arthur Jacot, Fran¸ cois Gabriel, and Cl´ ement Hongler. Neural tangent kernel: Convergence and generalization in neural networks.arXiv preprint arXiv:1806.07572, 2018. 3

arXiv 2018
[9]

On the global convergence of gradient descent for over- parameterized models with smooth activations.arXiv preprint arXiv:1806.02629, 2018

Lenaic Chizat and Francis Bach. On the global convergence of gradient descent for over- parameterized models with smooth activations.arXiv preprint arXiv:1806.02629, 2018. 3

arXiv 2018
[10]

Six lectures on linearized models.arXiv preprint, 2023

Andrea Montanari. Six lectures on linearized models.arXiv preprint, 2023. 3

2023
[11]

A mean field view of the landscape of two-layer neural networks.Proceedings of the National Academy of Sciences, 115(33):E7665– E7671, 2018

Song Mei, Andrea Montanari, and Phan-Minh Nguyen. A mean field view of the landscape of two-layer neural networks.Proceedings of the National Academy of Sciences, 115(33):E7665– E7671, 2018. 3

2018
[12]

Trainability and accuracy of artificial neural net- works: An interacting particle system approach.Communications on Pure and Applied Math- ematics, 75(9):1889–1935, 2022

Grant Rotskoff and Eric Vanden-Eijnden. Trainability and accuracy of artificial neural net- works: An interacting particle system approach.Communications on Pure and Applied Math- ematics, 75(9):1889–1935, 2022. 3

1935
[13]

Dynamical decoupling of generalization and overfitting in large two-layer networks.arXiv preprint arXiv:2502.21269, 2025

Andrea Montanari and Pierfrancesco Urbani. Dynamical decoupling of generalization and overfitting in large two-layer networks.arXiv preprint arXiv:2502.21269, 2025. 3, 4, 11, 12

arXiv 2025
[14]

Neural ordinary differential equations

Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. InAdvances in Neural Information Processing Systems, pages 6571– 6583, 2018. 3

2018
[15]

On neural differential equations.arXiv preprint arXiv:2202.02435, 2022

Patrick Kidger. On neural differential equations.arXiv preprint arXiv:2202.02435, 2022. 3

arXiv 2022
[16]

Attention is all you need.Advances in Neural Information Processing Systems, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in Neural Information Processing Systems, 2017. 3

2017
[17]

Language models are unsupervised multitask learners.OpenAI Blog, 2019

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners.OpenAI Blog, 2019. 3

2019
[18]

Efficiently modeling long sequences with struc- tured state spaces.arXiv preprint arXiv:2111.00396, 2021

Albert Gu, Karan Goel, and Christopher R´ e. Efficiently modeling long sequences with struc- tured state spaces.arXiv preprint arXiv:2111.00396, 2021. 3 25

Pith/arXiv arXiv 2021
[19]

Hippo: Recurrent memory with optimal polynomial projections.Neural Information Processing Systems, 2020

Albert Gu, Tri Dao, Stephen Gu, Daniel Dohan, Emily Chen, Rewon Child, and Christopher R´ e. Hippo: Recurrent memory with optimal polynomial projections.Neural Information Processing Systems, 2020. 3

2020
[20]

FFJORD: Free-form continuous dynamics for reversible generative models.arXiv preprint arXiv:1810.01367, 2018

Will Grathwohl, Ricky TQ Chen, Jesse Betancourt, Jascha Sohl-Dickstein, and David Du- venaud. FFJORD: Free-form continuous dynamics for reversible generative models.arXiv preprint arXiv:1810.01367, 2018. 3

Pith/arXiv arXiv 2018
[21]

Harnessing Nonlinearity: Predicting chaotic systems and saving energy with reservoir computing.Science, 304(5667):78–80, 2004

Herbert Jaeger and Harro Haas. Harnessing Nonlinearity: Predicting chaotic systems and saving energy with reservoir computing.Science, 304(5667):78–80, 2004. 3

2004
[22]

Real-time computing without stable states: A new framework for neural computation based on perturbations.Neural Com- putation, 14(11):2531–2560, 2002

Wolfgang Maass, Thomas Natschl¨ ager, and Henry Markram. Real-time computing without stable states: A new framework for neural computation based on perturbations.Neural Com- putation, 14(11):2531–2560, 2002. 3

2002
[23]

Generating coherent patterns of activity from chaotic neural networks.Neuron, 63(4):544–557, 2009

David Sussillo and Larry F Abbott. Generating coherent patterns of activity from chaotic neural networks.Neuron, 63(4):544–557, 2009. 3, 4

2009
[24]

Three unfinished works on the optimal storage capacity of networks.Journal of Physics A: Mathematical and General, 22(12):1983–1994,

Elizabeth Gardner and Bernard Derrida. Three unfinished works on the optimal storage capacity of networks.Journal of Physics A: Mathematical and General, 22(12):1983–1994,

1983
[25]

World Scientific Publishing Company, 1987

Marc M´ ezard, Giorgio Parisi, and Miguel Angel Virasoro.Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company, 1987. 4, 15, 16

1987
[26]

Passed & spurious: Descent algorithms and local minima in spiked matrix-tensor models

Stefano Sarao Mannelli, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborova. Passed & spurious: Descent algorithms and local minima in spiked matrix-tensor models. Ininterna- tional conference on machine learning, pages 4333–4342. PMLR, 2019. 4

2019
[27]

Marvels and pitfalls of the langevin algorithm in noisy high- dimensional inference.Physical Review X, 10(1):011057, 2020

Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborov´ a. Marvels and pitfalls of the langevin algorithm in noisy high- dimensional inference.Physical Review X, 10(1):011057, 2020. 4

2020
[28]

Analytical study of momentum-based accel- eration methods in paradigmatic high-dimensional non-convex problems.Advances in Neural Information Processing Systems, 34:187–199, 2021

Stefano Sarao Mannelli and Pierfrancesco Urbani. Analytical study of momentum-based accel- eration methods in paradigmatic high-dimensional non-convex problems.Advances in Neural Information Processing Systems, 34:187–199, 2021. 4

2021
[29]

Francesca Mignacco, Pierfrancesco Urbani, and Lenka Zdeborov´ a. Stochasticity helps to nav- igate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem.Machine Learning: Science and Technology, 2(3):035029, 2021. 4

2021
[30]

Stochastic gradient descent outperforms gradi- ent descent in recovering a high-dimensional signal in a glassy energy landscape.arXiv preprint arXiv:2309.04788, 2023

Persia Jana Kamali and Pierfrancesco Urbani. Stochastic gradient descent outperforms gradi- ent descent in recovering a high-dimensional signal in a glassy energy landscape.arXiv preprint arXiv:2309.04788, 2023. 4, 11, 12

arXiv 2023
[31]

Out-of- equilibrium dynamical mean-field equations for the perceptron model.Journal of Physics A: Mathematical and Theoretical, 51(8):085002, 2018

Elisabeth Agoritsas, Giulio Biroli, Pierfrancesco Urbani, and Francesco Zamponi. Out-of- equilibrium dynamical mean-field equations for the perceptron model.Journal of Physics A: Mathematical and Theoretical, 51(8):085002, 2018. 4, 12

2018
[32]

Dynam- ical mean-field theory for stochastic gradient descent in gaussian mixture classification.Ad- vances in Neural Information Processing Systems, 33:9540–9550, 2020

Francesca Mignacco, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborov´ a. Dynam- ical mean-field theory for stochastic gradient descent in gaussian mixture classification.Ad- vances in Neural Information Processing Systems, 33:9540–9550, 2020. 4, 12 26

2020
[33]

The estimation error of general first order methods

Michael Celentano, Andrea Montanari, and Yuchen Wu. The estimation error of general first order methods. InConference on Learning Theory, pages 1078–1141. PMLR, 2020. 4

2020
[34]

The effective noise of stochastic gradient de- scent.Journal of Statistical Mechanics: Theory and Experiment, 2022(8):083405, 2022

Francesca Mignacco and Pierfrancesco Urbani. The effective noise of stochastic gradient de- scent.Journal of Statistical Mechanics: Theory and Experiment, 2022(8):083405, 2022. 4, 12

2022
[35]

A dynamical model of neural scaling laws.arXiv preprint arXiv:2402.01092, 2024

Blake Bordelon, Alexander Atanasov, and Cengiz Pehlevan. A dynamical model of neural scaling laws.arXiv preprint arXiv:2402.01092, 2024. 4

arXiv 2024
[36]

Dynamical mean-field analysis of adaptive langevin diffusions: Propagation-of-chaos and convergence of the linear response.arXiv preprint arXiv:2504.15556, 2025

Zhou Fan, Justin Ko, Bruno Loureiro, Yue M Lu, and Yandi Shen. Dynamical mean-field analysis of adaptive langevin diffusions: Propagation-of-chaos and convergence of the linear response.arXiv preprint arXiv:2504.15556, 2025. 4

arXiv 2025
[37]

Statistical physics of learning in high- dimensional chaotic systems.Journal of Statistical Mechanics: Theory and Experiment, 2023(11):113301, 2023

Samantha J Fournier and Pierfrancesco Urbani. Statistical physics of learning in high- dimensional chaotic systems.Journal of Statistical Mechanics: Theory and Experiment, 2023(11):113301, 2023. 4, 5, 6

2023
[38]

Structure, disorder, and dynamics in task-trained recurrent neural circuits.bioRxiv, pages 2026–03,

David G Clark, Blake Bordelon, Jacob A Zavatone-Veth, and Cengiz Pehlevan. Structure, disorder, and dynamics in task-trained recurrent neural circuits.bioRxiv, pages 2026–03,

2026
[39]

To appear

Samantha Fournier and Pierfrancesco Urbani. To appear. 2026. 4

2026
[40]

Disordered high-dimensional optimal control.Journal of Physics A: Mathematical and Theoretical, 54(32):324001, 2021

Pierfrancesco Urbani. Disordered high-dimensional optimal control.Journal of Physics A: Mathematical and Theoretical, 54(32):324001, 2021. 4

2021
[41]

Unpublished

Yuri Lombardo and Pierfrancesco Urbani. Unpublished. see: Yuri, lombardo optimization of the gradient descent dynamics in simple mean field spin glasses. 2021. 4

2021
[42]

Optimal protocols for continual learning via statistical physics and control theory.Journal of Statistical Mechanics: Theory and Experiment, 2025(8):084004, 2025

Francesco Mori, Stefano Sarao Mannelli, and Francesca Mignacco. Optimal protocols for continual learning via statistical physics and control theory.Journal of Statistical Mechanics: Theory and Experiment, 2025(8):084004, 2025. 4

2025
[43]

Macroscopic fluctuation theory.Reviews of Modern Physics, 87(2):593–636, 2015

Lorenzo Bertini, Alberto De Sole, Davide Gabrielli, Giovanni Jona-Lasinio, and Claudio Landim. Macroscopic fluctuation theory.Reviews of Modern Physics, 87(2):593–636, 2015. 5

2015
[44]

Quantum optimal control theory.Journal of Physics B: Atomic, Molecular and Optical Physics, 40(18):R175–R211, 2007

J Werschnik and EKU Gross. Quantum optimal control theory.Journal of Physics B: Atomic, Molecular and Optical Physics, 40(18):R175–R211, 2007. 5

2007
[45]

Non- reciprocal interactions and high-dimensional chaos: comparing dynamics and statistics of equi- libria in a solvable model

Samantha J Fournier, Alessandro Pacco, Valentina Ros, and Pierfrancesco Urbani. Non- reciprocal interactions and high-dimensional chaos: comparing dynamics and statistics of equi- libria in a solvable model. 2025. 5, 6

2025
[46]

High-dimensional dynamical systems: co- existence of attractors, phase transitions, maximal lyapunov exponent and response to periodic drive.arXiv preprint arXiv:2511.09679, 2025

Samantha J Fournier and Pierfrancesco Urbani. High-dimensional dynamical systems: co- existence of attractors, phase transitions, maximal lyapunov exponent and response to periodic drive.arXiv preprint arXiv:2511.09679, 2025. 5, 6

arXiv 2025
[47]

Chaos in high-dimensional dynamical systems with tunable non-reciprocity.arXiv preprint arXiv:2601.04702, 2026

Samantha Fournier and Pierfrancesco Urbani. Chaos in high-dimensional dynamical systems with tunable non-reciprocity.arXiv preprint arXiv:2601.04702, 2026. 6

Pith/arXiv arXiv 2026
[48]

Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020

Timothy P Lillicrap, Adam Santoro, Luke Marris, Colin J Akerman, and Geoffrey Hinton. Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020. 8 27

2020
[49]

A mean-field optimal control formulation of deep learning.arXiv preprint arXiv:1807.01083, 2018

E Weinan, Jiequn Han, and Qianxiao Li. A mean-field optimal control formulation of deep learning.arXiv preprint arXiv:1807.01083, 2018. 10

arXiv 2018
[50]

Dynamical mean field theory for models of confluent tissues and beyond.SciPost Physics, 15(5):219, 2023

Persia Jana Kamali and Pierfrancesco Urbani. Dynamical mean field theory for models of confluent tissues and beyond.SciPost Physics, 15(5):219, 2023. 11

2023
[51]

Emergence and scaling laws in sgd learning of shallow neural networks.Advances in Neural Information Processing Systems, 38:38227–38309, 2026

Yunwei Ren, Eshaan Nichani, Denny Wu, and Jason Lee. Emergence and scaling laws in sgd learning of shallow neural networks.Advances in Neural Information Processing Systems, 38:38227–38309, 2026. 11

2026
[52]

Statistical dynamics of classical systems

Paul Cecil Martin, Eric D Siggia, and Harvey A Rose. Statistical dynamics of classical systems. Physical Review A, 8(1):423, 1973. 12

1973
[53]

Hans-Karl Janssen. On a lagrangean for classical field dynamics and renormalization group calculations of dynamical critical properties.Zeitschrift f¨ ur Physik B Condensed Matter, 23(4):377–380, 1976. 12

1976
[54]

Techniques de renormalisation de la th´ eorie des champs et dynamique des ph´ enomenes critiques.J

C De Dominicis. Techniques de renormalisation de la th´ eorie des champs et dynamique des ph´ enomenes critiques.J. Phys. Colloques, 37, 1976. 12

1976
[55]

Dynamics as a substitute for replicas in systems with quenched random impurities.Physical Review B, 18(9):4913, 1978

C De Dominicis. Dynamics as a substitute for replicas in systems with quenched random impurities.Physical Review B, 18(9):4913, 1978. 12, 13

1978
[56]

Oxford university press, 2021

Jean Zinn-Justin.Quantum field theory and critical phenomena, volume 171. Oxford university press, 2021. 12

2021
[57]

Diffusion with stochastic resetting.Physical review letters, 106(16):160601, 2011

Martin R Evans and Satya N Majumdar. Diffusion with stochastic resetting.Physical review letters, 106(16):160601, 2011. 19 28

2011

[1] [1]

Learning representations by back-propagating errors.Nature, 323(6088):533–536, 1986

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors.Nature, 323(6088):533–536, 1986. 2

1986

[2] [2]

Multilayer feedforward networks are universal approximators.Neural networks, 2(5):359–366, 1989

Kurt Hornik, Maxwell B Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators.Neural networks, 2(5):359–366, 1989. 2 24

1989

[3] [3]

MIT Press, Cambridge, MA, USA, 2021

Yoshua Bengio, Ian Goodfellow, and Aaron Courville.Deep Learning. MIT Press, Cambridge, MA, USA, 2021. 2

2021

[4] [4]

Deep convolutional neural networks for image classification: A comprehensive review.Neural Computation, 29(9):2352–2449, 2017

Wang Rawat and Zan Wang. Deep convolutional neural networks for image classification: A comprehensive review.Neural Computation, 29(9):2352–2449, 2017. 2

2017

[5] [5]

Springer science & business media,

Vladimir Vapnik.The nature of statistical learning theory. Springer science & business media,

[6] [6]

Understand- ing deep learning requires rethinking generalization.arXiv preprint arXiv:1611.03530, 2016

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understand- ing deep learning requires rethinking generalization.arXiv preprint arXiv:1611.03530, 2016. 3

Pith/arXiv arXiv 2016

[7] [7]

Under- standing deep learning (still) requires rethinking generalization.Communications of the ACM, 64(3):107–115, 2021

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Under- standing deep learning (still) requires rethinking generalization.Communications of the ACM, 64(3):107–115, 2021. 3

2021

[8] [8]

Neural tangent kernel: Convergence and generalization in neural networks.arXiv preprint arXiv:1806.07572, 2018

Arthur Jacot, Fran¸ cois Gabriel, and Cl´ ement Hongler. Neural tangent kernel: Convergence and generalization in neural networks.arXiv preprint arXiv:1806.07572, 2018. 3

arXiv 2018

[9] [9]

On the global convergence of gradient descent for over- parameterized models with smooth activations.arXiv preprint arXiv:1806.02629, 2018

Lenaic Chizat and Francis Bach. On the global convergence of gradient descent for over- parameterized models with smooth activations.arXiv preprint arXiv:1806.02629, 2018. 3

arXiv 2018

[10] [10]

Six lectures on linearized models.arXiv preprint, 2023

Andrea Montanari. Six lectures on linearized models.arXiv preprint, 2023. 3

2023

[11] [11]

A mean field view of the landscape of two-layer neural networks.Proceedings of the National Academy of Sciences, 115(33):E7665– E7671, 2018

Song Mei, Andrea Montanari, and Phan-Minh Nguyen. A mean field view of the landscape of two-layer neural networks.Proceedings of the National Academy of Sciences, 115(33):E7665– E7671, 2018. 3

2018

[12] [12]

Trainability and accuracy of artificial neural net- works: An interacting particle system approach.Communications on Pure and Applied Math- ematics, 75(9):1889–1935, 2022

Grant Rotskoff and Eric Vanden-Eijnden. Trainability and accuracy of artificial neural net- works: An interacting particle system approach.Communications on Pure and Applied Math- ematics, 75(9):1889–1935, 2022. 3

1935

[13] [13]

Dynamical decoupling of generalization and overfitting in large two-layer networks.arXiv preprint arXiv:2502.21269, 2025

Andrea Montanari and Pierfrancesco Urbani. Dynamical decoupling of generalization and overfitting in large two-layer networks.arXiv preprint arXiv:2502.21269, 2025. 3, 4, 11, 12

arXiv 2025

[14] [14]

Neural ordinary differential equations

Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. InAdvances in Neural Information Processing Systems, pages 6571– 6583, 2018. 3

2018

[15] [15]

On neural differential equations.arXiv preprint arXiv:2202.02435, 2022

Patrick Kidger. On neural differential equations.arXiv preprint arXiv:2202.02435, 2022. 3

arXiv 2022

[16] [16]

Attention is all you need.Advances in Neural Information Processing Systems, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in Neural Information Processing Systems, 2017. 3

2017

[17] [17]

Language models are unsupervised multitask learners.OpenAI Blog, 2019

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners.OpenAI Blog, 2019. 3

2019

[18] [18]

Efficiently modeling long sequences with struc- tured state spaces.arXiv preprint arXiv:2111.00396, 2021

Albert Gu, Karan Goel, and Christopher R´ e. Efficiently modeling long sequences with struc- tured state spaces.arXiv preprint arXiv:2111.00396, 2021. 3 25

Pith/arXiv arXiv 2021

[19] [19]

Hippo: Recurrent memory with optimal polynomial projections.Neural Information Processing Systems, 2020

Albert Gu, Tri Dao, Stephen Gu, Daniel Dohan, Emily Chen, Rewon Child, and Christopher R´ e. Hippo: Recurrent memory with optimal polynomial projections.Neural Information Processing Systems, 2020. 3

2020

[20] [20]

FFJORD: Free-form continuous dynamics for reversible generative models.arXiv preprint arXiv:1810.01367, 2018

Will Grathwohl, Ricky TQ Chen, Jesse Betancourt, Jascha Sohl-Dickstein, and David Du- venaud. FFJORD: Free-form continuous dynamics for reversible generative models.arXiv preprint arXiv:1810.01367, 2018. 3

Pith/arXiv arXiv 2018

[21] [21]

Harnessing Nonlinearity: Predicting chaotic systems and saving energy with reservoir computing.Science, 304(5667):78–80, 2004

Herbert Jaeger and Harro Haas. Harnessing Nonlinearity: Predicting chaotic systems and saving energy with reservoir computing.Science, 304(5667):78–80, 2004. 3

2004

[22] [22]

Real-time computing without stable states: A new framework for neural computation based on perturbations.Neural Com- putation, 14(11):2531–2560, 2002

Wolfgang Maass, Thomas Natschl¨ ager, and Henry Markram. Real-time computing without stable states: A new framework for neural computation based on perturbations.Neural Com- putation, 14(11):2531–2560, 2002. 3

2002

[23] [23]

Generating coherent patterns of activity from chaotic neural networks.Neuron, 63(4):544–557, 2009

David Sussillo and Larry F Abbott. Generating coherent patterns of activity from chaotic neural networks.Neuron, 63(4):544–557, 2009. 3, 4

2009

[24] [24]

Three unfinished works on the optimal storage capacity of networks.Journal of Physics A: Mathematical and General, 22(12):1983–1994,

Elizabeth Gardner and Bernard Derrida. Three unfinished works on the optimal storage capacity of networks.Journal of Physics A: Mathematical and General, 22(12):1983–1994,

1983

[25] [25]

World Scientific Publishing Company, 1987

Marc M´ ezard, Giorgio Parisi, and Miguel Angel Virasoro.Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company, 1987. 4, 15, 16

1987

[26] [26]

Passed & spurious: Descent algorithms and local minima in spiked matrix-tensor models

Stefano Sarao Mannelli, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborova. Passed & spurious: Descent algorithms and local minima in spiked matrix-tensor models. Ininterna- tional conference on machine learning, pages 4333–4342. PMLR, 2019. 4

2019

[27] [27]

Marvels and pitfalls of the langevin algorithm in noisy high- dimensional inference.Physical Review X, 10(1):011057, 2020

Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborov´ a. Marvels and pitfalls of the langevin algorithm in noisy high- dimensional inference.Physical Review X, 10(1):011057, 2020. 4

2020

[28] [28]

Analytical study of momentum-based accel- eration methods in paradigmatic high-dimensional non-convex problems.Advances in Neural Information Processing Systems, 34:187–199, 2021

Stefano Sarao Mannelli and Pierfrancesco Urbani. Analytical study of momentum-based accel- eration methods in paradigmatic high-dimensional non-convex problems.Advances in Neural Information Processing Systems, 34:187–199, 2021. 4

2021

[29] [29]

Francesca Mignacco, Pierfrancesco Urbani, and Lenka Zdeborov´ a. Stochasticity helps to nav- igate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem.Machine Learning: Science and Technology, 2(3):035029, 2021. 4

2021

[30] [30]

Stochastic gradient descent outperforms gradi- ent descent in recovering a high-dimensional signal in a glassy energy landscape.arXiv preprint arXiv:2309.04788, 2023

Persia Jana Kamali and Pierfrancesco Urbani. Stochastic gradient descent outperforms gradi- ent descent in recovering a high-dimensional signal in a glassy energy landscape.arXiv preprint arXiv:2309.04788, 2023. 4, 11, 12

arXiv 2023

[31] [31]

Out-of- equilibrium dynamical mean-field equations for the perceptron model.Journal of Physics A: Mathematical and Theoretical, 51(8):085002, 2018

Elisabeth Agoritsas, Giulio Biroli, Pierfrancesco Urbani, and Francesco Zamponi. Out-of- equilibrium dynamical mean-field equations for the perceptron model.Journal of Physics A: Mathematical and Theoretical, 51(8):085002, 2018. 4, 12

2018

[32] [32]

Dynam- ical mean-field theory for stochastic gradient descent in gaussian mixture classification.Ad- vances in Neural Information Processing Systems, 33:9540–9550, 2020

Francesca Mignacco, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborov´ a. Dynam- ical mean-field theory for stochastic gradient descent in gaussian mixture classification.Ad- vances in Neural Information Processing Systems, 33:9540–9550, 2020. 4, 12 26

2020

[33] [33]

The estimation error of general first order methods

Michael Celentano, Andrea Montanari, and Yuchen Wu. The estimation error of general first order methods. InConference on Learning Theory, pages 1078–1141. PMLR, 2020. 4

2020

[34] [34]

The effective noise of stochastic gradient de- scent.Journal of Statistical Mechanics: Theory and Experiment, 2022(8):083405, 2022

Francesca Mignacco and Pierfrancesco Urbani. The effective noise of stochastic gradient de- scent.Journal of Statistical Mechanics: Theory and Experiment, 2022(8):083405, 2022. 4, 12

2022

[35] [35]

A dynamical model of neural scaling laws.arXiv preprint arXiv:2402.01092, 2024

Blake Bordelon, Alexander Atanasov, and Cengiz Pehlevan. A dynamical model of neural scaling laws.arXiv preprint arXiv:2402.01092, 2024. 4

arXiv 2024

[36] [36]

Dynamical mean-field analysis of adaptive langevin diffusions: Propagation-of-chaos and convergence of the linear response.arXiv preprint arXiv:2504.15556, 2025

Zhou Fan, Justin Ko, Bruno Loureiro, Yue M Lu, and Yandi Shen. Dynamical mean-field analysis of adaptive langevin diffusions: Propagation-of-chaos and convergence of the linear response.arXiv preprint arXiv:2504.15556, 2025. 4

arXiv 2025

[37] [37]

Statistical physics of learning in high- dimensional chaotic systems.Journal of Statistical Mechanics: Theory and Experiment, 2023(11):113301, 2023

Samantha J Fournier and Pierfrancesco Urbani. Statistical physics of learning in high- dimensional chaotic systems.Journal of Statistical Mechanics: Theory and Experiment, 2023(11):113301, 2023. 4, 5, 6

2023

[38] [38]

Structure, disorder, and dynamics in task-trained recurrent neural circuits.bioRxiv, pages 2026–03,

David G Clark, Blake Bordelon, Jacob A Zavatone-Veth, and Cengiz Pehlevan. Structure, disorder, and dynamics in task-trained recurrent neural circuits.bioRxiv, pages 2026–03,

2026

[39] [39]

To appear

Samantha Fournier and Pierfrancesco Urbani. To appear. 2026. 4

2026

[40] [40]

Disordered high-dimensional optimal control.Journal of Physics A: Mathematical and Theoretical, 54(32):324001, 2021

Pierfrancesco Urbani. Disordered high-dimensional optimal control.Journal of Physics A: Mathematical and Theoretical, 54(32):324001, 2021. 4

2021

[41] [41]

Unpublished

Yuri Lombardo and Pierfrancesco Urbani. Unpublished. see: Yuri, lombardo optimization of the gradient descent dynamics in simple mean field spin glasses. 2021. 4

2021

[42] [42]

Optimal protocols for continual learning via statistical physics and control theory.Journal of Statistical Mechanics: Theory and Experiment, 2025(8):084004, 2025

Francesco Mori, Stefano Sarao Mannelli, and Francesca Mignacco. Optimal protocols for continual learning via statistical physics and control theory.Journal of Statistical Mechanics: Theory and Experiment, 2025(8):084004, 2025. 4

2025

[43] [43]

Macroscopic fluctuation theory.Reviews of Modern Physics, 87(2):593–636, 2015

Lorenzo Bertini, Alberto De Sole, Davide Gabrielli, Giovanni Jona-Lasinio, and Claudio Landim. Macroscopic fluctuation theory.Reviews of Modern Physics, 87(2):593–636, 2015. 5

2015

[44] [44]

Quantum optimal control theory.Journal of Physics B: Atomic, Molecular and Optical Physics, 40(18):R175–R211, 2007

J Werschnik and EKU Gross. Quantum optimal control theory.Journal of Physics B: Atomic, Molecular and Optical Physics, 40(18):R175–R211, 2007. 5

2007

[45] [45]

Non- reciprocal interactions and high-dimensional chaos: comparing dynamics and statistics of equi- libria in a solvable model

Samantha J Fournier, Alessandro Pacco, Valentina Ros, and Pierfrancesco Urbani. Non- reciprocal interactions and high-dimensional chaos: comparing dynamics and statistics of equi- libria in a solvable model. 2025. 5, 6

2025

[46] [46]

High-dimensional dynamical systems: co- existence of attractors, phase transitions, maximal lyapunov exponent and response to periodic drive.arXiv preprint arXiv:2511.09679, 2025

Samantha J Fournier and Pierfrancesco Urbani. High-dimensional dynamical systems: co- existence of attractors, phase transitions, maximal lyapunov exponent and response to periodic drive.arXiv preprint arXiv:2511.09679, 2025. 5, 6

arXiv 2025

[47] [47]

Chaos in high-dimensional dynamical systems with tunable non-reciprocity.arXiv preprint arXiv:2601.04702, 2026

Samantha Fournier and Pierfrancesco Urbani. Chaos in high-dimensional dynamical systems with tunable non-reciprocity.arXiv preprint arXiv:2601.04702, 2026. 6

Pith/arXiv arXiv 2026

[48] [48]

Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020

Timothy P Lillicrap, Adam Santoro, Luke Marris, Colin J Akerman, and Geoffrey Hinton. Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020. 8 27

2020

[49] [49]

A mean-field optimal control formulation of deep learning.arXiv preprint arXiv:1807.01083, 2018

E Weinan, Jiequn Han, and Qianxiao Li. A mean-field optimal control formulation of deep learning.arXiv preprint arXiv:1807.01083, 2018. 10

arXiv 2018

[50] [50]

Dynamical mean field theory for models of confluent tissues and beyond.SciPost Physics, 15(5):219, 2023

Persia Jana Kamali and Pierfrancesco Urbani. Dynamical mean field theory for models of confluent tissues and beyond.SciPost Physics, 15(5):219, 2023. 11

2023

[51] [51]

Emergence and scaling laws in sgd learning of shallow neural networks.Advances in Neural Information Processing Systems, 38:38227–38309, 2026

Yunwei Ren, Eshaan Nichani, Denny Wu, and Jason Lee. Emergence and scaling laws in sgd learning of shallow neural networks.Advances in Neural Information Processing Systems, 38:38227–38309, 2026. 11

2026

[52] [52]

Statistical dynamics of classical systems

Paul Cecil Martin, Eric D Siggia, and Harvey A Rose. Statistical dynamics of classical systems. Physical Review A, 8(1):423, 1973. 12

1973

[53] [53]

Hans-Karl Janssen. On a lagrangean for classical field dynamics and renormalization group calculations of dynamical critical properties.Zeitschrift f¨ ur Physik B Condensed Matter, 23(4):377–380, 1976. 12

1976

[54] [54]

Techniques de renormalisation de la th´ eorie des champs et dynamique des ph´ enomenes critiques.J

C De Dominicis. Techniques de renormalisation de la th´ eorie des champs et dynamique des ph´ enomenes critiques.J. Phys. Colloques, 37, 1976. 12

1976

[55] [55]

Dynamics as a substitute for replicas in systems with quenched random impurities.Physical Review B, 18(9):4913, 1978

C De Dominicis. Dynamics as a substitute for replicas in systems with quenched random impurities.Physical Review B, 18(9):4913, 1978. 12, 13

1978

[56] [56]

Oxford university press, 2021

Jean Zinn-Justin.Quantum field theory and critical phenomena, volume 171. Oxford university press, 2021. 12

2021

[57] [57]

Diffusion with stochastic resetting.Physical review letters, 106(16):160601, 2011

Martin R Evans and Satya N Majumdar. Diffusion with stochastic resetting.Physical review letters, 106(16):160601, 2011. 19 28

2011