pith. machine review for the scientific record. sign in

arxiv: 2604.16801 · v1 · submitted 2026-04-18 · 💻 cs.LG

Recognition: unknown

Continuous Limits of Coupled Flows in Representation Learning

Weiwei Xu, Xuanbo Lu, Xuanqi Zhao, Xuchun Tong, Zilin Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 07:02 UTC · model grok-4.3

classification 💻 cs.LG
keywords decentralized learningrepresentation learningRiemannian manifoldsstochastic differential equationsglobal dissipativitydisentangled featurescoupled flowsLangevin dynamics
0
0 comments X

The pith

Discrete decentralized learning on manifolds converges to orthogonally disentangled features.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models decentralized representation learning as coupled slow-fast dynamical systems on Riemannian manifolds. Discrete local updates are shown to converge uniformly to an overdamped Langevin stochastic differential equation through measure-theoretic limits. Stochastic analysis then establishes global dissipativity of the joint system, proving that weights align with the principal eigenspace and that linearly separable, orthogonally disentangled features arise spontaneously at equilibrium.

Core claim

Formalizing decentralized learning as a coupled slow-fast dynamical system on Riemannian manifolds, the discrete spatial transitions converge uniformly to an overdamped Langevin stochastic differential equation. Via the Itô-Poisson resolvent and a stochastic extension of LaSalle's Invariance Principle, the representation weights unconditionally avoid divergence and align strictly with the principal eigenspace of the spatial measure. A joint Lyapunov functional for the fully coupled spatial-parametric flow proves global dissipativity, with orthogonally disentangled, linearly separable features emerging at the stationary limit.

What carries the argument

The joint Lyapunov functional for the fully coupled spatial-parametric flow, which establishes global dissipativity via the Itô-Poisson resolvent and stochastic LaSalle invariance.

Load-bearing premise

The discrete spatial transitions converge uniformly to an overdamped Langevin stochastic differential equation via measure-theoretic limits on the Riemannian manifold.

What would settle it

A simulation or analytic counterexample in which the discrete coupled flow diverges or fails to align weights with the principal eigenspace of the spatial measure.

Figures

Figures reproduced from arXiv: 2604.16801 by Weiwei Xu, Xuanbo Lu, Xuanqi Zhao, Xuchun Tong, Zilin Li.

Figure 1
Figure 1. Figure 1: Coupled Slow-Fast Architecture. The framework operationalizes decentralized learning as intertwined streams regulated by a Global Objective E(X,W). It coordinates sys￾tem convergence via a joint dissipative mecha￾nism. Top (Fast Kinematics): Xτ undergoes Langevin diffusion. Bottom (Slow Plasticity): Wτ evolves via local Hebbian plasticity. Red dotted lines denote gradient guidance −∇E, en￾forcing a joint d… view at source ↗
Figure 2
Figure 2. Figure 2: Empirical Validation of the Macroscopic Limit. (Left) The fundamental bias-variance tradeoff in local graph operators. While the deterministic geometric bias decays as O(ε 2 N ), the stochastic variance diverges for infinitesimally small neighborhoods. (Middle) The fundamental topological necessity of the asymptotic scaling law (Eq. 1). Violating the spectral regime triggers severe numerical divergence of … view at source ↗
Figure 3
Figure 3. Figure 3: Macroscopic Spectral Evolution and Phase Transitions. Temporal trajectories of the latent eigenspectrum over 1.5 × 104 steps. Red markers denote principal structures isolated by the Algebraic Sieve; gray lines indicate suppressed noise. In heavily over-parameterized spaces (e.g., VGG-4096, BERT-768), the flow collapses uninformative dimensions into the null space (y = 0), providing empirical evidence for t… view at source ↗
read the original abstract

While modern representation learning relies heavily on global error signals, decentralized algorithms driven by local interactions offer a fundamental distributed alternative. However, the macroscopic convergence properties of these discrete dynamics on continuous data manifolds remain theoretically unresolved, notoriously suffering from parameter explosion. We bridge this gap by formalizing decentralized learning as a coupled slow-fast dynamical system on Riemannian manifolds. First, using measure-theoretic limits, we prove that the discrete spatial transitions converge uniformly to an overdamped Langevin stochastic differential equation. Second, via the It\^o-Poisson resolvent and a stochastic extension of LaSalle's Invariance Principle, we establish that the representation weights unconditionally avoid divergence and align strictly with the principal eigenspace of the spatial measure. Finally, we construct a joint Lyapunov functional for the fully coupled spatial-parametric flow. This proves global dissipativity and demonstrates that orthogonally disentangled, linearly separable features emerge spontaneously at the stationary limit. Our framework bridges discrete algorithms with continuous stochastic analysis, providing a formal theoretical baseline for decentralized representation learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript formalizes decentralized representation learning as a coupled slow-fast dynamical system on Riemannian manifolds. It claims three main results: (1) using measure-theoretic limits, discrete spatial transitions converge uniformly to an overdamped Langevin SDE; (2) via the Itô-Poisson resolvent and stochastic LaSalle invariance, representation weights unconditionally avoid divergence and align with the principal eigenspace of the spatial measure; (3) a joint Lyapunov functional for the fully coupled flow proves global dissipativity and shows that orthogonally disentangled, linearly separable features emerge spontaneously at the stationary limit.

Significance. If the central derivations hold with the required technical conditions, the work would provide a valuable theoretical bridge between discrete decentralized algorithms and continuous stochastic analysis on manifolds. It offers a formal baseline for understanding convergence and the spontaneous emergence of disentangled representations without global error signals, potentially informing the design of local-interaction methods that avoid parameter explosion.

major comments (2)
  1. [Abstract] Abstract, first proof step: the claim that discrete spatial transitions converge uniformly to the overdamped Langevin SDE via measure-theoretic limits is load-bearing for all subsequent results, yet the abstract invokes only “measure-theoretic limits” without stating the requisite Lipschitz or moment conditions on the coupled interaction kernels, tightness of transition measures, or equicontinuity in local charts with respect to the manifold metric. No error bounds or assumption checks are supplied.
  2. [Abstract] Abstract, second and third steps: the application of the Itô-Poisson resolvent plus stochastic LaSalle to establish unconditional alignment, and the construction of the joint Lyapunov functional for global dissipativity, are derived inside the continuous limit; any gap in the discrete-to-continuous passage therefore propagates directly to the headline claims of spontaneous orthogonal disentanglement and linear separability.
minor comments (1)
  1. [Abstract] The LaTeX fragment “It^o-Poisson” should be rendered as “Itô-Poisson” for typographic correctness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. We address each major comment point by point below, indicating where revisions will be incorporated.

read point-by-point responses
  1. Referee: [Abstract] Abstract, first proof step: the claim that discrete spatial transitions converge uniformly to the overdamped Langevin SDE via measure-theoretic limits is load-bearing for all subsequent results, yet the abstract invokes only “measure-theoretic limits” without stating the requisite Lipschitz or moment conditions on the coupled interaction kernels, tightness of transition measures, or equicontinuity in local charts with respect to the manifold metric. No error bounds or assumption checks are supplied.

    Authors: The abstract is a high-level summary. The full technical conditions—Lipschitz continuity and moment bounds on the interaction kernels (Assumption 3.1), tightness of transition measures, and equicontinuity in local charts—are stated explicitly in Section 3. Theorem 3.2 proves uniform convergence to the overdamped Langevin SDE with explicit error bounds of order O(Δt + h), where Δt is the time step and h the spatial discretization parameter. We will revise the abstract to reference these conditions and the error bound for clarity. revision: yes

  2. Referee: [Abstract] Abstract, second and third steps: the application of the Itô-Poisson resolvent plus stochastic LaSalle to establish unconditional alignment, and the construction of the joint Lyapunov functional for global dissipativity, are derived inside the continuous limit; any gap in the discrete-to-continuous passage therefore propagates directly to the headline claims of spontaneous orthogonal disentanglement and linear separability.

    Authors: The second and third results are indeed derived for the limiting SDE. However, Section 3 establishes the uniform convergence under the stated assumptions, and the appendix verifies that the discrete system satisfies the prerequisites for the Itô-Poisson resolvent and stochastic LaSalle invariance to carry over. The joint Lyapunov functional is constructed to be preserved in the limit, so the claims on alignment and disentanglement hold for the original discrete dynamics via the convergence theorem. We see no unaddressed gap. revision: no

Circularity Check

0 steps flagged

No circularity: derivations invoke external theorems without self-referential reduction.

full rationale

The paper's chain proceeds by (1) invoking measure-theoretic limits to obtain uniform convergence of discrete transitions to an overdamped Langevin SDE on a Riemannian manifold, (2) applying the Itô-Poisson resolvent together with a stochastic extension of LaSalle's invariance principle to obtain unconditional alignment with the principal eigenspace, and (3) constructing a joint Lyapunov functional to prove global dissipativity and spontaneous orthogonal disentanglement. Each step relies on standard, externally established mathematical machinery (SDE convergence theory, stochastic LaSalle) whose statements and proofs are independent of the present paper's fitted quantities or definitions. No equation is shown to equal its own input by construction, no parameter is fitted on a subset and then relabeled a prediction, and no load-bearing premise reduces to a self-citation whose own justification is internal. The framework therefore remains non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard stochastic analysis plus two modeling choices: data on a Riemannian manifold and the slow-fast coupling structure. No free parameters or new entities are introduced in the abstract.

axioms (2)
  • domain assumption Data points lie on a Riemannian manifold
    Invoked to define the continuous spatial transitions and the principal eigenspace alignment.
  • ad hoc to paper Measure-theoretic limits of discrete transitions exist and are uniform
    Central step used to obtain the overdamped Langevin SDE.

pith-pipeline@v0.9.0 · 5477 in / 1278 out tokens · 35008 ms · 2026-05-10T07:02:51.252386+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

136 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    Springer Science & Business Media, 2012

    Ralph Abraham, Jerrold E Marsden, and Tudor Ratiu.Manifolds, Tensor Analysis, and Applications, volume 75. Springer Science & Business Media, 2012

  2. [2]

    Princeton University Press, 2009

    P-A Absil, Robert Mahony, and Rodolphe Sepulchre.Optimization algorithms on matrix manifolds. Princeton University Press, 2009

  3. [3]

    Birkhäuser, 2008

    Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré.Gradient Flows: In Metric Spaces and in the Space of Probability Measures. Birkhäuser, 2008

  4. [4]

    A theoretical analysis of contrastive unsupervised representation learning

    Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, and Nikunj Saunshi. A theoretical analysis of contrastive unsupervised representation learning. In International Conference on Machine Learning, pages 5628–5637. PMLR, 2019

  5. [5]

    Springer, 2014

    Dominique Bakry, Ivan Gentil, and Michel Ledoux.Analysis and Geometry of Markov Diffusion Operators, volume 348. Springer, 2014

  6. [6]

    Neural networks and principal component analysis: Learning from examples without local minima.Neural networks, 2(1):53–58, 1989

    Pierre Baldi and Kurt Hornik. Neural networks and principal component analysis: Learning from examples without local minima.Neural networks, 2(1):53–58, 1989

  7. [7]

    Laplacian eigenmaps and spectral techniques for embedding and clustering

    Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. InAdvances in neural information processing systems, pages 585–591, 2001

  8. [8]

    Representation learning: A review and new perspectives.IEEE transactions on pattern analysis and machine intelligence, 35(8): 1798–1828, 2013

    Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives.IEEE transactions on pattern analysis and machine intelligence, 35(8): 1798–1828, 2013

  9. [9]

    Prentice hall, 1989

    Dimitri P Bertsekas and John N Tsitsiklis.Parallel and distributed computation: numerical methods. Prentice hall, 1989

  10. [10]

    John Wiley & Sons, 3rd edition, 1995

    Patrick Billingsley.Probability and Measure. John Wiley & Sons, 3rd edition, 1995

  11. [11]

    Springer Science & Business Media, 2007

    Vladimir I Bogachev.Measure theory, volume 1. Springer Science & Business Media, 2007

  12. [12]

    Springer, 2009

    Vivek S Borkar.Stochastic approximation: a dynamical systems viewpoint. Springer, 2009

  13. [13]

    Oxford University Press, 2013

    Stéphane Boucheron, Gábor Lugosi, and Pascal Massart.Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013

  14. [14]

    Geometric deep learning: going beyond euclidean data.IEEE signal processing magazine, 34 (4):18–42, 2017

    Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. Geometric deep learning: going beyond euclidean data.IEEE signal processing magazine, 34 (4):18–42, 2017

  15. [15]

    Springer Science & Business Media, 2002

    Han-Fu Chen.Stochastic Approximation and Its Applications, volume 64. Springer Science & Business Media, 2002

  16. [16]

    Neural ordinary differential equations

    Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. InAdvances in neural information processing systems, volume 31, 2018

  17. [17]

    A simple framework for contrastive learning of visual representations

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PMLR, 2020

  18. [18]

    On the global convergence of gradient descent for over- parameterized models using optimal transport

    Lénaïc Chizat and Francis Bach. On the global convergence of gradient descent for over- parameterized models using optimal transport. InAdvances in neural information processing systems, volume 31, 2018

  19. [19]

    American Mathematical Society, 1997

    Fan RK Chung.Spectral Graph Theory, volume 92. American Mathematical Society, 1997

  20. [20]

    Diffusion maps.Applied and computational harmonic analysis, 21(1):5–30, 2006

    Ronald R Coifman and Stéphane Lafon. Diffusion maps.Applied and computational harmonic analysis, 21(1):5–30, 2006

  21. [21]

    Cambridge university press, 2014

    Giuseppe Da Prato and Jerzy Zabczyk.Stochastic equations in infinite dimensions. Cambridge university press, 2014. 10

  22. [22]

    Random geometric graphs.Physical review E, 66(1): 016121, 2002

    Jesper Dall and Michael Christensen. Random geometric graphs.Physical review E, 66(1): 016121, 2002

  23. [23]

    The rotation of eigenvectors by a perturbation

    Chandler Davis and William M Kahan. The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970

  24. [24]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 4171–4186, 2019

  25. [25]

    Untangling invariant object recognition.Trends in cognitive sciences, 11(8):333–341, 2007

    James J DiCarlo and David D Cox. Untangling invariant object recognition.Trends in cognitive sciences, 11(8):333–341, 2007. doi: 10.1016/j.tics.2007.06.010

  26. [26]

    American Mathematical Society, 1977

    Joseph Diestel and John Jerry Uhl.Vector Measures, volume 15 ofMathematical Surveys and Monographs. American Mathematical Society, 1977

  27. [27]

    Springer-Verlag Heidelberg, 5th edition, 2017

    Reinhard Diestel.Graph Theory. Springer-Verlag Heidelberg, 5th edition, 2017

  28. [28]

    Gossip algorithms for distributed signal processing.Proceedings of the IEEE, 98 (11):1847–1864, 2010

    Alexandros G Dimakis, Soummya Kar, José MF Moura, Michael G Rabbat, and Anna Scaglione. Gossip algorithms for distributed signal processing.Proceedings of the IEEE, 98 (11):1847–1864, 2010

  29. [29]

    Birkhäuser, 1992

    Manfredo Perdigao Do Carmo.Riemannian Geometry. Birkhäuser, 1992

  30. [30]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021

  31. [31]

    Cambridge University Press, 2002

    Richard M Dudley.Real Analysis and Probability. Cambridge University Press, 2002

  32. [32]

    The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

    Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

  33. [33]

    John Wiley & Sons, 2009

    Stewart N Ethier and Thomas G Kurtz.Markov Processes: Characterization and Convergence. John Wiley & Sons, 2009

  34. [34]

    American Mathematical Society, 2nd edition, 2010

    Lawrence C Evans.Partial Differential Equations, volume 19. American Mathematical Society, 2nd edition, 2010

  35. [35]

    Springer, 1969

    Herbert Federer.Geometric Measure Theory. Springer, 1969

  36. [36]

    Testing the manifold hypothesis

    Charles Fefferman, Sanjoy Mitter, and Hariharan Narayanan. Testing the manifold hypothesis. Journal of the American Mathematical Society, 29(4):983–1049, 2016

  37. [37]

    John Wiley & Sons, 2nd edition, 1999

    Gerald B Folland.Real Analysis: Modern Techniques and Their Applications. John Wiley & Sons, 2nd edition, 1999

  38. [38]

    Springer Science & Business Media, 3rd edition, 2012

    Mark I Freidlin and Alexander D Wentzell.Random Perturbations of Dynamical Systems, volume 260. Springer Science & Business Media, 3rd edition, 2012

  39. [39]

    Springer Berlin, 3rd edition, 2004

    Crispin W Gardiner.Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences, volume 13. Springer Berlin, 3rd edition, 2004

  40. [40]

    JHU press, 2013

    Gene H Golub and Charles F Van Loan.Matrix computations. JHU press, 2013

  41. [41]

    MIT press, 2016

    Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep learning. MIT press, 2016

  42. [42]

    An empirical investigation of catastrophic forgetting in gradient-based neural networks.arXiv preprint arXiv:1312.6211,

    Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. An empirical investigation of catastrophic forgetting in gradient-based neural networks.arXiv preprint arXiv:1312.6211, 2013

  43. [43]

    Springer Science & Business Media, 2nd edition, 2004

    Alfred Gray.Tubes, volume 221. Springer Science & Business Media, 2nd edition, 2004. 11

  44. [44]

    The critical power for asymptotic connectivity in wireless networks.Stochastic Analysis and Applications, 16(3):547–566, 1998

    Piyush Gupta and Panganamala R Kumar. The critical power for asymptotic connectivity in wireless networks.Stochastic Analysis and Applications, 16(3):547–566, 1998

  45. [45]

    Springer, 2015

    Brian C Hall.Lie groups, Lie algebras, and representations: an elementary introduction, volume 222. Springer, 2015

  46. [46]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

  47. [47]

    Momentum contrast for unsupervised visual representation learning

    Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020

  48. [48]

    From graphs to manifolds–weak and strong pointwise consistency of graph laplacians

    Matthias Hein, Jean-Yves Audibert, and Ulrike V on Luxburg. From graphs to manifolds–weak and strong pointwise consistency of graph laplacians. InLearning Theory: 18th Annual Conference on Learning Theory, COLT 2005, pages 470–485. Springer, 2005

  49. [49]

    Towards a Definition of Disentangled Representations

    Irina Higgins, David Amos, David Pfau, Demis Hassabis, Matthew Botvinick, and Danilo Rezende. Towards a definition of disentangled representations.arXiv preprint arXiv:1812.02230, 2018

  50. [50]

    Academic press, 2012

    Morris W Hirsch, Stephen Smale, and Robert L Devaney.Differential equations, dynamical systems, and an introduction to chaos. Academic press, 2012

  51. [51]

    Neural networks and physical systems with emergent collective computational abilities.Proceedings of the national academy of sciences, 79(8):2554–2558, 1982

    John J Hopfield. Neural networks and physical systems with emergent collective computational abilities.Proceedings of the national academy of sciences, 79(8):2554–2558, 1982

  52. [52]

    Cambridge University Press, 2nd edition, 2012

    Roger A Horn and Charles R Johnson.Matrix Analysis. Cambridge University Press, 2nd edition, 2012

  53. [53]

    American Mathematical Society, 2002

    Elton P Hsu.Stochastic Analysis on Manifolds, volume 38. American Mathematical Society, 2002

  54. [54]

    Independent component analysis: algorithms and applications

    Aapo Hyvärinen and Erkki Oja. Independent component analysis: algorithms and applications. Neural networks, 13(4-5):411–430, 2000

  55. [55]

    Understanding dimensional collapse in contrastive self-supervised learning

    Li Jing, Pascal Vincent, Yann LeCun, and Yuandong Tian. Understanding dimensional collapse in contrastive self-supervised learning. InInternational Conference on Learning Representations, 2022

  56. [56]

    The variational formulation of the fokker–planck equation.SIAM journal on mathematical analysis, 29(1):1–17, 1998

    Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formulation of the fokker–planck equation.SIAM journal on mathematical analysis, 29(1):1–17, 1998

  57. [57]

    Springer Science & Business Media, 5th edition, 2008

    Jürgen Jost.Riemannian Geometry and Geometric Analysis. Springer Science & Business Media, 5th edition, 2008

  58. [58]

    Springer Science & Business Media, 2nd edition, 1991

    Ioannis Karatzas and Steven Shreve.Brownian Motion and Stochastic Calculus, volume 113. Springer Science & Business Media, 2nd edition, 1991

  59. [59]

    Springer Science & Business Media, 2013

    Tosio Kato.Perturbation theory for linear operators, volume 132. Springer Science & Business Media, 2013

  60. [60]

    John Wiley & Sons, 1979

    Frank P Kelly.Reversibility and Stochastic Networks. John Wiley & Sons, 1979

  61. [61]

    Prentice hall Upper Saddle River, NJ, 2002

    Hassan K Khalil.Nonlinear systems, volume 3. Prentice hall Upper Saddle River, NJ, 2002

  62. [62]

    Springer Science & Business Media, 2011

    Rafail Khasminskii.Stochastic stability of differential equations, volume 66. Springer Science & Business Media, 2011

  63. [63]

    On the principle of averaging for itô’s stochastic differential equations

    Rafail Z Khasminskii. On the principle of averaging for itô’s stochastic differential equations. Kibernetika, 4(3):260–279, 1968

  64. [64]

    Springer, 1992

    Peter E Kloeden and Eckhard Platen.Numerical solution of stochastic differential equations. Springer, 1992. 12

  65. [65]

    MIT Press, 1984

    Harold J Kushner.Approximation and Weak Convergence Methods for Random Processes, with Applications to Stochastic Systems Theory. MIT Press, 1984

  66. [66]

    Springer, 2nd edition, 2003

    Harold J Kushner and G George Yin.Stochastic Approximation and Recursive Algorithms and Applications. Springer, 2nd edition, 2003

  67. [67]

    Some extensions of liapunov’s second method.IRE Transactions on circuit theory, 7(4):520–527, 1960

    Joseph P LaSalle. Some extensions of liapunov’s second method.IRE Transactions on circuit theory, 7(4):520–527, 1960

  68. [68]

    Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

    Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

  69. [69]

    A tutorial on energy-based learning

    Yann LeCun, Sumit Chopra, Raia Hadsell, Marc’Aurelio Ranzato, and Fu-Jie Huang. A tutorial on energy-based learning. In Gökhan H. Bakır, Thomas Hofmann, Bernhard Schölkopf, Alexander J. Smola, and Ben Taskar, editors,Predicting Structured Data, pages 191–246. MIT Press, 2006

  70. [70]

    Springer, 2018

    John M Lee.Introduction to Riemannian Manifolds. Springer, 2018

  71. [71]

    American Mathematical Society, 2017

    David A Levin and Yuval Peres.Markov Chains and Mixing Times, volume 107. American Mathematical Society, 2017

  72. [72]

    Neuro-vesicles: Neuromodulation should be a dynamical system, not a tensor decoration, 2025

    Zilin Li, Weiwei Xu, and Vicki Kane. Neuro-vesicles: Neuromodulation should be a dynamical system, not a tensor decoration, 2025. URLhttps://arxiv.org/abs/2512.06966

  73. [73]

    Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020

    Timothy P Lillicrap, Adam Santoro, Luke Marris, Colin J Akerman, and Geoffrey Hinton. Backpropagation and the brain.Nature Reviews Neuroscience, 21(6):335–346, 2020

  74. [74]

    Self-organization in a perceptual network.Computer, 21(3):105–117, 1988

    Ralph Linsker. Self-organization in a perceptual network.Computer, 21(3):105–117, 1988

  75. [75]

    Challenging common assumptions in the unsupervised learning of disentangled representations

    Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Raetsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. Challenging common assumptions in the unsupervised learning of disentangled representations. InInternational conference on machine learning, pages 4114–4124. PMLR, 2019

  76. [76]

    John Wiley & Sons, 3rd edition, 2019

    Jan R Magnus and Heinz Neudecker.Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley & Sons, 3rd edition, 2019

  77. [77]

    Stochastic versions of the lasalle theorem.Journal of Differential Equations, 153(1):175–195, 1999

    Xuerong Mao. Stochastic versions of the lasalle theorem.Journal of Differential Equations, 153(1):175–195, 1999

  78. [78]

    A mean field view of the landscape of two-layer neural networks.Proceedings of the National Academy of Sciences, 115(33): E7665–E7671, 2018

    Song Mei, Andrea Montanari, and Phan-Minh Nguyen. A mean field view of the landscape of two-layer neural networks.Proceedings of the National Academy of Sciences, 115(33): E7665–E7671, 2018

  79. [79]

    SIAM, 2000

    Carl D Meyer.Matrix analysis and applied linear algebra, volume 71. SIAM, 2000

  80. [80]

    World Scientific Publishing Company, 1987

    Marc Mézard, Giorgio Parisi, and Miguel Angel Virasoro.Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company, 1987

Showing first 80 references.