pith. machine review for the scientific record. sign in

arxiv: 2604.00333 · v2 · submitted 2026-04-01 · 🧮 math.NA · cs.LG· cs.NA· physics.comp-ph

Recognition: 2 theorem links

· Lean Theorem

MVNN: A Measure-Valued Neural Network for Learning McKean-Vlasov Dynamics from Particle Data

Hayden Schaeffer, Liyao Lyu, Xinyue Yu

Authors on Pith no claims yet

Pith reviewed 2026-05-13 22:43 UTC · model grok-4.3

classification 🧮 math.NA cs.LGcs.NAphysics.comp-ph
keywords measure-valued neural networkMcKean-Vlasov dynamicsparticle trajectory learningpropagation of chaosuniversal approximationcollective behavior modelingmean-field limits
0
0 comments X

The pith

A measure-valued neural network infers interaction drifts in McKean-Vlasov systems directly from observed particle trajectories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an architecture that extends ordinary neural networks to probability measures so that drift terms depending on the full empirical distribution can be learned from particle paths. It proves that the resulting measure-dependent ODEs are well-posed and that the associated particle system satisfies propagation of chaos. Under an explicit low-dimensional dependence assumption on the measure, the network is shown to enjoy universal approximation together with quantitative rates. Numerical tests on first- and second-order aggregation, alignment, and multi-group models demonstrate accurate in- and out-of-distribution prediction. The approach therefore supplies a data-driven route to closing the mean-field limit without hand-crafted kernels.

Core claim

We introduce a measure-valued neural network that infers measure-dependent interaction (drift) terms directly from particle-trajectory observations. The architecture learns cylindrical features via an embedding network that maps distributions to vectors. We establish well-posedness of the resulting dynamics, prove propagation-of-chaos for the interacting-particle system, and obtain universal approximation together with quantitative rates under a low-dimensional measure-dependence assumption.

What carries the argument

The measure-valued neural network (MVNN), which operates on probability measures by learning cylindrical features through a distribution-to-vector embedding network.

If this is right

  • The learned dynamics remain well-posed for any initial measure.
  • Finite-particle simulations converge to the mean-field limit as the number of particles grows.
  • The network recovers both deterministic and stochastic versions of the Motsch-Tadmor, Cucker-Smale, and attraction-repulsion models from data.
  • Prediction accuracy persists on out-of-distribution initial configurations and parameter regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same embedding idea could be applied to learn nonlocal kernels in other nonlocal PDEs such as aggregation-diffusion equations.
  • Quantitative rates under the low-dimensional assumption suggest that the method may scale to moderately high-dimensional state spaces provided the measure dependence stays low-dimensional.
  • If the embedding network is replaced by a learned graph neural network, the architecture might capture heterogeneous interaction rules without explicit low-dimensional reduction.

Load-bearing premise

The interaction drift depends on the measure through a low-dimensional feature map.

What would settle it

A concrete particle trajectory dataset whose learned drift fails to reproduce the observed collective motion once the embedding dimension is fixed below the true intrinsic dimension of the measure dependence.

Figures

Figures reproduced from arXiv: 2604.00333 by Hayden Schaeffer, Liyao Lyu, Xinyue Yu.

Figure 1
Figure 1. Figure 1: 1D Motsch-Tadmor dynamics: Empirical density ρ(x, t) for the 1D Motsch-Tadmor dynamics: comparison between the reference N-particle simulation (orange) and the MVNN-learned mean-field model (blue). Columns show t = 0, 1, 2; rows correspond to three unseen initial distributions. Densities are estimated using Gaussian kernel density estimation. The L 2 error is computed between the KDE-smoothed densities. se… view at source ↗
Figure 2
Figure 2. Figure 2: Comparisons on 1D Motsch-Tadmor dynamics: Empirical density ρ(x, t) for the 1D Motsch-Tadmor dynamics: comparison between the reference N-particle simulation (green), the MVNN-learned mean-field model (orange), and the prediction from the Gaussian process model [83] (blue). Columns show t = 0, 0.5, 1; rows correspond to three unseen initial distributions. Densities are estimated using Gaussian kernel densi… view at source ↗
Figure 3
Figure 3. Figure 3: Comparisons on 1D Motsch-Tadmor dynamics: L 2 error for the 1D Motsch-Tadmor dynamics: comparison of the L 2 error for the Gaussian process model (blue), the MVNN model trained on 16 agents, 9 trajectories, and 20 timesteps (orange), and the MVNN model trained on 16, 000 agents, 100 trajectories, and 200 timesteps (green). The L 2 error is computed between the KDE-smoothed densities. Columns correspond to … view at source ↗
Figure 4
Figure 4. Figure 4: Simulation Time Comparison: Comparison of average simulation times (seconds) against number of agents N for the MVNN and Gaussian process models over 10 trials. where Xi t ∈ R 2 denotes the position of agent i. The interaction kernel ϕ combines short-range repulsion and long-range attraction, modeled by a sum of Gaussians: ϕ(r) = crep exp −(r/ℓrep) 2  − catt exp −(r/ℓatt) 2  . We use the parameters crep … view at source ↗
Figure 5
Figure 5. Figure 5: Stochastic Motsch-Tadmor dynamics (σ = 0.1): density evolution. Empirical density ρ(x, t) from the reference interacting-particle simulation (orange) and from the MVNN-learned McKean-Vlasov model (blue), shown at t = 0, 1, 2 for an unseen initial distribution. Densities are estimated via Gaussian KDE, and the reported L 2 error is computed between the kernel-smoothed densities. −1 0 1 t = 0 t = 1 t = 2 −1 … view at source ↗
Figure 6
Figure 6. Figure 6: 2D aggregation model with ring-shaped initialization. The upper row displays the ground truth particle trajectories, while the lower row shows the evolution predicted by the learned mean-field dynamics. The model accurately preserves the topological structure of the ring over time. low-density region on the left and a high-density region on the right). The results are presented in Figures 7, 8, and 9, resp… view at source ↗
Figure 7
Figure 7. Figure 7: 2D aggregation model with double-ring initialization. Comparison between the ground truth particle system (upper row) and the learned mean-field dynamics (lower row). The model correctly reproduces the contraction of both rings despite never seeing this topology during training. −2 0 2 t = 0 t = 1 t = 2 −2 0 2 −2 0 2 −2 0 2 −2 0 2 0.0 0.1 0.2 0.3 0.4 [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: 2D aggregation model with disk-shaped initialization. The learned dynamics (lower row) accurately capture the collapse of the uniform disk, matching the ground truth (upper row). where ϕk,l represents the interaction kernel between agents in group k and group l. The corresponding mean-field dynamics can be written in the form: X˙ i,k t = bk  X i,k t , µ1, . . . , µK  , i = 1, . . . , Nk, k = 1, . . . , K… view at source ↗
Figure 9
Figure 9. Figure 9: 2D aggregation model with binary asymmetric initialization. Evolution of a system initialized with heterogeneous densities (low density left, high density right). The learned model (lower row) preserves the density gradient and correctly predicts the asymmetric aggregation process. 3.1 Multi-Group Measure-Valued Neural Network We extend the proposed framework to heterogeneous systems composed of multiple i… view at source ↗
Figure 10
Figure 10. Figure 10: Hierarchical dynamics: Initial condition 1. Evolution of the multi-group system initialized with spatially separated populations. The rows correspond to Group 1, Group 2, and Group 3. The learned model (blue) faithfully reproduces the reference particle dynamics (orange), capturing the directional information flow from Group 3 down to Group 1. For each trajectory m, the initial conditions Xi 0,m, Vi 0,m … view at source ↗
Figure 11
Figure 11. Figure 11: Hierarchical dynamics: Initial condition 2. Evolution of the multi-group system initialized with spatially separated populations. The rows correspond to Group 1, Group 2, and Group 3. The learned model (blue) accurately reproduces the reference particle dynamics (orange), capturing the directional information flow from Group 3 down to Group 1. hence, bθ(X, V, µN t ) ≈ φint  X, V, 1 N X N j=1 φemb(X j,N … view at source ↗
Figure 12
Figure 12. Figure 12: Second-order attraction-repulsion: Evolution of the second-order attraction-repulsion model initialized with ring-shaped initial position and Gaussian initial velocity. The upper rows display ground truth particle positions and velocities, while the lower rows show the positions and velocities predicted by the learned MVNN. 5 Discussion We proposed a measure-valued neural network (MVNN) framework for esti… view at source ↗
Figure 13
Figure 13. Figure 13: Second-order attraction-repulsion: Evolution of the second-order attraction-repulsion model initialized with double ring-shaped initial position and Gaussian initial velocity. The upper rows display ground truth particle positions and velocities, while the lower rows show the positions and velocities predicted by the learned MVNN. mapping discrete pointwise evaluations of input functions to outputs, simil… view at source ↗
Figure 14
Figure 14. Figure 14: Second-order attraction-repulsion: Evolution of the second-order attraction-repulsion model initialized with disk-shaped initial position and Gaussian initial velocity. The upper rows display ground truth particle positions and velocities, while the lower rows show the positions and velocities predicted by the learned MVNN. Acknowledgments We would like to thank Adrien Weihs for the insightful discussions… view at source ↗
Figure 15
Figure 15. Figure 15: Second-order attraction-repulsion: Evolution of the second-order attraction-repulsion model initialized with binary asymmetric initial position (low density left, high density right) and Gaussian initial velocity. The upper rows display ground truth particle positions and velocities, while the lower rows show the positions and velocities predicted by the learned MVNN. [3] Theodore Kolokolnikov, Hui Sun, D… view at source ↗
Figure 16
Figure 16. Figure 16: Second-order Cucker-Smale: Evolution of the second-order Cucker-Smale model initialized with Gaussian initial position and Gaussian initial velocity. The upper rows display ground truth particle positions and velocities, while the lower rows show the positions and velocities predicted by the learned MVNN. [8] Quanjun Lang and Fei Lu. Learning interaction kernels in mean-field equations of first-order syst… view at source ↗
Figure 17
Figure 17. Figure 17: Second-order Cucker-Smale: Evolution of the second-order Cucker-Smale model initialized with a two-component Gaussian mixture initial position and Gaussian initial velocity. The upper rows display ground truth particle positions and velocities, while the lower rows show the positions and velocities predicted by the learned MVNN. [16] Zecheng Gan, Xuanzhao Gao, Jiuyang Liang, and Zhenli Xu. Random batch ew… view at source ↗
read the original abstract

Collective behaviors that emerge from interactions are fundamental to numerous biological systems. To learn such interacting forces from observations, we introduce a measure-valued neural network that infers measure-dependent interaction (drift) terms directly from particle-trajectory observations. The proposed architecture generalizes standard neural networks to operate on probability measures by learning cylindrical features, using an embedding network that produces scalable distribution-to-vector representations. On the theory side, we establish well-posedness of the resulting dynamics and prove propagation-of-chaos for the associated interacting-particle system. We further show universal approximation and quantitative approximation rates under a low-dimensional measure-dependence assumption. Numerical experiments on first and second order systems, including deterministic and stochastic Motsch-Tadmor dynamics, two-dimensional attraction-repulsion aggregation, Cucker-Smale dynamics, and a hierarchical multi-group system, demonstrate accurate prediction and strong out-of-distribution generalization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces a measure-valued neural network (MVNN) that learns measure-dependent drift terms in McKean-Vlasov equations directly from particle trajectory data. The architecture employs cylindrical features and an embedding network to map probability measures to vectors. Theoretical results include well-posedness of the learned dynamics, propagation of chaos for the associated particle system, and universal approximation with quantitative rates under an explicit low-dimensional measure-dependence assumption on the interaction kernel. Numerical experiments on first- and second-order systems (Motsch-Tadmor, Cucker-Smale, attraction-repulsion aggregation, and hierarchical multi-group models) demonstrate accurate prediction and out-of-distribution generalization.

Significance. If the approximation guarantees and numerical performance hold, the work would supply a theoretically grounded data-driven method for inferring interaction kernels in mean-field limits, with potential applications to collective behavior modeling in biology and physics. The combination of architecture design, well-posedness/propagation-of-chaos results, and reported numerical success on multiple systems constitutes a substantive contribution to learning McKean-Vlasov dynamics.

major comments (1)
  1. [Abstract and theory section] Abstract and theory section: the universal approximation and quantitative approximation rates are proved only under an explicit low-dimensional measure-dependence assumption on the interaction kernel. The numerical examples (Motsch-Tadmor, Cucker-Smale, aggregation) are all constructed to satisfy this assumption by design; no controlled experiment is reported in which the assumption is violated to quantify performance degradation or to test necessity of the structural restriction.
minor comments (1)
  1. Notation for the embedding network and cylindrical features should be introduced with explicit dimension tracking to clarify how the map from measures to vectors scales with particle number.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. We respond to the major comment below.

read point-by-point responses
  1. Referee: [Abstract and theory section] Abstract and theory section: the universal approximation and quantitative approximation rates are proved only under an explicit low-dimensional measure-dependence assumption on the interaction kernel. The numerical examples (Motsch-Tadmor, Cucker-Smale, aggregation) are all constructed to satisfy this assumption by design; no controlled experiment is reported in which the assumption is violated to quantify performance degradation or to test necessity of the structural restriction.

    Authors: We thank the referee for highlighting this point. It is correct that the universal approximation result with quantitative rates requires the explicit low-dimensional measure-dependence assumption on the interaction kernel, as stated in the abstract and theory sections. The numerical examples are constructed to satisfy the assumption by design so that the learned dynamics can be validated directly against the theoretical setting. The MVNN architecture itself is general and does not require the assumption for implementation or training; the restriction is used only to derive the quantitative rates. We agree that testing performance when the assumption is violated would be informative. In the revised manuscript we will add a clarifying paragraph in the theory section and a remark in the numerical experiments section that explicitly notes the role of the assumption and discusses its implications for approximation quality outside this regime. revision: partial

Circularity Check

0 steps flagged

No significant circularity; central claims rest on independent architecture and explicit assumptions

full rationale

The derivation introduces a cylindrical-feature embedding network for measures and establishes well-posedness, propagation-of-chaos, and universal approximation under an explicitly stated low-dimensional measure-dependence assumption on the interaction kernel. This assumption is an external structural restriction on the target dynamics, not a quantity fitted or defined by the network itself. No equation reduces a prediction to a fitted parameter by construction, no uniqueness theorem is imported solely via self-citation, and no ansatz is smuggled through prior work. Numerical examples validate the method but do not define the theoretical guarantees. The chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; the primary unverified premise is the low-dimensional measure-dependence assumption invoked for quantitative rates. No explicit free parameters or invented entities are named.

axioms (1)
  • domain assumption low-dimensional measure-dependence assumption
    Invoked to obtain quantitative approximation rates and universal approximation.

pith-pipeline@v0.9.0 · 5467 in / 1173 out tokens · 31927 ms · 2026-05-13T22:43:27.024495+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. One Operator for Many Densities: Amortized Approximation of Conditioning by Neural Operators

    stat.ML 2026-05 unverdicted novelty 7.0

    A single neural operator can approximate the map from arbitrary joint densities to their conditionals, backed by new continuity results and illustrated on Gaussian mixtures.

  2. One Operator for Many Densities: Amortized Approximation of Conditioning by Neural Operators

    stat.ML 2026-05 unverdicted novelty 6.0

    A single neural operator can approximate the map from joint densities to conditional densities to arbitrary accuracy, with a proof based on continuity of the conditioning operator and a demonstration on Gaussian mixtures.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    A review on attractive–repulsive hydrodynamics for consensus in collective behavior.Active Particles, Volume 1: Advances in Theory, Models, and Applications, pages 259–298, 2017

    José A Carrillo, Young-Pil Choi, and Sergio P Perez. A review on attractive–repulsive hydrodynamics for consensus in collective behavior.Active Particles, Volume 1: Advances in Theory, Models, and Applications, pages 259–298, 2017

  2. [2]

    Giacomo Albi, Nicola Bellomo, Luisa Fermo, S-Y Ha, Jeongho Kim, Lorenzo Pareschi, David Poyato, and Juan Soler. Vehicular traffic, crowds, and swarms: From kinetic theory and multiscale methods to applications and research perspectives.Mathematical Models and Methods in Applied Sciences, 29(10):1901–2005, 2019. 22 MVNN −2 0 2 t = 0 t = 1 t = 2 −2 0 2 −2 0...

  3. [3]

    Stability of ring patterns arising from two-dimensional particle interactions.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 84(1):015203, 2011

    Theodore Kolokolnikov, Hui Sun, David Uminsky, and Andrea L Bertozzi. Stability of ring patterns arising from two-dimensional particle interactions.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 84(1):015203, 2011

  4. [4]

    Collective motion.Physics reports, 517(3-4):71–140, 2012

    Tamás Vicsek and Anna Zafeiris. Collective motion.Physics reports, 517(3-4):71–140, 2012

  5. [5]

    Cambridge University Press, 2008

    Yoav Shoham and Kevin Leyton-Brown.Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press, 2008

  6. [6]

    Nonparametric inference of interaction laws in systems of agents from trajectory data.Proceedings of the National Academy of Sciences, 116(29):14424–14433, 2019

    Fei Lu, Ming Zhong, Sui Tang, and Mauro Maggioni. Nonparametric inference of interaction laws in systems of agents from trajectory data.Proceedings of the National Academy of Sciences, 116(29):14424–14433, 2019

  7. [7]

    Random feature models for learning interacting dynamical systems.Proceedings of the Royal Society A, 479(2275):20220835, 2023

    Yuxuan Liu, Scott G McCalla, and Hayden Schaeffer. Random feature models for learning interacting dynamical systems.Proceedings of the Royal Society A, 479(2275):20220835, 2023. 23 MVNN −3 0 3 t = 0 t = 1 t = 2 −3 0 3 −3 0 3 −3 0 3 −3 0 3 0 (a) Upper row: true positions, lower row: learned positions −1 0 1 t = 0 t = 1 t = 2 −1 0 1 −1 0 1 −1 0 1 −1 0 1 0.0...

  8. [8]

    Learning interaction kernels in mean-field equations of first-order systems of interacting particles.SIAM Journal on Scientific Computing, 44(1):A260–A285, 2022

    Quanjun Lang and Fei Lu. Learning interaction kernels in mean-field equations of first-order systems of interacting particles.SIAM Journal on Scientific Computing, 44(1):A260–A285, 2022

  9. [9]

    Data-driven model construction for anisotropic dynamics of active matter.PRX Life, 1(1):013009, 2023

    Mengyang Gu, Xinyi Fang, and Yimin Luo. Data-driven model construction for anisotropic dynamics of active matter.PRX Life, 1(1):013009, 2023

  10. [10]

    Inference of interaction kernels in mean-field models of opinion dynamics

    Weiqi Chu, Qin Li, and Mason A Porter. Inference of interaction kernels in mean-field models of opinion dynamics. SIAM Journal on Applied Mathematics, 84(3):1096–1115, 2024

  11. [11]

    Learning interaction kernels for agent systems on riemannian manifolds

    Mauro Maggioni, Jason J Miller, Hongda Qiu, and Ming Zhong. Learning interaction kernels for agent systems on riemannian manifolds. InInternational Conference on Machine Learning, pages 7290–7300. PMLR, 2021

  12. [12]

    A macroscopic crowd motion model of gradient flow type.Mathematical Models and Methods in Applied Sciences, 20(10):1787–1821, 2010

    Bertrand Maury, Aude Roudneff-Chupin, and Filippo Santambrogio. A macroscopic crowd motion model of gradient flow type.Mathematical Models and Methods in Applied Sciences, 20(10):1787–1821, 2010

  13. [13]

    On kinematic waves ii

    Michael James Lighthill and Gerald Beresford Whitham. On kinematic waves ii. a theory of traffic flow on long crowded roads.Proceedings of the royal society of london. series a. mathematical and physical sciences, 229(1178):317–345, 1955

  14. [14]

    Initiation of slime mold aggregation viewed as an instability.Journal of theoretical biology, 26(3):399–415, 1970

    Evelyn F Keller and Lee A Segel. Initiation of slime mold aggregation viewed as an instability.Journal of theoretical biology, 26(3):399–415, 1970

  15. [15]

    Random batch methods (rbm) for interacting particle systems.Journal of Computational Physics, 400:108877, 2020

    Shi Jin, Lei Li, and Jian-Guo Liu. Random batch methods (rbm) for interacting particle systems.Journal of Computational Physics, 400:108877, 2020. 24 MVNN −3 0 3 t = 0 t = 1 t = 2 −3 0 3 −3 0 3 −3 0 3 −3 0 3 0 (a) Upper row: true positions, lower row: learned positions −1 0 1 t = 0 t = 1 t = 2 −1 0 1 −1 0 1 −1 0 1 −1 0 1 0.0 0.4 0.8 1.2 (b) Upper row: tru...

  16. [16]

    Random batch ewald method for dielectrically confined coulomb systems.SIAM Journal on Scientific Computing, 47(4):B846–B874, 2025

    Zecheng Gan, Xuanzhao Gao, Jiuyang Liang, and Zhenli Xu. Random batch ewald method for dielectrically confined coulomb systems.SIAM Journal on Scientific Computing, 47(4):B846–B874, 2025

  17. [17]

    More is the same; phase transitions and mean field theories.Journal of Statistical Physics, 137(5):777–797, 2009

    Leo P Kadanoff. More is the same; phase transitions and mean field theories.Journal of Statistical Physics, 137(5):777–797, 2009

  18. [18]

    The mean-field limit for the dynamics of large particle systems.Journées équations aux dérivées partielles, pages 1–47, 2003

    François Golse. The mean-field limit for the dynamics of large particle systems.Journées équations aux dérivées partielles, pages 1–47, 2003

  19. [19]

    Springer Science & Business Media, 2012

    Herbert Spohn.Large scale dynamics of interacting particles. Springer Science & Business Media, 2012

  20. [20]

    Mean field limit and quantitative estimates with singular attractive kernels.Duke Mathematical Journal, 172(13):2591–2641, 2023

    Didier Bresch, Pierre-Emmanuel Jabin, and Zhenfu Wang. Mean field limit and quantitative estimates with singular attractive kernels.Duke Mathematical Journal, 172(13):2591–2641, 2023

  21. [21]

    Particle, kinetic, and hydrodynamic models of swarming

    José A Carrillo, Massimo Fornasier, Giuseppe Toscani, and Francesco Vecil. Particle, kinetic, and hydrodynamic models of swarming. InMathematical modeling of collective behavior in socio-economic and life sciences, pages 297–336. Springer, 2010

  22. [22]

    Asymptotic flocking dynamics for the kinetic cucker–smale model.SIAM Journal on Mathematical Analysis, 42(1):218–236, 2010

    José A Carrillo, Massimo Fornasier, Jesús Rosado, and Giuseppe Toscani. Asymptotic flocking dynamics for the kinetic cucker–smale model.SIAM Journal on Mathematical Analysis, 42(1):218–236, 2010

  23. [23]

    Giacomo Albi, Lorenzo Pareschi, and Mattia Zanella. Boltzmann-type control of opinion consensus through leaders.Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 372(2028):20140138, 2014. 25 MVNN

  24. [24]

    Heterophilious dynamics enhances consensus.SIAM review, 56(4):577–621, 2014

    Sebastien Motsch and Eitan Tadmor. Heterophilious dynamics enhances consensus.SIAM review, 56(4):577–621, 2014

  25. [25]

    An analytical framework for consensus-based global optimization method.Mathematical Models and Methods in Applied Sciences, 28(06):1037–1066, 2018

    José A Carrillo, Young-Pil Choi, Claudia Totzeck, and Oliver Tse. An analytical framework for consensus-based global optimization method.Mathematical Models and Methods in Applied Sciences, 28(06):1037–1066, 2018

  26. [26]

    A consensus-based global optimization method for high dimensional machine learning problems.ESAIM: Control, Optimisation and Calculus of Variations, 27:S5, 2021

    José A Carrillo, Shi Jin, Lei Li, and Yuhua Zhu. A consensus-based global optimization method for high dimensional machine learning problems.ESAIM: Control, Optimisation and Calculus of Variations, 27:S5, 2021

  27. [27]

    On the mean-field limit for the consensus-based optimization.Mathematical Methods in the Applied Sciences, 45(12):7814–7831, 2022

    Hui Huang and Jinniao Qiu. On the mean-field limit for the consensus-based optimization.Mathematical Methods in the Applied Sciences, 45(12):7814–7831, 2022

  28. [28]

    Consensus based stochastic optimal control

    Liyao Lyu and Jingrun Chen. Consensus based stochastic optimal control. InForty-second International Conference on Machine Learning, 2025

  29. [29]

    Distilling free-form natural laws from experimental data.science, 324(5923):81– 85, 2009

    Michael Schmidt and Hod Lipson. Distilling free-form natural laws from experimental data.science, 324(5923):81– 85, 2009

  30. [30]

    Neural ordinary differential equations.Advances in neural information processing systems, 31, 2018

    Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations.Advances in neural information processing systems, 31, 2018

  31. [31]

    Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the national academy of sciences, 113(15):3932– 3937, 2016

    Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the national academy of sciences, 113(15):3932– 3937, 2016

  32. [32]

    Sparse model selection via integral terms.Physical Review E, 96(2):023302, 2017

    Hayden Schaeffer and Scott G McCalla. Sparse model selection via integral terms.Physical Review E, 96(2):023302, 2017

  33. [33]

    Extracting sparse high-dimensional dynamics from limited data

    Hayden Schaeffer, Giang Tran, and Rachel Ward. Extracting sparse high-dimensional dynamics from limited data. SIAM Journal on Applied Mathematics, 78(6):3279–3295, 2018

  34. [34]

    Ab initio generalized langevin equation.Proceedings of the National Academy of Sciences, 121(14):e2308668121, 2024

    Pinchen Xie, Roberto Car, and Weinan E. Ab initio generalized langevin equation.Proceedings of the National Academy of Sciences, 121(14):e2308668121, 2024

  35. [35]

    Construction of coarse-grained molecular dynamics with many-body non-markovian memory.Physical Review Letters, 131(17):177301, 2023

    Liyao Lyu and Huan Lei. Construction of coarse-grained molecular dynamics with many-body non-markovian memory.Physical Review Letters, 131(17):177301, 2023

  36. [36]

    Data-driven learning of the generalized langevin equation with state-dependent memory.Physical Review Letters, 133(7):077301, 2024

    Pei Ge, Zhongqiang Zhang, and Huan Lei. Data-driven learning of the generalized langevin equation with state-dependent memory.Physical Review Letters, 133(7):077301, 2024

  37. [37]

    Learning stochastic dynamical system via flow map operator.Journal of Computa- tional Physics, 508:112984, 2024

    Yuan Chen and Dongbin Xiu. Learning stochastic dynamical system via flow map operator.Journal of Computa- tional Physics, 508:112984, 2024

  38. [38]

    A training-free conditional diffusion model for learning stochastic dynamical systems.SIAM Journal on Scientific Computing, 47(5):C1144–C1171, 2025

    Yanfang Liu, Yuan Chen, Dongbin Xiu, and Guannan Zhang. A training-free conditional diffusion model for learning stochastic dynamical systems.SIAM Journal on Scientific Computing, 47(5):C1144–C1171, 2025

  39. [39]

    Model-free learning of random dynamical system from noisy observations.Journal of Computational Physics, page 114474, 2025

    Kyongmin Yeo, Hyomin Shin, Heechang Kim, and Minseok Choi. Model-free learning of random dynamical system from noisy observations.Journal of Computational Physics, page 114474, 2025

  40. [40]

    Les-sindy: Laplace-enhanced sparse identification of nonlinear dynamical systems.Journal of Computational Physics, page 114443, 2025

    Haoyang Zheng and Guang Lin. Les-sindy: Laplace-enhanced sparse identification of nonlinear dynamical systems.Journal of Computational Physics, page 114443, 2025

  41. [41]

    Learning nonlinear operators via deeponet based on the universal approximation theorem of operators.Nature machine intelligence, 3(3):218–229, 2021

    Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators.Nature machine intelligence, 3(3):218–229, 2021

  42. [42]

    Fourier neural operator for parametric partial differential equations

    Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, Anima Anandkumar, et al. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations

  43. [43]

    Sparse dynamics for partial differential equations.Proceedings of the National Academy of Sciences, 110(17):6634–6639, 2013

    Hayden Schaeffer, Russel Caflisch, Cory D Hauck, and Stanley Osher. Sparse dynamics for partial differential equations.Proceedings of the National Academy of Sciences, 110(17):6634–6639, 2013

  44. [44]

    Pde-net: Learning pdes from data

    Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. Pde-net: Learning pdes from data. InInternational conference on machine learning, pages 3208–3216. PMLR, 2018

  45. [45]

    Dnn modeling of partial differential equations with incomplete data.Journal of Computational Physics, 493:112502, 2023

    Victor Churchill, Yuan Chen, Zhongshu Xu, and Dongbin Xiu. Dnn modeling of partial differential equations with incomplete data.Journal of Computational Physics, 493:112502, 2023

  46. [46]

    Neupde: Neural network based ordinary and partial differential equations for modeling time-dependent data

    Yifan Sun, Linan Zhang, and Hayden Schaeffer. Neupde: Neural network based ordinary and partial differential equations for modeling time-dependent data. InMathematical and Scientific Machine Learning, pages 352–372. PMLR, 2020. 26 MVNN

  47. [47]

    Hayden Schaeffer. Learning partial differential equations via data discovery and sparse optimization.Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473(2197), 2017

  48. [48]

    Extracting structured dynamical systems using sparse optimization with very few samples.Multiscale Modeling & Simulation, 18(4):1435–1461, 2020

    Hayden Schaeffer, Giang Tran, Rachel Ward, and Linan Zhang. Extracting structured dynamical systems using sparse optimization with very few samples.Multiscale Modeling & Simulation, 18(4):1435–1461, 2020

  49. [49]

    Weak sindy for partial differential equations.Journal of Computational Physics, 443:110525, 2021

    Daniel A Messenger and David M Bortz. Weak sindy for partial differential equations.Journal of Computational Physics, 443:110525, 2021

  50. [50]

    Due: A deep learning framework and library for modeling unknown equations.SIAM Review, 67(4):873–902, 2025

    Junfeng Chen, Kailiang Wu, and Dongbin Xiu. Due: A deep learning framework and library for modeling unknown equations.SIAM Review, 67(4):873–902, 2025

  51. [51]

    Multipole graph neural operator for parametric partial differential equations.Advances in Neural Information Processing Systems, 33:6755–6766, 2020

    Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Andrew Stuart, Kaushik Bhattacharya, and Anima Anandkumar. Multipole graph neural operator for parametric partial differential equations.Advances in Neural Information Processing Systems, 33:6755–6766, 2020

  52. [52]

    Kernel methods are competitive for operator learning.Journal of Computational Physics, 496:112549, 2024

    Pau Batlle, Matthieu Darcy, Bamdad Hosseini, and Houman Owhadi. Kernel methods are competitive for operator learning.Journal of Computational Physics, 496:112549, 2024

  53. [53]

    Regularized random fourier features and finite element reconstruction for operator learning in sobolev space.arXiv preprint arXiv:2512.17884, 2025

    Xinyue Yu and Hayden Schaeffer. Regularized random fourier features and finite element reconstruction for operator learning in sobolev space.arXiv preprint arXiv:2512.17884, 2025

  54. [54]

    Belnet: Basis enhanced learning, a mesh-free neural operator.Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 479(2276), 2023

    Zecheng Zhang, Leung Wing Tat, and Hayden Schaeffer. Belnet: Basis enhanced learning, a mesh-free neural operator.Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 479(2276), 2023

  55. [55]

    Towards a foundation model for partial differential equations: Multioperator learning and extrapolation.Physical Review E, 111(3):035304, 2025

    Jingmin Sun, Yuxuan Liu, Zecheng Zhang, and Hayden Schaeffer. Towards a foundation model for partial differential equations: Multioperator learning and extrapolation.Physical Review E, 111(3):035304, 2025

  56. [56]

    Pi-mfm: Physics-informed multimodal foundation model for solving partial differential equations.arXiv preprint arXiv:2512.23056, 2025

    Min Zhu, Jingmin Sun, Zecheng Zhang, Hayden Schaeffer, and Lu Lu. Pi-mfm: Physics-informed multimodal foundation model for solving partial differential equations.arXiv preprint arXiv:2512.23056, 2025

  57. [57]

    A deep learning framework for multi-operator learning: Architectures and approximation theory.arXiv preprint arXiv:2510.25379, 2025

    Adrien Weihs, Jingmin Sun, Zecheng Zhang, and Hayden Schaeffer. A deep learning framework for multi-operator learning: Architectures and approximation theory.arXiv preprint arXiv:2510.25379, 2025

  58. [58]

    Compno: A novel foundation model approach for solving partial differential equations.Applied Sciences, 16(2):972, 2026

    Hamda Hmida, Hsiu-Wen Chang Joly, and Youssef Mesri. Compno: A novel foundation model approach for solving partial differential equations.Applied Sciences, 16(2):972, 2026

  59. [59]

    A multimodal pde foundation model for prediction and scientific text descriptions.arXiv preprint arXiv:2502.06026, 2025

    Elisa Negrini, Yuxuan Liu, Liu Yang, Stanley J Osher, and Hayden Schaeffer. A multimodal pde foundation model for prediction and scientific text descriptions.arXiv preprint arXiv:2502.06026, 2025

  60. [60]

    Pdeformer: Towards a foundation model for one-dimensional partial differential equations.arXiv preprint arXiv:2402.12652, 2024

    Zhanhong Ye, Xiang Huang, Leheng Chen, Hongsheng Liu, Zidong Wang, and Bin Dong. Pdeformer: Towards a foundation model for one-dimensional partial differential equations.arXiv preprint arXiv:2402.12652, 2024

  61. [61]

    Prose: Predicting multiple operators and symbolic expressions using multimodal transformers.Neural Networks, 180:106707, 2024

    Yuxuan Liu, Zecheng Zhang, and Hayden Schaeffer. Prose: Predicting multiple operators and symbolic expressions using multimodal transformers.Neural Networks, 180:106707, 2024

  62. [62]

    Vicon: Vision in-context operator networks for multi-physics fluid dynamics prediction.arXiv preprint arXiv:2411.16063, 2024

    Yadi Cao, Yuxuan Liu, Liu Yang, Rose Yu, Hayden Schaeffer, and Stanley Osher. Vicon: Vision in-context operator networks for multi-physics fluid dynamics prediction.arXiv preprint arXiv:2411.16063, 2024

  63. [63]

    Disco: learning to discover an evolution operator for multi- physics-agnostic prediction.arXiv preprint arXiv:2504.19496, 2025

    Rudy Morel, Jiequn Han, and Edouard Oyallon. Disco: learning to discover an evolution operator for multi- physics-agnostic prediction.arXiv preprint arXiv:2504.19496, 2025

  64. [64]

    In-context operator learning with data prompts for differential equation problems.Proceedings of the National Academy of Sciences, 120(39):e2310142120, 2023

    Liu Yang, Siting Liu, Tingwei Meng, and Stanley J Osher. In-context operator learning with data prompts for differential equation problems.Proceedings of the National Academy of Sciences, 120(39):e2310142120, 2023

  65. [65]

    Time-series forecasting, knowledge distillation, and refinement within a multimodal pde foundation model.arXiv preprint arXiv:2409.11609, 2024

    Derek Jollie, Jingmin Sun, Zecheng Zhang, and Hayden Schaeffer. Time-series forecasting, knowledge distillation, and refinement within a multimodal pde foundation model.arXiv preprint arXiv:2409.11609, 2024

  66. [66]

    Prose-fd: A multimodal pde foundation model for learning multiple operators for forecasting fluid dynamics.arXiv preprint arXiv:2409.09811, 2024

    Yuxuan Liu, Jingmin Sun, Xinjie He, Griffin Pinney, Zecheng Zhang, and Hayden Schaeffer. Prose-fd: A multimodal pde foundation model for learning multiple operators for forecasting fluid dynamics.arXiv preprint arXiv:2409.09811, 2024

  67. [67]

    Unisolver: Pde-conditional trans- formers towards universal neural pde solvers.arXiv preprint arXiv:2405.17527, 2024

    Hang Zhou, Yuezhou Ma, Haixu Wu, Haowen Wang, and Mingsheng Long. Unisolver: Pde-conditional trans- formers towards universal neural pde solvers.arXiv preprint arXiv:2405.17527, 2024

  68. [68]

    Pdeformer-2: A versatile foundation model for two-dimensional partial differential equations.arXiv preprint arXiv:2507.15409, 2025

    Zhanhong Ye, Zining Liu, Bingyang Wu, Hongjie Jiang, Leheng Chen, Minyan Zhang, Xiang Huang, Qinghe Meng Zou, Hongsheng Liu, Bin Dong, et al. Pdeformer-2: A versatile foundation model for two-dimensional partial differential equations.arXiv preprint arXiv:2507.15409, 2025

  69. [69]

    Itô’s formula for flows of measures on semimartingales.Stochastic Processes and their applications, 159:350–390, 2023

    Xin Guo, Huyên Pham, and Xiaoli Wei. Itô’s formula for flows of measures on semimartingales.Stochastic Processes and their applications, 159:350–390, 2023. 27 MVNN

  70. [70]

    Protter.Stochastic Differential Equations, pages 249–361

    Philip E. Protter.Stochastic Differential Equations, pages 249–361. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005

  71. [71]

    Topics in propagation of chaos

    Alain-Sol Sznitman. Topics in propagation of chaos. InEcole d’été de probabilités de Saint-Flour XIX—1989, pages 165–251. Springer, 2006

  72. [72]

    Propagation of chaos: a review of models, methods and applications

    Louis-Pierre Chaintron and Antoine Diez. Propagation of chaos: a review of models, methods and applications. ii. applications.arXiv preprint arXiv:2106.14812, 2021

  73. [73]

    Henry P McKean et al. Propagation of chaos for a class of non-linear parabolic equations.Stochastic Differential Equations (Lecture Series in Differential Equations, Session 7, Catholic Univ., 1967), pages 41–57, 1967

  74. [74]

    Learning mean-field equations from particle data using wsindy.Physica D: Nonlinear Phenomena, 439:133406, 2022

    Daniel A Messenger and David M Bortz. Learning mean-field equations from particle data using wsindy.Physica D: Nonlinear Phenomena, 439:133406, 2022

  75. [75]

    Physics-informed genetic programming for discovery of partial differential equations from scarce and noisy data.Journal of Computational Physics, 514:113261, 2024

    Benjamin G Cohen, Burcu Beykal, and George M Bollas. Physics-informed genetic programming for discovery of partial differential equations from scarce and noisy data.Journal of Computational Physics, 514:113261, 2024

  76. [76]

    Mean-field neural networks: learning mappings on wasserstein space.Neural Networks, 168:380–393, 2023

    Huyên Pham and Xavier Warin. Mean-field neural networks: learning mappings on wasserstein space.Neural Networks, 168:380–393, 2023

  77. [77]

    Neural scaling laws of deep relu and deep operator network: A theoretical study.arXiv preprint arXiv:2410.00357, 2024

    Hao Liu, Zecheng Zhang, Wenjing Liao, and Hayden Schaeffer. Neural scaling laws of deep relu and deep operator network: A theoretical study.arXiv preprint arXiv:2410.00357, 2024

  78. [78]

    Maximum likelihood estimation of mckean– vlasov stochastic differential equation and its application.Applied Mathematics and Computation, 274:237–246, 2016

    Jianghui Wen, Xiangjun Wang, Shuhua Mao, and Xinping Xiao. Maximum likelihood estimation of mckean– vlasov stochastic differential equation and its application.Applied Mathematics and Computation, 274:237–246, 2016

  79. [79]

    Parameter estimation for the mckean- vlasov stochastic differential equation.arXiv preprint arXiv:2106.13751, 2021

    Louis Sharrock, Nikolas Kantas, Panos Parpas, and Grigorios A Pavliotis. Parameter estimation for the mckean- vlasov stochastic differential equation.arXiv preprint arXiv:2106.13751, 2021

  80. [80]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

Showing first 80 references.