arxiv: 2604.00333 · v2 · submitted 2026-04-01 · 🧮 math.NA · cs.LG· cs.NA· physics.comp-ph

Recognition: 2 theorem links

· Lean Theorem

MVNN: A Measure-Valued Neural Network for Learning McKean-Vlasov Dynamics from Particle Data

Hayden Schaeffer, Liyao Lyu, Xinyue Yu

Authors on Pith no claims yet

Pith reviewed 2026-05-13 22:43 UTC · model grok-4.3

classification 🧮 math.NA cs.LGcs.NAphysics.comp-ph

keywords measure-valued neural networkMcKean-Vlasov dynamicsparticle trajectory learningpropagation of chaosuniversal approximationcollective behavior modelingmean-field limits

0 comments

The pith

A measure-valued neural network infers interaction drifts in McKean-Vlasov systems directly from observed particle trajectories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an architecture that extends ordinary neural networks to probability measures so that drift terms depending on the full empirical distribution can be learned from particle paths. It proves that the resulting measure-dependent ODEs are well-posed and that the associated particle system satisfies propagation of chaos. Under an explicit low-dimensional dependence assumption on the measure, the network is shown to enjoy universal approximation together with quantitative rates. Numerical tests on first- and second-order aggregation, alignment, and multi-group models demonstrate accurate in- and out-of-distribution prediction. The approach therefore supplies a data-driven route to closing the mean-field limit without hand-crafted kernels.

Core claim

We introduce a measure-valued neural network that infers measure-dependent interaction (drift) terms directly from particle-trajectory observations. The architecture learns cylindrical features via an embedding network that maps distributions to vectors. We establish well-posedness of the resulting dynamics, prove propagation-of-chaos for the interacting-particle system, and obtain universal approximation together with quantitative rates under a low-dimensional measure-dependence assumption.

What carries the argument

The measure-valued neural network (MVNN), which operates on probability measures by learning cylindrical features through a distribution-to-vector embedding network.

If this is right

The learned dynamics remain well-posed for any initial measure.
Finite-particle simulations converge to the mean-field limit as the number of particles grows.
The network recovers both deterministic and stochastic versions of the Motsch-Tadmor, Cucker-Smale, and attraction-repulsion models from data.
Prediction accuracy persists on out-of-distribution initial configurations and parameter regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same embedding idea could be applied to learn nonlocal kernels in other nonlocal PDEs such as aggregation-diffusion equations.
Quantitative rates under the low-dimensional assumption suggest that the method may scale to moderately high-dimensional state spaces provided the measure dependence stays low-dimensional.
If the embedding network is replaced by a learned graph neural network, the architecture might capture heterogeneous interaction rules without explicit low-dimensional reduction.

Load-bearing premise

The interaction drift depends on the measure through a low-dimensional feature map.

What would settle it

A concrete particle trajectory dataset whose learned drift fails to reproduce the observed collective motion once the embedding dimension is fixed below the true intrinsic dimension of the measure dependence.

Figures

Figures reproduced from arXiv: 2604.00333 by Hayden Schaeffer, Liyao Lyu, Xinyue Yu.

**Figure 1.** Figure 1: 1D Motsch-Tadmor dynamics: Empirical density ρ(x, t) for the 1D Motsch-Tadmor dynamics: comparison between the reference N-particle simulation (orange) and the MVNN-learned mean-field model (blue). Columns show t = 0, 1, 2; rows correspond to three unseen initial distributions. Densities are estimated using Gaussian kernel density estimation. The L 2 error is computed between the KDE-smoothed densities. se… view at source ↗

**Figure 2.** Figure 2: Comparisons on 1D Motsch-Tadmor dynamics: Empirical density ρ(x, t) for the 1D Motsch-Tadmor dynamics: comparison between the reference N-particle simulation (green), the MVNN-learned mean-field model (orange), and the prediction from the Gaussian process model [83] (blue). Columns show t = 0, 0.5, 1; rows correspond to three unseen initial distributions. Densities are estimated using Gaussian kernel densi… view at source ↗

**Figure 3.** Figure 3: Comparisons on 1D Motsch-Tadmor dynamics: L 2 error for the 1D Motsch-Tadmor dynamics: comparison of the L 2 error for the Gaussian process model (blue), the MVNN model trained on 16 agents, 9 trajectories, and 20 timesteps (orange), and the MVNN model trained on 16, 000 agents, 100 trajectories, and 200 timesteps (green). The L 2 error is computed between the KDE-smoothed densities. Columns correspond to … view at source ↗

**Figure 4.** Figure 4: Simulation Time Comparison: Comparison of average simulation times (seconds) against number of agents N for the MVNN and Gaussian process models over 10 trials. where Xi t ∈ R 2 denotes the position of agent i. The interaction kernel ϕ combines short-range repulsion and long-range attraction, modeled by a sum of Gaussians: ϕ(r) = crep exp −(r/ℓrep) 2 − catt exp −(r/ℓatt) 2 . We use the parameters crep … view at source ↗

**Figure 5.** Figure 5: Stochastic Motsch-Tadmor dynamics (σ = 0.1): density evolution. Empirical density ρ(x, t) from the reference interacting-particle simulation (orange) and from the MVNN-learned McKean-Vlasov model (blue), shown at t = 0, 1, 2 for an unseen initial distribution. Densities are estimated via Gaussian KDE, and the reported L 2 error is computed between the kernel-smoothed densities. −1 0 1 t = 0 t = 1 t = 2 −1 … view at source ↗

**Figure 6.** Figure 6: 2D aggregation model with ring-shaped initialization. The upper row displays the ground truth particle trajectories, while the lower row shows the evolution predicted by the learned mean-field dynamics. The model accurately preserves the topological structure of the ring over time. low-density region on the left and a high-density region on the right). The results are presented in Figures 7, 8, and 9, resp… view at source ↗

**Figure 7.** Figure 7: 2D aggregation model with double-ring initialization. Comparison between the ground truth particle system (upper row) and the learned mean-field dynamics (lower row). The model correctly reproduces the contraction of both rings despite never seeing this topology during training. −2 0 2 t = 0 t = 1 t = 2 −2 0 2 −2 0 2 −2 0 2 −2 0 2 0.0 0.1 0.2 0.3 0.4 [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: 2D aggregation model with disk-shaped initialization. The learned dynamics (lower row) accurately capture the collapse of the uniform disk, matching the ground truth (upper row). where ϕk,l represents the interaction kernel between agents in group k and group l. The corresponding mean-field dynamics can be written in the form: X˙ i,k t = bk X i,k t , µ1, . . . , µK , i = 1, . . . , Nk, k = 1, . . . , K… view at source ↗

**Figure 9.** Figure 9: 2D aggregation model with binary asymmetric initialization. Evolution of a system initialized with heterogeneous densities (low density left, high density right). The learned model (lower row) preserves the density gradient and correctly predicts the asymmetric aggregation process. 3.1 Multi-Group Measure-Valued Neural Network We extend the proposed framework to heterogeneous systems composed of multiple i… view at source ↗

**Figure 10.** Figure 10: Hierarchical dynamics: Initial condition 1. Evolution of the multi-group system initialized with spatially separated populations. The rows correspond to Group 1, Group 2, and Group 3. The learned model (blue) faithfully reproduces the reference particle dynamics (orange), capturing the directional information flow from Group 3 down to Group 1. For each trajectory m, the initial conditions Xi 0,m, Vi 0,m … view at source ↗

**Figure 11.** Figure 11: Hierarchical dynamics: Initial condition 2. Evolution of the multi-group system initialized with spatially separated populations. The rows correspond to Group 1, Group 2, and Group 3. The learned model (blue) accurately reproduces the reference particle dynamics (orange), capturing the directional information flow from Group 3 down to Group 1. hence, bθ(X, V, µN t ) ≈ φint  X, V, 1 N X N j=1 φemb(X j,N … view at source ↗

**Figure 12.** Figure 12: Second-order attraction-repulsion: Evolution of the second-order attraction-repulsion model initialized with ring-shaped initial position and Gaussian initial velocity. The upper rows display ground truth particle positions and velocities, while the lower rows show the positions and velocities predicted by the learned MVNN. 5 Discussion We proposed a measure-valued neural network (MVNN) framework for esti… view at source ↗

**Figure 13.** Figure 13: Second-order attraction-repulsion: Evolution of the second-order attraction-repulsion model initialized with double ring-shaped initial position and Gaussian initial velocity. The upper rows display ground truth particle positions and velocities, while the lower rows show the positions and velocities predicted by the learned MVNN. mapping discrete pointwise evaluations of input functions to outputs, simil… view at source ↗

**Figure 14.** Figure 14: Second-order attraction-repulsion: Evolution of the second-order attraction-repulsion model initialized with disk-shaped initial position and Gaussian initial velocity. The upper rows display ground truth particle positions and velocities, while the lower rows show the positions and velocities predicted by the learned MVNN. Acknowledgments We would like to thank Adrien Weihs for the insightful discussions… view at source ↗

**Figure 15.** Figure 15: Second-order attraction-repulsion: Evolution of the second-order attraction-repulsion model initialized with binary asymmetric initial position (low density left, high density right) and Gaussian initial velocity. The upper rows display ground truth particle positions and velocities, while the lower rows show the positions and velocities predicted by the learned MVNN. [3] Theodore Kolokolnikov, Hui Sun, D… view at source ↗

**Figure 16.** Figure 16: Second-order Cucker-Smale: Evolution of the second-order Cucker-Smale model initialized with Gaussian initial position and Gaussian initial velocity. The upper rows display ground truth particle positions and velocities, while the lower rows show the positions and velocities predicted by the learned MVNN. [8] Quanjun Lang and Fei Lu. Learning interaction kernels in mean-field equations of first-order syst… view at source ↗

**Figure 17.** Figure 17: Second-order Cucker-Smale: Evolution of the second-order Cucker-Smale model initialized with a two-component Gaussian mixture initial position and Gaussian initial velocity. The upper rows display ground truth particle positions and velocities, while the lower rows show the positions and velocities predicted by the learned MVNN. [16] Zecheng Gan, Xuanzhao Gao, Jiuyang Liang, and Zhenli Xu. Random batch ew… view at source ↗

read the original abstract

Collective behaviors that emerge from interactions are fundamental to numerous biological systems. To learn such interacting forces from observations, we introduce a measure-valued neural network that infers measure-dependent interaction (drift) terms directly from particle-trajectory observations. The proposed architecture generalizes standard neural networks to operate on probability measures by learning cylindrical features, using an embedding network that produces scalable distribution-to-vector representations. On the theory side, we establish well-posedness of the resulting dynamics and prove propagation-of-chaos for the associated interacting-particle system. We further show universal approximation and quantitative approximation rates under a low-dimensional measure-dependence assumption. Numerical experiments on first and second order systems, including deterministic and stochastic Motsch-Tadmor dynamics, two-dimensional attraction-repulsion aggregation, Cucker-Smale dynamics, and a hierarchical multi-group system, demonstrate accurate prediction and strong out-of-distribution generalization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The MVNN gives a workable architecture for learning measure-dependent drifts in McKean-Vlasov systems from particle data, with well-posedness and chaos proofs, but the universal approximation and rates only hold under a low-dimensional assumption that the experiments never violate.

read the letter

The main point is that this paper introduces a neural network that takes probability measures as input to recover the interaction drift in McKean-Vlasov dynamics straight from trajectory data. It uses cylindrical features plus an embedding network to turn the measure into a vector representation that scales with particle number. They prove the learned dynamics are well-posed and that the finite-particle system converges to the mean-field limit in the propagation-of-chaos sense. Approximation theory and rates are also given, but only when the kernel depends on the measure through a low-dimensional structure.

Referee Report

1 major / 1 minor

Summary. The paper introduces a measure-valued neural network (MVNN) that learns measure-dependent drift terms in McKean-Vlasov equations directly from particle trajectory data. The architecture employs cylindrical features and an embedding network to map probability measures to vectors. Theoretical results include well-posedness of the learned dynamics, propagation of chaos for the associated particle system, and universal approximation with quantitative rates under an explicit low-dimensional measure-dependence assumption on the interaction kernel. Numerical experiments on first- and second-order systems (Motsch-Tadmor, Cucker-Smale, attraction-repulsion aggregation, and hierarchical multi-group models) demonstrate accurate prediction and out-of-distribution generalization.

Significance. If the approximation guarantees and numerical performance hold, the work would supply a theoretically grounded data-driven method for inferring interaction kernels in mean-field limits, with potential applications to collective behavior modeling in biology and physics. The combination of architecture design, well-posedness/propagation-of-chaos results, and reported numerical success on multiple systems constitutes a substantive contribution to learning McKean-Vlasov dynamics.

major comments (1)

[Abstract and theory section] Abstract and theory section: the universal approximation and quantitative approximation rates are proved only under an explicit low-dimensional measure-dependence assumption on the interaction kernel. The numerical examples (Motsch-Tadmor, Cucker-Smale, aggregation) are all constructed to satisfy this assumption by design; no controlled experiment is reported in which the assumption is violated to quantify performance degradation or to test necessity of the structural restriction.

minor comments (1)

Notation for the embedding network and cylindrical features should be introduced with explicit dimension tracking to clarify how the map from measures to vectors scales with particle number.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. We respond to the major comment below.

read point-by-point responses

Referee: [Abstract and theory section] Abstract and theory section: the universal approximation and quantitative approximation rates are proved only under an explicit low-dimensional measure-dependence assumption on the interaction kernel. The numerical examples (Motsch-Tadmor, Cucker-Smale, aggregation) are all constructed to satisfy this assumption by design; no controlled experiment is reported in which the assumption is violated to quantify performance degradation or to test necessity of the structural restriction.

Authors: We thank the referee for highlighting this point. It is correct that the universal approximation result with quantitative rates requires the explicit low-dimensional measure-dependence assumption on the interaction kernel, as stated in the abstract and theory sections. The numerical examples are constructed to satisfy the assumption by design so that the learned dynamics can be validated directly against the theoretical setting. The MVNN architecture itself is general and does not require the assumption for implementation or training; the restriction is used only to derive the quantitative rates. We agree that testing performance when the assumption is violated would be informative. In the revised manuscript we will add a clarifying paragraph in the theory section and a remark in the numerical experiments section that explicitly notes the role of the assumption and discusses its implications for approximation quality outside this regime. revision: partial

Circularity Check

0 steps flagged

No significant circularity; central claims rest on independent architecture and explicit assumptions

full rationale

The derivation introduces a cylindrical-feature embedding network for measures and establishes well-posedness, propagation-of-chaos, and universal approximation under an explicitly stated low-dimensional measure-dependence assumption on the interaction kernel. This assumption is an external structural restriction on the target dynamics, not a quantity fitted or defined by the network itself. No equation reduces a prediction to a fitted parameter by construction, no uniqueness theorem is imported solely via self-citation, and no ansatz is smuggled through prior work. Numerical examples validate the method but do not define the theoretical guarantees. The chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; the primary unverified premise is the low-dimensional measure-dependence assumption invoked for quantitative rates. No explicit free parameters or invented entities are named.

axioms (1)

domain assumption low-dimensional measure-dependence assumption
Invoked to obtain quantitative approximation rates and universal approximation.

pith-pipeline@v0.9.0 · 5467 in / 1173 out tokens · 31927 ms · 2026-05-13T22:43:27.024495+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
We approximate the drift by a composition of two neural networks: (1) an embedding network and (2) an interaction network. We define the MVNN drift as b_θ(x, μ) := φ_int(x, ⟨φ_emb(·;θ_emb), μ⟩; θ_int). ... cylindrical functional framework [69,70]
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear
Assumption 1 (Finite-Dimensional Measure Dependency). ... there exists a fixed, finite set of r feature functions F = {f_i : R^d → R} that fully characterizes the dependencies ... V(μ) = G(⟨f_1, μ⟩, …, ⟨f_r, μ⟩)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

One Operator for Many Densities: Amortized Approximation of Conditioning by Neural Operators
stat.ML 2026-05 unverdicted novelty 7.0

A single neural operator can approximate the map from arbitrary joint densities to their conditionals, backed by new continuity results and illustrated on Gaussian mixtures.
One Operator for Many Densities: Amortized Approximation of Conditioning by Neural Operators
stat.ML 2026-05 unverdicted novelty 6.0

A single neural operator can approximate the map from joint densities to conditional densities to arbitrary accuracy, with a proof based on continuity of the conditioning operator and a demonstration on Gaussian mixtures.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

A review on attractive–repulsive hydrodynamics for consensus in collective behavior.Active Particles, Volume 1: Advances in Theory, Models, and Applications, pages 259–298, 2017

José A Carrillo, Young-Pil Choi, and Sergio P Perez. A review on attractive–repulsive hydrodynamics for consensus in collective behavior.Active Particles, Volume 1: Advances in Theory, Models, and Applications, pages 259–298, 2017

work page 2017
[2]

Giacomo Albi, Nicola Bellomo, Luisa Fermo, S-Y Ha, Jeongho Kim, Lorenzo Pareschi, David Poyato, and Juan Soler. Vehicular traffic, crowds, and swarms: From kinetic theory and multiscale methods to applications and research perspectives.Mathematical Models and Methods in Applied Sciences, 29(10):1901–2005, 2019. 22 MVNN −2 0 2 t = 0 t = 1 t = 2 −2 0 2 −2 0...

work page 1901
[3]

Stability of ring patterns arising from two-dimensional particle interactions.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 84(1):015203, 2011

Theodore Kolokolnikov, Hui Sun, David Uminsky, and Andrea L Bertozzi. Stability of ring patterns arising from two-dimensional particle interactions.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 84(1):015203, 2011

work page 2011
[4]

Collective motion.Physics reports, 517(3-4):71–140, 2012

Tamás Vicsek and Anna Zafeiris. Collective motion.Physics reports, 517(3-4):71–140, 2012

work page 2012
[5]

Cambridge University Press, 2008

Yoav Shoham and Kevin Leyton-Brown.Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press, 2008

work page 2008
[6]

Nonparametric inference of interaction laws in systems of agents from trajectory data.Proceedings of the National Academy of Sciences, 116(29):14424–14433, 2019

Fei Lu, Ming Zhong, Sui Tang, and Mauro Maggioni. Nonparametric inference of interaction laws in systems of agents from trajectory data.Proceedings of the National Academy of Sciences, 116(29):14424–14433, 2019

work page 2019
[7]

Random feature models for learning interacting dynamical systems.Proceedings of the Royal Society A, 479(2275):20220835, 2023

Yuxuan Liu, Scott G McCalla, and Hayden Schaeffer. Random feature models for learning interacting dynamical systems.Proceedings of the Royal Society A, 479(2275):20220835, 2023. 23 MVNN −3 0 3 t = 0 t = 1 t = 2 −3 0 3 −3 0 3 −3 0 3 −3 0 3 0 (a) Upper row: true positions, lower row: learned positions −1 0 1 t = 0 t = 1 t = 2 −1 0 1 −1 0 1 −1 0 1 −1 0 1 0.0...

work page 2023
[8]

Learning interaction kernels in mean-field equations of first-order systems of interacting particles.SIAM Journal on Scientific Computing, 44(1):A260–A285, 2022

Quanjun Lang and Fei Lu. Learning interaction kernels in mean-field equations of first-order systems of interacting particles.SIAM Journal on Scientific Computing, 44(1):A260–A285, 2022

work page 2022
[9]

Data-driven model construction for anisotropic dynamics of active matter.PRX Life, 1(1):013009, 2023

Mengyang Gu, Xinyi Fang, and Yimin Luo. Data-driven model construction for anisotropic dynamics of active matter.PRX Life, 1(1):013009, 2023

work page 2023
[10]

Inference of interaction kernels in mean-field models of opinion dynamics

Weiqi Chu, Qin Li, and Mason A Porter. Inference of interaction kernels in mean-field models of opinion dynamics. SIAM Journal on Applied Mathematics, 84(3):1096–1115, 2024

work page 2024
[11]

Learning interaction kernels for agent systems on riemannian manifolds

Mauro Maggioni, Jason J Miller, Hongda Qiu, and Ming Zhong. Learning interaction kernels for agent systems on riemannian manifolds. InInternational Conference on Machine Learning, pages 7290–7300. PMLR, 2021

work page 2021
[12]

A macroscopic crowd motion model of gradient flow type.Mathematical Models and Methods in Applied Sciences, 20(10):1787–1821, 2010

Bertrand Maury, Aude Roudneff-Chupin, and Filippo Santambrogio. A macroscopic crowd motion model of gradient flow type.Mathematical Models and Methods in Applied Sciences, 20(10):1787–1821, 2010

work page 2010
[13]

On kinematic waves ii

Michael James Lighthill and Gerald Beresford Whitham. On kinematic waves ii. a theory of traffic flow on long crowded roads.Proceedings of the royal society of london. series a. mathematical and physical sciences, 229(1178):317–345, 1955

work page 1955
[14]

Initiation of slime mold aggregation viewed as an instability.Journal of theoretical biology, 26(3):399–415, 1970

Evelyn F Keller and Lee A Segel. Initiation of slime mold aggregation viewed as an instability.Journal of theoretical biology, 26(3):399–415, 1970

work page 1970
[15]

Random batch methods (rbm) for interacting particle systems.Journal of Computational Physics, 400:108877, 2020

Shi Jin, Lei Li, and Jian-Guo Liu. Random batch methods (rbm) for interacting particle systems.Journal of Computational Physics, 400:108877, 2020. 24 MVNN −3 0 3 t = 0 t = 1 t = 2 −3 0 3 −3 0 3 −3 0 3 −3 0 3 0 (a) Upper row: true positions, lower row: learned positions −1 0 1 t = 0 t = 1 t = 2 −1 0 1 −1 0 1 −1 0 1 −1 0 1 0.0 0.4 0.8 1.2 (b) Upper row: tru...

work page 2020
[16]

Random batch ewald method for dielectrically confined coulomb systems.SIAM Journal on Scientific Computing, 47(4):B846–B874, 2025

Zecheng Gan, Xuanzhao Gao, Jiuyang Liang, and Zhenli Xu. Random batch ewald method for dielectrically confined coulomb systems.SIAM Journal on Scientific Computing, 47(4):B846–B874, 2025

work page 2025
[17]

More is the same; phase transitions and mean field theories.Journal of Statistical Physics, 137(5):777–797, 2009

Leo P Kadanoff. More is the same; phase transitions and mean field theories.Journal of Statistical Physics, 137(5):777–797, 2009

work page 2009
[18]

The mean-field limit for the dynamics of large particle systems.Journées équations aux dérivées partielles, pages 1–47, 2003

François Golse. The mean-field limit for the dynamics of large particle systems.Journées équations aux dérivées partielles, pages 1–47, 2003

work page 2003
[19]

Springer Science & Business Media, 2012

Herbert Spohn.Large scale dynamics of interacting particles. Springer Science & Business Media, 2012

work page 2012
[20]

Mean field limit and quantitative estimates with singular attractive kernels.Duke Mathematical Journal, 172(13):2591–2641, 2023

Didier Bresch, Pierre-Emmanuel Jabin, and Zhenfu Wang. Mean field limit and quantitative estimates with singular attractive kernels.Duke Mathematical Journal, 172(13):2591–2641, 2023

work page 2023
[21]

Particle, kinetic, and hydrodynamic models of swarming

José A Carrillo, Massimo Fornasier, Giuseppe Toscani, and Francesco Vecil. Particle, kinetic, and hydrodynamic models of swarming. InMathematical modeling of collective behavior in socio-economic and life sciences, pages 297–336. Springer, 2010

work page 2010
[22]

Asymptotic flocking dynamics for the kinetic cucker–smale model.SIAM Journal on Mathematical Analysis, 42(1):218–236, 2010

José A Carrillo, Massimo Fornasier, Jesús Rosado, and Giuseppe Toscani. Asymptotic flocking dynamics for the kinetic cucker–smale model.SIAM Journal on Mathematical Analysis, 42(1):218–236, 2010

work page 2010
[23]

Giacomo Albi, Lorenzo Pareschi, and Mattia Zanella. Boltzmann-type control of opinion consensus through leaders.Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 372(2028):20140138, 2014. 25 MVNN

work page 2028
[24]

Heterophilious dynamics enhances consensus.SIAM review, 56(4):577–621, 2014

Sebastien Motsch and Eitan Tadmor. Heterophilious dynamics enhances consensus.SIAM review, 56(4):577–621, 2014

work page 2014
[25]

An analytical framework for consensus-based global optimization method.Mathematical Models and Methods in Applied Sciences, 28(06):1037–1066, 2018

José A Carrillo, Young-Pil Choi, Claudia Totzeck, and Oliver Tse. An analytical framework for consensus-based global optimization method.Mathematical Models and Methods in Applied Sciences, 28(06):1037–1066, 2018

work page 2018
[26]

A consensus-based global optimization method for high dimensional machine learning problems.ESAIM: Control, Optimisation and Calculus of Variations, 27:S5, 2021

José A Carrillo, Shi Jin, Lei Li, and Yuhua Zhu. A consensus-based global optimization method for high dimensional machine learning problems.ESAIM: Control, Optimisation and Calculus of Variations, 27:S5, 2021

work page 2021
[27]

On the mean-field limit for the consensus-based optimization.Mathematical Methods in the Applied Sciences, 45(12):7814–7831, 2022

Hui Huang and Jinniao Qiu. On the mean-field limit for the consensus-based optimization.Mathematical Methods in the Applied Sciences, 45(12):7814–7831, 2022

work page 2022
[28]

Consensus based stochastic optimal control

Liyao Lyu and Jingrun Chen. Consensus based stochastic optimal control. InForty-second International Conference on Machine Learning, 2025

work page 2025
[29]

Distilling free-form natural laws from experimental data.science, 324(5923):81– 85, 2009

Michael Schmidt and Hod Lipson. Distilling free-form natural laws from experimental data.science, 324(5923):81– 85, 2009

work page 2009
[30]

Neural ordinary differential equations.Advances in neural information processing systems, 31, 2018

Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations.Advances in neural information processing systems, 31, 2018

work page 2018
[31]

Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the national academy of sciences, 113(15):3932– 3937, 2016

Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the national academy of sciences, 113(15):3932– 3937, 2016

work page 2016
[32]

Sparse model selection via integral terms.Physical Review E, 96(2):023302, 2017

Hayden Schaeffer and Scott G McCalla. Sparse model selection via integral terms.Physical Review E, 96(2):023302, 2017

work page 2017
[33]

Extracting sparse high-dimensional dynamics from limited data

Hayden Schaeffer, Giang Tran, and Rachel Ward. Extracting sparse high-dimensional dynamics from limited data. SIAM Journal on Applied Mathematics, 78(6):3279–3295, 2018

work page 2018
[34]

Ab initio generalized langevin equation.Proceedings of the National Academy of Sciences, 121(14):e2308668121, 2024

Pinchen Xie, Roberto Car, and Weinan E. Ab initio generalized langevin equation.Proceedings of the National Academy of Sciences, 121(14):e2308668121, 2024

work page 2024
[35]

Construction of coarse-grained molecular dynamics with many-body non-markovian memory.Physical Review Letters, 131(17):177301, 2023

Liyao Lyu and Huan Lei. Construction of coarse-grained molecular dynamics with many-body non-markovian memory.Physical Review Letters, 131(17):177301, 2023

work page 2023
[36]

Data-driven learning of the generalized langevin equation with state-dependent memory.Physical Review Letters, 133(7):077301, 2024

Pei Ge, Zhongqiang Zhang, and Huan Lei. Data-driven learning of the generalized langevin equation with state-dependent memory.Physical Review Letters, 133(7):077301, 2024

work page 2024
[37]

Learning stochastic dynamical system via flow map operator.Journal of Computa- tional Physics, 508:112984, 2024

Yuan Chen and Dongbin Xiu. Learning stochastic dynamical system via flow map operator.Journal of Computa- tional Physics, 508:112984, 2024

work page 2024
[38]

A training-free conditional diffusion model for learning stochastic dynamical systems.SIAM Journal on Scientific Computing, 47(5):C1144–C1171, 2025

Yanfang Liu, Yuan Chen, Dongbin Xiu, and Guannan Zhang. A training-free conditional diffusion model for learning stochastic dynamical systems.SIAM Journal on Scientific Computing, 47(5):C1144–C1171, 2025

work page 2025
[39]

Model-free learning of random dynamical system from noisy observations.Journal of Computational Physics, page 114474, 2025

Kyongmin Yeo, Hyomin Shin, Heechang Kim, and Minseok Choi. Model-free learning of random dynamical system from noisy observations.Journal of Computational Physics, page 114474, 2025

work page 2025
[40]

Les-sindy: Laplace-enhanced sparse identification of nonlinear dynamical systems.Journal of Computational Physics, page 114443, 2025

Haoyang Zheng and Guang Lin. Les-sindy: Laplace-enhanced sparse identification of nonlinear dynamical systems.Journal of Computational Physics, page 114443, 2025

work page 2025
[41]

Learning nonlinear operators via deeponet based on the universal approximation theorem of operators.Nature machine intelligence, 3(3):218–229, 2021

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators.Nature machine intelligence, 3(3):218–229, 2021

work page 2021
[42]

Fourier neural operator for parametric partial differential equations

Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, Anima Anandkumar, et al. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations

work page
[43]

Sparse dynamics for partial differential equations.Proceedings of the National Academy of Sciences, 110(17):6634–6639, 2013

Hayden Schaeffer, Russel Caflisch, Cory D Hauck, and Stanley Osher. Sparse dynamics for partial differential equations.Proceedings of the National Academy of Sciences, 110(17):6634–6639, 2013

work page 2013
[44]

Pde-net: Learning pdes from data

Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. Pde-net: Learning pdes from data. InInternational conference on machine learning, pages 3208–3216. PMLR, 2018

work page 2018
[45]

Dnn modeling of partial differential equations with incomplete data.Journal of Computational Physics, 493:112502, 2023

Victor Churchill, Yuan Chen, Zhongshu Xu, and Dongbin Xiu. Dnn modeling of partial differential equations with incomplete data.Journal of Computational Physics, 493:112502, 2023

work page 2023
[46]

Neupde: Neural network based ordinary and partial differential equations for modeling time-dependent data

Yifan Sun, Linan Zhang, and Hayden Schaeffer. Neupde: Neural network based ordinary and partial differential equations for modeling time-dependent data. InMathematical and Scientific Machine Learning, pages 352–372. PMLR, 2020. 26 MVNN

work page 2020
[47]

Hayden Schaeffer. Learning partial differential equations via data discovery and sparse optimization.Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473(2197), 2017

work page 2017
[48]

Extracting structured dynamical systems using sparse optimization with very few samples.Multiscale Modeling & Simulation, 18(4):1435–1461, 2020

Hayden Schaeffer, Giang Tran, Rachel Ward, and Linan Zhang. Extracting structured dynamical systems using sparse optimization with very few samples.Multiscale Modeling & Simulation, 18(4):1435–1461, 2020

work page 2020
[49]

Weak sindy for partial differential equations.Journal of Computational Physics, 443:110525, 2021

Daniel A Messenger and David M Bortz. Weak sindy for partial differential equations.Journal of Computational Physics, 443:110525, 2021

work page 2021
[50]

Due: A deep learning framework and library for modeling unknown equations.SIAM Review, 67(4):873–902, 2025

Junfeng Chen, Kailiang Wu, and Dongbin Xiu. Due: A deep learning framework and library for modeling unknown equations.SIAM Review, 67(4):873–902, 2025

work page 2025
[51]

Multipole graph neural operator for parametric partial differential equations.Advances in Neural Information Processing Systems, 33:6755–6766, 2020

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Andrew Stuart, Kaushik Bhattacharya, and Anima Anandkumar. Multipole graph neural operator for parametric partial differential equations.Advances in Neural Information Processing Systems, 33:6755–6766, 2020

work page 2020
[52]

Kernel methods are competitive for operator learning.Journal of Computational Physics, 496:112549, 2024

Pau Batlle, Matthieu Darcy, Bamdad Hosseini, and Houman Owhadi. Kernel methods are competitive for operator learning.Journal of Computational Physics, 496:112549, 2024

work page 2024
[53]

Regularized random fourier features and finite element reconstruction for operator learning in sobolev space.arXiv preprint arXiv:2512.17884, 2025

Xinyue Yu and Hayden Schaeffer. Regularized random fourier features and finite element reconstruction for operator learning in sobolev space.arXiv preprint arXiv:2512.17884, 2025

work page arXiv 2025
[54]

Belnet: Basis enhanced learning, a mesh-free neural operator.Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 479(2276), 2023

Zecheng Zhang, Leung Wing Tat, and Hayden Schaeffer. Belnet: Basis enhanced learning, a mesh-free neural operator.Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 479(2276), 2023

work page 2023
[55]

Towards a foundation model for partial differential equations: Multioperator learning and extrapolation.Physical Review E, 111(3):035304, 2025

Jingmin Sun, Yuxuan Liu, Zecheng Zhang, and Hayden Schaeffer. Towards a foundation model for partial differential equations: Multioperator learning and extrapolation.Physical Review E, 111(3):035304, 2025

work page 2025
[56]

Pi-mfm: Physics-informed multimodal foundation model for solving partial differential equations.arXiv preprint arXiv:2512.23056, 2025

Min Zhu, Jingmin Sun, Zecheng Zhang, Hayden Schaeffer, and Lu Lu. Pi-mfm: Physics-informed multimodal foundation model for solving partial differential equations.arXiv preprint arXiv:2512.23056, 2025

work page arXiv 2025
[57]

A deep learning framework for multi-operator learning: Architectures and approximation theory.arXiv preprint arXiv:2510.25379, 2025

Adrien Weihs, Jingmin Sun, Zecheng Zhang, and Hayden Schaeffer. A deep learning framework for multi-operator learning: Architectures and approximation theory.arXiv preprint arXiv:2510.25379, 2025

work page arXiv 2025
[58]

Compno: A novel foundation model approach for solving partial differential equations.Applied Sciences, 16(2):972, 2026

Hamda Hmida, Hsiu-Wen Chang Joly, and Youssef Mesri. Compno: A novel foundation model approach for solving partial differential equations.Applied Sciences, 16(2):972, 2026

work page 2026
[59]

A multimodal pde foundation model for prediction and scientific text descriptions.arXiv preprint arXiv:2502.06026, 2025

Elisa Negrini, Yuxuan Liu, Liu Yang, Stanley J Osher, and Hayden Schaeffer. A multimodal pde foundation model for prediction and scientific text descriptions.arXiv preprint arXiv:2502.06026, 2025

work page arXiv 2025
[60]

Pdeformer: Towards a foundation model for one-dimensional partial differential equations.arXiv preprint arXiv:2402.12652, 2024

Zhanhong Ye, Xiang Huang, Leheng Chen, Hongsheng Liu, Zidong Wang, and Bin Dong. Pdeformer: Towards a foundation model for one-dimensional partial differential equations.arXiv preprint arXiv:2402.12652, 2024

work page arXiv 2024
[61]

Prose: Predicting multiple operators and symbolic expressions using multimodal transformers.Neural Networks, 180:106707, 2024

Yuxuan Liu, Zecheng Zhang, and Hayden Schaeffer. Prose: Predicting multiple operators and symbolic expressions using multimodal transformers.Neural Networks, 180:106707, 2024

work page 2024
[62]

Vicon: Vision in-context operator networks for multi-physics fluid dynamics prediction.arXiv preprint arXiv:2411.16063, 2024

Yadi Cao, Yuxuan Liu, Liu Yang, Rose Yu, Hayden Schaeffer, and Stanley Osher. Vicon: Vision in-context operator networks for multi-physics fluid dynamics prediction.arXiv preprint arXiv:2411.16063, 2024

work page arXiv 2024
[63]

Disco: learning to discover an evolution operator for multi- physics-agnostic prediction.arXiv preprint arXiv:2504.19496, 2025

Rudy Morel, Jiequn Han, and Edouard Oyallon. Disco: learning to discover an evolution operator for multi- physics-agnostic prediction.arXiv preprint arXiv:2504.19496, 2025

work page arXiv 2025
[64]

In-context operator learning with data prompts for differential equation problems.Proceedings of the National Academy of Sciences, 120(39):e2310142120, 2023

Liu Yang, Siting Liu, Tingwei Meng, and Stanley J Osher. In-context operator learning with data prompts for differential equation problems.Proceedings of the National Academy of Sciences, 120(39):e2310142120, 2023

work page 2023
[65]

Time-series forecasting, knowledge distillation, and refinement within a multimodal pde foundation model.arXiv preprint arXiv:2409.11609, 2024

Derek Jollie, Jingmin Sun, Zecheng Zhang, and Hayden Schaeffer. Time-series forecasting, knowledge distillation, and refinement within a multimodal pde foundation model.arXiv preprint arXiv:2409.11609, 2024

work page arXiv 2024
[66]

Prose-fd: A multimodal pde foundation model for learning multiple operators for forecasting fluid dynamics.arXiv preprint arXiv:2409.09811, 2024

Yuxuan Liu, Jingmin Sun, Xinjie He, Griffin Pinney, Zecheng Zhang, and Hayden Schaeffer. Prose-fd: A multimodal pde foundation model for learning multiple operators for forecasting fluid dynamics.arXiv preprint arXiv:2409.09811, 2024

work page arXiv 2024
[67]

Unisolver: Pde-conditional trans- formers towards universal neural pde solvers.arXiv preprint arXiv:2405.17527, 2024

Hang Zhou, Yuezhou Ma, Haixu Wu, Haowen Wang, and Mingsheng Long. Unisolver: Pde-conditional trans- formers towards universal neural pde solvers.arXiv preprint arXiv:2405.17527, 2024

work page arXiv 2024
[68]

Pdeformer-2: A versatile foundation model for two-dimensional partial differential equations.arXiv preprint arXiv:2507.15409, 2025

Zhanhong Ye, Zining Liu, Bingyang Wu, Hongjie Jiang, Leheng Chen, Minyan Zhang, Xiang Huang, Qinghe Meng Zou, Hongsheng Liu, Bin Dong, et al. Pdeformer-2: A versatile foundation model for two-dimensional partial differential equations.arXiv preprint arXiv:2507.15409, 2025

work page arXiv 2025
[69]

Itô’s formula for flows of measures on semimartingales.Stochastic Processes and their applications, 159:350–390, 2023

Xin Guo, Huyên Pham, and Xiaoli Wei. Itô’s formula for flows of measures on semimartingales.Stochastic Processes and their applications, 159:350–390, 2023. 27 MVNN

work page 2023
[70]

Protter.Stochastic Differential Equations, pages 249–361

Philip E. Protter.Stochastic Differential Equations, pages 249–361. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005

work page 2005
[71]

Topics in propagation of chaos

Alain-Sol Sznitman. Topics in propagation of chaos. InEcole d’été de probabilités de Saint-Flour XIX—1989, pages 165–251. Springer, 2006

work page 1989
[72]

Propagation of chaos: a review of models, methods and applications

Louis-Pierre Chaintron and Antoine Diez. Propagation of chaos: a review of models, methods and applications. ii. applications.arXiv preprint arXiv:2106.14812, 2021

work page arXiv 2021
[73]

Henry P McKean et al. Propagation of chaos for a class of non-linear parabolic equations.Stochastic Differential Equations (Lecture Series in Differential Equations, Session 7, Catholic Univ., 1967), pages 41–57, 1967

work page 1967
[74]

Learning mean-field equations from particle data using wsindy.Physica D: Nonlinear Phenomena, 439:133406, 2022

Daniel A Messenger and David M Bortz. Learning mean-field equations from particle data using wsindy.Physica D: Nonlinear Phenomena, 439:133406, 2022

work page 2022
[75]

Physics-informed genetic programming for discovery of partial differential equations from scarce and noisy data.Journal of Computational Physics, 514:113261, 2024

Benjamin G Cohen, Burcu Beykal, and George M Bollas. Physics-informed genetic programming for discovery of partial differential equations from scarce and noisy data.Journal of Computational Physics, 514:113261, 2024

work page 2024
[76]

Mean-field neural networks: learning mappings on wasserstein space.Neural Networks, 168:380–393, 2023

Huyên Pham and Xavier Warin. Mean-field neural networks: learning mappings on wasserstein space.Neural Networks, 168:380–393, 2023

work page 2023
[77]

Neural scaling laws of deep relu and deep operator network: A theoretical study.arXiv preprint arXiv:2410.00357, 2024

Hao Liu, Zecheng Zhang, Wenjing Liao, and Hayden Schaeffer. Neural scaling laws of deep relu and deep operator network: A theoretical study.arXiv preprint arXiv:2410.00357, 2024

work page arXiv 2024
[78]

Maximum likelihood estimation of mckean– vlasov stochastic differential equation and its application.Applied Mathematics and Computation, 274:237–246, 2016

Jianghui Wen, Xiangjun Wang, Shuhua Mao, and Xinping Xiao. Maximum likelihood estimation of mckean– vlasov stochastic differential equation and its application.Applied Mathematics and Computation, 274:237–246, 2016

work page 2016
[79]

Parameter estimation for the mckean- vlasov stochastic differential equation.arXiv preprint arXiv:2106.13751, 2021

Louis Sharrock, Nikolas Kantas, Panos Parpas, and Grigorios A Pavliotis. Parameter estimation for the mckean- vlasov stochastic differential equation.arXiv preprint arXiv:2106.13751, 2021

work page arXiv 2021
[80]

Adam: A Method for Stochastic Optimization

Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

Showing first 80 references.