arxiv: 2604.21097 · v1 · submitted 2026-04-22 · 📊 stat.ML · cs.LG

Recognition: unknown

Learning to Emulate Chaos: Adversarial Optimal Transport Regularization

Gabriel Melo , Leonardo Santiago , Peter Y. Lu

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:45 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords chaos emulationoptimal transportadversarial regularizationneural operatorsdynamical systemsstatistical fidelityattractors

0 comments

The pith

Adversarial optimal transport regularization trains neural emulators to match chaotic attractor statistics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Chaotic dynamical systems are sensitive to initial conditions, so exact long-term forecasts are impossible and squared-error losses fail when training data-driven emulators on noisy data. The paper introduces a family of adversarial optimal transport objectives to regularize training so that emulators reproduce the statistical properties of the chaotic attractor rather than pointwise trajectories. It analyzes and tests a Sinkhorn divergence formulation based on the 2-Wasserstein distance together with a WGAN-style dual formulation based on the 1-Wasserstein distance; both jointly learn the emulator and high-quality summary statistics. Experiments on multiple chaotic systems, including those with high-dimensional attractors, show that the resulting emulators achieve significantly better long-term statistical fidelity than methods relying on handcrafted features or fixed summary statistics.

Core claim

A family of adversarial optimal transport objectives, including Sinkhorn divergence for 2-Wasserstein matching and a WGAN-style dual for 1-Wasserstein matching, jointly learns summary statistics and a physically consistent emulator that reproduces the statistical properties of chaotic attractors.

What carries the argument

Adversarial optimal transport objectives that enforce distributional matching between emulator trajectories and the true chaotic attractor while learning summary statistics.

If this is right

Emulators exhibit significantly improved long-term statistical fidelity across a variety of chaotic systems.
The method succeeds even for systems with high-dimensional chaotic attractors.
Joint learning of summary statistics and the emulator removes the need for handcrafted local features.
Both the Sinkhorn divergence and WGAN-style formulations are theoretically analyzed and experimentally validated for this task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The regularization may allow neural operator architectures to handle a wider range of complex dynamical systems where only statistical behavior is observable.
Applications such as weather or power-grid modeling could use these emulators for ensemble forecasting without pointwise accuracy.
The approach could be combined with other regularization terms that encode known physical invariants.

Load-bearing premise

The adversarial optimal transport regularization produces physically consistent emulators without introducing artifacts, instabilities, or distribution mismatches that affect downstream use.

What would settle it

Train an emulator on a chaotic system using the proposed regularization, then generate long trajectories and measure whether their statistical properties (for example, state distributions or attractor dimensions) match those of the true system or whether unphysical artifacts appear.

Figures

Figures reproduced from arXiv: 2604.21097 by Gabriel Melo, Leonardo Santiago, Peter Y. Lu.

**Figure 1.** Figure 1: Adversarial optimal transport regularization for emulating chaotic dynamics. (a) Emulator training via one-step prediction loss with OT regularization. (b) Adversarial learning of summary statistics that maximize the discrepancy between real and generated trajectory distributions while minimizing the full loss. 4. Our Approach: Adversarial Optimal Transport Regularization Motivated by the fact that chaotic… view at source ↗

**Figure 2.** Figure 2: KS full roll-out evaluation (clean data). Our WGANstyle emulator most faithfully replicates the diagonal wave patterns and spatial structure of the ground truth (numerical simulation) across the full evaluation rollout. Lorenz-63 Attractor Geometry Comparison Baseline (No OT) WGAN (Learnable) σ = 0.1 20 15 10 5 0 5 10 15 20 x 20 10 0 10 20 y 5 10 15 20 25 30 35 40 45 z Ground truth Emulator 20 15 10 5 0 5… view at source ↗

**Figure 3.** Figure 3: L63 emulator geometry at increasing noise level σ. At σ = 0.10, the MSE baseline (No OT) underestimates the spatial extent of the attractor; at σ = 0.15, it collapses to a limit cycle, losing the bilobal structure entirely. WGAN maintains coverage of both lobes at both noise levels, directly illustrating how distributional regularization prevents attractor collapse under noise. More results on L63 are prov… view at source ↗

**Figure 4.** Figure 4: Lipschitz bounds during training for the WGAN summary map f on L96, under three regularization settings. Upper bounds (dashed) are computed as Q ℓ ∥Wℓ∥2; lower bounds (solid) are estimated from the mean Jacobian spectral norm over a batch subset. Prescribed thresholds Lmax = 4 and Lmax = 10 are shown as horizontal dash-dotted lines. Practical trade-off. Explicit regularization via per-step Jacobian comput… view at source ↗

**Figure 5.** Figure 5: shows the Lorenz–63 attractor colored by the learned summary value s = fφ(u) across the three canonical projections (x, y), (x, z), and (y, z), alongside the projection induced by the dominant eigenvector of the displacement covariance CT (Proposition A.1). 2 1 0 1 2 x 3 2 1 0 1 2 3 y (x, y) 2 1 0 1 2 x 3 2 1 0 1 2 3 z (x, z) 3 2 1 0 1 2 3 y 3 2 1 0 1 2 3 z (y, z) 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 f_… view at source ↗

**Figure 6.** Figure 6: shows long-run state-space visitation histograms, comparing the ground truth system against the emulator trained with the adversarial OT objective. Both histograms are computed from a single long trajectory after transient removal. 2 1 0 1 2 0.0 0.1 0.2 0.3 0.4 0.5 Density x(t) Predicted True 3 2 1 0 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Density y(t) Predicted True 3 2 1 0 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5 Density… view at source ↗

**Figure 7.** Figure 7: compares the distribution of the learned summary statistic s extracted from ground truth trajectory segments against emulator-generated segments. 3 2 1 0 1 2 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Density f(u(t)) Predicted True Summary Histogram Comparison [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Space-time plots of u(x, t) for the Lorenz–96 system (d = 60), comparing ground truth numerical simulations (left) against emulator rollouts (right) over 1,500 timesteps. Each row corresponds to a different method. As expected for chaotic systems, pointwise trajectory agreement is not maintained beyond the Lyapunov time; the relevant comparison is the statistical structure of the attractor over long horizo… view at source ↗

**Figure 9.** Figure 9: Space-time plots of u(x, t) for the Kuramoto–Sivashinsky equation (d = 256), comparing ground truth numerical simulations (left) against emulator rollouts (right) over 1,000 timesteps. Each row corresponds to a different method. Trajectory-level correspondence is not expected beyond the Lyapunov time due to chaos; the relevant comparison is the statistical structure of the attractor rather than pointwise a… view at source ↗

**Figure 10.** Figure 10: Vorticity rollouts for 2D Kolmogorov flow (Re = 104 , α = 0.1) at selected timesteps. Each row shows autoregressive predictions from a different model alongside the ground truth (top row). For quantitative analysis, see Tables 2. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗

read the original abstract

Chaos arises in many complex dynamical systems, from weather to power grids, but is difficult to accurately model using data-driven emulators, including neural operator architectures. For chaotic systems, the inherent sensitivity to initial conditions makes exact long-term forecasts theoretically infeasible, meaning that traditional squared-error losses often fail when trained on noisy data. Recent work has focused on training emulators to match the statistical properties of chaotic attractors by introducing regularization based on handcrafted local features and summary statistics, as well as learned statistics extracted from a diverse dataset of trajectories. In this work, we propose a family of adversarial optimal transport objectives that jointly learn high-quality summary statistics and a physically consistent emulator. We theoretically analyze and experimentally validate a Sinkhorn divergence formulation (2-Wasserstein) and a WGAN-style dual formulation (1-Wasserstein). Our experiments across a variety of chaotic systems, including systems with high-dimensional chaotic attractors, show that emulators trained with our approach exhibit significantly improved long-term statistical fidelity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a joint adversarial OT approach to learn summary statistics and train emulators that better match chaotic attractor statistics.

read the letter

The main point is that they frame emulator training for chaotic systems as matching distributions via adversarial optimal transport, so the model learns both useful summary statistics and the dynamics at the same time. This differs from prior work that either uses fixed handcrafted features or learns statistics separately from a dataset of trajectories. They work out two concrete versions—one based on Sinkhorn divergence for the 2-Wasserstein distance and one that follows the WGAN dual for the 1-Wasserstein distance—and claim both theory and experiments to support them. The experiments cover several chaotic systems, including higher-dimensional attractors, and report better long-term statistical fidelity than the baselines. That joint formulation is the clearest new piece and it makes sense as a way to avoid brittle choices in what statistics to track. The theoretical analysis is mentioned but not shown in detail here, so its depth is hard to judge without the derivations. On the experimental side, the abstract gives no numbers on effect sizes, no description of the exact metrics or how the baselines were implemented, and no discussion of whether the emulators remain stable or introduce artifacts when rolled out. Those gaps make it difficult to know how general the gains are. This is the kind of paper that would interest people building neural operators or reduced-order models for weather, fluids, or power systems. It is grounded enough in existing OT and chaos literature to merit peer review, even if the current version needs more concrete evidence on the improvements and any limitations.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a family of adversarial optimal transport objectives—specifically a Sinkhorn divergence formulation based on the 2-Wasserstein distance and a WGAN-style dual formulation for the 1-Wasserstein distance—to jointly learn summary statistics and train neural emulators for chaotic dynamical systems. The central claim is that this regularization yields emulators with significantly improved long-term statistical fidelity to the attractors of chaotic systems, outperforming baselines that rely on handcrafted local features or learned statistics from trajectory datasets, as supported by theoretical analysis and experiments on a variety of chaotic systems including high-dimensional attractors.

Significance. If the central claims hold, the work provides a principled, automatic alternative to handcrafted or pre-learned statistics for regularizing data-driven emulators of chaotic dynamics. This could improve the reliability of long-term statistical predictions in applications such as weather modeling and power-grid simulation, where exact trajectory matching is infeasible due to sensitivity to initial conditions. The joint learning of statistics and emulator via optimal transport is a notable strength relative to prior regularization approaches.

minor comments (3)

The abstract and introduction would benefit from a brief, explicit statement of the precise baseline methods (handcrafted features and learned-statistic approaches) and the quantitative metrics used to assess long-term statistical fidelity, to allow readers to immediately gauge the scope of the claimed improvements.
In the experimental section, additional detail on the number of independent runs, standard deviations or confidence intervals for the reported fidelity metrics, and the precise definition of 'long-term' (e.g., integration horizon relative to Lyapunov time) would strengthen reproducibility and interpretation of the results.
Notation for the adversarial objectives (Sinkhorn and dual formulations) should be introduced with a short table or inline reminder of the key variables (e.g., the role of the critic network and the regularization parameter) to improve readability for readers less familiar with optimal transport.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and for recommending minor revision. The referee's description accurately reflects the manuscript's contributions regarding adversarial optimal transport regularization for emulators of chaotic systems. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines its core adversarial optimal transport objectives (Sinkhorn 2-Wasserstein divergence and WGAN-style 1-Wasserstein dual) directly from standard optimal transport theory and applies them to jointly optimize summary statistics and the emulator. No load-bearing step in the abstract or described approach reduces the claimed predictions or statistical fidelity improvements to quantities fitted from the target data by construction, nor relies on self-citations for uniqueness theorems, ansatzes, or renaming of known results. The central claim rests on experimental comparison to handcrafted and learned-statistic baselines, which supplies independent validation rather than tautological equivalence to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the method relies on standard optimal transport and adversarial training concepts from prior literature.

pith-pipeline@v0.9.0 · 5469 in / 962 out tokens · 119384 ms · 2026-05-09T22:45:48.433096+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 10 canonical work pages · 1 internal anchor

[1]

Advances in Neural Information Processing Systems , volume=

Training neural operators to preserve invariant measures of chaotic attractors , author=. Advances in Neural Information Processing Systems , volume=
[2]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year =

Hierarchical Implicit Neural Emulators , author =. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year =
[3]

2010 , eprint=

Physical Measure and Absolute Continuity for One-Dimensional Center Direction , author=. 2010 , eprint=

2010
[4]

International Conference on Artificial Intelligence and Statistics , pages=

Learning generative models with sinkhorn divergences , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2018 , organization=

2018
[5]

Neural Computation 9(8), 1735–1780 (1997)

Hochreiter, Sepp and Schmidhuber, J\". Long Short-Term Memory , year =. Neural Comput. , month = nov, pages =. doi:10.1162/neco.1997.9.8.1735 , abstract =

work page doi:10.1162/neco.1997.9.8.1735 1997
[6]

2025 , eprint=

Optimal Transport for Machine Learners , author=. 2025 , eprint=

2025
[7]

Annals of Mathematical Statistics , volume =

Richard Sinkhorn , title =. Annals of Mathematical Statistics , volume =
[8]

Advances in Neural Information Processing Systems (NeurIPS) , pages =

Marco Cuturi , title =. Advances in Neural Information Processing Systems (NeurIPS) , pages =
[9]

Proceedings of the 31st International Conference on Machine Learning (ICML) , volume =

Marco Cuturi and Arnaud Doucet , title =. Proceedings of the 31st International Conference on Machine Learning (ICML) , volume =. 2014 , publisher =

2014
[10]

2018 , eprint=

Interpolating between Optimal Transport and MMD using Sinkhorn Divergences , author=. 2018 , eprint=

2018
[11]

2022 , eprint=

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning , author=. 2022 , eprint=

2022
[12]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Deep Sets , author =. Advances in Neural Information Processing Systems (NeurIPS) , volume =
[13]

IEEE transactions on neural networks , volume=

Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems , author=. IEEE transactions on neural networks , volume=. 1995 , publisher=

1995
[14]

Nature machine intelligence , volume=

Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators , author=. Nature machine intelligence , volume=. 2021 , publisher=

2021
[15]

2016 , eprint=

Exploiting Cyclic Symmetry in Convolutional Neural Networks , author=. 2016 , eprint=

2016
[16]

2016 , eprint=

Group Equivariant Convolutional Networks , author=. 2016 , eprint=

2016
[17]

2017 , eprint=

Wasserstein GAN , author=. 2017 , eprint=

2017
[18]

arXiv preprint arXiv:1910.03875 , year=

How well do wgans estimate the wasserstein metric? , author=. arXiv preprint arXiv:1910.03875 , year=

work page arXiv 1910
[19]

Wasserstein GANs work because they fail (to approximate the Wasserstein distance)

Wasserstein GANs work because they fail (to approximate the Wasserstein distance) , author=. arXiv preprint arXiv:2103.01678 , year=

work page arXiv
[20]

Advances in neural information processing systems , volume=

Improved training of wasserstein gans , author=. Advances in neural information processing systems , volume=
[21]

Communications of the ACM , volume=

Generative adversarial networks , author=. Communications of the ACM , volume=. 2020 , publisher=

2020
[22]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

Recent advances in optimal transport for machine learning , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
[23]

On convergence and stability of GANs

On convergence and stability of gans , author=. arXiv preprint arXiv:1705.07215 , year=

work page arXiv
[24]

Advances in neural information processing systems , volume=

Generative adversarial nets , author=. Advances in neural information processing systems , volume=
[25]

Demystifying MMD GANs

Demystifying mmd gans , author=. arXiv preprint arXiv:1801.01401 , year=

work page internal anchor Pith review arXiv
[26]

The journal of machine learning research , volume=

A kernel two-sample test , author=. The journal of machine learning research , volume=. 2012 , publisher=

2012
[27]

Spectral Normalization for Generative Adversarial Networks

Spectral normalization for generative adversarial networks , author=. arXiv preprint arXiv:1802.05957 , year=

work page Pith review arXiv
[28]

Advances in Neural Information Processing Systems , volume=

Learning Chaotic Dynamics in Dissipative Systems , author=. Advances in Neural Information Processing Systems , volume=
[29]

SIAM Journal on Applied Dynamical Systems , volume=

Optimal Transport for Parameter Identification of Chaotic Dynamics via Invariant Measures , author=. SIAM Journal on Applied Dynamical Systems , volume=. 2023 , doi=

2023
[30]

Learning Dynamics on Invariant Measures Using

Botvinick-Greenhouse, Jonah and Martin, Robert and Yang, Yunan , journal=. Learning Dynamics on Invariant Measures Using. 2023 , publisher=

2023
[31]

Chaos: An Interdisciplinary Journal of Nonlinear Science , volume=

Constraining Chaos: Enforcing Dynamical Invariants in the Training of Reservoir Computers , author=. Chaos: An Interdisciplinary Journal of Nonlinear Science , volume=. 2023 , publisher=

2023
[32]

Proceedings of the 41st International Conference on Machine Learning , series=

Schiff, Yair and Wan, Zhong Yi and Parker, Jeffrey B and Hoyer, Stephan and Kuleshov, Volodymyr and Sha, Fei and Zepeda-N. Proceedings of the 41st International Conference on Machine Learning , series=
[33]

Advances in Neural Information Processing Systems , volume=

Beyond Closure Models: Learning Chaotic-Systems via Physics-Informed Neural Operators , author=. Advances in Neural Information Processing Systems , volume=
[34]

Proceedings of the 13th International Conference on Learning Representations , year=

Learning Chaos In A Linear Way , author=. Proceedings of the 13th International Conference on Learning Representations , year=
[35]

On the Difficulty of Learning Chaotic Dynamics with

Mikhaeil, Jonas M and Monfared, Zahra and Durstewitz, Daniel , booktitle=. On the Difficulty of Learning Chaotic Dynamics with
[36]

Proceedings of the 40th International Conference on Machine Learning , series=

Generalized Teacher Forcing for Learning Chaotic Dynamics , author=. Proceedings of the 40th International Conference on Machine Learning , series=
[37]

Nonlinear Dynamics: A Primer , DOI=

Medio, Alfredo and Lines, Marji , year=. Nonlinear Dynamics: A Primer , DOI=
[38]

Dorfman, J. R. , year=. An Introduction to Chaos in Nonequilibrium Statistical Mechanics , DOI=
[39]

2015 , month =

Davidson, Peter , title = ". 2015 , month =

2015
[40]

2022 , eprint=

FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators , author=. 2022 , eprint=

2022
[41]

Uncovering turbulent plasma dynamics via deep learning from partial observations , author =. Phys. Rev. E , volume =. 2021 , month =. doi:10.1103/PhysRevE.104.025205 , url =

work page doi:10.1103/physreve.104.025205 2021
[42]

and Chmiela, Stefan and Sauceda, Huziel E

Unke, Oliver T. and Chmiela, Stefan and Sauceda, Huziel E. and Gastegger, Michael and Poltavsky, Igor and Sch. Machine Learning Force Fields , journal=. 2021 , month=

2021
[43]

and Kornbluth, Mordechai and Kozinsky, Boris , title=

Musaelian, Albert and Batzner, Simon and Johansson, Anders and Sun, Lixin and Owen, Cameron J. and Kornbluth, Mordechai and Kozinsky, Boris , title=. Nature Communications , year=
[44]

International Conference on Learning Representations , year=

Fourier Neural Operator for Parametric Partial Differential Equations , author=. International Conference on Learning Representations , year=
[45]

Nature Machine Intelligence , year=

Lu, Lu and Jin, Pengzhan and Pang, Guofei and Zhang, Zhongqiang and Karniadakis, George Em , title=. Nature Machine Intelligence , year=
[46]

On the difficulty of learning chaotic dynamics with RNNs

Mikhaeil, Jonas M and Monfared, Zahra and Durstewitz, Daniel. On the difficulty of learning chaotic dynamics with RNNs. Advances in Neural Information Processing Systems
[47]

2015 , publisher=

Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering , author=. 2015 , publisher=

2015
[48]

Annual Review of Condensed Matter Physics , volume=

Machine learning for climate physics and simulations , author=. Annual Review of Condensed Matter Physics , volume=. 2024 , publisher=

2024
[49]

Science , volume=

Learning skillful medium-range global weather forecasting , author=. Science , volume=. 2023 , publisher=

2023
[50]

Nature , volume=

Neural general circulation models for weather and climate , author=. Nature , volume=. 2024 , publisher=

2024
[51]

ACE: A fast, skillful learned global atmospheric model for climate prediction

ACE: A fast, skillful learned global atmospheric model for climate prediction , author=. arXiv preprint arXiv:2310.02074 , year=

work page arXiv
[52]

2024 , eprint=

Challenges of learning multi-scale dynamics with AI weather models: Implications for stability and one solution , author=. 2024 , eprint=

2024
[53]

Geophysical Research Letters , volume=

On some limitations of current machine learning weather prediction models , author=. Geophysical Research Letters , volume=. 2024 , publisher=

2024
[54]

Subspace Robust

Paty, Fran. Subspace Robust. Proceedings of the 36th International Conference on Machine Learning , pages =. 2019 , editor =

2019
[55]

Dmitrii Kochkov, Jamie A

Chandler, Gary J. and Kerswell, Rich R. , year=. Invariant recurrent solutions embedded in a turbulent two-dimensional Kolmogorov flow , volume=. doi:10.1017/jfm.2013.122 , journal=

work page doi:10.1017/jfm.2013.122 2013
[56]

1986 , issn =

The Kuramoto-Sivashinsky equation: A bridge between PDE'S and dynamical systems , journal =. 1986 , issn =. doi:https://doi.org/10.1016/0167-2789(86)90166-1 , author =

work page doi:10.1016/0167-2789(86)90166-1 1986
[57]

and Brunton, Bingni W

Brunton, Steven L. and Brunton, Bingni W. and Proctor, Joshua L. and Kaiser, Eurika and Kutz, J. Nathan , title =. Nature Communications , volume =. 2017 , doi =

2017
[58]

International Conference on Learning Representations , year =

Khromov, Grigory and Pal Singh, Sidak , title =. International Conference on Learning Representations , year =
[59]

Journal of Advances in Modeling Earth Systems , volume=

Learning closed-form equations for subgrid-scale closures from high-fidelity data: Promises and challenges , author=. Journal of Advances in Modeling Earth Systems , volume=. 2024 , publisher=

2024
[60]

Part 1: Theory , author=

Lyapunov characteristic exponents for smooth dynamical systems and for Hamiltonian systems; a method for computing all of them. Part 1: Theory , author=. Meccanica , volume=. 1980 , publisher=

1980