pith. sign in

arxiv: 2605.31547 · v1 · pith:2E3TZWUQnew · submitted 2026-05-29 · 💻 cs.LG · math.DS· stat.ML

The Dynamic-Probabilistic Consistency Gap in Chaotic Surrogate Modeling

Pith reviewed 2026-06-28 23:20 UTC · model grok-4.3

classification 💻 cs.LG math.DSstat.ML
keywords dynamic-probabilistic consistency gapchaotic surrogate modelingdynamical systems reconstructionextended Kalman filteruncertainty estimationLorenz-96Jacobian covariance
0
0 comments X

The pith

Open-loop Gaussian rollout objectives create a dynamic-probabilistic consistency gap by penalizing Jacobian-generated covariance growth in chaotic surrogate models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that finite-horizon probabilistic training objectives in dynamical systems reconstruction can degrade learned dynamics or decouple predictive uncertainty from local tangent dynamics in chaotic systems. It isolates three mechanisms behind this gap: core collapse, noise masking, and blind uncertainty. The authors introduce KAFFEE, a differentiable extended Kalman filter framework that evaluates likelihood on local innovations while transporting covariance through learned Jacobians, and demonstrate that it reduces the failure modes, improves dynamical invariant reconstruction, and supports in-context Bayesian filtering on hyperchaotic systems and foundation model adaptation.

Core claim

Open-loop Gaussian rollout objectives penalize Jacobian-generated covariance growth in chaotic systems, encouraging optimization shortcuts that weaken physical expansion or decouple uncertainty from the tangent dynamics. KAFFEE closes this dynamic-probabilistic consistency gap by evaluating likelihood on innovations and transporting covariance via Jacobians, yielding better invariant reconstruction on stochastic hyperchaotic Lorenz-96 and largely preserving zero-shot dynamics across 13 systems during probabilistic adaptation of a foundation model.

What carries the argument

The dynamic-probabilistic consistency (DPC) gap, produced when open-loop Gaussian objectives penalize covariance growth arising from learned local Jacobians in chaotic regimes.

If this is right

  • KAFFEE reduces core collapse, noise masking, and blind uncertainty on stochastic hyperchaotic Lorenz-96.
  • KAFFEE improves reconstruction of dynamical invariants relative to open-loop objectives.
  • KAFFEE maintains competitive predictive scores while enabling in-context Bayesian filtering.
  • KAFFEE largely preserves zero-shot dynamics when adapting a DSR foundation model across 13 chaotic systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The gap may reduce trustworthiness of uncertainty estimates for long-horizon forecasts in chaotic domains.
  • Similar decoupling could occur in other rollout-based probabilistic losses beyond Gaussian objectives.
  • KAFFEE-style local filtering may offer a route to consistent uncertainty in high-dimensional surrogate models outside the tested systems.

Load-bearing premise

That likelihood evaluation on local innovations combined with Jacobian-based covariance transport in a differentiable extended Kalman filter will close the consistency gap without creating new inconsistencies in chaotic regimes.

What would settle it

Direct comparison of whether covariance growth under KAFFEE remains aligned with the learned Jacobians on a new hyperchaotic system where open-loop training produced measurable decoupling.

Figures

Figures reproduced from arXiv: 2605.31547 by Andre Herz, Daniel Durstewitz, Georgia Koppe, Matthijs Pals.

Figure 1
Figure 1. Figure 1: The DPC gap in probabilistic DSR. Left: Finite-horizon scores SH can improve through optimization shortcuts that can degrade dynamical reconstruction quality: core collapse, noise masking, or blind uncertainty. KAFFEE mitigates these by pairing filtered innovation scoring with Jacobian-based covariance propagation. Right: A mechanistic taxonomy of these failure modes. average, but also indicate when local … view at source ↗
Figure 2
Figure 2. Figure 2: Stochastic Lorenz-96 (20D) results. Left: DPC gap. (a) Held-out NLL, (b) deterministic state-space divergence (Dstsp), (c) unstable-volume growth error, and (d) stochastic Dstsp. Right: Invariant reconstruction and dynamically grounded local uncertainty. (e) Median reconstruction of positive deterministic Lyapunov exponents λi . Only filtered innovations outperform the deter￾ministic prior. (f) Correspondi… view at source ↗
Figure 3
Figure 3. Figure 3: a shows the resulting trade-off. After 10 optimization steps on 13 systems and 20 seeds, KAFFEE matches or improves the held-out NLL of the open-loop control while drifting less from the pretrained autonomous core. Yet, the autonomous geometry differs substantially. Noise-only adaptation preserves the core by construction, but performs worse in NLL. KAFFEE thus preserves most of the zero-shot DynaMix dynam… view at source ↗
Figure 4
Figure 4. Figure 4: Lorenz-96 forecast-skill spectrum across horizons. Predictive scores targeted at marginal calibration alone do not certify a dynamically grounded probabilistic surrogate. The no-transport baselines remain competitive in NLL/CRPS (a-b), but its uncertainty is locally blind (c). Filtered objectives maintain Jacobian-transported uncertainty across horizons. Lines show medians across 20 seeds; shaded bands are… view at source ↗
Figure 5
Figure 5. Figure 5: Attenuation on a noisy linear spiral. From left to right: optimization traces, eigenvalues of learned Jacobians, deterministic autonomous rollouts from the common initialization, and stochastic state rollouts from that same initialization. Both models share the same linear Gaussian state-space parametrization and initialization. The noisy state predictive objective over-damps the learned autonomous map, wh… view at source ↗
Figure 6
Figure 6. Figure 6: Observation-noise sweep for the linear spiral. The noisy-state predictive objective becomes progressively more contractive as observation noise increases. KAFFEE remains closer to the true stable transition because it separates latent-state inference from observation noise. Error bars show 95% confidence intervals over 10 seeds. Van der Pol limit cycle. We next use a nonlinear but non-chaotic system: a noi… view at source ↗
Figure 7
Figure 7. Figure 7: shows the same qualitative failure in a nonlinear setting. The noisy-state predictive NLL learns an overcontractive autonomous map: its shell-off rollout does not return to the limit cycle and instead collapses to a spurious fixed point. The stochastic shell-on rollout can partly “mask” this deterministic failure, but it does not repair the learned core. KAFFEE preserves the transient and returns to the co… view at source ↗
Figure 8
Figure 8. Figure 8: Van der Pol covariance propagation map. (top row) One-step transported reference￾covariance trace over the same sampled state cloud: left for the true flow Jacobian, center for the noisy-state predictive objective, right for KAFFEE. (bottom row) Analogous twenty-step covariance propagation map with the same ordering of panels. Overall, the noisy-state predictive objective yields damped/contracted uncertain… view at source ↗
read the original abstract

Dynamical systems reconstruction (DSR) aims to learn surrogate models that capture the dynamics underlying time-series data. Reliably deploying these surrogates requires uncertainty estimates consistent with the learned dynamics. We expose a dynamic-probabilistic consistency (DPC) gap: the pursuit of finite-horizon probabilistic objectives can degrade dynamics or decouple predictive uncertainty from the local tangent dynamics it ought to reflect. We isolate three mechanisms behind this gap: core collapse, noise masking, and blind uncertainty. Specifically, we show that open-loop Gaussian rollout objectives can penalize Jacobian-generated covariance growth in chaotic systems, encouraging optimization shortcuts that weaken physical expansion or decouple uncertainty from it. To mitigate this gap, we propose KAFFEE (Kalman-Aware Framework For Ergodic Emulation), a differentiable extended Kalman filter-based training framework that evaluates likelihood on local predictive residuals (innovations) while transporting covariance through learned local Jacobians. On stochastic hyperchaotic Lorenz-96, KAFFEE reduces the identified failure modes, improves reconstruction of dynamical invariants relative to open-loop objectives, and maintains competitive predictive scores. We further show that the DPC gap appears when probabilistically adapting a DSR foundation model across 13 chaotic systems, where KAFFEE enables in-context Bayesian filtering while largely preserving zero-shot dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to identify a dynamic-probabilistic consistency (DPC) gap in dynamical systems reconstruction for chaotic systems, where open-loop Gaussian rollout objectives degrade dynamics or decouple uncertainty via three mechanisms (core collapse, noise masking, blind uncertainty). It proposes KAFFEE, a differentiable extended Kalman filter training framework using innovations likelihood and learned local Jacobians for covariance transport. Experiments reportedly show reduced failure modes and improved dynamical invariants on stochastic hyperchaotic Lorenz-96, plus better preservation of zero-shot dynamics when adapting a foundation model across 13 systems.

Significance. If the central claim holds, the work identifies a practically relevant inconsistency between probabilistic training objectives and chaotic dynamics that affects uncertainty-aware surrogate modeling. The KAFFEE approach offers a concrete, differentiable mechanism to enforce consistency via EKF-style innovations, which could improve reliability of long-horizon predictions and invariant preservation in scientific applications of DSR.

major comments (2)
  1. [Section describing KAFFEE and its EKF implementation] The central claim that KAFFEE closes the DPC gap rests on the assumption that learned local Jacobians in the differentiable EKF transport covariance consistently with the tangent dynamics. No direct diagnostic (e.g., comparison of propagated covariance to local Lyapunov structure or ensemble statistics beyond training horizon) is provided for the hyperchaotic Lorenz-96 case, leaving the causal link between the innovations objective and the reported improvements unverified.
  2. [Experimental results on stochastic hyperchaotic Lorenz-96] The isolation of the three mechanisms (core collapse, noise masking, blind uncertainty) and their attribution to open-loop Gaussian rollouts is load-bearing for the motivation. The experimental section reports overall improvements but lacks targeted ablations or metrics that separately quantify each mechanism's reduction under KAFFEE versus baselines.
minor comments (2)
  1. [Abstract] The abstract states that KAFFEE 'maintains competitive predictive scores' but does not specify the exact scores or baselines used for comparison.
  2. [Methods] Notation for the innovation covariance and how the likelihood is evaluated on local predictive residuals should be made explicit with an equation reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and will revise the manuscript accordingly to strengthen the evidence.

read point-by-point responses
  1. Referee: [Section describing KAFFEE and its EKF implementation] The central claim that KAFFEE closes the DPC gap rests on the assumption that learned local Jacobians in the differentiable EKF transport covariance consistently with the tangent dynamics. No direct diagnostic (e.g., comparison of propagated covariance to local Lyapunov structure or ensemble statistics beyond training horizon) is provided for the hyperchaotic Lorenz-96 case, leaving the causal link between the innovations objective and the reported improvements unverified.

    Authors: We agree that a direct diagnostic would provide stronger verification of covariance transport consistency with tangent dynamics. The current manuscript supports the claim via theoretical derivation of the DPC gap and empirical improvements in invariants and failure modes. In revision we will add a diagnostic comparing EKF-propagated covariance growth against the local Lyapunov spectrum computed from the learned Jacobians, plus ensemble statistics on the hyperchaotic Lorenz-96 case. revision: yes

  2. Referee: [Experimental results on stochastic hyperchaotic Lorenz-96] The isolation of the three mechanisms (core collapse, noise masking, blind uncertainty) and their attribution to open-loop Gaussian rollouts is load-bearing for the motivation. The experimental section reports overall improvements but lacks targeted ablations or metrics that separately quantify each mechanism's reduction under KAFFEE versus baselines.

    Authors: The three mechanisms are isolated via theoretical analysis of open-loop Gaussian objectives in chaotic systems. Experiments demonstrate overall reduction in associated failure modes under KAFFEE. We acknowledge the value of separate quantification and will add targeted ablations in revision, including Jacobian-norm statistics for core collapse, calibration metrics for noise masking, and variance analysis for blind uncertainty. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on empirical evaluation of proposed framework on external benchmarks

full rationale

The paper defines the dynamic-probabilistic consistency gap conceptually, identifies mechanisms via analysis of open-loop objectives, and introduces KAFFEE as a differentiable EKF-based alternative. All reported improvements (on stochastic Lorenz-96 and 13-system adaptation) are measured against standard chaotic benchmarks using independent metrics for dynamical invariants and predictive scores. No equations, fitted parameters, or self-citations are shown reducing any central claim to its own inputs by construction; the framework and gap analysis remain self-contained against external data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard domain assumptions from Kalman filtering applied to chaotic dynamics; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Local linear approximations via Jacobians are sufficient to transport covariance in the training objective for chaotic systems.
    Invoked by the differentiable extended Kalman filter component of KAFFEE.

pith-pipeline@v0.9.1-grok · 5771 in / 1260 out tokens · 30925 ms · 2026-06-28T23:20:11.451758+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

83 extracted references · 8 canonical work pages · 6 internal anchors

  1. [1]

    D Durstewitz, G Koppe, and M. I Thurm. Reconstructing computational system dynamics from neural data with recurrent neural networks.Nature Reviews Neuroscience, 24(11):693–710, 2023

  2. [2]

    Generalized Teacher Forcing for Learning Chaotic Dynamics

    F Hess, Z Monfared, M Brenner, and D Durstewitz. Generalized Teacher Forcing for Learning Chaotic Dynamics. InProceedings of the 40th International Conference on Machine Learning (ICML), volume 202 ofProceedings of Machine Learning Research, pages 13017–13049, 2023

  3. [4]

    Out-of-Domain Generalization in Dynamical Systems Reconstruction

    N Göring, F Hess, M Brenner, Z Monfared, and D Durstewitz. Out-of-Domain Generalization in Dynamical Systems Reconstruction. InProceedings of the 41st International Conference on Machine Learning, ICML’24, 2024

  4. [5]

    Invariant Measures in Time-Delay Coordinates for Unique Dynamical System Identification.Phys

    J Botvinick-Greenhouse, R Martin, and Y Yang. Invariant Measures in Time-Delay Coordinates for Unique Dynamical System Identification.Phys. Rev. Lett., 135(16):167202, 2025

  5. [6]

    Y Lu, E Orlova, and R Willett

    R Jiang, P. Y Lu, E Orlova, and R Willett. Training neural operators to preserve invariant measures of chaotic attractors. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, 2023

  6. [7]

    Reconstructing bifurcation diagrams of chaotic circuits with reservoir computing.Phys

    H Luo, Y Du, H Fan, X Wang, J Guo, and X Wang. Reconstructing bifurcation diagrams of chaotic circuits with reservoir computing.Phys. Rev. E, 109(2):024210, 2024

  7. [8]

    T Yang, and N Chandramoorthy

    J Park, N. T Yang, and N Chandramoorthy. When are dynamical systems learned from time series data statistically accurate? InAdvances in Neural Information Processing Systems, volume 37, pages 43975–44008, 2024

  8. [9]

    Y Wan, J

    Y Schiff, Z. Y Wan, J. B Parker, S Hoyer, V Kuleshov, F Sha, and L Zepeda-Núñez. DySLIM: Dynamics stable learning by invariant measure for chaotic systems. InProceedings of the 41st International Conference on Machine Learning, ICML’24, 2024

  9. [10]

    A Gottwald and S Reich

    G. A Gottwald and S Reich. Supervised learning from noisy observations: Combining machine-learning techniques with data assimilation.Physica D: Nonlinear Phenomena, 423:132911, 2021. ISSN 01672789. 10

  10. [11]

    F Psaros, X Meng, Z Zou, L Guo, and G

    A. F Psaros, X Meng, Z Zou, L Guo, and G. E Karniadakis. Uncertainty quantification in scientific machine learning: Methods, metrics, and comparisons.Journal of Computational Physics, 477:111902, 2023. ISSN 00219991

  11. [12]

    Cambridge University Press, 1 edition, 2015

    S Reich and C Cotter.Probabilistic Forecasting and Bayesian Data Assimilation. Cambridge University Press, 1 edition, 2015. ISBN 978-1-107-06939-8 978-1-107-66391-6 978-1-107-70680-4

  12. [13]

    On the Kalman filter error covariance collapse into the unstable subspace

    A Trevisan and L Palatella. On the Kalman filter error covariance collapse into the unstable subspace. Nonlinear Processes in Geophysics, 18(2):243–250, 2011

  13. [14]

    D. J. C MacKay. Bayesian interpolation.Neural Comput., 4(3):415–447, 1992. ISSN 0899-7667

  14. [15]

    E Rasmussen and C

    C. E Rasmussen and C. K. I Williams.Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006

  15. [16]

    P Eckmann and D Ruelle

    J. P Eckmann and D Ruelle. Ergodic theory of chaos and strange attractors.Rev. Mod. Phys., 57(3): 617–656, 1985

  16. [17]

    Data Assimilation for Chaotic Dynamics

    A Carrassi, M Bocquet, J Demaeyer, C Grudzien, P Raanes, and S Vannitsem. Data Assimilation for Chaotic Dynamics. In S. K Park and L Xu, editors,Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications (Vol. IV), pages 1–42. Springer International Publishing, Cham, 2022. ISBN 978-3-030-77721-0 978-3-030-77722-7

  17. [18]

    J Hemmer and D Durstewitz

    C. J Hemmer and D Durstewitz. True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  18. [19]

    B Lai, A Bao, and W Gilpin

    J. B Lai, A Bao, and W Gilpin. Panda: A pretrained forecast model for chaotic dynamics. InThe Fourteenth International Conference on Learning Representations, 2026

  19. [20]

    Position: A Dynamical Systems Perspective is Needed to Advance Time Series Modeling

    C Liu, B Zhao, J Ding, and Y Li. ChaosNexus: A Foundation Model for ODE-based Chaotic System Forecasting with Hierarchical Multi-scale Awareness.arXiv preprint arXiv:2602.16864, 2025

  20. [21]

    Computational aspects of maximum likelihood estimation and reduction in sensitivity function calculations.IEEE Transactions on Automatic Control, 19(6):774–783, 1974

    N Gupta and R Mehra. Computational aspects of maximum likelihood estimation and reduction in sensitivity function calculations.IEEE Transactions on Automatic Control, 19(6):774–783, 1974. ISSN 0018-9286

  21. [22]

    Parameter estimation in non-linear state-space models by automatic differentiation of non-linear kalman filters

    A Gorad, Z Zhao, and S Särkkä. Parameter estimation in non-linear state-space models by automatic differentiation of non-linear kalman filters. In2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6, 2020

  22. [23]

    Institute of Mathematical Statistics Textbooks

    S Särkkä and L Svensson.Bayesian Filtering and Smoothing. Institute of Mathematical Statistics Textbooks. Cambridge University Press, second edition, 2023

  23. [24]

    Sigma-point filtering and smoothing based parameter estimation in nonlinear dynamic systems.Journal of advances in information fusion, 11(1):15, 2016

    J Kokkala, A Solin, and S Särkkä. Sigma-point filtering and smoothing based parameter estimation in nonlinear dynamic systems.Journal of advances in information fusion, 11(1):15, 2016

  24. [25]

    Backprop KF: Learning discriminative deterministic state estimators

    T Haarnoja, A Ajay, S Levine, and P Abbeel. Backprop KF: Learning discriminative deterministic state estimators. InProceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pages 4383–4391, 2016

  25. [26]

    C Tanoh, M Deistler, J

    I. C Tanoh, M Deistler, J. H Macke, and S Linderman. Identifying multi-compartment Hodgkin-Huxley models with high-density extracellular voltage recordings. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

  26. [27]

    N Lorenz

    E. N Lorenz. Predictability: A problem partly solved. InProc. Seminar on Predictability, Reading, UK, ECMWF, volume 1, pages 1–18, 1996

  27. [28]

    Bifurcations in the Learning of Recurrent Neural Networks

    K Doya. Bifurcations in the Learning of Recurrent Neural Networks. InProceedings of the 1992 IEEE International Symposium on Circuits and Systems, 1992

  28. [29]

    A framework for machine learning of model error in dynamical systems

    M Levine and A Stuart. A framework for machine learning of model error in dynamical systems. Communications of the American Mathematical Society, 2(7):283–344, 2022. ISSN 2692-3688

  29. [30]

    Scheduled sampling for sequence prediction with recurrent Neural networks

    S Bengio, O Vinyals, N Jaitly, and N Shazeer. Scheduled sampling for sequence prediction with recurrent Neural networks. InProceedings of the 29th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, pages 1171–1179, 2015

  30. [31]

    Sequence Level Training with Recurrent Neural Networks

    M Ranzato, S Chopra, M Auli, and W Zaremba. Sequence Level Training with Recurrent Neural Networks. In4th International Conference on Learning Representations, ICLR 2016, 2016. 11

  31. [32]

    Continuous-time system identification with neural networks: Model structures and fitting criteria.European Journal of Control, 59:69–81, 2021

    M Forgione and D Piga. Continuous-time system identification with neural networks: Model structures and fitting criteria.European Journal of Control, 59:69–81, 2021

  32. [33]

    Nonlinear state-space identification using deep encoder networks

    G Beintema, R Toth, and M Schoukens. Nonlinear state-space identification using deep encoder networks. InProceedings of the 3rd Conference on Learning for Dynamics and Control, volume 144 ofProceedings of Machine Learning Research, pages 241–250, 2021

  33. [34]

    I Beintema, M Schoukens, and R Tóth

    G. I Beintema, M Schoukens, and R Tóth. Deep subspace encoders for nonlinear system identification. Automatica, 156(C), 2023. ISSN 0005-1098

  34. [35]

    E Rasmussen, and R Murray-Smith

    A Girard, C. E Rasmussen, and R Murray-Smith. Multiple-step ahead prediction for non linear dynamic systems–a Gaussian process treatment with propagation of the uncertainty. InSixteenth Annual Conference on Neural Information Processing Systems, volume 15, pages 529–536, 2002

  35. [36]

    Probabilistic recurrent state-space models

    A Doerr, C Daniel, M Schiegg, N.-T Duy, S Schaal, M Toussaint, and T Sebastian. Probabilistic recurrent state-space models. InInternational Conference on Machine Learning, pages 1280–1289, 2018

  36. [37]

    Revisiting structured variational autoencoders

    Y Zhao and S Linderman. Revisiting structured variational autoencoders. InInternational Conference on Machine Learning, pages 42046–42057, 2023

  37. [38]

    M Mikhaeil, L

    M Brenner, F Hess, J. M Mikhaeil, L. F Bereska, Z Monfared, P.-C Kuo, and D Durstewitz. Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems. InProceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, 2022

  38. [39]

    E Turner and M Sahani

    R. E Turner and M Sahani. Two problems with variational expectation maximisation for time series models. InBayesian Time Series Models, pages 104–124. Cambridge University Press, 2011. ISBN 978-0-511-98467-9

  39. [40]

    Mind the Gap when Conditioning Amortised Inference in Sequential Latent-Variable Models

    J Bayer, M Soelch, A Mirchev, B Kayalibay, and P van der Smagt. Mind the Gap when Conditioning Amortised Inference in Sequential Latent-Variable Models. InInternational Conference on Learning Representations, 2021

  40. [41]

    A Fuller.Measurement Error Models

    W. A Fuller.Measurement Error Models. Wiley Series in Probability and Statistics. John Wiley & Sons, New York, 1987

  41. [42]

    P Buonaccorsi

    J Staudenmayer and J. P Buonaccorsi. Measurement Error in Linear Autoregressive Models.Journal of the American Statistical Association, 100(471):841–852, 2005. ISSN 01621459

  42. [43]

    M Mikhaeil, Z Monfared, and D Durstewitz

    J. M Mikhaeil, Z Monfared, and D Durstewitz. On the difficulty of learning chaotic dynamics with RNNs. InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, 2022

  43. [44]

    E Sa˘gtekin, C Bredenberg, and C Savin

    A. E Sa˘gtekin, C Bredenberg, and C Savin. Error Forcing in Recurrent Neural Networks. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  44. [45]

    Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics

    A Herz, D Durstewitz, and G Koppe. Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics.arXiv preprint arXiv:2604.25904, 2026

  45. [46]

    Identifying nonlinear dynamical systems via generative recurrent neural networks with applications to fMRI.PLOS Computational Biology, 15(8):1–35, 2019

    G Koppe, H Toutounji, P Kirsch, S Lis, and D Durstewitz. Identifying nonlinear dynamical systems via generative recurrent neural networks with applications to fMRI.PLOS Computational Biology, 15(8):1–35, 2019

  46. [47]

    J Hemmer, Z Monfared, and D Durstewitz

    M Brenner, C. J Hemmer, Z Monfared, and D Durstewitz. Almost-Linear RNNs Yield Highly Interpretable Symbolic Codes in Dynamical Systems Reconstruction. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

  47. [48]

    N Lorenz

    E. N Lorenz. Deterministic Nonperiodic Flow.Journal of the Atmospheric Sciences, 20(2):130–141, 1963

  48. [49]

    Double projection for reconstructing dynamical systems: Between stochastic and deterministic regimes.arXiv preprint arXiv:2510.01089, 2026

    V Sip, M Breyton, S Petkoski, and V Jirsa. Double projection for reconstructing dynamical systems: Between stochastic and deterministic regimes.arXiv preprint arXiv:2510.01089, 2026

  49. [50]

    L Wong, R

    X Li, T.-K. L Wong, R. T. Q Chen, and D Duvenaud. Scalable Gradients for Stochastic Differential Equations. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 ofProceedings of Machine Learning Research, pages 3870–3882, 2020

  50. [51]

    Data assimilation in the geosciences: An overview of methods, issues, and perspectives.WIREs Climate Change, 9(5):e535, 2018

    A Carrassi, M Bocquet, L Bertino, and G Evensen. Data assimilation in the geosciences: An overview of methods, issues, and perspectives.WIREs Climate Change, 9(5):e535, 2018. ISSN 1757-7780, 1757-7799. 12

  51. [52]

    Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: A case study with the Lorenz 96 model

    J Brajard, A Carrassi, M Bocquet, and L Bertino. Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: A case study with the Lorenz 96 model. Journal of Computational Science, 44:101171, 2020. ISSN 18777503

  52. [53]

    Online learning of both state and dynamics using ensemble Kalman filters.Foundations of Data Science, 3(3):305–330, 2021

    M Bocquet, A Farchi, and Q Malartic. Online learning of both state and dynamics using ensemble Kalman filters.Foundations of Data Science, 3(3):305–330, 2021. ISSN 2639-8001

  53. [54]

    A comparison of combined data assimilation and machine learning methods for offline and online model error correction.Journal of Computational Science, 55:101468, 2021

    A Farchi, M Bocquet, P Laloyaux, M Bonavita, and Q Malartic. A comparison of combined data assimilation and machine learning methods for offline and online model error correction.Journal of Computational Science, 55:101468, 2021. ISSN 18777503

  54. [55]

    S Finn, C Durand, S Cheng, Y Chen, I Pasmans, and A Carrassi

    M Bocquet, A Farchi, T. S Finn, C Durand, S Cheng, Y Chen, I Pasmans, and A Carrassi. Accurate deep learning-based filtering for chaotic dynamics by identifying instabilities without an ensemble.Chaos: An Interdisciplinary Journal of Nonlinear Science, 34(9):091104, 2024. ISSN 1054-1500, 1089-7682

  55. [56]

    Auto-differentiable data assimilation: Co-learning of states, dynamics, and filtering algorithms.arXiv preprint arXiv:2603.20891, 2026

    M Adrian, D Sanz-Alonso, and R Willett. Auto-differentiable data assimilation: Co-learning of states, dynamics, and filtering algorithms.arXiv preprint arXiv:2603.20891, 2026

  56. [57]

    Probabilistic Retrofitting of Learned Simulators

    C Diaconu, M Cranmer, R. E Turner, T Marwah, and P Mukhopadhyay. Probabilistic Retrofitting of Learned Simulators.arXiv preprint arXiv:2603.01949, 2026

  57. [58]

    E Sa˘gtekin, F Pei, M Gloeckler, and J Macke

    M Pals, A. E Sa˘gtekin, F Pei, M Gloeckler, and J Macke. Inferring stochastic low-rank recurrent neural networks from neural data. InAdvances in Neural Information Processing Systems 37 (NeurIPS 2024), volume 37, pages 18225–18264, 2024

  58. [59]

    A Wan and R Van Der Merwe

    E. A Wan and R Van Der Merwe. The unscented Kalman filter for nonlinear estimation. InProceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No. 00EX373), pages 153–158, 2000

  59. [60]

    The ensemble Kalman filter: Theoretical formulation and practical implementation.Ocean Dynamics, 53(4):343–367, 2003

    G Evensen. The ensemble Kalman filter: Theoretical formulation and practical implementation.Ocean Dynamics, 53(4):343–367, 2003

  60. [61]

    Rao–Blackwellised Particle Filtering for Dynamic Bayesian Networks

    A Doucet, N de Freitas, K Murphy, and S Russell. Rao–Blackwellised Particle Filtering for Dynamic Bayesian Networks. InProceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI), 2000

  61. [62]

    Y Zhao and I. M Park. Variational online learning of neural dynamics.Frontiers in computational neuroscience, 14:71, 2020

  62. [63]

    M Dowling, Y Zhao, and I. M Park. eXponential FAmily dynamical systems (XFADS): Large-scale nonlinear gaussian state-space modeling.Advances in Neural Information Processing Systems, 37:13458– 13488, 2024

  63. [64]

    Integrating Multimodal Data for Joint Generative Modeling of Complex Dynamics

    M Brenner, F Hess, G Koppe, and D Durstewitz. Integrating Multimodal Data for Joint Generative Modeling of Complex Dynamics. InProceedings of the 41st International Conference on Machine Learning (ICML), 2024

  64. [65]

    G Chang, K

    P. G Chang, K. P Murphy, and M Jones. On diagonal approximations to the extended Kalman filter for online training of Bayesian neural networks. InContinual Lifelong Learning Workshop at ACML 2022, 2022

  65. [66]

    G Chang, G Durán-Martín, A Shestopaloff, M Jones, and K

    P. G Chang, G Durán-Martín, A Shestopaloff, M Jones, and K. P Murphy. Low-rank extended Kalman filtering for online learning of neural networks from streaming data. In S Chandar, R Pascanu, H Sedghi, and D Precup, editors,Proceedings of The 2nd Conference on Lifelong Learning Agents, volume 232 of Proceedings of Machine Learning Research, pages 1025–1071, 2023

  66. [67]

    The Rank-Reduced Kalman Filter: Approximate Dynamical- Low-Rank Filtering In High Dimensions

    J Schmidt, P Hennig, J Nick, and F Tronarp. The Rank-Reduced Kalman Filter: Approximate Dynamical- Low-Rank Filtering In High Dimensions. InThirty-Seventh Conference on Neural Information Processing Systems, 2023

  67. [68]

    Chaotic dynamics and the role of covariance inflation for reduced rank Kalman filters with model error.Nonlinear Processes in Geophysics, 25(3):633–648, 2018

    C Grudzien, A Carrassi, and M Bocquet. Chaotic dynamics and the role of covariance inflation for reduced rank Kalman filters with model error.Nonlinear Processes in Geophysics, 25(3):633–648, 2018. ISSN 1607-7946

  68. [69]

    H Jazwinski.Stochastic Processes and Filtering Theory, volume 64 ofMathematics in Science and Engineering

    A. H Jazwinski.Stochastic Processes and Filtering Theory, volume 64 ofMathematics in Science and Engineering. Elsevier, 1970. ISBN 978-0-12-381550-7

  69. [70]

    Variational Sequential Monte Carlo

    C Naesseth, S Linderman, R Ranganath, and D Blei. Variational Sequential Monte Carlo. InProceed- ings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, 2018. 13

  70. [71]

    A Le, M Igl, T Rainforth, T Jin, and F Wood

    T. A Le, M Igl, T Rainforth, T Jin, and F Wood. Auto-Encoding Sequential Monte Carlo. InInternational Conference on Learning Representations, 2018

  71. [72]

    J Maddison, J Lawson, G Tucker, N Heess, M Norouzi, A Mnih, A Doucet, and Y Teh

    C. J Maddison, J Lawson, G Tucker, N Heess, M Norouzi, A Mnih, A Doucet, and Y Teh. Filtering Variational Objectives. InAdvances in Neural Information Processing Systems, volume 30, 2017

  72. [73]

    Stochastic stability of the discrete-time extended Kalman filter.IEEE Transactions on Automatic Control, 44(4):714–728, 1999

    K Reif, S Gunther, E Yaz, and R Unbehauen. Stochastic stability of the discrete-time extended Kalman filter.IEEE Transactions on Automatic Control, 44(4):714–728, 1999. ISSN 00189286

  73. [74]

    Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

    Y Bengio, N Léonard, and A Courville. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation.arXiv preprint arXiv:1308.3432, 2013

  74. [75]

    Gaussian Error Linear Units (GELUs)

    D Hendrycks and K Gimpel. Gaussian Error Linear Units (GELUs).arXiv preprint arXiv:1606.08415, 2023

  75. [76]

    Adam: A Method for Stochastic Optimization

    D. P Kingma and J Ba. Adam: A Method for Stochastic Optimization.arXiv preprint arXiv:1412.6980, 2014

  76. [77]

    On the Variance of the Adaptive Learning Rate and Beyond

    L Liu, H Jiang, P He, W Chen, X Liu, J Gao, and J Han. On the Variance of the Adaptive Learning Rate and Beyond. InInternational Conference on Learning Representations, 2020

  77. [78]

    R Hershey and P

    J. R Hershey and P. A Olsen. Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models. In2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP ’07, pages IV–317–IV–320, 2007

  78. [79]

    Lyapunov Characteristic Exponents for smooth dynamical systems and for hamiltonian systems; a method for computing all of them

    G Benettin, L Galgani, A Giorgilli, and J.-M Strelcyn. Lyapunov Characteristic Exponents for smooth dynamical systems and for hamiltonian systems; a method for computing all of them. Part 1: Theory. Meccanica, 15(1):9–20, 1980. ISSN 0025-6455, 1572-9648

  79. [80]

    Comparison of Different Methods for Computing Lyapunov Exponents.Progress of Theoretical Physics, 83(5):875–893, 1990

    K Geist, U Parlitz, and W Lauterborn. Comparison of Different Methods for Computing Lyapunov Exponents.Progress of Theoretical Physics, 83(5):875–893, 1990. ISSN 0033-068X

  80. [81]

    perfect surrogate model

    W Gilpin. Chaos as an interpretable benchmark for forecasting and data-driven modelling. InAdvances in Neural Information Processing Systems, volume 34, 2021. 14 A Theory Details and Proofs This section supplements the theory in the main text: the proof of volume pressure localization, a noisy-state attenuation diagnostic, and the anti-contractive mechani...

Showing first 80 references.