arxiv: 2605.06980 · v1 · submitted 2026-05-07 · 🧮 math.NA · cs.NA

Recognition: no theorem link

Accelerating the Simulation of Ordinary Differential Equations Through Physics-Preserving Neural Networks

Andrew Frandsen, Andrew Ning, Andrew Tagg

Pith reviewed 2026-05-11 00:49 UTC · model grok-4.3

classification 🧮 math.NA cs.NA

keywords ordinary differential equationsneural networkslatent spacenumerical integrationstiff systemsphysics-preservingchain rule derivation

0 comments

The pith

A pseudo-invertible neural network maps ODE states to a latent space whose dynamics evolve slowly, cutting the number of function evaluations by 3x to 20x while keeping accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a technique to accelerate numerical integration of ordinary differential equations whose solutions change rapidly. It trains a pseudo-invertible network to embed the state vector in a higher-dimensional latent space and derives the latent ODEs directly from the original equations using the chain rule rather than fitting them to data. Because the latent equations remain exact, an off-the-shelf integrator can advance the solution with substantially larger steps and therefore far fewer right-hand-side evaluations. Experiments on several example systems confirm that the same accuracy is reached with three to twenty times fewer function calls.

Core claim

For any given ODE, a pseudo-invertible neural network can be trained so that the mapped latent states obey a derived system of equations whose time derivatives are small. Simulation of these slow latent dynamics, followed by decoding back to the original coordinates, reproduces the true solution trajectory at the same accuracy but with three to twenty times fewer evaluations of the original right-hand-side function.

What carries the argument

A pseudo-invertible neural network whose encoder produces latent variables and whose decoder reconstructs the original state; the latent ODEs are obtained analytically from the original vector field by the chain rule and then trained to minimize the magnitude of their time derivatives.

If this is right

Stiff or high-acceleration ODEs become cheaper to integrate without changing the integrator itself.
Because the latent equations are derived exactly, the method applies to any ODE rather than only those for which trajectory data are abundant.
The same trained network can be reused for repeated simulations of the same system under varying initial conditions or parameters.
Accuracy is controlled by the usual step-size tolerances; no additional modeling error is introduced by the latent equations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be paired with adaptive step-size controllers that automatically exploit the slower latent time scales.
If the latent dimension is increased further, the slowdown factor might grow, suggesting a trade-off between network size and step-size gain that could be tuned per problem.
The method might extend to partial differential equations by treating spatial discretizations as large ODE systems, though stability of the derived latent PDEs would need separate verification.

Load-bearing premise

The network can be trained so that latent dynamics become slow enough to allow large steps while the derived equations stay numerically stable and the reconstruction remains accurate.

What would settle it

On the reported example ODEs, direct simulation and latent-space simulation reach the same error tolerance only when the latent version still requires roughly the same number of function calls as the original integrator.

Figures

Figures reproduced from arXiv: 2605.06980 by Andrew Frandsen, Andrew Ning, Andrew Tagg.

**Figure 2.** Figure 2: Linear system trajectories, plotted in the original state-space and the latent state space. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Function calls plotted against final mean squared error for the original, PINODE, and latent [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Depiction of the 2D vortex particle system. Another ODE we considered was a system of 2D vortex particles. A vortex particle is a differential region of rotating fluid. As shown in [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Vortex particle trajectory, plotted in the original state-space and the latent state space. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Function evaluations vs. mean squared error for the vortex particle simulations. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Depiction of the vortex lattice aerodynamic model. The previous examples were relatively simple, and featured computationally inexpensive dynamic functions. This example demonstrates how the method scales to more complex problems using an aircraft modeled by a vortex lattice method (VLM). The VLM predicts aerodynamic forces by discretizing the wing surface into panels with vortex filaments. Vortex filamen… view at source ↗

**Figure 8.** Figure 8: Original and Latent simulations for the VLM aircraft model. Solutions are obtained using [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Function evaluations vs. mean squared error for the vortex lattice aircraft simulations using [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

**Figure 10.** Figure 10: Euler solver function calls vs. final mean squared error for networks trained on various [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 11.** Figure 11: Effect of latent dimension on Euler solver performance for vortex particle dynamics [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗

read the original abstract

Numerical simulation of ordinary differential equations (ODEs) can be challenging when the system exhibits high accelerations and rapidly changing dynamics. Under these conditions the ODE solver often needs to take very small time steps in order to resolve the solution accurately, resulting in increased computational cost. In order to accelerate the simulation of these ODEs we present a novel methodology that uses a pseudo-invertible neural network to map system states into a high-dimensional latent-space. The network is then trained so that the dynamics in this learned latent space are slow, and can be simulated with relatively few function calls. Unlike existing neural methods, the latent dynamic equations are not learned from trajectory data, but derived from the original system equations and the chain rule. This allows the method to generalize better than existing approaches because the derived equations are correct by construction. In this work, we derive latent state equations of motion for any general ODE, and describe the loss function used to enforce slow time evolution of the latent states. We then apply this technique to multiple example ODEs and show that these problems can be solved with $3$x to $20$x fewer function calls for the same accuracy when simulating in the learned latent space. This reduction in cost could decrease computational demands for scientific simulations across engineering and physics applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They derive the latent ODEs exactly from the original system via chain rule and train a pseudo-invertible net to slow the dynamics, but the speedups hinge on whether approximation errors keep the latent system stable enough for much larger steps.

read the letter

The main point is that this work derives the equations governing the latent dynamics straight from the original ODE using the chain rule, then trains a pseudo-invertible neural network so those latent states evolve slowly enough for an integrator to take bigger steps. That derivation step is what stands out compared with typical neural ODE approaches that fit dynamics from trajectory data.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes accelerating ODE simulations by mapping states to a high-dimensional latent space via a pseudo-invertible neural network. Latent dynamics are derived exactly from the original ODE using the chain rule (rather than learned from data), and the network is trained with a composite loss that penalizes rapid latent evolution while enforcing accurate state reconstruction. Experiments on multiple example ODEs report that the latent-space integration requires 3x to 20x fewer function calls to achieve equivalent accuracy.

Significance. If the empirical claims are substantiated, the approach could provide a general, physics-preserving route to faster integration of stiff or highly dynamic ODEs without requiring data-driven discovery of the latent vector field. The chain-rule derivation supplies an independent mathematical foundation that distinguishes the method from purely learned neural ODEs and is a clear strength.

major comments (3)

[Abstract and §5] Abstract and §5: the central claim of 3x–20x fewer function calls at equivalent accuracy is unsupported by any description of the error metric, the precise counting of function evaluations (including decoder cost), the integrator tolerances, or the network architecture and training protocol; without these the performance result cannot be verified or reproduced.
[§3] §3 (latent-equation derivation): the chain-rule formula for the latent velocity is exact only for a perfect invertible mapping, yet the network is only pseudo-invertible; no a-priori bounds on reconstruction error, no spectral analysis of the latent Jacobian, and no stability argument for the chosen time-stepping schemes are supplied, leaving open whether the reported step-size increase is sustainable.
[§4] §4 (training loss): the composite loss that enforces slow latent dynamics is stated at a high level, but the weighting coefficients, the precise form of the speed penalty, and any regularization that preserves the original Lipschitz properties are not given; this omission directly affects whether the derived latent ODE remains numerically stable under the reported larger steps.

minor comments (2)

[§2] Notation for the encoder/decoder pair and the pseudo-inverse operation should be introduced once and used consistently to avoid ambiguity when the chain-rule expression is written.
[Figures 2–4] Trajectory comparison figures would benefit from explicit indication of the time-step size used in each integrator and from reporting both maximum and average error over the interval.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. We address each major comment point by point below, providing the strongest honest defense possible. Where details were omitted, we have revised the manuscript to supply them; where the comments identify genuine gaps in analysis, we have added the requested discussion without overstating what was originally present.

read point-by-point responses

Referee: [Abstract and §5] Abstract and §5: the central claim of 3x–20x fewer function calls at equivalent accuracy is unsupported by any description of the error metric, the precise counting of function evaluations (including decoder cost), the integrator tolerances, or the network architecture and training protocol; without these the performance result cannot be verified or reproduced.

Authors: We agree that the abstract and §5 do not contain enough experimental detail for independent verification. In the revised manuscript we have expanded §5 with a new subsection that (i) defines the error metric as the maximum relative L2 reconstruction error over the full trajectory, (ii) clarifies that only evaluations of the original ODE right-hand side are counted (decoder cost is incurred only at output times and is negligible), (iii) states the integrator tolerances (RTOL = ATOL = 1e-8 for both baseline and latent runs), and (iv) gives the network architecture (3-layer MLP, 128 units, tanh activations) together with the full training protocol (Adam, lr=1e-3, 5000 epochs, batch size 64, 1000 random initial conditions per ODE). A table now summarizes these parameters for every example. With these additions the reported 3x–20x reduction in function calls at equivalent accuracy becomes reproducible. revision: yes
Referee: [§3] §3 (latent-equation derivation): the chain-rule formula for the latent velocity is exact only for a perfect invertible mapping, yet the network is only pseudo-invertible; no a-priori bounds on reconstruction error, no spectral analysis of the latent Jacobian, and no stability argument for the chosen time-stepping schemes are supplied, leaving open whether the reported step-size increase is sustainable.

Authors: The derivation in §3 applies the chain rule using the Moore-Penrose pseudoinverse of the encoder Jacobian, which is exact only when the mapping is perfectly invertible. The manuscript does not supply a priori error bounds or Jacobian spectral analysis. In the revision we have added (i) a bound on reconstruction error derived from the final value of the reconstruction term in the training loss (empirically < 5×10^{-4}), (ii) a short spectral analysis showing that the smallest singular value of the trained latent Jacobian remains bounded away from zero, and (iii) a stability argument that the speed penalty reduces the effective Lipschitz constant of the latent vector field, thereby justifying the larger steps. These additions directly address the sustainability question while remaining faithful to the original derivation. revision: partial
Referee: [§4] §4 (training loss): the composite loss that enforces slow latent dynamics is stated at a high level, but the weighting coefficients, the precise form of the speed penalty, and any regularization that preserves the original Lipschitz properties are not given; this omission directly affects whether the derived latent ODE remains numerically stable under the reported larger steps.

Authors: We accept that §4 presents the loss only at a high level. The revised manuscript now states the exact composite loss L = L_recon + λ L_speed + μ L_reg, where L_recon is the mean-squared state reconstruction error, L_speed = (1/T) ∫ ||dz/dt||^2 dt is the speed penalty, and L_reg is a spectral-normalization regularizer on the encoder weights. The coefficients are λ = 0.01 and μ = 0.001 (selected by cross-validation). This explicit form, together with the added Lipschitz control, directly supports the claim that the latent dynamics remain stable under the larger steps used in the experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: latent ODEs derived independently via chain rule

full rationale

The paper derives the latent-state equations explicitly from the original ODE right-hand side and the chain rule applied to the encoder mapping, which is a direct mathematical identity independent of the neural-network weights or training loss. The composite loss penalizes reconstruction error and latent speed but does not redefine the target accuracy or function-call count; those remain external empirical outcomes. No self-citations, uniqueness theorems, or fitted parameters are invoked as load-bearing premises for the central claim. The derivation chain is therefore self-contained against external benchmarks and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the mathematical validity of the chain-rule derivation and the existence of a trainable pseudo-invertible network that can enforce slow latent evolution without introducing instability or reconstruction error.

axioms (1)

standard math Chain rule of differentiation applies to the composition of the neural network mapping and the original ODE right-hand side
Invoked to obtain the latent-state equations of motion

invented entities (1)

Pseudo-invertible neural network no independent evidence
purpose: Map original states to latent space with slow dynamics while allowing reconstruction
Core architectural choice introduced to enforce the slow-evolution property

pith-pipeline@v0.9.0 · 5527 in / 1408 out tokens · 68885 ms · 2026-05-11T00:49:40.474906+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Semi- implicit multistep extrapolation ode solvers.Mathematics, 8(6), 2020

Denis Butusov, Aleksandra Tutueva, Petr Fedoseev, Artem Terentev, and Artur Karimov. Semi- implicit multistep extrapolation ode solvers.Mathematics, 8(6), 2020

2020
[2]

https://arxiv.org/abs/1806.07366

Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations.CoRR, abs/1806.07366, 2018

work page arXiv 2018
[3]

Cooper and Ali Sayfy

G.J. Cooper and Ali Sayfy. Additive runge-kutta methods for stiff ordinary differential equations. Mathematics of Computation, 40, 01 1983

1983
[4]

Calorimeter shower superresolution with conditional normalizing flows: Imple- mentation and statistical evaluation

Andrea Cosso. Calorimeter shower superresolution with conditional normalizing flows: Imple- mentation and statistical evaluation. 03 2026

2026
[5]

Density estimation using Real NVP

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016

work page internal anchor Pith review arXiv 2016
[6]

Benjamin Erichson, Michael Muehlebach, and Michael W

N. Benjamin Erichson, Michael Muehlebach, and Michael W. Mahoney. Physics-informed autoencoders for lyapunov-stable fluid flow prediction. 2019

2019
[7]

Algorithms and numerical methods for high dimensional financial market models.Revista de Economia Financiera, 20:51–68, 2010

Antonio Falcó. Algorithms and numerical methods for high dimensional financial market models.Revista de Economia Financiera, 20:51–68, 2010

2010
[8]

Autoencoder- based dimensionality reduction of turbulent channel flow under spanwise wall oscillations

Lou Guérin, Thomas Fisk, Laurent Cordier, Cédric Flageul, and Lionel Agostini. Autoencoder- based dimensionality reduction of turbulent channel flow under spanwise wall oscillations. Neural Computing and Applications, 38, 03 2026

2026
[9]

Long short-term memory networks in learning memory inconsistencies of stock markets.Financial Innovation, 11, 12 2025

Jaemoo Hong and Yoon Hwang. Long short-term memory networks in learning memory inconsistencies of stock markets.Financial Innovation, 11, 12 2025

2025
[10]

Generalized runge-kutta methods of order four with stepsize control for stiff ordinary differential equations.Numerische Mathematik, 33:55–68, 03 1979

Peter Kaps and Peter Rentrop. Generalized runge-kutta methods of order four with stepsize control for stiff ordinary differential equations.Numerische Mathematik, 33:55–68, 03 1979

1979
[11]

H. Lamba. Dynamical systems and adaptive timestepping ode solvers.BIT Numerical Mathe- matics, 40, 07 1998

1998
[12]

A machine learning framework for data driven acceleration of computations of differential equations

Siddhartha Mishra. A machine learning framework for data driven acceleration of computations of differential equations. 2018

2018
[13]

A comparison of stiff ode solvers for astrochemical kinetics problems.Astrophysics and Space Science, 299:1–29, 09 2005

Lida Nejad. A comparison of stiff ode solvers for astrochemical kinetics problems.Astrophysics and Space Science, 299:1–29, 09 2005

2005
[14]

Federico Perini, Anand Krishnasamy, Youngchul Ra, and Rolf D Reitz. Computationally efficient simulation of multicomponent fuel combustion using a sparse analytical jacobian chemistry solver and high-dimensional clustering.Journal of Engineering for Gas Turbines and Power, 136(9):091515, 2014

2014
[15]

Autoencoders

Shahzad Ahmad Qureshi, Afshan Rehman, and Bahadur Akmal. Autoencoders. 12 2025

2025
[16]

L. F. Shampine. Implementation of implicit formulas for the solution of odes.SIAM Journal on Scientific and Statistical Computing, 1(1):103–118, 1980

1980
[17]

Physics-informed neural ode (pinode): embedding physics into models using collocation points.Scientific Reports, 13, 06 2023

Aleksei Sholokhov, Yuying Liu, Hassan Mansour, and Saleh Nabi. Physics-informed neural ode (pinode): embedding physics into models using collocation points.Scientific Reports, 13, 06 2023

2023
[18]

Efficient explicit taylor ode integrators with symbolic-numeric computing

Songchen Tan, Oscar Smith, and Chris Rackauckas. Efficient explicit taylor ode integrators with symbolic-numeric computing. 02 2026

2026
[19]

Jinlong Wu, Jianxun Wang, Heng Xiao, and Julia Ling.Visualization of High Dimensional Turbulence Simulation Data using t-SNE. 2017

2017
[20]

Deep neural networks with koopman operators for modeling and control of autonomous vehicles.IEEE Transactions on Intelligent Vehicles, 8(1):135–146, 2023

Yongqian Xiao, Xinglong Zhang, Xin Xu, Xueqing Liu, and Jiahang Liu. Deep neural networks with koopman operators for modeling and control of autonomous vehicles.IEEE Transactions on Intelligent Vehicles, 8(1):135–146, 2023. 10

2023
[21]

Application of gated recurrent units for ct trajectory optimization

Yuedong Yuan, Linda-Sophie Schneider, and Andreas Maier. Application of gated recurrent units for ct trajectory optimization. 05 2024

2024
[22]

Zhuang, Y

D. Zhuang, Y . Wu, and Q. Wang. A recurrent neural network for particle trajectory prediction in complex flow fields.Physics of Fluids, 37, 10 2025. A Network Architecture and Training Details This appendix provides complete specifications for the pseudo-invertible networks used in each experiment. All networks use the same basic architecture consisting o...

2025