arxiv: 2605.07157 · v1 · submitted 2026-05-08 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Learned Lagrangian Models of PDEs via Euler-Lagrange Residual Minimization

Lyra Zhornyak , Eric Forgoston , M. Ani Hsieh

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:13 UTC · model grok-4.3

classification 💻 cs.LG

keywords Lagrangian neural networksPDE forecastingEuler-Lagrange residualsymplectic integrationphysics-informed machine learningoptimization-based integratorswave equationslearned dynamics

0 comments

The pith

A learned continuous Lagrangian can forecast PDE dynamics stably over long times by minimizing the Euler-Lagrange residual on local patches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the first approach that learns a continuous Lagrangian directly from data and then uses it to forecast the evolution of systems governed by partial differential equations. It replaces standard time-stepping with an optimization procedure that minimizes the squared Euler-Lagrange residual on small space-time patches, preserving the conservative structure of the underlying dynamics. Because the integrator operates locally and scales linearly with domain size through Jacobi iteration, it avoids the global coupling that slows conventional discretizations and allows the same learned model to handle arbitrary boundary conditions and spatially varying coefficients. If the method works as claimed, physics-informed neural networks could produce reliable long-range predictions for wave and similar conservative systems without the phase or conservation drift that usually appears in learned simulators.

Core claim

The central claim is that a neural network representing a continuous Lagrangian can be trained and then integrated by repeatedly solving a local optimization problem that drives the Euler-Lagrange residual to zero on mesh-free space-time patches; this near-symplectic construction decouples model error from integration error, yields forecasts whose accuracy matches classical symplectic integrators on the double pendulum and on one- and two-dimensional wave equations, and extends without retraining to spatially varying dynamics or arbitrary boundaries.

What carries the argument

The optimization-based integrator that minimizes the squared Euler-Lagrange residual via a mesh-free near-symplectic construction on local space-time patches, solved by Jacobi iteration.

If this is right

The learned model achieves error levels comparable to classical symplectic integrators on the tested wave equations and double pendulum.
The same trained network works for spatially varying dynamics and arbitrary boundary conditions without retraining.
The integrator scales linearly with domain size because it relies on local patches and Jacobi iteration.
No special architectural constraints are placed on the neural network, so the method can be combined with existing physics-guided learning techniques.
Model error and integration error remain separable because the integrator is formulated as an optimization problem rather than a fixed time-stepping scheme.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The local-patch optimization could be applied to other conservative systems whose Lagrangians are harder to discretize globally, such as certain fluid or plasma models.
Because the method does not enforce a particular network structure, it may allow simpler or more expressive architectures to be used for Lagrangian learning than current physics-informed approaches require.
If the residual minimization is stable under mesh refinement, the same framework might serve as a general-purpose integrator for any learned conservative model, not only neural ones.
The linear scaling suggests that the approach could be useful for very large spatial domains where global implicit solvers become prohibitive.

Load-bearing premise

Minimizing the squared Euler-Lagrange residual via Jacobi iteration on local patches will reliably separate model error from integration error and produce stable long-range forecasts without introducing new instabilities or requiring global coupling.

What would settle it

Train the network on snapshots from a 2D wave equation whose coefficients vary in space, then run the integrator forward for many hundreds of time units and check whether the integrated solution conserves total energy to within a small tolerance while matching a high-fidelity reference solver.

Figures

Figures reproduced from arXiv: 2605.07157 by Eric Forgoston, Lyra Zhornyak, M. Ani Hsieh.

**Figure 2.** Figure 2: Relative error in energy |Et − E0|/E0 over 100,000 s for a double pendulum with initial angles q1 = 3π 7 and q2 = 3π 4 . The double pendulum has an analytical Lagrangian L = ˙q 2 1 + 1 2 q˙ 2 2 + q˙1q˙2 cos(q1 − q2) + 2 cos q1 + cos q2. Results are shown for ELM and for four classical integrators using an LNN-style estimate for the acceleration. All methods share the same learned Lagrangian. Faint lines ar… view at source ↗

**Figure 3.** Figure 3: (a) Wave amplitude heatmaps, (b) relative error in energy, and (c) spatial profile for the 1D [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Center amplitude, (b) relative energy error, and (c) spatial snapshots in time for the 2D [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Demonstrations of (a) partial reflection and (b) diffraction and interference. Both panels use [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Relative L 2 error vs. exact solution over the full 10,000 s (224 periods) for the methods of [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Spatio-temporal heatmaps q(x, t) over 10,000 s (224 periods) extending Figure 3a. The figure shows the first ∼20 periods and the last ∼10. Parameters for each method are listed in Appendix B [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Relative L 2 error vs. eigenmode ground truth for the 2D wave over 1,000 s for the same system and methods as [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

read the original abstract

We present the first method to directly use a learned continuous Lagrangian to forecast the dynamics of systems governed by partial differential equations, exploiting the inherent conservative structure to achieve stable long-range predictions. We develop an optimization-based integrator that minimizes the squared Euler--Lagrange residual via a mesh-free near-symplectic construction on local space-time patches. Different from integrators for analytical models, integrators for learned models should decouple model error (phase error) from integration error (conservation error). By relying on optimization rather than time-stepping, we bypass the global coupling inherent to fixed discretizations, which slows time- and space-stepping and complicates learning. Our method scales linearly with domain size via Jacobi iteration, and places no structural requirements on the learned network, allowing it to be coupled with existing physics-guided machine learning (ML) methods. We validate our approach on a learned representation of a double pendulum, a one-dimensional wave equation, and a two-dimensional wave equation. Our method achieves error comparable to classical symplectic methods while generalizing to spatially varying dynamics and arbitrary boundary conditions without retraining.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable mesh-free integrator for learned Lagrangians on PDEs by minimizing EL residuals on local patches with Jacobi iteration, but the experiments stay too thin to confirm reliable error decoupling.

read the letter

The core move is to treat the learned Lagrangian as given and then solve for the trajectory by minimizing the squared Euler-Lagrange residual directly on small space-time patches instead of stepping forward in time. They use Jacobi iteration to keep the solve local and linear in domain size, which sidesteps the global coupling that usually slows learned physics models. That construction is new for continuous learned Lagrangians on PDEs and does let the method handle arbitrary boundaries and spatially varying coefficients without retraining the network.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce the first method for directly using a learned continuous Lagrangian to forecast dynamics of PDE-governed systems. It develops a mesh-free near-symplectic integrator that minimizes the squared Euler-Lagrange residual via Jacobi iteration on local space-time patches, aiming to decouple model (phase) error from integration (conservation) error. This bypasses global coupling of fixed discretizations, scales linearly with domain size, imposes no structural requirements on the neural network, and is validated on a double pendulum, 1D wave equation, and 2D wave equation, achieving errors comparable to classical symplectic methods while generalizing to spatially varying dynamics and arbitrary boundary conditions.

Significance. If the central claims hold, the work would be significant for physics-informed machine learning by enabling stable long-term forecasting in conservative PDE systems through direct use of learned Lagrangians. It credits the mesh-free construction, linear scaling, and compatibility with existing methods as practical advantages over time-stepping approaches. The approach addresses a genuine gap in applying variational principles to learned models for fields.

major comments (2)

[Abstract] Abstract: The claim that the integrator 'decouples model error (phase error) from integration error (conservation error)' by relying on optimization rather than time-stepping is load-bearing for the central contribution, yet no convergence analysis, error bounds, or proof of global consistency is provided for the local Jacobi iteration when the Lagrangian is only approximately learned. For PDEs the EL operator includes spatial derivatives, so local patches must consistently approximate derivatives and interface conditions; without this, residual mismatches could recouple errors and introduce instabilities.
[Validation] Validation (on double pendulum, 1D/2D wave equations): The abstract states that the method 'achieves error comparable to classical symplectic methods' and 'generalizing to spatially varying dynamics,' but reports no quantitative error metrics (e.g., L2 norms, long-term drift), ablation studies on patch size/Jacobi iterations, or optimization convergence details. This leaves the support for stable long-range predictions and the decoupling assumption only moderately substantiated.

minor comments (2)

[Abstract] The abstract could specify the neural network architectures, training losses, and hyperparameter choices used for the learned Lagrangians to aid reproducibility.
[Methods] Notation for the Euler-Lagrange residual and the precise form of the Jacobi update on patches would benefit from an explicit equation or pseudocode in the methods section for clarity.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their careful reading and constructive feedback on our manuscript. The comments highlight important aspects of the theoretical foundations and empirical validation that we address point by point below. We have revised the manuscript to strengthen the presentation where possible while maintaining an honest account of the work's scope.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the integrator 'decouples model error (phase error) from integration error (conservation error)' by relying on optimization rather than time-stepping is load-bearing for the central contribution, yet no convergence analysis, error bounds, or proof of global consistency is provided for the local Jacobi iteration when the Lagrangian is only approximately learned. For PDEs the EL operator includes spatial derivatives, so local patches must consistently approximate derivatives and interface conditions; without this, residual mismatches could recouple errors and introduce instabilities.

Authors: We agree that the decoupling claim is central and that the manuscript lacks a formal convergence analysis or error bounds for the approximate-Lagrangian case. Deriving such guarantees is difficult without strong assumptions on network approximation quality that may not hold in practice. The local patch-wise minimization is intended to avoid the sequential error accumulation of time-stepping integrators, and the mesh-free construction permits the learned model to adapt to spatial derivatives and boundaries without explicit interface enforcement. Experiments on the wave equations show no observed instabilities over long horizons. In the revised manuscript we have added a dedicated limitations paragraph in the discussion section that qualifies the decoupling statement, notes the absence of theoretical bounds, and summarizes the empirical evidence from the reported trajectories. revision: partial
Referee: [Validation] Validation (on double pendulum, 1D/2D wave equations): The abstract states that the method 'achieves error comparable to classical symplectic methods' and 'generalizing to spatially varying dynamics,' but reports no quantitative error metrics (e.g., L2 norms, long-term drift), ablation studies on patch size/Jacobi iterations, or optimization convergence details. This leaves the support for stable long-range predictions and the decoupling assumption only moderately substantiated.

Authors: We accept that the original validation relied primarily on qualitative trajectory plots and that explicit quantitative metrics, ablations, and convergence diagnostics would strengthen the claims. The revised manuscript now includes a table of L2 error norms and long-term drift statistics for all three test systems, with direct numerical comparison to classical symplectic integrators. Additional ablation experiments varying patch size and Jacobi iteration count have been added to the supplementary material, confirming robustness within the ranges used in the main results. Residual norms at convergence are also reported for each experiment to document optimization behavior. revision: yes

standing simulated objections not resolved

A rigorous convergence analysis or error bounds for the local Jacobi iteration under approximate learned Lagrangians cannot be supplied in the current revision.

Circularity Check

0 steps flagged

Optimization-based EL residual minimization is constructed directly from variational principles; no reduction of performance claims to self-defined fits or self-citations

full rationale

The derivation chain begins from the standard Euler-Lagrange equations and applies an optimization procedure (Jacobi iteration on local patches) to minimize the residual. This construction is independent of the learned Lagrangian parameters and does not rename or refit any input quantity as an output prediction. The claimed decoupling of phase error from conservation error follows from the choice of optimization over time-stepping, which is a standard design choice rather than a tautology. No load-bearing step invokes a self-citation whose uniqueness theorem or ansatz is required to justify the central result; the method is self-contained against external benchmarks such as classical symplectic integrators. Minor self-citations on related Lagrangian learning may be present but do not carry the performance claims.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach rests on standard variational mechanics and numerical optimization; no new physical entities are introduced. The learned Lagrangian is parameterized by a neural network whose architecture details are unspecified in the abstract.

free parameters (1)

Neural network hyperparameters
Architecture, width, and training choices for the learned Lagrangian are not detailed and must be selected to make the residual minimization effective.

axioms (2)

standard math The Euler-Lagrange equations correctly describe the dynamics of the underlying conservative system
Invoked as the residual that is minimized to obtain the forecast.
domain assumption Local patch optimization via Jacobi iteration converges to a near-symplectic update
Required for the claimed linear scaling and decoupling of errors.

pith-pipeline@v0.9.0 · 5494 in / 1343 out tokens · 33716 ms · 2026-05-11T01:13:08.368191+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

minimizes the squared Euler–Lagrange residual via a mesh-free near-symplectic construction on local space-time patches... When the residual is driven to zero, the integrator becomes symplectic
IndisputableMonolith/Foundation/Atomicity.lean atomic_tick unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ELM is an optimization-based integrator... scales linearly with domain size via Jacobi iteration

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

Acta Numerica , volume=

Discrete mechanics and variational integrators , author=. Acta Numerica , volume=. 2001 , publisher=

work page 2001
[2]

ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations , year=

Lagrangian Neural Networks , author=. ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations , year=

work page 2020
[3]

Lutter, Michael and Ritter, Christian and Peters, Jan , booktitle=. Deep

work page
[4]

Simplifying

Finzi, Marc and Wang, Ke Alexander and Wilson, Andrew Gordon , booktitle=. Simplifying

work page
[5]

Advances in Neural Information Processing Systems (NeurIPS) , pages=

Hamiltonian Neural Networks , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=

work page
[6]

SIAM Journal on Numerical Analysis , volume=

Backward Error Analysis for Numerical Integrators , author=. SIAM Journal on Numerical Analysis , volume=. 1999 , doi=

work page 1999
[7]

International Conference on Learning Representations (ICLR) , year=

Symplectic Recurrent Neural Networks , author=. International Conference on Learning Representations (ICLR) , year=

work page
[8]

Journal of Computational Physics , volume=

Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations , author=. Journal of Computational Physics , volume=. 2019 , doi=

work page 2019
[9]

Learning nonlinear operators via

Lu, Lu and Jin, Pengzhan and Pang, Guofei and Zhang, Zhongqiang and Karniadakis, George Em , journal=. Learning nonlinear operators via. 2021 , doi=

work page 2021
[10]

and Patrick, George W

Marsden, Jerrold E. and Patrick, George W. and Shkoller, Steve , journal=. Multisymplectic Geometry, Variational Integrators, and Nonlinear. 1998 , doi=

work page 1998
[11]

Advances in Neural Information Processing Systems (NeurIPS) , pages=

Neural Ordinary Differential Equations , author=. Advances in Neural Information Processing Systems (NeurIPS) , pages=

work page
[12]

International Conference on Machine Learning (ICML) , year=

Learning to Simulate Complex Physics with Graph Networks , author=. International Conference on Machine Learning (ICML) , year=

work page
[13]

International Conference on Learning Representations (ICLR) , year=

Fourier Neural Operator for Parametric Partial Differential Equations , author=. International Conference on Learning Representations (ICLR) , year=

work page
[14]

and Welling, Max , booktitle=

Brandstetter, Johannes and Worrall, Daniel E. and Welling, Max , booktitle=. Message Passing Neural

work page
[15]

International Conference on Artificial Intelligence and Statistics (AISTATS) , year=

Variational Integrator Networks for Physically Structured Embeddings , author=. International Conference on Artificial Intelligence and Statistics (AISTATS) , year=

work page
[16]

Learning of discrete models of variational

Offen, Christian and Ober-Bl. Learning of discrete models of variational. Chaos , volume=. 2024 , doi=

work page 2024
[17]

Learning Relativistic Geodesics and Chaotic Dynamics via Stabilized

Hamzaogullari, Abdullah Umut and Ozakin, Arkadas , journal=. Learning Relativistic Geodesics and Chaotic Dynamics via Stabilized. 2026 , doi=

work page 2026
[18]

Advances in Neural Information Processing Systems (NeurIPS) , volume=

Characterizing possible failure modes in physics-informed neural networks , author=. Advances in Neural Information Processing Systems (NeurIPS) , volume=

work page
[19]

International Conference on Learning Representations (ICLR) , year=

Learning Mesh-Based Simulation with Graph Networks , author=. International Conference on Learning Representations (ICLR) , year=

work page
[20]

Physics-based deep learning

Physics-based Deep Learning , author=. arXiv preprint arXiv:2109.05237 , year=

work page arXiv
[21]

Lippe, Phillip and Veeling, Bastiaan S and Perdikaris, Paris and Turner, Richard E and Brandstetter, Johannes , booktitle=

work page
[22]

2006 , publisher=

Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations , author=. 2006 , publisher=

work page 2006
[23]

Frontiers of Mathematics in China , volume=

General Techniques for Constructing Variational Integrators , author=. Frontiers of Mathematics in China , volume=. 2012 , doi=

work page 2012
[24]

Numerische Mathematik , volume=

Spectral Variational Integrators , author=. Numerische Mathematik , volume=. 2015 , doi=

work page 2015
[25]

Learning the solution operator of parametric partial differential equations with physics-informed

Wang, Sifan and Wang, Hanwen and Perdikaris, Paris , journal=. Learning the solution operator of parametric partial differential equations with physics-informed. 2021 , doi=

work page 2021
[26]

ACM / IMS Journal of Data Science , volume=

Physics-Informed Neural Operator for Learning Partial Differential Equations , author=. ACM / IMS Journal of Data Science , volume=. 2024 , doi=

work page 2024
[27]

Frequency-Separable

Li, Yaojun and Yang, Yulong and Allen-Blanchette, Christine , journal=. Frequency-Separable

work page
[28]

James Bradbury and Roy Frostig and Peter Hawkins and Matthew James Johnson and Yash Katariya and Chris Leary and Dougal Maclaurin and George Necula and Adam Paszke and Jake Vander

work page
[29]

2021 , journal=

Patrick Kidger and Cristian Garcia , title=. 2021 , journal=

work page 2021
[30]

IEEE transactions on Electromagnetic Compatibility , number=

Absorbing boundary conditions for the finite-difference approximation of the time-domain electromagnetic-field equations , author=. IEEE transactions on Electromagnetic Compatibility , number=. 1981 , publisher=

work page 1981
[31]

Proceedings of the National Academy of Sciences , volume=

Learning dynamical systems from data: An introduction to physics-guided deep learning , author=. Proceedings of the National Academy of Sciences , volume=. 2024 , publisher=

work page 2024