pith. machine review for the scientific record. sign in

arxiv: 2604.05230 · v1 · submitted 2026-04-06 · 💻 cs.LG · cs.AI· cs.NA· math.NA· math.OC

Recognition: 2 theorem links

· Lean Theorem

Curvature-Aware Optimization for High-Accuracy Physics-Informed Neural Networks

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:52 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NAmath.NAmath.OC
keywords physics-informed neural networksnatural gradientquasi-Newton optimizationBFGSdifferential equationsPINNsscientific machine learning
0
0 comments X

The pith

Curvature-aware optimizers like natural gradient and self-scaling BFGS accelerate PINN convergence to high accuracy on complex differential equations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces efficient implementations of natural gradient, self-scaling BFGS, and Broyden optimizers for physics-informed neural networks. These methods are shown to speed up training on challenging problems such as the Helmholtz equation, Stokes flow, inviscid Burgers equation, Euler equations, and stiff ODEs. The work also develops new PINN approaches for the Burgers and Euler equations and compares them to traditional numerical solvers. It further addresses scaling these optimizers for large batched datasets.

Core claim

By using curvature-aware optimization techniques including the natural gradient and quasi-Newton methods, PINNs can achieve faster convergence and higher accuracy when solving partial and ordinary differential equations that are difficult for standard optimizers.

What carries the argument

Natural Gradient optimizer and Self-Scaling BFGS/Broyden quasi-Newton methods, which approximate the curvature of the loss landscape to provide better update directions for training PINNs.

If this is right

  • High-accuracy solutions for inviscid flows and stiff systems become feasible with PINNs.
  • Quasi-Newton methods can be scaled for batched training in large-scale scientific ML.
  • New PINN formulations for Burgers and Euler equations match high-order numerical accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These optimizers might reduce the need for architecture-specific tuning in PINNs across different physical domains.
  • Integration with other ML techniques could further improve performance on high-dimensional problems.
  • Similar curvature-aware methods could apply to other scientific computing neural network tasks beyond PINNs.

Load-bearing premise

That the curvature information from these optimizers can be computed and applied efficiently at scale for batched training without prohibitive memory or time costs.

What would settle it

A demonstration that on one of the tested equations, such as the inviscid Burgers, the new optimizers fail to reach the accuracy of high-order numerical methods within reasonable training time or resources.

Figures

Figures reproduced from arXiv: 2604.05230 by Anas Jnini, Elham Kiyani, George Em Karniadakis, Johannes Muller, Jorge F. Urban, Khemraj Shukla, Marius Zeinhofer, Nazanin Ahmadi Daryakenari.

Figure 1
Figure 1. Figure 1: Schematic illustration of PINN training and optimization methods considered in this work. Our benchmarks include PDEs of elliptic (Helmholtz and Stokes), parabolic (2D viscous Burgers), hyperbolic (inviscid Burgers and 1D Euler) type, and a stiff PK–PD ODE system, characterized by oscillatory solutions, nonlinear diffusion, shock-induced discontinuities, and stiffness. The methods studied here are BFGS, SS… view at source ↗
Figure 2
Figure 2. Figure 2: Loss landscapes for two representative PDEs are shown along random directions and projections onto pairs of singular vectors. (a) Top row: 1D Euler equation 59 (Sod problem), projected along orthogonal random directions, leading singular vectors, and next dominant pair (left to right). Bottom row: Stokes equation 42, following the same projection scheme. Axes correspond to (𝛼, 𝛽). Jnini et al.: Preprint su… view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of the NG (top), SSBroyden (middle), and SOAP (bottom) training dynamics for the 2D Helmholtz problem with 𝑎1 = 1, 𝑎2 = 4, and 𝑘 = 1; shown are side-by-side snapshot comparisons of the updates of the optimizers (left) and the current error of the prediction (right). Jnini et al.: Preprint submitted to Elsevier Page 13 of 53 [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: 2D Helmholtz problem with 𝑎1 = 1, 𝑎2 = 4, and 𝑘 = 1. SSBroyden prediction, reference solution given by Equation (39), and the pointwise absolute error |𝑢pred − 𝑢ref | [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: 2D Helmholtz problem with 𝑎1 = 1, 𝑎2 = 4, and 𝑘 = 1. Training loss and relative 𝐿2 error, comparing BFGS, SSBFGS, SSBroyden, NG, and SPRING. The SSBroyden (PyTorch) and SSBFGS (PyTorch) results are obtained using SciPy-based implementations executed on the CPU, while SSBFGS (JAX) is implemented using Optax in JAX. Case 𝑘 Optimizer Relative 𝐿∞ Relative 𝐿2 Training time (s) # parameters 1 1 BFGS (PyTorch)* 4… view at source ↗
Figure 6
Figure 6. Figure 6: 2D Helmholtz problem with 𝑎1 = 6 and 𝑎2 = 6: (a) 𝑘 = 1 and (b) 𝑘 = 100. During training for 𝑘 = 100, collocation points are sampled progressively from [−0.2, 0.2]2 for 10,000 iterations, [−0.4, 0.4]2 for 15,000 iterations, [−0.7, 0.7]2 for 20,000 iterations, and finally the full domain [−1, 1]2 for 40,000 iterations. Case 𝑘 Optimizer Relative 𝐿∞ Relative 𝐿2 Training time (s) # parameters 1 1 SSBFGS with Tr… view at source ↗
Figure 7
Figure 7. Figure 7: 2D Helmholtz with 𝑎1 = 10, 𝑎2 = 10, and 𝑘 = 1: SSBroyden prediction, exact solution (39), absolute error, and pointwise absolute error |𝑢pred − 𝑢exact| [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: 2D Helmholtz with 𝑎1 = 10, 𝑎2 = 10, and 𝑘 = 1: Loss and relative 𝐿2 error for the Helmholtz problem with 𝑎1 = 10 and 𝑎2 = 10, comparing BFGS, SSBroyden, and NG descent. Case Optimizer Relative 𝐿∞ Relative 𝐿2 Training time (s) # parameters 1 SSBFGS with TrustRegion 2.2 × 10−3 3.1 × 10−3 1530 6,691 2 SSBroyden with TrustRegion 6.1 × 10−3 6.7 × 10−3 2194 6,691 3 NG 2.2 × 10−5 1.9 × 10−5 3000 6,691 4 SPRING 3.… view at source ↗
Figure 9
Figure 9. Figure 9: 3D Helmholtz with 𝑎1 = 4, 𝑎2 = 4, 𝑎3 = 3 , and 𝑘 = 1. The network consists of a periodic Fourier feature embedding (𝑘max = 2), followed by a fully connected network with four hidden layers, each containing 30 neurons with tanh activations, and a linear output layer. Case Optimizer Relative 𝐿∞ Relative 𝐿2 Training time (s) # parameters 1 PINN–SSBFGS with TrustRegion 1.0 × 10−4 3.5 × 10−4 551 3211 2 PINN–NG … view at source ↗
Figure 10
Figure 10. Figure 10: Stokes equation: The left subfigure presents the loss history for Stokes flow using various combinations of optimizers and line-search routines. Each experiment starts with a warm-up phase employing first-order optimizers (Adam and SOAP), followed by a switch to quasi-Newton optimizers combined with different line-search strategies. The point of transition between optimizers is marked by a vertical dashed… view at source ↗
Figure 11
Figure 11. Figure 11: Stokes equation: Streamlines obtained using various optimization strategies: (a) Adam, (b) SOAP, (c) SOAP + SSBFGS (trust-region), (d) SOAP + SSBroyden (Armijo–Wolfe), (e) SOAP + SSBroyden (zoom), (f) Adam + SSBFGS (trust-region), (g) SOAP + SSBFGS (Wolfe), (h) SOAP + SSBFGS (zoom), and (i) NG. The NG achieves comparable accuracy to self-scaled optimizers while significantly reducing runtime. Panel (j) sh… view at source ↗
Figure 12
Figure 12. Figure 12: 2D viscous Burgers equation: Solution obtained using different optimization methods. (a) Best results from a quasi-Newton and line search method (SSBroyden with zoom line search), (b) NG and (c) SOAP optimization. In each panel, the left subfigure shows the PINN solution at 𝑡 = 0.5, the middle subfigure shows the spatial distribution of the absolute error (concentrated at the shock location), and the righ… view at source ↗
Figure 13
Figure 13. Figure 13: Inviscid Burgers with Roe linearization: Predicted solutions at 𝑡 = 1 with the LRPINN for both optimizers, together with a reference numerical solution obtained with a third-order WENO scheme. Left panel shows the solution for all 𝑥 ∈ [−1, 1], whereas the right panel shows a zoomed-in view of the shock region (𝑥 = 0). Optimizer Relative 𝐿1 Relative 𝐿2 # collocation points Training time (s) SSBroyden (1.2 … view at source ↗
Figure 14
Figure 14. Figure 14: Inviscid Burgers equation using relaxation and entropy inequality: Loss history for the inviscid Burgers flow obtained using different combinations of optimizers and line-search strategies, incorporating flux relaxation and an entropy inequality. The training procedure begins with a warm-up phase using the first-order Adam optimizer, followed by a transition to quasi-Newton optimizers combined with variou… view at source ↗
Figure 15
Figure 15. Figure 15: Inviscid Burgers equation using relaxation and entropy inequality: The solution of Burgers’ equation, incorporating an entropy inequality and flux relaxation, is obtained using a physics-informed neural network (PINN) trained with the SSBroyden optimizer and a zoom line-search strategy. Panel (a) compares the PINN solution with a DGSEM solution of order 𝑁 = 3, where 𝑁 represents the polynomial degree used… view at source ↗
Figure 16
Figure 16. Figure 16: 1D Euler equations with Roe linearization: Neural network predictions for the density, velocity and pressure at 𝑡 = 1. The exact solution is also plotted for reference [PITH_FULL_IMAGE:figures/full_fig_p035_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: 1D Euler equations with Roe linearization: Neural network prediction for the density, before and after the inviscid part of the training process. Case Optimizer Relative 𝐿1 (𝜌, 𝑢, 𝑝) Training time (s) # parameters 1 SSBroyden (9.2, 8.0, 5.0) × 10−4 160 4833 2 NG (6.5, 9.1, 3.3) × 10−3 452 4833 [PITH_FULL_IMAGE:figures/full_fig_p035_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: 1D Euler Equations with HLLC Flux: PINN architecture for 1D Euler equation using adaptive viscosity and HLLC flux. Jnini et al.: Preprint submitted to Elsevier Page 36 of 53 [PITH_FULL_IMAGE:figures/full_fig_p036_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Euler equations with HLLC flux vs analytical solution: Comparison of the 1D Euler solution for the Sod shock tube problem obtained using the PINN framework augmented with the HLLC flux against the analytical solution. The top row displays the density 𝜌, velocity 𝑢, pressure 𝑝, and local Mach number 𝑀𝑎 (from left to right) at 𝑡 = 0.15. The bottom row presents the corresponding pointwise absolute error dist… view at source ↗
Figure 20
Figure 20. Figure 20: Euler Equations with HLLC flux vs WENO solution: Comparison of the 1D Euler solution for the Sod shock tube problem obtained using the PINN framework augmented with the HLLC flux against a numerical reference solution computed using a WENO scheme [32]. The top row shows the density 𝜌, velocity 𝑢, pressure 𝑝, and local Mach number 𝑀𝑎 (from left to right) at 𝑡 = 0.15. The bottom row presents the correspondi… view at source ↗
Figure 21
Figure 21. Figure 21: Stiff PK–PD ODE System: Reference vs. PINN (NG).Comparison of the exact solution and the PINN prediction obtained with the NG optimizer, showing accurate reconstruction of the multiscale stiff dynamics. introduce delayed pharmacodynamic responses with characteristic timescales governed by 𝑘1 , which interact with the much faster cytotoxic effects (controlled by 𝑘2 𝑐(𝑡)) acting on the proliferating compart… view at source ↗
Figure 22
Figure 22. Figure 22: Stiff PK–PD ODE System: Time-resolved absolute error for each state variable, shown on a logarithmic scale, comparing PINN solutions trained with different optimization methods. The results illustrate optimizer-dependent performance in resolving stiff multiscale dynamics. Optimizer Relative 𝐿2 (𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 ) Relative 𝐿2 (W) Relative 𝐿∞ (W) Training time (s) Adam 1.25 × 10−1 , 7.93 × 10−1 , 5.79 × 10… view at source ↗
Figure 23
Figure 23. Figure 23: Roofline analysis for the SSBFGS and NG optimizers applied to the inviscid Burgers, Euler, and Stokes equations. The solid black line denotes the theoretical roofline, and the vertical dashed line marks the ridge point, which delineates the transition from the memory-bound to the compute-bound regime. memory traffic is not the principal bottleneck and that the available bandwidth is being utilized efficie… view at source ↗
Figure 24
Figure 24. Figure 24: Comparison of batch training using Adam and the sSSBFGS approach (Algorithm 2) for the 1D Poisson equation. Notably, in FP32 precision, the sSSBFGS algorithm reaches loss values on the order of 10−7, significantly outperforming Adam. Jnini et al.: Preprint submitted to Elsevier Page 54 of 53 [PITH_FULL_IMAGE:figures/full_fig_p054_24.png] view at source ↗
read the original abstract

Efficient and robust optimization is essential for neural networks, enabling scientific machine learning models to converge rapidly to very high accuracy -- faithfully capturing complex physical behavior governed by differential equations. In this work, we present advanced optimization strategies to accelerate the convergence of physics-informed neural networks (PINNs) for challenging partial (PDEs) and ordinary differential equations (ODEs). Specifically, we provide efficient implementations of the Natural Gradient (NG) optimizer, Self-Scaling BFGS and Broyden optimizers, and demonstrate their performance on problems including the Helmholtz equation, Stokes flow, inviscid Burgers equation, Euler equations for high-speed flows, and stiff ODEs arising in pharmacokinetics and pharmacodynamics. Beyond optimizer development, we also propose new PINN-based methods for solving the inviscid Burgers and Euler equations, and compare the resulting solutions against high-order numerical methods to provide a rigorous and fair assessment. Finally, we address the challenge of scaling these quasi-Newton optimizers for batched training, enabling efficient and scalable solutions for large data-driven problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces efficient implementations of curvature-aware optimizers (Natural Gradient, self-scaling BFGS, and Broyden) for physics-informed neural networks (PINNs) applied to challenging PDEs and ODEs, including the Helmholtz equation, Stokes flow, inviscid Burgers equation, Euler equations for high-speed flows, and stiff ODEs from pharmacokinetics/pharmacodynamics. It proposes new PINN formulations for the inviscid Burgers and Euler equations with comparisons against high-order numerical methods, and develops a scaling approach to enable batched training with these quasi-Newton methods.

Significance. If the reported convergence and accuracy improvements are substantiated, the work would provide practical, scalable tools for high-fidelity scientific machine learning on differential equations. The explicit comparisons to high-order numerics and the batched-training scaling strategy are strengths that support reproducibility and broader applicability.

minor comments (2)
  1. [Abstract] Abstract: the claim of performance gains on multiple benchmark problems is stated without any quantitative metrics, error norms, or convergence rates; adding a short summary of key results (e.g., final L2 errors or iteration counts) would make the abstract self-contained.
  2. [Section on batched training] Implementation details for the batched quasi-Newton scaling (mentioned in the final paragraph) should include explicit pseudocode or complexity analysis to clarify how curvature information is maintained across batches without prohibitive memory cost.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work on curvature-aware optimizers for PINNs and for recommending minor revision. The assessment correctly identifies the key contributions, including efficient implementations of Natural Gradient, self-scaling BFGS, and Broyden methods, new PINN formulations for inviscid Burgers and Euler equations with high-order numerical comparisons, and the batched scaling strategy for quasi-Newton methods. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript presents algorithmic implementations of established curvature-aware optimizers (Natural Gradient, self-scaling BFGS, Broyden) together with empirical benchmarks on standard PDE/ODE test problems. No derivation chain is claimed that reduces a first-principles result or prediction to its own fitted inputs by construction. The work is framed as engineering and numerical demonstration rather than a closed mathematical derivation; therefore no self-definitional, fitted-input-renamed-as-prediction, or self-citation-load-bearing steps are present. All performance claims rest on direct comparison with high-order numerical references, which are external to the optimizer formulations themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract introduces no explicit free parameters, axioms, or invented entities beyond standard assumptions of neural network training and differential equation modeling.

pith-pipeline@v0.9.0 · 5524 in / 1197 out tokens · 78171 ms · 2026-05-10T18:52:34.396723+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. GRAFT-ATHENA: Self-Improving Agentic Teams for Autonomous Discovery and Evolutionary Numerical Algorithms

    cs.LG 2026-05 unverdicted novelty 6.0

    GRAFT-ATHENA projects combinatorial method choices into factored trees that embed as fingerprints in a metric space, enabling an agentic system to accumulate experience across domains and autonomously discover new num...

  2. Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit

    cs.LG 2026-05 unverdicted novelty 6.0

    In the infinite-width limit, regularized Newton's method for neural networks converges exponentially to global minimizers with uniform rates across the frequency spectrum using the Newton neural tangent kernel.

  3. A Physics-Informed Neural Network for Solving the Quasi-static Magnetohydrodynamic Equations

    physics.plasm-ph 2026-04 unverdicted novelty 6.0

    A PINN solves the time-dependent quasi-static MHD equations in axisymmetric tokamak geometry without training data and reproduces vertical plasma displacement seen in ground-truth simulations.

  4. Lightweight Geometric Adaptation for Training Physics-Informed Neural Networks

    cs.LG 2026-04 unverdicted novelty 5.0

    A secant-based adaptive correction augments first-order optimizers to improve convergence speed, stability, and accuracy when training PINNs on challenging PDEs.

  5. Two-scale Neural Networks for Singularly Perturbed Dynamical Systems with Multiple Parameters

    math.NA 2026-05 unverdicted novelty 4.0

    A neural network augmented with the geometric mean of multiple small parameters approximates solutions to singularly perturbed dynamical systems with satisfactory accuracy on tested coupled cases.

Reference graph

Works this paper leans on

77 extracted references · 29 canonical work pages · cited by 5 Pith papers · 2 internal anchors

  1. [1]

    Physics-informed machine learning in biomedical science and engineering

    Ahmadi, N., Cao, Q., Humphrey, J.D., Karniadakis, G.E., 2025. Physics-informed machine learning in biomedical science and engineering. arXiv preprint arXiv:2510.05433

  2. [2]

    Representation meets optimization: Training PINNs and PIKANs for gray- box discovery in systems pharmacology

    Ahmadi Daryakenari, N., Shukla, K., Karniadakis, G.E., 2026. Representation meets optimization: Training PINNs and PIKANs for gray- box discovery in systems pharmacology. Computers in Biology and Medicine 201, 111393. URL:https://www.sciencedirect.com/ science/article/pii/S0010482525017470, doi:https://doi.org/10.1016/j.compbiomed.2025.111393

  3. [3]

    Global and superlinear convergence of a class of self-scaling methods with inexact line searches

    Al-Baali, M., 1998. Global and superlinear convergence of a class of self-scaling methods with inexact line searches. Computational Optimization and Applications 9, 191–203. doi:10.1023/A:1018315205474

  4. [4]

    Wide interval for efficient self-scaling quasi-newton algorithms

    Al-Baali, M., Khalfan, H., 2005. Wide interval for efficient self-scaling quasi-newton algorithms. Optimization Methods and Software 20, 679–691. doi:10.1080/10556780410001709448

  5. [5]

    Broyden’s quasi-newton methods for a nonlinear system of equations and unconstrained optimization: A review and open problems

    Al-Baali, M., Spedicato, E., Maggioni, F., 2014. Broyden’s quasi-newton methods for a nonlinear system of equations and unconstrained optimization: A review and open problems. Optimization Methods and Software 29, 937–954. doi:10.1080/10556788.2013.856909

  6. [6]

    Information geometry

    Amari, S., 1997. Information geometry. Contemporary Mathematics 203, 4

  7. [7]

    Naturalgradientworksefficientlyinlearning

    ichiAmari,S.,1998. Naturalgradientworksefficientlyinlearning. NeuralComputation10,251–276. doi:10.1162/089976698300017746

  8. [8]

    Functional neural wavefunction optimization

    Armegioiu, V., Carrasquilla, J., Mishra, S., Müller, J., Nys, J., Zeinhofer, M., Zhang, H., 2025. Functional neural wavefunction optimization. arXiv preprint arXiv:2507.10835

  9. [9]

    Ontheoptimizationofdeepnetworks:Implicitaccelerationbyoverparameterization,in:International conference on machine learning, PMLR

    Arora,S.,Cohen,N.,Hazan,E.,2018. Ontheoptimizationofdeepnetworks:Implicitaccelerationbyoverparameterization,in:International conference on machine learning, PMLR. pp. 244–253

  10. [10]

    A progressive batching l-bfgs method for machine learning, in: International Conference on Machine Learning, PMLR

    Bollapragada, R., Nocedal, J., Mudigere, D., Shi, H.J., Tang, P.T.P., 2018. A progressive batching l-bfgs method for machine learning, in: International Conference on Machine Learning, PMLR. pp. 620–629

  11. [11]

    The convergence of a class of double-rank minimization algorithms 1

    Broyden, C.G., 1970. The convergence of a class of double-rank minimization algorithms 1. general considerations. IMA Journal of Applied Mathematics 6, 76–90. doi:10.1093/imamat/6.1.76

  12. [12]

    Nektar++: An open-source spectral/hp element framework

    Cantwell, C.D., Moxey, D., Comerford, A., Bolis, A., Rocco, G., Mengaldo, G., De Grazia, D., Yakovlev, S., Lombard, J.E., Ekelschot, D., et al., 2015. Nektar++: An open-source spectral/hp element framework. Computer physics communications 192, 205–219

  13. [13]

    Ondiscretelyentropyconservativeandentropystablediscontinuousgalerkinmethods

    Chan,J.,2018. Ondiscretelyentropyconservativeandentropystablediscontinuousgalerkinmethods. JournalofComputationalPhysics362, 346–374

  14. [14]

    Separablephysics-informedneuralnetworks

    Cho,J.,Nam,S.,Yang,H.,Yun,S.B.,Hong,Y.,Park,E.,2023. Separablephysics-informedneuralnetworks. AdvancesinNeuralInformation Processing Systems 36, 23761–23788

  15. [15]

    Trust region methods

    Conn, A.R., Gould, N.I., Toint, P.L., 2000. Trust region methods. SIAM

  16. [16]

    Kronecker-factored approximate curvature for physics-informed neural networks

    Dangel, F., Müller, J., Zeinhofer, M., 2024. Kronecker-factored approximate curvature for physics-informed neural networks. Advances in Neural Information Processing Systems 37, 34582–34636. Jnini et al.:Preprint submitted to ElsevierPage 47 of 53 Curvature-Aware Optimization for High-Accuracy PINNs

  17. [17]

    CMINNs: Compartment model informed neural networks—unlocking drug dynamics

    Daryakenari, N.A., Wang, S., Karniadakis, G., 2025. CMINNs: Compartment model informed neural networks—unlocking drug dynamics. Computers in Biology and Medicine 184, 109392

  18. [18]

    Variable Metric Method for Minimization

    Davidon, W.C., 1959. Variable Metric Method for Minimization. Technical Report ANL-5990. Argonne National Laboratory

  19. [19]

    Numerical Methods for Unconstrained Optimization and Nonlinear Equations

    Dennis Jr, J., Schnabel, R.B., 1996. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. SIAM

  20. [20]

    Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

    Duchi, J., Hazan, E., Singer, Y., 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research 12, 2121–2159. URL:http://jmlr.org/papers/v12/duchi11a.html

  21. [21]

    Gradient-annihilated pinns for solving riemannproblems:Applicationtorelativistichydrodynamics

    Ferrer-Sánchez, A., Martín-Guerrero, J.D., de Austri-Bazan, R.R., Torres-Forné, A., Font, J.A., 2024. Gradient-annihilated pinns for solving riemannproblems:Applicationtorelativistichydrodynamics. ComputerMethodsinAppliedMechanicsandEngineering424,116906. URL: https://www.sciencedirect.com/science/article/pii/S0045782524001622, doi:https://doi.org/10.1016...

  22. [22]

    Fischer,P.,Kerkemeier,S.,Min,M.,Lan,Y.H.,Phillips,M.,Rathnayake,T.,Merzari,E.,Tomboulides,A.,Karakus,A.,Chalmers,N.,etal.,

  23. [23]

    Parallel Computing 114, 102982

    Nekrs, a gpu-accelerated spectral element navier–stokes solver. Parallel Computing 114, 102982

  24. [24]

    Fletcher

    Fletcher, R., 1970. A new approach to variable metric algorithms. The Computer Journal 13, 317–322. doi:10.1093/comjnl/13.3.317

  25. [25]

    Arapidlyconvergentdescentmethodforminimization

    Fletcher,R.,Powell,M.J.D.,1963. Arapidlyconvergentdescentmethodforminimization. TheComputerJournal6,163–168. doi:10.1093/ comjnl/6.2.163

  26. [26]

    A family of variable-metric methods derived by variational means

    Goldfarb, D., 1970. A family of variable-metric methods derived by variational means. Mathematics of Computation 24, 23–26. doi:10.1090/S0025-5718-1970-0258249-6

  27. [27]

    Journal of Computational Physics 516, 113351

    Goldshlager,G.,Abrahamsen,N.,Lin,L.,2024.Akaczmarz-inspiredapproachtoacceleratetheoptimizationofneuralnetworkwavefunctions. Journal of Computational Physics 516, 113351

  28. [28]

    A sketch-and-project analysis of subsampled natural gradient algorithms

    Goldshlager, G., Hu, J., Lin, L., 2025. A sketch-and-project analysis of subsampled natural gradient algorithms. arXiv preprint arXiv:2508.21022

  29. [29]

    Deep learning

    Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y., 2016. Deep learning. volume 1. MIT press Cambridge

  30. [30]

    Shampoo: Preconditioned stochastic tensor optimization, in: Proceedings of the 35th International Conference on Machine Learning (ICML)

    Gupta, V., Koren, T., Singer, Y., 2018. Shampoo: Preconditioned stochastic tensor optimization, in: Proceedings of the 35th International Conference on Machine Learning (ICML)

  31. [31]

    Improving energy natural gradient descent through woodbury, momentum, and randomization, in: The Thirty-ninth Annual Conference on Neural Information Processing Systems

    Guzman-Cordero, A., Dangel, F., Goldshlager, G., Zeinhofer, M., 2025. Improving energy natural gradient descent through woodbury, momentum, and randomization, in: The Thirty-ninth Annual Conference on Neural Information Processing Systems. URL:https: //openreview.net/forum?id=5YMZfufpfY

  32. [32]

    Aprovablyentropystablesubcellshockcapturingapproachfor high order split form dg for the compressible euler equations

    Hennemann,S.,Rueda-Ramírez,A.M.,Hindenlang,F.J.,Gassner,G.J.,2021. Aprovablyentropystablesubcellshockcapturingapproachfor high order split form dg for the compressible euler equations. Journal of Computational Physics 426, 109935

  33. [33]

    Numerical methods for conservation laws: From analysis to algorithms

    Hesthaven, J.S., 2017. Numerical methods for conservation laws: From analysis to algorithms. SIAM

  34. [34]

    State-space models are accurate and efficient neural operators for dynamical systems

    Hu, Z., Daryakenari, N.A., Shen, Q., Kawaguchi, K., Karniadakis, G.E., 2024. State-space models are accurate and efficient neural operators for dynamical systems. arXiv preprint arXiv:2409.03231

  35. [35]

    Dual cone gradient descent for training physics-informed neural networks

    Hwang, Y., Lim, D.Y., 2024. Dual cone gradient descent for training physics-informed neural networks. Advances in Neural Information Processing Systems 37, 98563–98595

  36. [36]

    Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems

    Jagtap, A.D., Kharazmi, E., Karniadakis, G.E., 2020. Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems. Computer Methods in Applied Mechanics and Engineering 365, 113028

  37. [37]

    Physics-informed neural networks for inverse problems in supersonic flows

    Jagtap, A.D., Mao, Z., Adams, N., Karniadakis, G.E., 2022. Physics-informed neural networks for inverse problems in supersonic flows. Journal of Computational Physics 466, 111402

  38. [38]

    Dual natural gradient descent for scalable training of physics-informed neural networks

    Jnini, A., Vella, F., 2025. Dual natural gradient descent for scalable training of physics-informed neural networks. Transactions on Machine Learning Research URL:https://openreview.net/forum?id=GDHVRy6SDd

  39. [39]

    Gauss-Newton Natural Gradient Descent for Physics-informed Computational Fluid Dynamics

    Jnini, A., Vella, F., Zeinhofer, M., 2025. Gauss-Newton Natural Gradient Descent for Physics-informed Computational Fluid Dynamics. Computers & Fluids , 106955

  40. [40]

    Spectral/hp element methods for computational fluid dynamics

    Karniadakis, G., Sherwin, S., 2013. Spectral/hp element methods for computational fluid dynamics. American Chemical Society

  41. [41]

    Deep learning without poor local minima

    Kawaguchi, K., 2016. Deep learning without poor local minima. Advances in neural information processing systems 29

  42. [42]

    Adam: A Method for Stochastic Optimization

    Kingma, D.P., 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  43. [43]

    A framework based on symbolic regression coupled with extended physics- informedneuralnetworksforgray-boxlearningofequationsofmotionfromdata

    Kiyani, E., Shukla, K., Karniadakis, G.E., Karttunen, M., 2023. A framework based on symbolic regression coupled with extended physics- informedneuralnetworksforgray-boxlearningofequationsofmotionfromdata. ComputerMethodsinAppliedMechanicsandEngineering 415, 116258. doi:https://doi.org/10.1016/j.cma.2023.116258

  44. [44]

    Optimizing the optimizer for physics-informed neural networks and kolmogorov-arnold networks

    Kiyani, E., Shukla, K., Urbán, J.F., Darbon, J., Karniadakis, G.E., 2025. Optimizing the optimizer for physics-informed neural networks and kolmogorov-arnold networks. Computer Methods in Applied Mechanics and Engineering 446, 118308

  45. [45]

    Introduction to optimization methods for training sciml models

    Kopaničáková, A., Riccietti, E., 2026. Introduction to optimization methods for training sciml models. arXiv preprint arXiv:2601.10222

  46. [46]

    Characterizing possible failure modes in physics-informed neural networks, in: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W

    Krishnapriyan, A., Gholami, A., Zhe, S., Kirby, R., Mahoney, M.W., 2021. Characterizing possible failure modes in physics-informed neural networks, in: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc.. pp. 26548–26560. URL:https://proceedings.neurips.cc/p...

  47. [47]

    Deep learning

    LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. nature 521, 436–444

  48. [48]

    Visualizing the loss landscape of neural nets

    Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T., 2018. Visualizing the loss landscape of neural nets. Advances in neural information processing systems 31

  49. [49]

    Canweremovethesquare-rootinadaptivegradientmethods? a second-order perspective

    Lin,W.,Dangel,F.,Eschenhagen,R.,Bae,J.,Turner,R.E.,Makhzani,A.,2024. Canweremovethesquare-rootinadaptivegradientmethods? a second-order perspective. International Conference on Machine Learning

  50. [50]

    Understanding and improving shampoo and soap via kullback-leibler minimization.arXiv preprint arXiv:2509.03378,

    Lin,W.,Lowe,S.C.,Dangel,F.,Eschenhagen,R.,Xu,Z.,Grosse,R.B.,2025. Understandingandimprovingshampooandsoapviakullback- leibler minimization. arXiv preprint arXiv:2509.03378 . Jnini et al.:Preprint submitted to ElsevierPage 48 of 53 Curvature-Aware Optimization for High-Accuracy PINNs

  51. [51]

    Locally linearized physics informed neural networks for riemann problems of hyperbolic conservation laws

    Liu, J., Zheng, S., Song, X., Xu, D., 2024. Locally linearized physics informed neural networks for riemann problems of hyperbolic conservation laws. Physics of Fluids 36, 116135. doi:10.1063/5.0238865

  52. [52]

    Decoupled weight decay regularization, in: International Conference on Learning Representations

    Loshchilov, I., Hutter, F., 2019. Decoupled weight decay regularization, in: International Conference on Learning Representations. URL: https://openreview.net/forum?id=Bkg6RiCqY7

  53. [53]

    Viscous and resistive eddies near a sharp corner

    Moffatt, H.K., 1964. Viscous and resistive eddies near a sharp corner. Journal of Fluid Mechanics 18, 1–18

  54. [54]

    Achieving High Accuracy with PINNs via Energy Natural Gradient Descent, in: International Conference on Machine Learning, PMLR

    Müller, J., Zeinhofer, M., 2023. Achieving High Accuracy with PINNs via Energy Natural Gradient Descent, in: International Conference on Machine Learning, PMLR. pp. 25471–25485

  55. [55]

    Position:OptimizationinSciMLshouldemploythefunctionspacegeometry,in:Salakhutdinov,R.,Kolter,Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., Berkenkamp, F

    Müller,J.,Zeinhofer,M.,2024. Position:OptimizationinSciMLshouldemploythefunctionspacegeometry,in:Salakhutdinov,R.,Kolter,Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., Berkenkamp, F. (Eds.), Proceedings of the 41st International Conference on Machine Learning, PMLR. pp. 36705–36722. URL:https://proceedings.mlr.press/v235/muller24d.html

  56. [56]

    Numerical optimization

    Nocedal, J., Wright, S.J., 2006. Numerical optimization. Springer

  57. [57]

    Invariance properties of the natural gradient in overparametrised systems: J

    van Oostrum, J., Müller, J., Ay, N., 2023. Invariance properties of the natural gradient in overparametrised systems: J. van oostrum et al. Information geometry 6, 51–67

  58. [58]

    Self-scalingvariablemetric(ssvm)algorithms,parti:Criteriaandsufficientconditionsforscalingaclass of algorithms

    Oren,S.S.,Luenberger,D.G.,1974. Self-scalingvariablemetric(ssvm)algorithms,parti:Criteriaandsufficientconditionsforscalingaclass of algorithms. Management Science 20, 845–862. doi:10.1287/mnsc.20.5.845

  59. [59]

    Optimistix: modular optimisation in JAX and Equinox.arXiv:2402.09983, 2024

    Rader, J., Lyons, T., Kidger, P., 2024. Optimistix: modular optimisation in jax and equinox. arXiv preprint arXiv:2402.09983

  60. [60]

    Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations

    Raissi, M., Perdikaris, P., Karniadakis, G., 2019. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, 686–707. URL:https://www. sciencedirect.com/science/article/pii/S0021999118307125, doi:https://doi.org/10.1016/j.jc...

  61. [61]

    Ranocha, M

    Ranocha,H.,Schlottke-Lakemper,M.,Winters,A.R.,Faulhaber,E.,Chan,J.,Gassner,G.J.,2021. Adaptivenumericalsimulationswithtrixi. jl: A case study of julia for scientific computing. arXiv preprint arXiv:2108.06476

  62. [62]

    Approximate riemann solvers, parameter vectors, and difference schemes

    Roe, P., 1981. Approximate riemann solvers, parameter vectors, and difference schemes. Journal of Computational Physics 43, 357–372. URL:https://www.sciencedirect.com/science/article/pii/0021999181901285,doi:https://doi.org/10.1016/ 0021-9991(81)90128-5

  63. [63]

    Anagram: A natural gradient relative to adapted model for efficient pinns learning

    Schwencke, N., Furtlehner, C., 2025. Anagram: A natural gradient relative to adapted model for efficient pinns learning. URL:https: //arxiv.org/abs/2412.10782,arXiv:2412.10782

  64. [64]

    Amstramgram: Adaptive multi-cutoff strategy modification for anagram

    Schwencke, N., Rousselot, C., Shilova, A., Furtlehner, C., 2025. Amstramgram: Adaptive multi-cutoff strategy modification for anagram. URL:https://arxiv.org/abs/2510.15998,arXiv:2510.15998

  65. [65]

    Conditioning of quasi-newton methods for function minimization

    Shanno, D.F., 1970. Conditioning of quasi-newton methods for function minimization. Mathematics of Computation 24, 647–656. doi:10.1090/S0025-5718-1970-0274029-X

  66. [66]

    Condition numbers and equilibration of matrices

    van der Sluis, A., 1969. Condition numbers and equilibration of matrices. Numerische Mathematik 14, 14–23. doi:10.1007/BF02165968

  67. [67]

    The numerical viscosity of entropy stable schemes for systems of conservation laws

    Tadmor, E., 1987. The numerical viscosity of entropy stable schemes for systems of conservation laws. i. Mathematics of Computation 49, 91–103

  68. [68]

    Divide the gradient by a running average of its recent magnitude

    Tieleman, T., Hinton, G., 2017. Divide the gradient by a running average of its recent magnitude. coursera: Neural networks for machine learning, in: Technical report. University of Toronto

  69. [69]

    Restoration of the contact surface in the hll-riemann solver

    Toro, E.F., Spruce, M., Speares, W., 1994. Restoration of the contact surface in the hll-riemann solver. Shock waves 4, 25–34

  70. [70]

    Frompinnstopikans:Recent advances in physics-informed machine learning

    Toscano,J.D.,Oommen,V.,Varghese,A.J.,Zou,Z.,AhmadiDaryakenari,N.,Wu,C.,Karniadakis,G.E.,2025. Frompinnstopikans:Recent advances in physics-informed machine learning. Machine Learning for Computational Science and Engineering 1, 15

  71. [71]

    Unveiling the optimization process of physics informed neural networks: How accurate and competitive can pinns be? Journal of Computational Physics 523, 113656

    Urbán, J.F., Stefanou, P., Pons, J.A., 2025. Unveiling the optimization process of physics informed neural networks: How accurate and competitive can pinns be? Journal of Computational Physics 523, 113656

  72. [72]

    2025 , journal =

    Urbán,J.F.,Pons,J.A.,2025. Anapproximateriemannsolverapproachinphysics-informedneuralnetworksforhyperbolicconservationlaws. Physics of Fluids 37, 096114. doi:10.1063/5.0285282

  73. [73]

    SOAP: Improving and Stabilizing Shampoo using Adam

    Vyas, N., Morwani, D., Zhao, R., Kwun, M., Shapira, I., Brandfonbrener, D., Janson, L., Kakade, S., 2024. SOAP: Improving and stabilizing Shampoo using Adam. arXiv preprint arXiv:2409.11321

  74. [74]

    Understandingandmitigatinggradientflowpathologiesinphysics-informedneuralnetworks

    Wang,S.,Teng,Y.,Perdikaris,P.,2021. Understandingandmitigatinggradientflowpathologiesinphysics-informedneuralnetworks. SIAM Journal on Scientific Computing 43, A3055–A3081

  75. [75]

    Roofline: an insightful visual performance model for multicore architectures

    Williams, S., Waterman, A., Patterson, D., 2009. Roofline: an insightful visual performance model for multicore architectures. Communica- tions of the ACM 52, 65–76

  76. [76]

    Convergence conditions for ascent methods

    Wolfe, P., 1969. Convergence conditions for ascent methods. SIAM review 11, 226–235

  77. [77]

    Towards understanding convergence and generalization of adamw

    Zhou, P., Xie, X., Lin, Z., Yan, S., 2024. Towards understanding convergence and generalization of adamw. IEEE transactions on pattern analysis and machine intelligence 46, 6486–6493. Jnini et al.:Preprint submitted to ElsevierPage 49 of 53 Curvature-Aware Optimization for High-Accuracy PINNs A. Details on Self-Scaled Quasi-Newton Methods Theparameters(𝜏 ...