arxiv: 2604.05230 · v1 · submitted 2026-04-06 · 💻 cs.LG · cs.AI· cs.NA· math.NA· math.OC

Recognition: 2 theorem links

· Lean Theorem

Curvature-Aware Optimization for High-Accuracy Physics-Informed Neural Networks

Anas Jnini , Elham Kiyani , Khemraj Shukla , Jorge F. Urban , Nazanin Ahmadi Daryakenari , Johannes Muller , Marius Zeinhofer , George Em Karniadakis

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:52 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NAmath.NAmath.OC

keywords physics-informed neural networksnatural gradientquasi-Newton optimizationBFGSdifferential equationsPINNsscientific machine learning

0 comments

The pith

Curvature-aware optimizers like natural gradient and self-scaling BFGS accelerate PINN convergence to high accuracy on complex differential equations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces efficient implementations of natural gradient, self-scaling BFGS, and Broyden optimizers for physics-informed neural networks. These methods are shown to speed up training on challenging problems such as the Helmholtz equation, Stokes flow, inviscid Burgers equation, Euler equations, and stiff ODEs. The work also develops new PINN approaches for the Burgers and Euler equations and compares them to traditional numerical solvers. It further addresses scaling these optimizers for large batched datasets.

Core claim

By using curvature-aware optimization techniques including the natural gradient and quasi-Newton methods, PINNs can achieve faster convergence and higher accuracy when solving partial and ordinary differential equations that are difficult for standard optimizers.

What carries the argument

Natural Gradient optimizer and Self-Scaling BFGS/Broyden quasi-Newton methods, which approximate the curvature of the loss landscape to provide better update directions for training PINNs.

If this is right

High-accuracy solutions for inviscid flows and stiff systems become feasible with PINNs.
Quasi-Newton methods can be scaled for batched training in large-scale scientific ML.
New PINN formulations for Burgers and Euler equations match high-order numerical accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These optimizers might reduce the need for architecture-specific tuning in PINNs across different physical domains.
Integration with other ML techniques could further improve performance on high-dimensional problems.
Similar curvature-aware methods could apply to other scientific computing neural network tasks beyond PINNs.

Load-bearing premise

That the curvature information from these optimizers can be computed and applied efficiently at scale for batched training without prohibitive memory or time costs.

What would settle it

A demonstration that on one of the tested equations, such as the inviscid Burgers, the new optimizers fail to reach the accuracy of high-order numerical methods within reasonable training time or resources.

Figures

Figures reproduced from arXiv: 2604.05230 by Anas Jnini, Elham Kiyani, George Em Karniadakis, Johannes Muller, Jorge F. Urban, Khemraj Shukla, Marius Zeinhofer, Nazanin Ahmadi Daryakenari.

**Figure 1.** Figure 1: Schematic illustration of PINN training and optimization methods considered in this work. Our benchmarks include PDEs of elliptic (Helmholtz and Stokes), parabolic (2D viscous Burgers), hyperbolic (inviscid Burgers and 1D Euler) type, and a stiff PK–PD ODE system, characterized by oscillatory solutions, nonlinear diffusion, shock-induced discontinuities, and stiffness. The methods studied here are BFGS, SS… view at source ↗

**Figure 2.** Figure 2: Loss landscapes for two representative PDEs are shown along random directions and projections onto pairs of singular vectors. (a) Top row: 1D Euler equation 59 (Sod problem), projected along orthogonal random directions, leading singular vectors, and next dominant pair (left to right). Bottom row: Stokes equation 42, following the same projection scheme. Axes correspond to (𝛼, 𝛽). Jnini et al.: Preprint su… view at source ↗

**Figure 3.** Figure 3: Visualization of the NG (top), SSBroyden (middle), and SOAP (bottom) training dynamics for the 2D Helmholtz problem with 𝑎1 = 1, 𝑎2 = 4, and 𝑘 = 1; shown are side-by-side snapshot comparisons of the updates of the optimizers (left) and the current error of the prediction (right). Jnini et al.: Preprint submitted to Elsevier Page 13 of 53 [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: 2D Helmholtz problem with 𝑎1 = 1, 𝑎2 = 4, and 𝑘 = 1. SSBroyden prediction, reference solution given by Equation (39), and the pointwise absolute error |𝑢pred − 𝑢ref | [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: 2D Helmholtz problem with 𝑎1 = 1, 𝑎2 = 4, and 𝑘 = 1. Training loss and relative 𝐿2 error, comparing BFGS, SSBFGS, SSBroyden, NG, and SPRING. The SSBroyden (PyTorch) and SSBFGS (PyTorch) results are obtained using SciPy-based implementations executed on the CPU, while SSBFGS (JAX) is implemented using Optax in JAX. Case 𝑘 Optimizer Relative 𝐿∞ Relative 𝐿2 Training time (s) # parameters 1 1 BFGS (PyTorch)* 4… view at source ↗

**Figure 6.** Figure 6: 2D Helmholtz problem with 𝑎1 = 6 and 𝑎2 = 6: (a) 𝑘 = 1 and (b) 𝑘 = 100. During training for 𝑘 = 100, collocation points are sampled progressively from [−0.2, 0.2]2 for 10,000 iterations, [−0.4, 0.4]2 for 15,000 iterations, [−0.7, 0.7]2 for 20,000 iterations, and finally the full domain [−1, 1]2 for 40,000 iterations. Case 𝑘 Optimizer Relative 𝐿∞ Relative 𝐿2 Training time (s) # parameters 1 1 SSBFGS with Tr… view at source ↗

**Figure 7.** Figure 7: 2D Helmholtz with 𝑎1 = 10, 𝑎2 = 10, and 𝑘 = 1: SSBroyden prediction, exact solution (39), absolute error, and pointwise absolute error |𝑢pred − 𝑢exact| [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: 2D Helmholtz with 𝑎1 = 10, 𝑎2 = 10, and 𝑘 = 1: Loss and relative 𝐿2 error for the Helmholtz problem with 𝑎1 = 10 and 𝑎2 = 10, comparing BFGS, SSBroyden, and NG descent. Case Optimizer Relative 𝐿∞ Relative 𝐿2 Training time (s) # parameters 1 SSBFGS with TrustRegion 2.2 × 10−3 3.1 × 10−3 1530 6,691 2 SSBroyden with TrustRegion 6.1 × 10−3 6.7 × 10−3 2194 6,691 3 NG 2.2 × 10−5 1.9 × 10−5 3000 6,691 4 SPRING 3.… view at source ↗

**Figure 9.** Figure 9: 3D Helmholtz with 𝑎1 = 4, 𝑎2 = 4, 𝑎3 = 3 , and 𝑘 = 1. The network consists of a periodic Fourier feature embedding (𝑘max = 2), followed by a fully connected network with four hidden layers, each containing 30 neurons with tanh activations, and a linear output layer. Case Optimizer Relative 𝐿∞ Relative 𝐿2 Training time (s) # parameters 1 PINN–SSBFGS with TrustRegion 1.0 × 10−4 3.5 × 10−4 551 3211 2 PINN–NG … view at source ↗

**Figure 10.** Figure 10: Stokes equation: The left subfigure presents the loss history for Stokes flow using various combinations of optimizers and line-search routines. Each experiment starts with a warm-up phase employing first-order optimizers (Adam and SOAP), followed by a switch to quasi-Newton optimizers combined with different line-search strategies. The point of transition between optimizers is marked by a vertical dashed… view at source ↗

**Figure 11.** Figure 11: Stokes equation: Streamlines obtained using various optimization strategies: (a) Adam, (b) SOAP, (c) SOAP + SSBFGS (trust-region), (d) SOAP + SSBroyden (Armijo–Wolfe), (e) SOAP + SSBroyden (zoom), (f) Adam + SSBFGS (trust-region), (g) SOAP + SSBFGS (Wolfe), (h) SOAP + SSBFGS (zoom), and (i) NG. The NG achieves comparable accuracy to self-scaled optimizers while significantly reducing runtime. Panel (j) sh… view at source ↗

**Figure 12.** Figure 12: 2D viscous Burgers equation: Solution obtained using different optimization methods. (a) Best results from a quasi-Newton and line search method (SSBroyden with zoom line search), (b) NG and (c) SOAP optimization. In each panel, the left subfigure shows the PINN solution at 𝑡 = 0.5, the middle subfigure shows the spatial distribution of the absolute error (concentrated at the shock location), and the righ… view at source ↗

**Figure 13.** Figure 13: Inviscid Burgers with Roe linearization: Predicted solutions at 𝑡 = 1 with the LRPINN for both optimizers, together with a reference numerical solution obtained with a third-order WENO scheme. Left panel shows the solution for all 𝑥 ∈ [−1, 1], whereas the right panel shows a zoomed-in view of the shock region (𝑥 = 0). Optimizer Relative 𝐿1 Relative 𝐿2 # collocation points Training time (s) SSBroyden (1.2 … view at source ↗

**Figure 14.** Figure 14: Inviscid Burgers equation using relaxation and entropy inequality: Loss history for the inviscid Burgers flow obtained using different combinations of optimizers and line-search strategies, incorporating flux relaxation and an entropy inequality. The training procedure begins with a warm-up phase using the first-order Adam optimizer, followed by a transition to quasi-Newton optimizers combined with variou… view at source ↗

**Figure 15.** Figure 15: Inviscid Burgers equation using relaxation and entropy inequality: The solution of Burgers’ equation, incorporating an entropy inequality and flux relaxation, is obtained using a physics-informed neural network (PINN) trained with the SSBroyden optimizer and a zoom line-search strategy. Panel (a) compares the PINN solution with a DGSEM solution of order 𝑁 = 3, where 𝑁 represents the polynomial degree used… view at source ↗

**Figure 16.** Figure 16: 1D Euler equations with Roe linearization: Neural network predictions for the density, velocity and pressure at 𝑡 = 1. The exact solution is also plotted for reference [PITH_FULL_IMAGE:figures/full_fig_p035_16.png] view at source ↗

**Figure 17.** Figure 17: 1D Euler equations with Roe linearization: Neural network prediction for the density, before and after the inviscid part of the training process. Case Optimizer Relative 𝐿1 (𝜌, 𝑢, 𝑝) Training time (s) # parameters 1 SSBroyden (9.2, 8.0, 5.0) × 10−4 160 4833 2 NG (6.5, 9.1, 3.3) × 10−3 452 4833 [PITH_FULL_IMAGE:figures/full_fig_p035_17.png] view at source ↗

**Figure 18.** Figure 18: 1D Euler Equations with HLLC Flux: PINN architecture for 1D Euler equation using adaptive viscosity and HLLC flux. Jnini et al.: Preprint submitted to Elsevier Page 36 of 53 [PITH_FULL_IMAGE:figures/full_fig_p036_18.png] view at source ↗

**Figure 19.** Figure 19: Euler equations with HLLC flux vs analytical solution: Comparison of the 1D Euler solution for the Sod shock tube problem obtained using the PINN framework augmented with the HLLC flux against the analytical solution. The top row displays the density 𝜌, velocity 𝑢, pressure 𝑝, and local Mach number 𝑀𝑎 (from left to right) at 𝑡 = 0.15. The bottom row presents the corresponding pointwise absolute error dist… view at source ↗

**Figure 20.** Figure 20: Euler Equations with HLLC flux vs WENO solution: Comparison of the 1D Euler solution for the Sod shock tube problem obtained using the PINN framework augmented with the HLLC flux against a numerical reference solution computed using a WENO scheme [32]. The top row shows the density 𝜌, velocity 𝑢, pressure 𝑝, and local Mach number 𝑀𝑎 (from left to right) at 𝑡 = 0.15. The bottom row presents the correspondi… view at source ↗

**Figure 21.** Figure 21: Stiff PK–PD ODE System: Reference vs. PINN (NG).Comparison of the exact solution and the PINN prediction obtained with the NG optimizer, showing accurate reconstruction of the multiscale stiff dynamics. introduce delayed pharmacodynamic responses with characteristic timescales governed by 𝑘1 , which interact with the much faster cytotoxic effects (controlled by 𝑘2 𝑐(𝑡)) acting on the proliferating compart… view at source ↗

**Figure 22.** Figure 22: Stiff PK–PD ODE System: Time-resolved absolute error for each state variable, shown on a logarithmic scale, comparing PINN solutions trained with different optimization methods. The results illustrate optimizer-dependent performance in resolving stiff multiscale dynamics. Optimizer Relative 𝐿2 (𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 ) Relative 𝐿2 (W) Relative 𝐿∞ (W) Training time (s) Adam 1.25 × 10−1 , 7.93 × 10−1 , 5.79 × 10… view at source ↗

**Figure 23.** Figure 23: Roofline analysis for the SSBFGS and NG optimizers applied to the inviscid Burgers, Euler, and Stokes equations. The solid black line denotes the theoretical roofline, and the vertical dashed line marks the ridge point, which delineates the transition from the memory-bound to the compute-bound regime. memory traffic is not the principal bottleneck and that the available bandwidth is being utilized efficie… view at source ↗

**Figure 24.** Figure 24: Comparison of batch training using Adam and the sSSBFGS approach (Algorithm 2) for the 1D Poisson equation. Notably, in FP32 precision, the sSSBFGS algorithm reaches loss values on the order of 10−7, significantly outperforming Adam. Jnini et al.: Preprint submitted to Elsevier Page 54 of 53 [PITH_FULL_IMAGE:figures/full_fig_p054_24.png] view at source ↗

read the original abstract

Efficient and robust optimization is essential for neural networks, enabling scientific machine learning models to converge rapidly to very high accuracy -- faithfully capturing complex physical behavior governed by differential equations. In this work, we present advanced optimization strategies to accelerate the convergence of physics-informed neural networks (PINNs) for challenging partial (PDEs) and ordinary differential equations (ODEs). Specifically, we provide efficient implementations of the Natural Gradient (NG) optimizer, Self-Scaling BFGS and Broyden optimizers, and demonstrate their performance on problems including the Helmholtz equation, Stokes flow, inviscid Burgers equation, Euler equations for high-speed flows, and stiff ODEs arising in pharmacokinetics and pharmacodynamics. Beyond optimizer development, we also propose new PINN-based methods for solving the inviscid Burgers and Euler equations, and compare the resulting solutions against high-order numerical methods to provide a rigorous and fair assessment. Finally, we address the challenge of scaling these quasi-Newton optimizers for batched training, enabling efficient and scalable solutions for large data-driven problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Applies known curvature-aware optimizers to PINNs on standard and new benchmarks but leaves performance gains unquantified in the abstract.

read the letter

The main thing here is that the authors take Natural Gradient, self-scaling BFGS, and Broyden methods and apply them to PINNs for Helmholtz, Stokes, inviscid Burgers, Euler, and some stiff ODEs. They also introduce new PINN formulations for the Burgers and Euler cases and compare those solutions to high-order numerical methods, plus they outline a scaling approach for batched training. That last part is useful because quasi-Newton methods often struggle with memory and batch size in practice. The comparisons to traditional solvers add a needed check on whether the neural solutions are actually accurate rather than just fitting the loss. The implementations appear to be the core contribution, and they stay empirical without claiming a new theoretical framework. The soft spot is the absence of concrete numbers. The abstract talks about faster convergence and higher accuracy but supplies no error tables, iteration counts, or timing data, so it is impossible to judge whether the gains are meaningful or simply the result of extra tuning. Second-order methods carry real overhead, and without details on how that cost is managed it is hard to know if the approach scales beyond the reported problems. No internal contradictions or hidden fitting assumptions show up in the description. This is for people already working with PINNs who need better training tools for stiff or high-dimensional physics problems. A reader who wants ready-to-use optimizer variants or benchmarks on those specific equations would get something out of it. I would send it for peer review to let referees check the actual results and implementation details rather than desk-rejecting it outright.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces efficient implementations of curvature-aware optimizers (Natural Gradient, self-scaling BFGS, and Broyden) for physics-informed neural networks (PINNs) applied to challenging PDEs and ODEs, including the Helmholtz equation, Stokes flow, inviscid Burgers equation, Euler equations for high-speed flows, and stiff ODEs from pharmacokinetics/pharmacodynamics. It proposes new PINN formulations for the inviscid Burgers and Euler equations with comparisons against high-order numerical methods, and develops a scaling approach to enable batched training with these quasi-Newton methods.

Significance. If the reported convergence and accuracy improvements are substantiated, the work would provide practical, scalable tools for high-fidelity scientific machine learning on differential equations. The explicit comparisons to high-order numerics and the batched-training scaling strategy are strengths that support reproducibility and broader applicability.

minor comments (2)

[Abstract] Abstract: the claim of performance gains on multiple benchmark problems is stated without any quantitative metrics, error norms, or convergence rates; adding a short summary of key results (e.g., final L2 errors or iteration counts) would make the abstract self-contained.
[Section on batched training] Implementation details for the batched quasi-Newton scaling (mentioned in the final paragraph) should include explicit pseudocode or complexity analysis to clarify how curvature information is maintained across batches without prohibitive memory cost.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work on curvature-aware optimizers for PINNs and for recommending minor revision. The assessment correctly identifies the key contributions, including efficient implementations of Natural Gradient, self-scaling BFGS, and Broyden methods, new PINN formulations for inviscid Burgers and Euler equations with high-order numerical comparisons, and the batched scaling strategy for quasi-Newton methods. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript presents algorithmic implementations of established curvature-aware optimizers (Natural Gradient, self-scaling BFGS, Broyden) together with empirical benchmarks on standard PDE/ODE test problems. No derivation chain is claimed that reduces a first-principles result or prediction to its own fitted inputs by construction. The work is framed as engineering and numerical demonstration rather than a closed mathematical derivation; therefore no self-definitional, fitted-input-renamed-as-prediction, or self-citation-load-bearing steps are present. All performance claims rest on direct comparison with high-order numerical references, which are external to the optimizer formulations themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract introduces no explicit free parameters, axioms, or invented entities beyond standard assumptions of neural network training and differential equation modeling.

pith-pipeline@v0.9.0 · 5524 in / 1197 out tokens · 78171 ms · 2026-05-10T18:52:34.396723+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We provide efficient implementations of the Natural Gradient (NG) optimizer, Self-Scaling BFGS and Broyden optimizers... Gauss–Newton introduce the approximate Hessian H_k ≈ J^T_k J_k
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

loss landscape in parameter space... projections onto pairs of singular vectors... NG updates correct the error much better

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

GRAFT-ATHENA: Self-Improving Agentic Teams for Autonomous Discovery and Evolutionary Numerical Algorithms
cs.LG 2026-05 unverdicted novelty 6.0

GRAFT-ATHENA projects combinatorial method choices into factored trees that embed as fingerprints in a metric space, enabling an agentic system to accumulate experience across domains and autonomously discover new num...
Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit
cs.LG 2026-05 unverdicted novelty 6.0

In the infinite-width limit, regularized Newton's method for neural networks converges exponentially to global minimizers with uniform rates across the frequency spectrum using the Newton neural tangent kernel.
A Physics-Informed Neural Network for Solving the Quasi-static Magnetohydrodynamic Equations
physics.plasm-ph 2026-04 unverdicted novelty 6.0

A PINN solves the time-dependent quasi-static MHD equations in axisymmetric tokamak geometry without training data and reproduces vertical plasma displacement seen in ground-truth simulations.
Lightweight Geometric Adaptation for Training Physics-Informed Neural Networks
cs.LG 2026-04 unverdicted novelty 5.0

A secant-based adaptive correction augments first-order optimizers to improve convergence speed, stability, and accuracy when training PINNs on challenging PDEs.
Two-scale Neural Networks for Singularly Perturbed Dynamical Systems with Multiple Parameters
math.NA 2026-05 unverdicted novelty 4.0

A neural network augmented with the geometric mean of multiple small parameters approximates solutions to singularly perturbed dynamical systems with satisfactory accuracy on tested coupled cases.

Reference graph

Works this paper leans on

77 extracted references · 29 canonical work pages · cited by 5 Pith papers · 2 internal anchors

[1]

Physics-informed machine learning in biomedical science and engineering

Ahmadi, N., Cao, Q., Humphrey, J.D., Karniadakis, G.E., 2025. Physics-informed machine learning in biomedical science and engineering. arXiv preprint arXiv:2510.05433

work page arXiv 2025
[2]

Representation meets optimization: Training PINNs and PIKANs for gray- box discovery in systems pharmacology

Ahmadi Daryakenari, N., Shukla, K., Karniadakis, G.E., 2026. Representation meets optimization: Training PINNs and PIKANs for gray- box discovery in systems pharmacology. Computers in Biology and Medicine 201, 111393. URL:https://www.sciencedirect.com/ science/article/pii/S0010482525017470, doi:https://doi.org/10.1016/j.compbiomed.2025.111393

work page doi:10.1016/j.compbiomed.2025.111393 2026
[3]

Global and superlinear convergence of a class of self-scaling methods with inexact line searches

Al-Baali, M., 1998. Global and superlinear convergence of a class of self-scaling methods with inexact line searches. Computational Optimization and Applications 9, 191–203. doi:10.1023/A:1018315205474

work page doi:10.1023/a:1018315205474 1998
[4]

Wide interval for efficient self-scaling quasi-newton algorithms

Al-Baali, M., Khalfan, H., 2005. Wide interval for efficient self-scaling quasi-newton algorithms. Optimization Methods and Software 20, 679–691. doi:10.1080/10556780410001709448

work page doi:10.1080/10556780410001709448 2005
[5]

Broyden’s quasi-newton methods for a nonlinear system of equations and unconstrained optimization: A review and open problems

Al-Baali, M., Spedicato, E., Maggioni, F., 2014. Broyden’s quasi-newton methods for a nonlinear system of equations and unconstrained optimization: A review and open problems. Optimization Methods and Software 29, 937–954. doi:10.1080/10556788.2013.856909

work page doi:10.1080/10556788.2013.856909 2014
[6]

Information geometry

Amari, S., 1997. Information geometry. Contemporary Mathematics 203, 4

1997
[7]

Naturalgradientworksefficientlyinlearning

ichiAmari,S.,1998. Naturalgradientworksefficientlyinlearning. NeuralComputation10,251–276. doi:10.1162/089976698300017746

work page doi:10.1162/089976698300017746 1998
[8]

Functional neural wavefunction optimization

Armegioiu, V., Carrasquilla, J., Mishra, S., Müller, J., Nys, J., Zeinhofer, M., Zhang, H., 2025. Functional neural wavefunction optimization. arXiv preprint arXiv:2507.10835

work page arXiv 2025
[9]

Ontheoptimizationofdeepnetworks:Implicitaccelerationbyoverparameterization,in:International conference on machine learning, PMLR

Arora,S.,Cohen,N.,Hazan,E.,2018. Ontheoptimizationofdeepnetworks:Implicitaccelerationbyoverparameterization,in:International conference on machine learning, PMLR. pp. 244–253

2018
[10]

A progressive batching l-bfgs method for machine learning, in: International Conference on Machine Learning, PMLR

Bollapragada, R., Nocedal, J., Mudigere, D., Shi, H.J., Tang, P.T.P., 2018. A progressive batching l-bfgs method for machine learning, in: International Conference on Machine Learning, PMLR. pp. 620–629

2018
[11]

The convergence of a class of double-rank minimization algorithms 1

Broyden, C.G., 1970. The convergence of a class of double-rank minimization algorithms 1. general considerations. IMA Journal of Applied Mathematics 6, 76–90. doi:10.1093/imamat/6.1.76

work page doi:10.1093/imamat/6.1.76 1970
[12]

Nektar++: An open-source spectral/hp element framework

Cantwell, C.D., Moxey, D., Comerford, A., Bolis, A., Rocco, G., Mengaldo, G., De Grazia, D., Yakovlev, S., Lombard, J.E., Ekelschot, D., et al., 2015. Nektar++: An open-source spectral/hp element framework. Computer physics communications 192, 205–219

2015
[13]

Ondiscretelyentropyconservativeandentropystablediscontinuousgalerkinmethods

Chan,J.,2018. Ondiscretelyentropyconservativeandentropystablediscontinuousgalerkinmethods. JournalofComputationalPhysics362, 346–374

2018
[14]

Separablephysics-informedneuralnetworks

Cho,J.,Nam,S.,Yang,H.,Yun,S.B.,Hong,Y.,Park,E.,2023. Separablephysics-informedneuralnetworks. AdvancesinNeuralInformation Processing Systems 36, 23761–23788

2023
[15]

Trust region methods

Conn, A.R., Gould, N.I., Toint, P.L., 2000. Trust region methods. SIAM

2000
[16]

Kronecker-factored approximate curvature for physics-informed neural networks

Dangel, F., Müller, J., Zeinhofer, M., 2024. Kronecker-factored approximate curvature for physics-informed neural networks. Advances in Neural Information Processing Systems 37, 34582–34636. Jnini et al.:Preprint submitted to ElsevierPage 47 of 53 Curvature-Aware Optimization for High-Accuracy PINNs

2024
[17]

CMINNs: Compartment model informed neural networks—unlocking drug dynamics

Daryakenari, N.A., Wang, S., Karniadakis, G., 2025. CMINNs: Compartment model informed neural networks—unlocking drug dynamics. Computers in Biology and Medicine 184, 109392

2025
[18]

Variable Metric Method for Minimization

Davidon, W.C., 1959. Variable Metric Method for Minimization. Technical Report ANL-5990. Argonne National Laboratory

1959
[19]

Numerical Methods for Unconstrained Optimization and Nonlinear Equations

Dennis Jr, J., Schnabel, R.B., 1996. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. SIAM

1996
[20]

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Duchi, J., Hazan, E., Singer, Y., 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research 12, 2121–2159. URL:http://jmlr.org/papers/v12/duchi11a.html

2011
[21]

Gradient-annihilated pinns for solving riemannproblems:Applicationtorelativistichydrodynamics

Ferrer-Sánchez, A., Martín-Guerrero, J.D., de Austri-Bazan, R.R., Torres-Forné, A., Font, J.A., 2024. Gradient-annihilated pinns for solving riemannproblems:Applicationtorelativistichydrodynamics. ComputerMethodsinAppliedMechanicsandEngineering424,116906. URL: https://www.sciencedirect.com/science/article/pii/S0045782524001622, doi:https://doi.org/10.1016...

work page doi:10.1016/j.cma.2024 2024
[22]

Fischer,P.,Kerkemeier,S.,Min,M.,Lan,Y.H.,Phillips,M.,Rathnayake,T.,Merzari,E.,Tomboulides,A.,Karakus,A.,Chalmers,N.,etal.,
[23]

Parallel Computing 114, 102982

Nekrs, a gpu-accelerated spectral element navier–stokes solver. Parallel Computing 114, 102982
[24]

Fletcher

Fletcher, R., 1970. A new approach to variable metric algorithms. The Computer Journal 13, 317–322. doi:10.1093/comjnl/13.3.317

work page doi:10.1093/comjnl/13.3.317 1970
[25]

Arapidlyconvergentdescentmethodforminimization

Fletcher,R.,Powell,M.J.D.,1963. Arapidlyconvergentdescentmethodforminimization. TheComputerJournal6,163–168. doi:10.1093/ comjnl/6.2.163

1963
[26]

A family of variable-metric methods derived by variational means

Goldfarb, D., 1970. A family of variable-metric methods derived by variational means. Mathematics of Computation 24, 23–26. doi:10.1090/S0025-5718-1970-0258249-6

work page doi:10.1090/s0025-5718-1970-0258249-6 1970
[27]

Journal of Computational Physics 516, 113351

Goldshlager,G.,Abrahamsen,N.,Lin,L.,2024.Akaczmarz-inspiredapproachtoacceleratetheoptimizationofneuralnetworkwavefunctions. Journal of Computational Physics 516, 113351

2024
[28]

A sketch-and-project analysis of subsampled natural gradient algorithms

Goldshlager, G., Hu, J., Lin, L., 2025. A sketch-and-project analysis of subsampled natural gradient algorithms. arXiv preprint arXiv:2508.21022

work page arXiv 2025
[29]

Deep learning

Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y., 2016. Deep learning. volume 1. MIT press Cambridge

2016
[30]

Shampoo: Preconditioned stochastic tensor optimization, in: Proceedings of the 35th International Conference on Machine Learning (ICML)

Gupta, V., Koren, T., Singer, Y., 2018. Shampoo: Preconditioned stochastic tensor optimization, in: Proceedings of the 35th International Conference on Machine Learning (ICML)

2018
[31]

Improving energy natural gradient descent through woodbury, momentum, and randomization, in: The Thirty-ninth Annual Conference on Neural Information Processing Systems

Guzman-Cordero, A., Dangel, F., Goldshlager, G., Zeinhofer, M., 2025. Improving energy natural gradient descent through woodbury, momentum, and randomization, in: The Thirty-ninth Annual Conference on Neural Information Processing Systems. URL:https: //openreview.net/forum?id=5YMZfufpfY

2025
[32]

Aprovablyentropystablesubcellshockcapturingapproachfor high order split form dg for the compressible euler equations

Hennemann,S.,Rueda-Ramírez,A.M.,Hindenlang,F.J.,Gassner,G.J.,2021. Aprovablyentropystablesubcellshockcapturingapproachfor high order split form dg for the compressible euler equations. Journal of Computational Physics 426, 109935

2021
[33]

Numerical methods for conservation laws: From analysis to algorithms

Hesthaven, J.S., 2017. Numerical methods for conservation laws: From analysis to algorithms. SIAM

2017
[34]

State-space models are accurate and efficient neural operators for dynamical systems

Hu, Z., Daryakenari, N.A., Shen, Q., Kawaguchi, K., Karniadakis, G.E., 2024. State-space models are accurate and efficient neural operators for dynamical systems. arXiv preprint arXiv:2409.03231

work page arXiv 2024
[35]

Dual cone gradient descent for training physics-informed neural networks

Hwang, Y., Lim, D.Y., 2024. Dual cone gradient descent for training physics-informed neural networks. Advances in Neural Information Processing Systems 37, 98563–98595

2024
[36]

Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems

Jagtap, A.D., Kharazmi, E., Karniadakis, G.E., 2020. Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems. Computer Methods in Applied Mechanics and Engineering 365, 113028

2020
[37]

Physics-informed neural networks for inverse problems in supersonic flows

Jagtap, A.D., Mao, Z., Adams, N., Karniadakis, G.E., 2022. Physics-informed neural networks for inverse problems in supersonic flows. Journal of Computational Physics 466, 111402

2022
[38]

Dual natural gradient descent for scalable training of physics-informed neural networks

Jnini, A., Vella, F., 2025. Dual natural gradient descent for scalable training of physics-informed neural networks. Transactions on Machine Learning Research URL:https://openreview.net/forum?id=GDHVRy6SDd

2025
[39]

Gauss-Newton Natural Gradient Descent for Physics-informed Computational Fluid Dynamics

Jnini, A., Vella, F., Zeinhofer, M., 2025. Gauss-Newton Natural Gradient Descent for Physics-informed Computational Fluid Dynamics. Computers & Fluids , 106955

2025
[40]

Spectral/hp element methods for computational fluid dynamics

Karniadakis, G., Sherwin, S., 2013. Spectral/hp element methods for computational fluid dynamics. American Chemical Society

2013
[41]

Deep learning without poor local minima

Kawaguchi, K., 2016. Deep learning without poor local minima. Advances in neural information processing systems 29

2016
[42]

Adam: A Method for Stochastic Optimization

Kingma, D.P., 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014
[43]

A framework based on symbolic regression coupled with extended physics- informedneuralnetworksforgray-boxlearningofequationsofmotionfromdata

Kiyani, E., Shukla, K., Karniadakis, G.E., Karttunen, M., 2023. A framework based on symbolic regression coupled with extended physics- informedneuralnetworksforgray-boxlearningofequationsofmotionfromdata. ComputerMethodsinAppliedMechanicsandEngineering 415, 116258. doi:https://doi.org/10.1016/j.cma.2023.116258

work page doi:10.1016/j.cma.2023.116258 2023
[44]

Optimizing the optimizer for physics-informed neural networks and kolmogorov-arnold networks

Kiyani, E., Shukla, K., Urbán, J.F., Darbon, J., Karniadakis, G.E., 2025. Optimizing the optimizer for physics-informed neural networks and kolmogorov-arnold networks. Computer Methods in Applied Mechanics and Engineering 446, 118308

2025
[45]

Introduction to optimization methods for training sciml models

Kopaničáková, A., Riccietti, E., 2026. Introduction to optimization methods for training sciml models. arXiv preprint arXiv:2601.10222

work page arXiv 2026
[46]

Characterizing possible failure modes in physics-informed neural networks, in: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W

Krishnapriyan, A., Gholami, A., Zhe, S., Kirby, R., Mahoney, M.W., 2021. Characterizing possible failure modes in physics-informed neural networks, in: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc.. pp. 26548–26560. URL:https://proceedings.neurips.cc/p...

2021
[47]

Deep learning

LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. nature 521, 436–444

2015
[48]

Visualizing the loss landscape of neural nets

Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T., 2018. Visualizing the loss landscape of neural nets. Advances in neural information processing systems 31

2018
[49]

Canweremovethesquare-rootinadaptivegradientmethods? a second-order perspective

Lin,W.,Dangel,F.,Eschenhagen,R.,Bae,J.,Turner,R.E.,Makhzani,A.,2024. Canweremovethesquare-rootinadaptivegradientmethods? a second-order perspective. International Conference on Machine Learning

2024
[50]

Understanding and improving shampoo and soap via kullback-leibler minimization.arXiv preprint arXiv:2509.03378,

Lin,W.,Lowe,S.C.,Dangel,F.,Eschenhagen,R.,Xu,Z.,Grosse,R.B.,2025. Understandingandimprovingshampooandsoapviakullback- leibler minimization. arXiv preprint arXiv:2509.03378 . Jnini et al.:Preprint submitted to ElsevierPage 48 of 53 Curvature-Aware Optimization for High-Accuracy PINNs

work page arXiv 2025
[51]

Locally linearized physics informed neural networks for riemann problems of hyperbolic conservation laws

Liu, J., Zheng, S., Song, X., Xu, D., 2024. Locally linearized physics informed neural networks for riemann problems of hyperbolic conservation laws. Physics of Fluids 36, 116135. doi:10.1063/5.0238865

work page doi:10.1063/5.0238865 2024
[52]

Decoupled weight decay regularization, in: International Conference on Learning Representations

Loshchilov, I., Hutter, F., 2019. Decoupled weight decay regularization, in: International Conference on Learning Representations. URL: https://openreview.net/forum?id=Bkg6RiCqY7

2019
[53]

Viscous and resistive eddies near a sharp corner

Moffatt, H.K., 1964. Viscous and resistive eddies near a sharp corner. Journal of Fluid Mechanics 18, 1–18

1964
[54]

Achieving High Accuracy with PINNs via Energy Natural Gradient Descent, in: International Conference on Machine Learning, PMLR

Müller, J., Zeinhofer, M., 2023. Achieving High Accuracy with PINNs via Energy Natural Gradient Descent, in: International Conference on Machine Learning, PMLR. pp. 25471–25485

2023
[55]

Position:OptimizationinSciMLshouldemploythefunctionspacegeometry,in:Salakhutdinov,R.,Kolter,Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., Berkenkamp, F

Müller,J.,Zeinhofer,M.,2024. Position:OptimizationinSciMLshouldemploythefunctionspacegeometry,in:Salakhutdinov,R.,Kolter,Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., Berkenkamp, F. (Eds.), Proceedings of the 41st International Conference on Machine Learning, PMLR. pp. 36705–36722. URL:https://proceedings.mlr.press/v235/muller24d.html

2024
[56]

Numerical optimization

Nocedal, J., Wright, S.J., 2006. Numerical optimization. Springer

2006
[57]

Invariance properties of the natural gradient in overparametrised systems: J

van Oostrum, J., Müller, J., Ay, N., 2023. Invariance properties of the natural gradient in overparametrised systems: J. van oostrum et al. Information geometry 6, 51–67

2023
[58]

Self-scalingvariablemetric(ssvm)algorithms,parti:Criteriaandsufficientconditionsforscalingaclass of algorithms

Oren,S.S.,Luenberger,D.G.,1974. Self-scalingvariablemetric(ssvm)algorithms,parti:Criteriaandsufficientconditionsforscalingaclass of algorithms. Management Science 20, 845–862. doi:10.1287/mnsc.20.5.845

work page doi:10.1287/mnsc.20.5.845 1974
[59]

Optimistix: modular optimisation in JAX and Equinox.arXiv:2402.09983, 2024

Rader, J., Lyons, T., Kidger, P., 2024. Optimistix: modular optimisation in jax and equinox. arXiv preprint arXiv:2402.09983

work page arXiv 2024
[60]

Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations

Raissi, M., Perdikaris, P., Karniadakis, G., 2019. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, 686–707. URL:https://www. sciencedirect.com/science/article/pii/S0021999118307125, doi:https://doi.org/10.1016/j.jc...

work page doi:10.1016/j.jcp.2018.10.045 2019
[61]

Ranocha, M

Ranocha,H.,Schlottke-Lakemper,M.,Winters,A.R.,Faulhaber,E.,Chan,J.,Gassner,G.J.,2021. Adaptivenumericalsimulationswithtrixi. jl: A case study of julia for scientific computing. arXiv preprint arXiv:2108.06476

work page arXiv 2021
[62]

Approximate riemann solvers, parameter vectors, and difference schemes

Roe, P., 1981. Approximate riemann solvers, parameter vectors, and difference schemes. Journal of Computational Physics 43, 357–372. URL:https://www.sciencedirect.com/science/article/pii/0021999181901285,doi:https://doi.org/10.1016/ 0021-9991(81)90128-5

work page arXiv 1981
[63]

Anagram: A natural gradient relative to adapted model for efficient pinns learning

Schwencke, N., Furtlehner, C., 2025. Anagram: A natural gradient relative to adapted model for efficient pinns learning. URL:https: //arxiv.org/abs/2412.10782,arXiv:2412.10782

work page arXiv 2025
[64]

Amstramgram: Adaptive multi-cutoff strategy modification for anagram

Schwencke, N., Rousselot, C., Shilova, A., Furtlehner, C., 2025. Amstramgram: Adaptive multi-cutoff strategy modification for anagram. URL:https://arxiv.org/abs/2510.15998,arXiv:2510.15998

work page arXiv 2025
[65]

Conditioning of quasi-newton methods for function minimization

Shanno, D.F., 1970. Conditioning of quasi-newton methods for function minimization. Mathematics of Computation 24, 647–656. doi:10.1090/S0025-5718-1970-0274029-X

work page doi:10.1090/s0025-5718-1970-0274029-x 1970
[66]

Condition numbers and equilibration of matrices

van der Sluis, A., 1969. Condition numbers and equilibration of matrices. Numerische Mathematik 14, 14–23. doi:10.1007/BF02165968

work page doi:10.1007/bf02165968 1969
[67]

The numerical viscosity of entropy stable schemes for systems of conservation laws

Tadmor, E., 1987. The numerical viscosity of entropy stable schemes for systems of conservation laws. i. Mathematics of Computation 49, 91–103

1987
[68]

Divide the gradient by a running average of its recent magnitude

Tieleman, T., Hinton, G., 2017. Divide the gradient by a running average of its recent magnitude. coursera: Neural networks for machine learning, in: Technical report. University of Toronto

2017
[69]

Restoration of the contact surface in the hll-riemann solver

Toro, E.F., Spruce, M., Speares, W., 1994. Restoration of the contact surface in the hll-riemann solver. Shock waves 4, 25–34

1994
[70]

Frompinnstopikans:Recent advances in physics-informed machine learning

Toscano,J.D.,Oommen,V.,Varghese,A.J.,Zou,Z.,AhmadiDaryakenari,N.,Wu,C.,Karniadakis,G.E.,2025. Frompinnstopikans:Recent advances in physics-informed machine learning. Machine Learning for Computational Science and Engineering 1, 15

2025
[71]

Unveiling the optimization process of physics informed neural networks: How accurate and competitive can pinns be? Journal of Computational Physics 523, 113656

Urbán, J.F., Stefanou, P., Pons, J.A., 2025. Unveiling the optimization process of physics informed neural networks: How accurate and competitive can pinns be? Journal of Computational Physics 523, 113656

2025
[72]

2025 , journal =

Urbán,J.F.,Pons,J.A.,2025. Anapproximateriemannsolverapproachinphysics-informedneuralnetworksforhyperbolicconservationlaws. Physics of Fluids 37, 096114. doi:10.1063/5.0285282

work page doi:10.1063/5.0285282 2025
[73]

SOAP: Improving and Stabilizing Shampoo using Adam

Vyas, N., Morwani, D., Zhao, R., Kwun, M., Shapira, I., Brandfonbrener, D., Janson, L., Kakade, S., 2024. SOAP: Improving and stabilizing Shampoo using Adam. arXiv preprint arXiv:2409.11321

work page internal anchor Pith review arXiv 2024
[74]

Understandingandmitigatinggradientflowpathologiesinphysics-informedneuralnetworks

Wang,S.,Teng,Y.,Perdikaris,P.,2021. Understandingandmitigatinggradientflowpathologiesinphysics-informedneuralnetworks. SIAM Journal on Scientific Computing 43, A3055–A3081

2021
[75]

Roofline: an insightful visual performance model for multicore architectures

Williams, S., Waterman, A., Patterson, D., 2009. Roofline: an insightful visual performance model for multicore architectures. Communica- tions of the ACM 52, 65–76

2009
[76]

Convergence conditions for ascent methods

Wolfe, P., 1969. Convergence conditions for ascent methods. SIAM review 11, 226–235

1969
[77]

Towards understanding convergence and generalization of adamw

Zhou, P., Xie, X., Lin, Z., Yan, S., 2024. Towards understanding convergence and generalization of adamw. IEEE transactions on pattern analysis and machine intelligence 46, 6486–6493. Jnini et al.:Preprint submitted to ElsevierPage 49 of 53 Curvature-Aware Optimization for High-Accuracy PINNs A. Details on Self-Scaled Quasi-Newton Methods Theparameters(𝜏 ...

2024