pith. machine review for the scientific record. sign in

arxiv: 2605.13378 · v1 · submitted 2026-05-13 · 💻 cs.CE · physics.comp-ph

Recognition: 2 theorem links

· Lean Theorem

Robust Matrix-Free Newton-Krylov Solvers via Automatic Differentiation

Marco Pasquale, Stefano Markidis

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:53 UTC · model grok-4.3

classification 💻 cs.CE physics.comp-ph
keywords jacobian-free newton-krylovautomatic differentiationfinite differencesmatrix-free methodsnonlinear solversgateaux derivativesnumerical robustness
0
0 comments X

The pith

Automatic differentiation for Jacobian-vector products makes matrix-free Newton-Krylov solvers orders of magnitude faster and far more robust than finite differences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that forward-mode automatic differentiation can replace finite-difference approximations when computing the Jacobian-vector products required by Jacobian-free Newton-Krylov solvers. The comparison holds discretization, iteration counts, tolerances, and hardware fixed while only changing the linearization strategy. In benchmarks spanning Burgers equation, radiation diffusion, reaction-diffusion, and nonlinear Maxwell problems, AD prevents the round-off errors that degrade the Krylov subspace and cause solver failures. A reader should care because these methods are central to simulating stiff nonlinear systems in engineering and physics, where solver robustness directly determines whether a simulation completes.

Core claim

By using forward-mode automatic differentiation to compute accurate Gateaux derivatives instead of finite-difference perturbations, the Jacobian-free Newton-Krylov method avoids sensitivity to round-off error in the linearization step. This change accelerates computation by two to three orders of magnitude on both CPU and GPU while raising the minimum solver completion rate to 95 percent from 42 percent for the finite-difference version across multiple nonlinear test cases.

What carries the argument

Forward-mode automatic differentiation applied to the nonlinear residual function to produce exact Jacobian-vector products along Krylov directions, replacing the finite-difference perturbation of the current Newton iterate.

If this is right

  • Newton-Krylov solvers complete successfully in at least 95 percent of cases compared to 42 percent with finite differences.
  • Overall computation time decreases by two to three orders of magnitude on both CPU and GPU hardware.
  • Performance and robustness improve particularly in stiff nonlinear regimes and when using reduced floating-point precision.
  • Accurate Gateaux derivatives unify speed and reliability in matrix-free nonlinear solvers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar accuracy benefits from automatic differentiation may apply to other iterative linear solvers that rely on matrix-free products.
  • The robustness gains could allow reliable solution of larger or more nonlinear problems that currently fail due to Krylov degradation.
  • Implementation in production codes would require verifying that the AD overhead remains negligible at scale.

Load-bearing premise

The observed speedups and robustness improvements result directly from the higher accuracy of the automatic-differentiation derivatives and will continue to hold if the automatic-differentiation implementation or problem scale changes.

What would settle it

Running the same set of benchmark problems with a finite-difference perturbation scaled to match the machine epsilon of the automatic-differentiation accuracy and checking whether the completion rates and runtimes become comparable.

Figures

Figures reproduced from arXiv: 2605.13378 by Marco Pasquale, Stefano Markidis.

Figure 1
Figure 1. Figure 1: Nested matrix-free Newton-Krylov workflow used in the experiments. The same [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example computational graph generated by [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Taylor–Green vortex simulation for the Burgers’ equation on a [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Su-Olson radiation diffusion simulation of a orbiting source in circular trajectory [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Reaction–diffusion solution on a sinusoidally perturbed grid, illustrating the 0005780006003136 0005780006003136 0005780006003136 [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Symmetric Gaussian source simulation for the time-harmonic Maxwell equations [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Verification and convergence behavior of the JFNK solver. The reference solution [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: CPU performance results on the Apple M4 backend. (a) Dolan–Moré perfor [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: GPU performance results on the NVIDIA A100 backend. (a) Dolan–Moré [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
read the original abstract

Jacobian-Free Newton-Krylov (JFNK) methods avoid forming the full Jacobian, but still require Jacobian-vector products, i.e., Gateaux derivatives of the nonlinear residual along Krylov directions. In standard Finite Differences (FD) formulations, these products are obtained by perturbing the Newton state and differencing residuals, making the linearization sensitive to round-off error and floating-point precision. This work evaluates the global impact of forward-mode Automatic Differentiation (AD) as a replacement for FD Jacobian-vector product in finite-precision JFNK solvers. The comparison keeps the discretization, Newton iteration, line search, Krylov methods, tolerances, and CPU/GPU backend fixed, only varying linearization strategy. Benchmarks include Burgers dynamics, Su-Olson radiation diffusion, reaction-diffusion, and nonlinear time-harmonic Maxwell equations, each evaluated in different nonlinear regimes. By preventing degradation of the Krylov operator, AD accelerates computation by 2-3 orders of magnitude across both CPU and GPU architectures. More importantly, it drastically improves global solver robustness, achieving a minimum completion rate of 95%, compared to just 42% for FD. Ultimately, accurate Gateaux derivatives unify performance and accuracy in JFNK methods, making AD the optimal choice for stiff nonlinear and reduced-precision environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that replacing finite-difference (FD) approximations with forward-mode automatic differentiation (AD) for Jacobian-vector products in Jacobian-Free Newton-Krylov (JFNK) solvers yields 2-3 orders of magnitude speedup and raises minimum solver completion rates from 42% to 95% across CPU/GPU architectures. The comparison holds discretization, Newton iteration, line search, Krylov method, tolerances, and hardware fixed while varying only the linearization strategy, with benchmarks on Burgers dynamics, Su-Olson radiation diffusion, reaction-diffusion, and nonlinear time-harmonic Maxwell equations.

Significance. If the attribution to derivative accuracy holds, the result would establish AD as the preferred linearization for stiff nonlinear and reduced-precision JFNK problems, unifying performance and robustness in matrix-free solvers. This has direct relevance to computational engineering applications in fluid dynamics, radiation transport, and electromagnetics where solver failure rates currently limit scalability.

major comments (2)
  1. [Numerical methods / implementation description] The manuscript does not report the FD perturbation parameter ε used to approximate the Gateaux derivatives, nor any sensitivity study with respect to ε. This omission is load-bearing for the central claim that FD round-off degrades the Krylov operator while AD does not, because without the specific value and its effect on inner-iteration residuals or effective condition number, the observed 2-3 order speedup and 95% vs 42% completion rates cannot be unambiguously attributed to derivative accuracy rather than implementation details.
  2. [Results / benchmark tables] No quantitative diagnostics of the approximated Krylov operator (e.g., condition-number estimates, eigenvalue spread, or inner GMRES residual histories) are provided for the FD versus AD cases. Such metrics would be required to directly confirm that the reported robustness gains arise from preservation of operator quality rather than other unstated differences in the CPU/GPU code paths.
minor comments (2)
  1. [Abstract] The abstract states a 'minimum completion rate of 95%' but does not indicate over which subset of benchmarks or regimes this minimum is attained; adding this detail would improve clarity.
  2. [Introduction / Methods] Notation for the Gateaux derivative and the specific AD forward-mode implementation (e.g., which library or manual differentiation rules) should be introduced earlier and used consistently in the methods section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We address the two major points below and describe the changes that will be incorporated in the revised manuscript.

read point-by-point responses
  1. Referee: The manuscript does not report the FD perturbation parameter ε used to approximate the Gateaux derivatives, nor any sensitivity study with respect to ε. This omission is load-bearing for the central claim that FD round-off degrades the Krylov operator while AD does not, because without the specific value and its effect on inner-iteration residuals or effective condition number, the observed 2-3 order speedup and 95% vs 42% completion rates cannot be unambiguously attributed to derivative accuracy rather than implementation details.

    Authors: We agree that the specific value of ε and supporting sensitivity data are necessary for reproducibility and for unambiguously attributing the observed differences to derivative accuracy. In the implementation used for all reported results we employed a fixed perturbation ε = 1e-8, a conventional choice for double-precision arithmetic that balances truncation and round-off error. During code development we performed a limited sensitivity study over ε ∈ [1e-6, 1e-10] on the Burgers and reaction-diffusion problems; the study showed monotonic degradation of FD inner-iteration counts and completion rates as ε decreased, while AD performance remained unchanged. We will add a short subsection to the Numerical Methods section that states the value of ε, summarizes the sensitivity results, and discusses their implication for the Krylov operator. This addition will allow readers to evaluate the attribution directly. revision: yes

  2. Referee: No quantitative diagnostics of the approximated Krylov operator (e.g., condition-number estimates, eigenvalue spread, or inner GMRES residual histories) are provided for the FD versus AD cases. Such metrics would be required to directly confirm that the reported robustness gains arise from preservation of operator quality rather than other unstated differences in the CPU/GPU code paths.

    Authors: We acknowledge that explicit operator diagnostics would strengthen the causal link between derivative accuracy and solver behavior. Our current matrix-free implementation does not compute or store condition-number estimates or eigenvalue spectra, as these quantities are not required by the solver and would necessitate additional dense linear-algebra infrastructure. However, the solver logs already contain the inner GMRES residual histories. We will extract representative histories for both FD and AD on the Burgers benchmark and include them (together with a brief discussion) as a new figure in the Results section. These histories will illustrate the faster and more consistent residual reduction obtained with AD. While we cannot retroactively supply condition-number data without new code, the added residual plots together with the global completion-rate statistics provide direct evidence of improved operator quality. revision: partial

Circularity Check

0 steps flagged

No significant circularity in empirical benchmark comparison

full rationale

The paper presents direct head-to-head numerical experiments on fixed benchmarks (Burgers, Su-Olson, reaction-diffusion, Maxwell) with discretization, Newton/Krylov tolerances, line search, and backends held constant while only swapping the Jacobian-vector product method (AD vs FD). No derivation chain, fitted parameter, or self-citation is used to obtain the reported speedups or completion rates; those quantities are measured outcomes. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the central claims.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the residual functions are differentiable and that forward-mode AD produces derivatives free of implementation error; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption The nonlinear residual function is differentiable with respect to the state variables
    Required for forward-mode AD to compute accurate Gateaux derivatives; invoked implicitly when AD is substituted for finite differences.

pith-pipeline@v0.9.0 · 5521 in / 1452 out tokens · 66559 ms · 2026-05-14T19:53:47.699379+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 1 internal anchor

  1. [1]

    Knoll, D

    D. Knoll, D. Keyes, Jacobian-free Newton–Krylov methods: a survey of approaches and applications, Journal of Computational Physics 193 (2) 26 (2004) 357–397.doi:10.1016/j.jcp.2003.08.010. URLhttps://linkinghub.elsevier.com/retrieve/pii/ S0021999103004340

  2. [2]

    C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations, Frontiers in Applied Mathematics, Society for Industrial and Applied Mathematics, 1995.doi:10.1137/1.9781611970944. URLhttps://epubs.siam.org/doi/10.1137/1.9781611970944

  3. [3]

    Y. Saad, M. H. Schultz, GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems, SIAM Journal on Scientific and Statistical Computing 7 (3) (1986) 856–869.doi: 10.1137/0907058. URLhttp://epubs.siam.org/doi/10.1137/0907058

  4. [4]

    M. R. Hestenes, E. Stiefel, Methods of Conjugate Gradients for Solving LinearSystems, JournalofResearchoftheNationalBureauofStandards 49 (6) (1952) 409–436

  5. [5]

    H. A. Van Der Vorst, Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems, SIAM Journal on Scientific and Statistical Computing 13 (2) (1992) 631–644.doi:10.1137/0913035. URLhttp://epubs.siam.org/doi/10.1137/0913035

  6. [6]

    Pernice, H

    M. Pernice, H. F. Walker, NITSOL: A Newton Iterative Solver for Non- linear Systems, SIAM Journal on Scientific Computing 19 (1) (1998) 302–318.doi:10.1137/S1064827596303843. URLhttps://epubs.siam.org/doi/10.1137/S1064827596303843

  7. [7]

    J. J. Moré, S. M. Wild, Do you trust derivatives or differ- ences?, Journal of Computational Physics 273 (2014) 268–277. doi:10.1016/j.jcp.2014.04.056. URLhttps://linkinghub.elsevier.com/retrieve/pii/ S0021999114003325

  8. [8]

    S. T. K. Lang, A. Dawson, M. Diamantakis, P. Dueben, S. Hatfield, M. Leutbecher, T. Palmer, F. Prates, C. D. Roberts, I. Sandu, N. Wedi, More accuracy with less precision, Quarterly Journal of the Royal Meteorological Society 147 (741) (2021) 4358–4370, _eprint: 27 https://rmets.onlinelibrary.wiley.com/doi/pdf/10.1002/qj.4181. doi:10.1002/qj.4181. URLhttp...

  9. [9]

    Hatfield, A

    S. Hatfield, A. Subramanian, T. Palmer, P. Düben, Improving Weather Forecast Skill through Reduced-Precision Data Assimilation, Monthly WeatherReview146(1)(2018)49–62.doi:10.1175/MWR-D-17-0132.1. URLhttps://journals.ametsoc.org/view/journals/mwre/146/1/ mwr-d-17-0132.1.xml

  10. [10]

    Freytag, J

    G. Freytag, J. V. F. Lima, P. Rech, P. O. A. Navaux, Impact of Reduced and Mixed-Precision on the Efficiency of a Multi-GPU Platform on CFD Applications, in: O. Gervasi, B. Murgante, S. Misra, A. M. A. C. Rocha, C. Garau (Eds.), Computational Science and Its Applications – ICCSA 2022 Workshops, Springer International Publishing, Cham, 2022, pp. 570–587.do...

  11. [11]

    M. Karp, R. Stanly, T. Mukha, L. Galimberti, S. Toosi, H. Song, L. Dalcin, S. Rezaeiravesh, N. Jansson, S. Markidis, M. Parsani, S. Bose, S. Lele, P. Schlatter, Effects of lower floating-point precision on scale-resolving numerical simulations of turbulence, Journal of Compu- tational Physics 549 (2026) 114600.doi:10.1016/j.jcp.2025.114600. URLhttps://lin...

  12. [12]

    Siklósi, P

    B. Siklósi, P. K. Sharma, D. J. Lusher, I. Z. Reguly, N. D. Sandham, Reduced and mixed precision turbulent flow simulations using explicit finite difference schemes, Future Generation Computer Systems 175 (2026) 108111.doi:10.1016/j.future.2025.108111. URLhttps://www.sciencedirect.com/science/article/pii/ S0167739X25004054

  13. [13]

    Giraud, J

    L. Giraud, J. Langou, M. Rozloznik, The loss of orthogonal- ity in the Gram-Schmidt orthogonalization process, Comput- ers & Mathematics with Applications 50 (7) (2005) 1069–1075. doi:10.1016/j.camwa.2005.08.009. URLhttps://www.sciencedirect.com/science/article/pii/ S0898122105003366 28

  14. [14]

    chinchilla optimal

    F. Sapienza, J. Bolibar, F. Schäfer, B. Groenke, A. Pal, V. Bous- sange, P. Heimbach, G. Hooker, F. Pérez, P.-O. Persson, C. Rack- auckas, Differentiable Programming for Differential Equations: A Re- view, arXiv:2406.09699 [math] (Oct. 2025).doi:10.48550/arXiv. 2406.09699. URLhttp://arxiv.org/abs/2406.09699

  15. [15]

    Corliss, C

    G. Corliss, C. Faure, A. Griewank, L. Hascoët, U. Naumann (Eds.), Automatic Differentiation of Algorithms: From Simulation to Opti- mization, Springer New York, New York, NY, 2002.doi:10.1007/ 978-1-4613-0075-5. URLhttp://link.springer.com/10.1007/978-1-4613-0075-5

  16. [16]

    Griewank, D

    A. Griewank, D. Juedes, J. Utke, Algorithm 755: ADOL-C: a pack- age for the automatic differentiation of algorithms written in C/C++, ACMTrans.Math.Softw.22(2)(1996)131–167.doi:10.1145/229473. 229474. URLhttps://dl.acm.org/doi/10.1145/229473.229474

  17. [17]

    Hascoet, V

    L. Hascoet, V. Pascual, The Tapenade automatic differentiation tool: Principles, model, and specification, ACM Trans. Math. Softw. 39 (3) (2013) 20:1–20:43.doi:10.1145/2450153.2450158. URLhttps://dl.acm.org/doi/10.1145/2450153.2450158

  18. [18]

    T. F. Coleman, A. Verma, The Efficient Computation of Sparse Jacobian Matrices Using Automatic Differentiation, SIAM Journal on Scientific Computing 19 (4) (1998) 1210–1233.doi:10.1137/ S1064827595295349. URLhttp://epubs.siam.org/doi/10.1137/S1064827595295349

  19. [19]

    W. Xu, T. F. Coleman, Efficient (Partial) Determination of Deriva- tive Matrices via Automatic Differentiation, SIAM Journal on Scientific Computing 35 (3) (2013) A1398–A1416.doi:10.1137/11085061X. URLhttp://epubs.siam.org/doi/10.1137/11085061X

  20. [20]

    W. S. Moses, S. H. K. Narayanan, L. Paehler, V. Churavy, M. Schanen, J. Hückelheim, J. Doerfert, P. Hovland, Scalable automatic differentia- tion of multiple parallel paradigms through compiler augmentation, in: Proceedings of the International Conference on High Performance Com- puting, Networking, Storage and Analysis, SC ’22, IEEE Press, Dallas, 29 Tex...

  21. [21]

    P. D. Hovland, L. C. McInnes, Parallel simulation of compressible flow using automatic di erentiation and PETSc, Parallel Computing (2001) 503–519

  22. [22]

    Lindsay, R

    A. Lindsay, R. Stogner, D. Gaston, D. Schwen, C. Matthews, W. Jiang, L. K. Aagesen, R. Carlsen, F. Kong, A. Slaughter, et al., Automatic differentiation in metaphysicl and its applications in moose, Nuclear Technology (2021) 1–18

  23. [23]

    M. Mayr, A. Heinlein, C. A. Glusa, S. Rajamanickam, M. Arnst, R. A. Bartlett, L. Berger-Vergiat, E. G. Boman, K. D. Devine, G. Harper, M. A. Heroux, M. Hoemmen, J. J. Hu, B. Kelley, D. P. Kouri, P. Ku- berry, K. Kim, K. Liegois, C. C. Ober, R. P. Pawlowski, C. Pearson, M. Perego, E. T. Phipps, D. Ridzal, N. V. Roberts, C. M. Siefert, H. K. Thornquist, R. ...

  24. [24]

    Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations

    M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving for- ward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics 378 (2019) 686–707. doi:10.1016/j.jcp.2018.10.045. URLhttps://www.sciencedirect.com/science/article/pii/ S0021999118307125

  25. [25]

    J. Cho, S. B. Yun, S. Nam, Y. Hong, H. Yang, E. Park, Separable Physics-Informed Neural Networks: 37th Conference on Neural Infor- mation Processing Systems, NeurIPS 2023, Advances in Neural Infor- mation Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023 (2023). URLhttps://www.scopus.com/pages/publications/8518...

  26. [26]

    Markidis, The Old and the New: Can Physics-Informed Deep- Learning Replace Traditional Linear Solvers?, Frontiers in Big Data 4 (2021) 669097.doi:10.3389/fdata.2021.669097

    S. Markidis, The Old and the New: Can Physics-Informed Deep- Learning Replace Traditional Linear Solvers?, Frontiers in Big Data 4 (2021) 669097.doi:10.3389/fdata.2021.669097. URLhttps://www.frontiersin.org/articles/10.3389/fdata. 2021.669097/full

  27. [27]

    2020 , note =

    P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, ...

  28. [28]

    Okuta, Y

    R. Okuta, Y. Unno, D. Nishino, S. Hido, C. Loomis, Cupy: A numpy-compatible library for nvidia gpu calculations, in: Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS), 2017. URLhttp://learningsys.org/nips17/assets/papers/paper_16. pdf

  29. [29]

    Bradbury, R

    J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, Y. Katariya, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, Q. Zhang, JAX: composable transformations of Python+NumPy programs (2018). URLhttp://github.com/jax-ml/jax

  30. [30]

    J. I. Castor, Radiation hydrodynamics, Cambridge University Press, Cambridge, 2004, meeting Name: Summer School on Radiative Transfer and Radiation Hydrodynamics

  31. [31]

    E. D. Dolan, J. J. Moré, Benchmarking optimization software with per- formance profiles, Mathematical Programming 91 (2) (2002) 201–213. doi:10.1007/s101070100263. URLhttps://doi.org/10.1007/s101070100263 31