arxiv: 2604.20061 · v1 · submitted 2026-04-21 · 🧮 math-ph · math.MP· nlin.CD

Recognition: unknown

Predictivity and Utility of Neural Surrogates of Multiscale PDEs

Karthik Duraisamy

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:38 UTC · model grok-4.3

classification 🧮 math-ph math.MPnlin.CD

keywords neural surrogatesmultiscale PDEsspectral biascoarse-graininginformation lossscientific machine learningweather predictionhybrid models

0 comments

The pith

Neural surrogates of multiscale PDEs cannot recover information lost to coarse-graining and spectral bias.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that apparent successes of neural networks as emulators for multiscale partial differential equations often rest on low-dimensional solution manifolds where simple interpolation works. It shows that neural networks systematically under-resolve high-frequency content because of spectral bias, while coarse-graining introduces irreversible information loss that no architecture or training can restore. This combination produces accumulating errors that prevent faithful long-term prediction in genuinely chaotic systems. The author contrasts this with the more favorable case of medium-range weather prediction on reanalysis data and identifies domains where neural surrogates still add value through hybrid approaches.

Core claim

In many multi-scale problems, no architecture or training procedure can fully recover what the coarse representation discards. Spectral bias causes neural networks to under-resolve high-frequency content, and coarse-graining compounds the problem through irreversible information loss, as illustrated by two simple examples that also reveal error accumulation over time.

What carries the argument

Spectral bias in neural networks combined with irreversible information loss from coarse-graining of multiscale PDEs.

If this is right

Neural surrogates provide reliable value only on low-dimensional manifolds or for short-to-medium prediction horizons where unresolved scales do not dominate.
Medium-range weather prediction on reanalysis data occupies a favorable regime because the data already incorporate some scale filtering and the forecast window remains short.
Genuinely chaotic multi-scale problems will exhibit accumulating errors that no purely data-driven surrogate can eliminate.
Hybrid neural-classical models are needed to handle the unresolved scales that coarse representations discard.
Reporting standards must distinguish interpolation on observed manifolds from true out-of-distribution prediction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Classical numerical methods or scale-aware closures should remain the default for high-fidelity turbulence or climate simulations until hybrids are proven.
Architectures that embed multi-scale structure explicitly, rather than learning it from coarse data, may be required for broader applicability.
The same limits likely constrain other machine-learning emulators trained on downsampled physical data in domains like ocean modeling or materials science.

Load-bearing premise

The spectral bias and information loss seen in the simple examples will block recovery in all genuinely chaotic multi-scale scenarios.

What would settle it

A neural surrogate that, after training on coarse data alone, reproduces both the high-frequency statistics and long-term chaotic trajectories of a reference fine-scale solver for a turbulent flow or similar multiscale PDE.

read the original abstract

Scientific machine learning is increasingly being spoken of as universal emulators for classical numerical solvers for multi-scale partial differential equations, but most apparent successes can be explained by facts that also define their limits. Many successful benchmarks live on low-dimensional solution manifolds where any competent reduced model will interpolate well. More fundamentally, neural surrogates systematically under-resolve high-frequency content due to spectral bias, and coarse-graining compounds this problem through irreversible information loss. In many multi-scale problems, no architecture or training procedure can fully recover what the coarse representation discards. Two simple examples are used to characterize spectral bias, coarse-graining and error accumulation. We discuss why medium-range weather prediction on reanalysis data sits in a favorable sweet spot and why this will not generalize to genuinely chaotic multi-scale scenarios. We identify domains where neural surrogates offer genuine value, propose further research on neural-classical hybrids, and call for better reporting standards.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper argues neural surrogates for multiscale PDEs are limited by spectral bias and irreversible coarse-graining losses, with two examples showing why they won't generalize beyond narrow cases like weather reanalysis.

read the letter

This paper's main point is that neural surrogates hit hard limits on multiscale PDEs. Spectral bias keeps them from capturing high-frequency content, and coarse-graining throws away information that no training can bring back. The two simple examples are meant to show error buildup in practice, and the argument is that this setup explains why medium-range weather works but genuinely chaotic multiscale problems will not.

Referee Report

2 major / 2 minor

Summary. The paper argues that neural surrogates for multiscale PDEs often succeed only on low-dimensional manifolds and are fundamentally limited by spectral bias and irreversible information loss from coarse-graining, such that in many cases no architecture or training procedure can recover the discarded information. This is illustrated via two simple examples characterizing spectral bias, coarse-graining, and error accumulation; the manuscript contrasts this with favorable cases like medium-range weather prediction on reanalysis data, identifies domains of genuine utility, and advocates hybrid neural-classical methods plus improved reporting standards.

Significance. If the core argument holds, the paper provides a useful corrective to over-optimistic claims about neural emulators in scientific machine learning, helping to delineate realistic scopes of applicability and encouraging hybrid approaches. Its conceptual framing and call for better standards add value as a perspective piece even without new theorems.

major comments (2)

[Abstract] Abstract: the central claim that 'in many multi-scale problems, no architecture or training procedure can fully recover what the coarse representation discards' extrapolates from the two simple examples without a general theorem, information-theoretic characterization of the PDE class, or demonstration that multi-resolution, wavelet, or hybrid physics-informed architectures must fail in chaotic regimes.
[Section on the two examples] Section on the two examples: the examples are described as characterizing spectral bias and irreversible loss, but without explicit PDE definitions, quantitative metrics (e.g., error spectra or recovery rates), or ablation against alternative architectures, it is unclear whether they suffice to support the broad extrapolation to 'genuinely chaotic multi-scale scenarios'.

minor comments (2)

The manuscript would benefit from numbered sections and explicit cross-references when discussing the examples and their implications for weather prediction.
A few additional citations to recent literature on spectral bias in neural PDE solvers would help situate the argument.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. The manuscript is intended as a perspective piece that uses illustrative examples to highlight fundamental limitations of neural surrogates for multiscale PDEs, rather than as a comprehensive theoretical treatise. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'in many multi-scale problems, no architecture or training procedure can fully recover what the coarse representation discards' extrapolates from the two simple examples without a general theorem, information-theoretic characterization of the PDE class, or demonstration that multi-resolution, wavelet, or hybrid physics-informed architectures must fail in chaotic regimes.

Authors: We agree that the central claim is supported by the two illustrative examples and by established results on spectral bias rather than by a general theorem. The paper is positioned as a perspective that delineates realistic scopes of applicability and advocates hybrid methods, not as a proof of impossibility across all architectures and regimes. We will revise the abstract to explicitly state that the claim is grounded in the provided examples and known neural network properties, while acknowledging that certain multi-resolution or hybrid approaches may partially mitigate (but not always eliminate) the information loss in chaotic settings. A full information-theoretic characterization of the entire PDE class lies beyond the scope of this work. revision: partial
Referee: [Section on the two examples] Section on the two examples: the examples are described as characterizing spectral bias and irreversible loss, but without explicit PDE definitions, quantitative metrics (e.g., error spectra or recovery rates), or ablation against alternative architectures, it is unclear whether they suffice to support the broad extrapolation to 'genuinely chaotic multi-scale scenarios'.

Authors: The two examples are explicitly defined in the manuscript (a linear heat equation variant to illustrate spectral bias and a coarse-grained advection-diffusion equation to show irreversible loss and error accumulation). We will add quantitative metrics, including error spectra and recovery rates, to make the characterization more precise. The examples are deliberately minimal to isolate the mechanisms; exhaustive ablations against every alternative architecture are not practical here. We will include a short discussion clarifying why the fundamental limitations persist even for multi-resolution methods in genuinely chaotic regimes, thereby strengthening the basis for the extrapolation. revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual critique with no derivations or self-referential predictions

full rationale

The paper presents a conceptual discussion of limitations in neural surrogates for multiscale PDEs, relying on known properties of spectral bias and information loss in coarse-graining. It offers no mathematical derivations, fitted parameters, or predictions that reduce to their own inputs by construction. The two simple examples are used illustratively to characterize phenomena rather than to derive a general result. No self-citations function as load-bearing premises, and the central claims are framed as extrapolations from examples without claiming a formal theorem. The analysis is therefore self-contained against external benchmarks of neural network behavior.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a perspective and critique paper rather than a derivation; it introduces no free parameters, new axioms, or invented entities. Arguments rest on established concepts of spectral bias in neural networks and information loss in coarse-graining from the existing literature.

pith-pipeline@v0.9.0 · 5454 in / 1131 out tokens · 33228 ms · 2026-05-10T00:38:35.746565+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 5 canonical work pages

[1]

Raissi, P

M. Raissi, P . Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Jour- nal of Computational Physics, 378:686–707, 2019

2019
[2]

Huang and K

C. Huang and K. Duraisamy. Predictive reduced order modeling of chaotic multi-scale problems using adap- tively sampled projections.Journal of Computational Physics, 491:112356, 2023

2023
[3]

K. Um, R. Brand, Y . Fei, P . Holl, and N. Thuerey. Solver-in-the-loop: Learning from differentiable physics to interact with iterative PDE-solvers. In Advances in Neural Information Processing Systems (NeurIPS), 2020

2020
[4]

Kochkov, J

D. Kochkov, J. A. Smith, A. Alieva, Q. Wang, M. P . Brenner, and S. Hoyer. Machine learning–accelerated computational fluid dynamics.Proceedings of the Na- tional Academy of Sciences, 118(21):e2101784118, 2021

2021
[5]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhat- tacharya, A. M. Stuart, and A. Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations (ICLR), 2021

2021
[6]

McGreivy and A

N. McGreivy and A. Hakim. Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations. Nature Machine Intelligence, 6:1256–1269, 2024

2024
[7]

Rahaman, A

N. Rahaman, A. Baratin, D. Arpit, F . Draxler, M. Lin, F . A. Hamprecht, Y . Bengio, and A. Courville. On the spectral bias of neural networks. InProceedings of the 36th International Conference on Machine Learning (ICML), PMLR 97:5301–5310, 2019

2019
[8]

Neural tangent kernel: Convergence and generalization in neural net- works,

A. Jacot, F . Gabriel, and C. Hongler, “Neural tangent kernel: Convergence and generalization in neural net- works,” inAdvances in Neural Information Processing Systems (NeurIPS), 2018

2018
[9]

C. Wang, J. Berner, Z. Li, D. Zhou, J. Wang, H. J. Bae, and A. Anandkumar. Coarse graining with neural operators for simulating chaotic systems.arXiv preprint arXiv:2408.05177, 2024

work page arXiv 2024
[10]

Fourier features let networks learn high frequency functions in low dimensional do- mains,

M. Tancik, P . Srinivasan, B. Mildenhall, S. Fridovich- Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. Barron, and R. Ng, “Fourier features let networks learn high frequency functions in low dimensional do- mains,” inAdvances in Neural Information Processing Systems (NeurIPS), 2020

2020
[11]

Integrating neural operators with diffusion models im- proves spectral representation in turbulence modeling,

V. Oommen, A. Bora, Z. Zhang, and G. E. Karniadakis, “Integrating neural operators with diffusion models im- proves spectral representation in turbulence modeling,” arXiv preprint arXiv:2409.08477, 2024

work page arXiv 2024
[12]

Leading Lyapunov vec- tors of a turbulent baroclinic jet in a quasigeostrophic model,

C. Snyder and T. M. Hamill, “Leading Lyapunov vec- tors of a turbulent baroclinic jet in a quasigeostrophic model,”Journal of the Atmospheric Sciences60(4), 683–688 (2003)

2003
[13]

The ERA5 global reanalysis,

H. Hersbach, B. Bell, P . Berrisford,et al., “The ERA5 global reanalysis,”Quarterly Journal of the Royal Meteorological Society146, 1999–2049 (2020)

1999
[14]

Raman, S

V. Raman, S. Prakash, and M. Gamba. Nonidealities in rotating detonation engines.Annual Review of Fluid Mechanics, 55(1):639–674, 2023

2023
[15]

J. W. Bennewitz, J. R. Burr, and C. F . Lietz. Character- istic timescales for rotating detonation rocket engines. InAIAA Propulsion and Energy 2021 Forum, AIAA Paper 2021-3671, 2021

2021
[16]

Pinkus.n-Widths in Approximation Theory

A. Pinkus.n-Widths in Approximation Theory. Springer-Verlag, Berlin, 1985

1985
[17]

A. J. Linot and M. D. Graham. Stabilized neural ordi- nary differential equations for long-time forecasting of dynamical systems.Journal of Computational Physics, 474:111838, 2023

2023
[18]

Lippe, B

P . Lippe, B. Veeling, P . Perdikaris, R. Turner, and J. Brandstetter. PDE-Refiner: Achieving accurate long rollouts with neural PDE solvers. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

2024
[19]

A. J. Chorin, O. H. Hald, and R. Kupferman. Optimal prediction and the Mori-Zwanzig representation of irreversible processes.Proceedings of the National Academy of Sciences, 97(7):2968–2973, 2000

2000
[20]

Q. He, M. Perego, A. A. Howard, G. E. Karniadakis, and P . Stinis. A hybrid deep neural operator/finite element method for ice-sheet modeling.Journal of Computational Physics, 492:112428, 2023

2023
[21]

Jiang, Y

Y . Jiang, Y . Liu, and Z. Wang. Integrating Fourier neural operator with diffusion model for autoregressive predictions of three-dimensional turbulence.arXiv preprint arXiv:2512.12628, 2024

work page arXiv 2024
[22]

Bhola and K

S. Bhola and K. Duraisamy. Flow matching operators for residual-augmented probabilistic learning of partial differential equations.arXiv preprint arXiv:2512.12749, 2024

work page arXiv 2024
[23]

List, L.-W

B. List, L.-W. Chen, and N. Thuerey. Learned turbulence modelling with differentiable fluid solvers. Nature Machine Intelligence, 6:1–12, 2024

2024
[24]

Bolgiani, C

P . Bolgiani, C. Calvo-Sancho, J. Díaz-Fernández, L. Quitián-Hernández, M. Sastre, D. Santos- Muñoz, L. M. Farfán, S. Fernández-González, and M. L. Martín. Wind kinetic energy climatology and effective resolution for the ERA5 reanalysis.Climate Dynamics, 59:737–752, 2022

2022
[25]

Vannitsem

S. Vannitsem. Predictability of large-scale atmo- spheric motions: Lyapunov exponents and error dy- namics.Chaos, 27(3):032101, 2017. Month 2026 Controversies and Perspectives on AI/ML in Science & Engineering 11 PREDICTIVITY AND UTILITY OF NEURAL PDE SURROGATES

2017
[26]

Farca¸ s, R

I.-G. Farca¸ s, R. P . Gundevia, R. Munipalli, and K. E. Willcox. Distributed computing for physics-based data-driven reduced modeling at scale: Application to a rotating detonation rocket engine.Computer Physics Communications, 313:109619, 2025

2025
[27]

Ronen, D

B. Ronen, D. Jacobs, Y . Kasten, and S. Kritchman, The convergence rate of neural networks for learned functions of different frequencies.Advances in Neural Information Processing Systems, 2019

2019
[28]

Brunton, and J.-N

S. Brunton, and J.-N. Kutz, Promising directions of machine learning for partial differential equations. Nature Computational Science, 4:483–494
[29]

Bhattacharya, N

K. Bhattacharya, N. Kovachki, A. Rajan, A. M. Stuart, and M. Trautner. Learning Homogenization for Elliptic Operators.SIAM Journal on Numerical Analysis, 2024

2024
[30]

Bietti, and J

A. Bietti, and J. Mairal, J. On the inductive bias of neural tangent kernels.Advances in Neural Information Processing Systems,2019

2019
[31]

AB-UPT: Scaling Neural CFD Surrogates for High- Fidelity Automotive Aerodynamics Simulations via Anchored-Branched Universal Physics Transformers,

B. Alkin, M. Bleeker, R. Kurle, T. Kronlachner, R. Sonnleitner, M. Dorfer, and J. Brandstetter AB- UPT: Scaling neural CFD surrogates for high-fidelity automotive aerodynamics simulations via anchored- branched universal physics transformers.arXiv preprint arXiv:2502.09692, 2025

work page arXiv 2025
[32]

The role of interface boundary conditions and sampling strategies for Schwarz-based coupling of projection-based reduced order models

C. Wentland, F . Rizzi, J.-L. Barnett, and Irina K. Tezaur. "The role of interface boundary conditions and sampling strategies for Schwarz-based coupling of projection-based reduced order models." Journal of Computational and Applied Mathematics 465 (2025): 116584. 12 Controversies and Perspectives on AI/ML in Science & Engineering Month 2026

2025