Recognition: unknown
Predictivity and Utility of Neural Surrogates of Multiscale PDEs
Pith reviewed 2026-05-10 00:38 UTC · model grok-4.3
The pith
Neural surrogates of multiscale PDEs cannot recover information lost to coarse-graining and spectral bias.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In many multi-scale problems, no architecture or training procedure can fully recover what the coarse representation discards. Spectral bias causes neural networks to under-resolve high-frequency content, and coarse-graining compounds the problem through irreversible information loss, as illustrated by two simple examples that also reveal error accumulation over time.
What carries the argument
Spectral bias in neural networks combined with irreversible information loss from coarse-graining of multiscale PDEs.
If this is right
- Neural surrogates provide reliable value only on low-dimensional manifolds or for short-to-medium prediction horizons where unresolved scales do not dominate.
- Medium-range weather prediction on reanalysis data occupies a favorable regime because the data already incorporate some scale filtering and the forecast window remains short.
- Genuinely chaotic multi-scale problems will exhibit accumulating errors that no purely data-driven surrogate can eliminate.
- Hybrid neural-classical models are needed to handle the unresolved scales that coarse representations discard.
- Reporting standards must distinguish interpolation on observed manifolds from true out-of-distribution prediction.
Where Pith is reading between the lines
- Classical numerical methods or scale-aware closures should remain the default for high-fidelity turbulence or climate simulations until hybrids are proven.
- Architectures that embed multi-scale structure explicitly, rather than learning it from coarse data, may be required for broader applicability.
- The same limits likely constrain other machine-learning emulators trained on downsampled physical data in domains like ocean modeling or materials science.
Load-bearing premise
The spectral bias and information loss seen in the simple examples will block recovery in all genuinely chaotic multi-scale scenarios.
What would settle it
A neural surrogate that, after training on coarse data alone, reproduces both the high-frequency statistics and long-term chaotic trajectories of a reference fine-scale solver for a turbulent flow or similar multiscale PDE.
read the original abstract
Scientific machine learning is increasingly being spoken of as universal emulators for classical numerical solvers for multi-scale partial differential equations, but most apparent successes can be explained by facts that also define their limits. Many successful benchmarks live on low-dimensional solution manifolds where any competent reduced model will interpolate well. More fundamentally, neural surrogates systematically under-resolve high-frequency content due to spectral bias, and coarse-graining compounds this problem through irreversible information loss. In many multi-scale problems, no architecture or training procedure can fully recover what the coarse representation discards. Two simple examples are used to characterize spectral bias, coarse-graining and error accumulation. We discuss why medium-range weather prediction on reanalysis data sits in a favorable sweet spot and why this will not generalize to genuinely chaotic multi-scale scenarios. We identify domains where neural surrogates offer genuine value, propose further research on neural-classical hybrids, and call for better reporting standards.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that neural surrogates for multiscale PDEs often succeed only on low-dimensional manifolds and are fundamentally limited by spectral bias and irreversible information loss from coarse-graining, such that in many cases no architecture or training procedure can recover the discarded information. This is illustrated via two simple examples characterizing spectral bias, coarse-graining, and error accumulation; the manuscript contrasts this with favorable cases like medium-range weather prediction on reanalysis data, identifies domains of genuine utility, and advocates hybrid neural-classical methods plus improved reporting standards.
Significance. If the core argument holds, the paper provides a useful corrective to over-optimistic claims about neural emulators in scientific machine learning, helping to delineate realistic scopes of applicability and encouraging hybrid approaches. Its conceptual framing and call for better standards add value as a perspective piece even without new theorems.
major comments (2)
- [Abstract] Abstract: the central claim that 'in many multi-scale problems, no architecture or training procedure can fully recover what the coarse representation discards' extrapolates from the two simple examples without a general theorem, information-theoretic characterization of the PDE class, or demonstration that multi-resolution, wavelet, or hybrid physics-informed architectures must fail in chaotic regimes.
- [Section on the two examples] Section on the two examples: the examples are described as characterizing spectral bias and irreversible loss, but without explicit PDE definitions, quantitative metrics (e.g., error spectra or recovery rates), or ablation against alternative architectures, it is unclear whether they suffice to support the broad extrapolation to 'genuinely chaotic multi-scale scenarios'.
minor comments (2)
- The manuscript would benefit from numbered sections and explicit cross-references when discussing the examples and their implications for weather prediction.
- A few additional citations to recent literature on spectral bias in neural PDE solvers would help situate the argument.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive review. The manuscript is intended as a perspective piece that uses illustrative examples to highlight fundamental limitations of neural surrogates for multiscale PDEs, rather than as a comprehensive theoretical treatise. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'in many multi-scale problems, no architecture or training procedure can fully recover what the coarse representation discards' extrapolates from the two simple examples without a general theorem, information-theoretic characterization of the PDE class, or demonstration that multi-resolution, wavelet, or hybrid physics-informed architectures must fail in chaotic regimes.
Authors: We agree that the central claim is supported by the two illustrative examples and by established results on spectral bias rather than by a general theorem. The paper is positioned as a perspective that delineates realistic scopes of applicability and advocates hybrid methods, not as a proof of impossibility across all architectures and regimes. We will revise the abstract to explicitly state that the claim is grounded in the provided examples and known neural network properties, while acknowledging that certain multi-resolution or hybrid approaches may partially mitigate (but not always eliminate) the information loss in chaotic settings. A full information-theoretic characterization of the entire PDE class lies beyond the scope of this work. revision: partial
-
Referee: [Section on the two examples] Section on the two examples: the examples are described as characterizing spectral bias and irreversible loss, but without explicit PDE definitions, quantitative metrics (e.g., error spectra or recovery rates), or ablation against alternative architectures, it is unclear whether they suffice to support the broad extrapolation to 'genuinely chaotic multi-scale scenarios'.
Authors: The two examples are explicitly defined in the manuscript (a linear heat equation variant to illustrate spectral bias and a coarse-grained advection-diffusion equation to show irreversible loss and error accumulation). We will add quantitative metrics, including error spectra and recovery rates, to make the characterization more precise. The examples are deliberately minimal to isolate the mechanisms; exhaustive ablations against every alternative architecture are not practical here. We will include a short discussion clarifying why the fundamental limitations persist even for multi-resolution methods in genuinely chaotic regimes, thereby strengthening the basis for the extrapolation. revision: yes
Circularity Check
No circularity: conceptual critique with no derivations or self-referential predictions
full rationale
The paper presents a conceptual discussion of limitations in neural surrogates for multiscale PDEs, relying on known properties of spectral bias and information loss in coarse-graining. It offers no mathematical derivations, fitted parameters, or predictions that reduce to their own inputs by construction. The two simple examples are used illustratively to characterize phenomena rather than to derive a general result. No self-citations function as load-bearing premises, and the central claims are framed as extrapolations from examples without claiming a formal theorem. The analysis is therefore self-contained against external benchmarks of neural network behavior.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Raissi, P
M. Raissi, P . Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Jour- nal of Computational Physics, 378:686–707, 2019
2019
-
[2]
Huang and K
C. Huang and K. Duraisamy. Predictive reduced order modeling of chaotic multi-scale problems using adap- tively sampled projections.Journal of Computational Physics, 491:112356, 2023
2023
-
[3]
K. Um, R. Brand, Y . Fei, P . Holl, and N. Thuerey. Solver-in-the-loop: Learning from differentiable physics to interact with iterative PDE-solvers. In Advances in Neural Information Processing Systems (NeurIPS), 2020
2020
-
[4]
Kochkov, J
D. Kochkov, J. A. Smith, A. Alieva, Q. Wang, M. P . Brenner, and S. Hoyer. Machine learning–accelerated computational fluid dynamics.Proceedings of the Na- tional Academy of Sciences, 118(21):e2101784118, 2021
2021
-
[5]
Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhat- tacharya, A. M. Stuart, and A. Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations (ICLR), 2021
2021
-
[6]
McGreivy and A
N. McGreivy and A. Hakim. Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations. Nature Machine Intelligence, 6:1256–1269, 2024
2024
-
[7]
Rahaman, A
N. Rahaman, A. Baratin, D. Arpit, F . Draxler, M. Lin, F . A. Hamprecht, Y . Bengio, and A. Courville. On the spectral bias of neural networks. InProceedings of the 36th International Conference on Machine Learning (ICML), PMLR 97:5301–5310, 2019
2019
-
[8]
Neural tangent kernel: Convergence and generalization in neural net- works,
A. Jacot, F . Gabriel, and C. Hongler, “Neural tangent kernel: Convergence and generalization in neural net- works,” inAdvances in Neural Information Processing Systems (NeurIPS), 2018
2018
- [9]
-
[10]
Fourier features let networks learn high frequency functions in low dimensional do- mains,
M. Tancik, P . Srinivasan, B. Mildenhall, S. Fridovich- Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. Barron, and R. Ng, “Fourier features let networks learn high frequency functions in low dimensional do- mains,” inAdvances in Neural Information Processing Systems (NeurIPS), 2020
2020
-
[11]
V. Oommen, A. Bora, Z. Zhang, and G. E. Karniadakis, “Integrating neural operators with diffusion models im- proves spectral representation in turbulence modeling,” arXiv preprint arXiv:2409.08477, 2024
-
[12]
Leading Lyapunov vec- tors of a turbulent baroclinic jet in a quasigeostrophic model,
C. Snyder and T. M. Hamill, “Leading Lyapunov vec- tors of a turbulent baroclinic jet in a quasigeostrophic model,”Journal of the Atmospheric Sciences60(4), 683–688 (2003)
2003
-
[13]
The ERA5 global reanalysis,
H. Hersbach, B. Bell, P . Berrisford,et al., “The ERA5 global reanalysis,”Quarterly Journal of the Royal Meteorological Society146, 1999–2049 (2020)
1999
-
[14]
Raman, S
V. Raman, S. Prakash, and M. Gamba. Nonidealities in rotating detonation engines.Annual Review of Fluid Mechanics, 55(1):639–674, 2023
2023
-
[15]
J. W. Bennewitz, J. R. Burr, and C. F . Lietz. Character- istic timescales for rotating detonation rocket engines. InAIAA Propulsion and Energy 2021 Forum, AIAA Paper 2021-3671, 2021
2021
-
[16]
Pinkus.n-Widths in Approximation Theory
A. Pinkus.n-Widths in Approximation Theory. Springer-Verlag, Berlin, 1985
1985
-
[17]
A. J. Linot and M. D. Graham. Stabilized neural ordi- nary differential equations for long-time forecasting of dynamical systems.Journal of Computational Physics, 474:111838, 2023
2023
-
[18]
Lippe, B
P . Lippe, B. Veeling, P . Perdikaris, R. Turner, and J. Brandstetter. PDE-Refiner: Achieving accurate long rollouts with neural PDE solvers. InAdvances in Neural Information Processing Systems (NeurIPS), 2024
2024
-
[19]
A. J. Chorin, O. H. Hald, and R. Kupferman. Optimal prediction and the Mori-Zwanzig representation of irreversible processes.Proceedings of the National Academy of Sciences, 97(7):2968–2973, 2000
2000
-
[20]
Q. He, M. Perego, A. A. Howard, G. E. Karniadakis, and P . Stinis. A hybrid deep neural operator/finite element method for ice-sheet modeling.Journal of Computational Physics, 492:112428, 2023
2023
- [21]
-
[22]
S. Bhola and K. Duraisamy. Flow matching operators for residual-augmented probabilistic learning of partial differential equations.arXiv preprint arXiv:2512.12749, 2024
-
[23]
List, L.-W
B. List, L.-W. Chen, and N. Thuerey. Learned turbulence modelling with differentiable fluid solvers. Nature Machine Intelligence, 6:1–12, 2024
2024
-
[24]
Bolgiani, C
P . Bolgiani, C. Calvo-Sancho, J. Díaz-Fernández, L. Quitián-Hernández, M. Sastre, D. Santos- Muñoz, L. M. Farfán, S. Fernández-González, and M. L. Martín. Wind kinetic energy climatology and effective resolution for the ERA5 reanalysis.Climate Dynamics, 59:737–752, 2022
2022
-
[25]
Vannitsem
S. Vannitsem. Predictability of large-scale atmo- spheric motions: Lyapunov exponents and error dy- namics.Chaos, 27(3):032101, 2017. Month 2026 Controversies and Perspectives on AI/ML in Science & Engineering 11 PREDICTIVITY AND UTILITY OF NEURAL PDE SURROGATES
2017
-
[26]
Farca¸ s, R
I.-G. Farca¸ s, R. P . Gundevia, R. Munipalli, and K. E. Willcox. Distributed computing for physics-based data-driven reduced modeling at scale: Application to a rotating detonation rocket engine.Computer Physics Communications, 313:109619, 2025
2025
-
[27]
Ronen, D
B. Ronen, D. Jacobs, Y . Kasten, and S. Kritchman, The convergence rate of neural networks for learned functions of different frequencies.Advances in Neural Information Processing Systems, 2019
2019
-
[28]
Brunton, and J.-N
S. Brunton, and J.-N. Kutz, Promising directions of machine learning for partial differential equations. Nature Computational Science, 4:483–494
-
[29]
Bhattacharya, N
K. Bhattacharya, N. Kovachki, A. Rajan, A. M. Stuart, and M. Trautner. Learning Homogenization for Elliptic Operators.SIAM Journal on Numerical Analysis, 2024
2024
-
[30]
Bietti, and J
A. Bietti, and J. Mairal, J. On the inductive bias of neural tangent kernels.Advances in Neural Information Processing Systems,2019
2019
-
[31]
B. Alkin, M. Bleeker, R. Kurle, T. Kronlachner, R. Sonnleitner, M. Dorfer, and J. Brandstetter AB- UPT: Scaling neural CFD surrogates for high-fidelity automotive aerodynamics simulations via anchored- branched universal physics transformers.arXiv preprint arXiv:2502.09692, 2025
-
[32]
The role of interface boundary conditions and sampling strategies for Schwarz-based coupling of projection-based reduced order models
C. Wentland, F . Rizzi, J.-L. Barnett, and Irina K. Tezaur. "The role of interface boundary conditions and sampling strategies for Schwarz-based coupling of projection-based reduced order models." Journal of Computational and Applied Mathematics 465 (2025): 116584. 12 Controversies and Perspectives on AI/ML in Science & Engineering Month 2026
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.