Higher-Order Fourier Neural Operator: Explicit Mode Mixer for Nonlinear PDEs

Alexandre Allauzen; Alex Colagrande; Eva Feillet; Paul Caillon

arxiv: 2606.28122 · v1 · pith:3RBGCJKTnew · submitted 2026-06-26 · 💻 cs.CE · cs.AI· cs.CV

Higher-Order Fourier Neural Operator: Explicit Mode Mixer for Nonlinear PDEs

Alex Colagrande , Paul Caillon , Eva Feillet , Alexandre Allauzen This is my paper

Pith reviewed 2026-06-29 01:46 UTC · model grok-4.3

classification 💻 cs.CE cs.AIcs.CV

keywords neural operatorsFourier neural operatornonlinear PDEsspectral convolutionmode mixinghigher-order mixingoperator learning

0 comments

The pith

Explicit n-linear mixing of Fourier modes in one layer lets neural operators capture nonlinear PDE interactions more efficiently than stacking many standard layers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Neural operators map functions to functions, and the Fourier Neural Operator works well for linear constant-coefficient PDEs because its spectral convolution treats Fourier modes independently. Nonlinear PDEs produce structured interactions between those modes through polynomial terms, so the paper replaces the standard convolution with a higher-order version that mixes multiple modes together explicitly in the Fourier domain. The resulting Higher-Order FNO keeps the same resolution independence and computational scaling as the original FNO while delivering higher accuracy on benchmark problems. The improvement is largest on strongly nonlinear cases such as the Poisson equation driven by polynomial forcing, where one HO-FNO layer beats FNO models that use up to sixteen layers.

Core claim

The paper replaces the diagonal spectral convolution of FNO with a Higher-Order Spectral Convolution that performs explicit n-linear mixing of Fourier modes. This mixing is chosen to match the mode-coupling structure that polynomial nonlinearities induce in the governing PDE. On standard operator-learning benchmarks the architecture matches or exceeds prior spectral operators, transformers, and state-space models; the largest gains appear in highly nonlinear regimes, where a single layer already surpasses FNO stacks of depth sixteen on the Poisson equation with polynomial right-hand side.

What carries the argument

Higher-Order Spectral Convolution: an n-linear operator applied to Fourier coefficients that mixes several modes jointly instead of modulating each coefficient independently.

If this is right

HO-FNO retains the efficiency and multi-resolution capability of FNO architectures.
It produces consistent accuracy gains over earlier spectral neural operators on standard nonlinear PDE benchmarks.
A single HO-FNO layer outperforms FNO models with up to 16 layers on the Poisson equation with polynomial forcing.
Performance is on par with or better than state-of-the-art transformers and state-space models, with larger margins in highly nonlinear regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same explicit mixing idea could be tried with other orthogonal bases when the nonlinearity produces known coupling rules.
Reducing required depth may lower the cost of training operator networks for high-resolution physics simulations.
If mode interactions are the dominant source of difficulty, similar n-linear blocks might improve non-Fourier spectral architectures as well.

Load-bearing premise

Structured interactions between Fourier modes in nonlinear PDEs are well captured by explicit n-linear mixing in the spectral domain rather than requiring deeper stacking or other mechanisms.

What would settle it

On the Poisson equation with polynomial forcing, train a single-layer HO-FNO and a 16-layer FNO under identical conditions and measure whether the single-layer error remains lower; reversal of that ordering would falsify the claimed advantage.

Figures

Figures reproduced from arXiv: 2606.28122 by Alexandre Allauzen, Alex Colagrande, Eva Feillet, Paul Caillon.

**Figure 1.** Figure 1: Overview of the proposed HO-FNO architecture, adapted from [Li et al., 2020]. Top: the input a is lifted by L, processed by L HO-FNO layers, and projected by P to obtain the output u. Bottom: each HO-Fourier layer transforms an intermediate representation v to Fourier space, mixes the N Fourier modes into higher-order pseudo-modes, keeps the lowest M pseudo-modes, applies a learned linear transform R, and … view at source ↗

**Figure 2.** Figure 2: Test MSE as a function of the number of layers on the Polynomial-Source Poisson datasets for p = 1, 2, 3, and 5. Solid lines denote the mean over runs, and shaded bands indicate one standard deviation. Lower values indicate better performance. Tables with quantitative results are provided in Appendix I. 5.2 Isolating the Effect of Higher-Order Spectral Convolutions To isolate the contribution of the propos… view at source ↗

**Figure 3.** Figure 3: Efficiency comparison on NS, Airfoil, and Pipe after normalization with respect to FNO for [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Example of mode mixing for a signal with [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Sample from the Polynomial-Source Poisson dataset with [PITH_FULL_IMAGE:figures/full_fig_p032_5.png] view at source ↗

**Figure 6.** Figure 6: Sample from the Polynomial-Source Poisson dataset with [PITH_FULL_IMAGE:figures/full_fig_p032_6.png] view at source ↗

**Figure 7.** Figure 7: Sample from the Polynomial-Source Poisson dataset with [PITH_FULL_IMAGE:figures/full_fig_p032_7.png] view at source ↗

**Figure 8.** Figure 8: Resolution equivariance of FNO and HO-FNO on the Darcy flow dataset. [PITH_FULL_IMAGE:figures/full_fig_p034_8.png] view at source ↗

**Figure 9.** Figure 9: Test MSE as a function of the number of layers on the Polynomial-Source Poisson datasets for p = 1, 2, 3, and 5. Solid lines denote the mean over runs, and shaded bands indicate one standard deviation. Lower values indicate better performance. with the real and imaginary parts, the number of spectral parameters scales as MC2 . Our higher-order spectral convolution of order m augments this linear spectral c… view at source ↗

**Figure 11.** Figure 11: Baselines For the efficiency analysis, we consider one representative model from each main architectural category. As a frequency-based baseline, we use FNO, since it is the model on which our proposed architecture builds. As a state-space model, we use LaMO, which is also the strongest competitor in the experiments reported in [PITH_FULL_IMAGE:figures/full_fig_p037_11.png] view at source ↗

**Figure 10.** Figure 10: Efficiency comparison on NS, Airfoil, and Pipe on a single Nvidia A100 GPU: ( [PITH_FULL_IMAGE:figures/full_fig_p038_10.png] view at source ↗

**Figure 11.** Figure 11: Efficiency comparison on NS, Airfoil, and Pipe after normalization with respect to FNO for [PITH_FULL_IMAGE:figures/full_fig_p038_11.png] view at source ↗

**Figure 12.** Figure 12: Spectrum of the predictions of FNO and HO-FNO (order 2) on the Airfoil benchmark. [PITH_FULL_IMAGE:figures/full_fig_p040_12.png] view at source ↗

**Figure 13.** Figure 13: Spectrum of the predictions of FNO and HO-FNO (order 2) on the Pipe benchmark. [PITH_FULL_IMAGE:figures/full_fig_p040_13.png] view at source ↗

**Figure 14.** Figure 14: Spectrum of the predictions of FNO and HO-FNO (order 2) on the Navier–Stokes [PITH_FULL_IMAGE:figures/full_fig_p040_14.png] view at source ↗

**Figure 15.** Figure 15: Comparison of the network blocks used in our experiments. We compare the standard [PITH_FULL_IMAGE:figures/full_fig_p042_15.png] view at source ↗

**Figure 16.** Figure 16: Qualitative visualization of predictions of FNO and HO-FNO (order 2) on the Airfoil [PITH_FULL_IMAGE:figures/full_fig_p046_16.png] view at source ↗

**Figure 17.** Figure 17: Qualitative visualization of predictions of FNO and HO-FNO (order 2) on the Pipe dataset. [PITH_FULL_IMAGE:figures/full_fig_p046_17.png] view at source ↗

**Figure 18.** Figure 18: Qualitative visualization of predictions of FNO and HO-FNO (order 2) on the Navier– [PITH_FULL_IMAGE:figures/full_fig_p046_18.png] view at source ↗

read the original abstract

Neural operators provide deep neural networks for learning mappings between function spaces. Among them, the Fourier Neural Operator (FNO) is particularly effective: its spectral convolution relies on low-dimensional Fourier-domain representations and can handle inputs at different resolutions. This design aligns well with settings where the Fourier basis diagonalizes the underlying operator, such as linear, constant-coefficient PDEs on periodic domains, in which Fourier modes evolve independently. However, nonlinear PDEs may benefit from an additional inductive bias, as they exhibit structured interactions between modes, governed by polynomial nonlinearities. To capture this inductive bias, we introduce the Higher-Order Spectral Convolution, a spectral mixer that extends FNO from diagonal modulation to explicit n-linear mode mixing, aligned with the dynamics of nonlinear PDEs. Our experiments on standard benchmarks show that the proposed Higher-Order FNO (HO-FNO) retains the efficiency of FNO-based architectures and consistently improves over other spectral neural operators. HO-FNO also performs on par with or better than state-of-the-art transformers and state-space models on several datasets, with stronger gains in highly nonlinear regimes, such as the Poisson equation with polynomial forcing, where a single HO-FNO layer outperforms FNO models with up to 16 layers. We open-source our code for reproducibility at: https://github.com/AlexColagrande/HO-FNO.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HO-FNO adds explicit n-linear spectral mixing to target polynomial nonlinearities, but the single-layer win over 16-layer FNO needs parameter-matched controls to separate bias from capacity.

read the letter

The new piece here is the Higher-Order Spectral Convolution, which replaces FNO's diagonal modulation with explicit n-linear terms over mode tuples. That matches the structure of polynomial nonlinearities in PDEs, where modes do not evolve independently. The abstract frames this as a targeted inductive bias rather than a generic deepening of the network.

The work does one thing cleanly: it keeps the resolution-independent Fourier basis while adding the mixing that linear FNO lacks. Open-sourcing the code is also useful for anyone who wants to test the operator on their own nonlinear problems.

The soft spot is the headline comparison. A single HO-FNO layer beating FNO stacks up to 16 layers on the Poisson equation with polynomial forcing is presented as evidence for the mode-mixing bias. Without reported parameter counts, FLOPs, or ablations that hold total capacity fixed, the gain could simply reflect the extra learnable coefficients in the n-linear terms. The abstract supplies no error bars, dataset sizes, or statistical tests, so the central claim cannot be evaluated from the given text.

The stress-test note lands: the performance delta needs to be shown to survive capacity matching before it can be attributed to the structural alignment. If the full paper contains those controls and the numbers still hold, the result strengthens; if not, the architectural claim weakens.

This is for people already working inside the FNO or spectral-operator literature who want a direct extension for nonlinear regimes. It is not a general-purpose architecture paper. The idea is specific enough and the claim sharp enough that a serious referee should see it, even if the experiments require tightening.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Higher-Order Fourier Neural Operator (HO-FNO) extending FNO via a Higher-Order Spectral Convolution that performs explicit n-linear mixing of Fourier modes, intended to capture structured mode interactions from polynomial nonlinearities in PDEs. It claims consistent improvements over spectral operators on benchmarks, competitive performance with transformers and state-space models, and a key result that a single HO-FNO layer outperforms standard FNO models with up to 16 layers on the Poisson equation with polynomial forcing; code is open-sourced.

Significance. If the central experimental claims hold after parameter-matched verification, the work supplies a targeted inductive bias for spectral neural operators on nonlinear PDEs, potentially improving sample efficiency and depth requirements in operator learning. The open-sourced implementation is a clear strength supporting reproducibility.

major comments (2)

[Abstract] Abstract and results on Poisson equation: the headline claim that a single HO-FNO layer outperforms FNO models with up to 16 layers supplies no parameter counts, FLOPs, dataset sizes, error bars, or statistical tests, leaving open whether gains arise from n-linear mode mixing or from unmatched capacity in the additional learnable coefficients over mode tuples.
[Method] Definition of Higher-Order Spectral Convolution (method section): the extension from diagonal modulation to n-linear terms introduces per-tuple coefficients whose total parameter count is not compared against the stacked FNO baselines, so the performance delta cannot yet be attributed to the claimed structural alignment with nonlinear dynamics rather than expressivity.

minor comments (1)

[Method] Notation for the mixing order n and the precise tensor contraction in the spectral convolution should be stated with an explicit equation to aid implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive comments. We will revise the manuscript to include the requested details on parameter counts, FLOPs, and statistical measures to strengthen the presentation of our results.

read point-by-point responses

Referee: [Abstract] Abstract and results on Poisson equation: the headline claim that a single HO-FNO layer outperforms FNO models with up to 16 layers supplies no parameter counts, FLOPs, dataset sizes, error bars, or statistical tests, leaving open whether gains arise from n-linear mode mixing or from unmatched capacity in the additional learnable coefficients over mode tuples.

Authors: We agree that additional details are needed to fully substantiate the claim. In the revised manuscript, we will augment the abstract and the results section with parameter counts, FLOPs, dataset sizes, error bars, and statistical tests for the Poisson equation experiments. This will allow readers to assess whether the performance improvements stem from the n-linear mixing or from differences in model capacity. revision: yes
Referee: [Method] Definition of Higher-Order Spectral Convolution (method section): the extension from diagonal modulation to n-linear terms introduces per-tuple coefficients whose total parameter count is not compared against the stacked FNO baselines, so the performance delta cannot yet be attributed to the claimed structural alignment with nonlinear dynamics rather than expressivity.

Authors: We acknowledge the importance of parameter-matched comparisons. The revised version will include an explicit comparison of the total parameter counts for the Higher-Order Spectral Convolution against the standard FNO layers and the multi-layer baselines. We will also discuss how the additional coefficients are structured to align with polynomial nonlinearities, supporting the attribution to the inductive bias. revision: yes

Circularity Check

0 steps flagged

No circularity: new architecture introduced as explicit design choice with empirical validation

full rationale

The paper proposes the Higher-Order Spectral Convolution as a novel inductive bias extension to FNO, motivated by the structure of nonlinear PDEs but not derived from or reduced to any fitted parameters, self-citations, or prior results by the same authors. The central claim (single-layer outperformance on Poisson) is presented as an experimental outcome rather than a first-principles prediction that collapses to inputs by construction. No load-bearing self-citation chains, ansatzes smuggled via citation, or renaming of known results appear in the provided text. This is the common case of an honest architectural contribution evaluated empirically.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central contribution is an architectural extension whose justification rests on a domain assumption about mode interactions rather than new free parameters or invented physical entities.

axioms (1)

domain assumption Nonlinear PDEs exhibit structured interactions between Fourier modes governed by polynomial nonlinearities.
This inductive bias is invoked to motivate the higher-order mixer.

invented entities (1)

Higher-Order Spectral Convolution no independent evidence
purpose: Explicit n-linear mode mixing aligned with nonlinear PDE dynamics
New component introduced to extend FNO; no independent evidence outside the architecture itself is provided in the abstract.

pith-pipeline@v0.9.1-grok · 5780 in / 1164 out tokens · 41834 ms · 2026-06-29T01:46:31.662224+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

84 extracted references · 10 canonical work pages

[1]

2011 , publisher=

Green's functions and boundary value problems , author=. 2011 , publisher=

2011
[2]

2022 , publisher=

Global Atmospheric and Oceanic Modelling: Fundamental Equations , author=. 2022 , publisher=

2022
[3]

2007 , isbn =

LeVeque, Randall , title =. 2007 , isbn =

2007
[4]

Numerical solution of partial differential equations by the finite element method

Johnson, C. Numerical solution of partial differential equations by the finite element method
[5]

R. J. LeVeque , publisher =. Finite Volume Methods for Hyperbolic Problems , year =
[6]

arXiv preprint arXiv:2506.10973 , year=

Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning , author=. arXiv preprint arXiv:2506.10973 , year=

arXiv
[7]

arXiv preprint arXiv:2010.08895 , year=

Fourier neural operator for parametric partial differential equations , author=. arXiv preprint arXiv:2010.08895 , year=

Pith/arXiv arXiv 2010
[8]

International conference on machine learning , pages=

Spherical fourier neural operators: Learning stable dynamics on the sphere , author=. International conference on machine learning , pages=. 2023 , organization=

2023
[9]

arXiv preprint arXiv:2302.08166 , year=

Learning neural operators on riemannian manifolds , author=. arXiv preprint arXiv:2302.08166 , year=

arXiv
[10]

Advances in Neural Information Processing Systems , volume=

Geometry-informed neural operator for large-scale 3d pdes , author=. Advances in Neural Information Processing Systems , volume=
[11]

arXiv preprint arXiv:2507.20065 , year=

Geometric Operator Learning with Optimal Transport , author=. arXiv preprint arXiv:2507.20065 , year=

arXiv
[12]

Journal of Machine Learning Research , volume=

Fourier neural operator with learned deformations for pdes on general geometries , author=. Journal of Machine Learning Research , volume=
[13]

Journal of Machine Learning Research , volume=

On universal approximation and error bounds for Fourier neural operators , author=. Journal of Machine Learning Research , volume=
[14]

Journal of Machine Learning Research , volume=

Neural operator: Learning maps between function spaces with applications to pdes , author=. Journal of Machine Learning Research , volume=
[15]

Advances in Neural Information Processing Systems , volume=

AROMA: Preserving spatial structure for latent PDE modeling with local neural fields , author=. Advances in Neural Information Processing Systems , volume=
[16]

, TITLE =

Cybenko, G. , TITLE =. Math. Control Signals Systems , FJOURNAL =. 1989 , PAGES =

1989
[17]

Circuits, Systems and Signal Processing , author =

Universal approximation capability of. Circuits, Systems and Signal Processing , author =. 1996 , pages =. doi:10.1007/BF01188988 , abstract =

work page doi:10.1007/bf01188988 1996
[18]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=
[19]

arXiv preprint arXiv:1409.0473 , year=

Neural machine translation by jointly learning to align and translate , author=. arXiv preprint arXiv:1409.0473 , year=

Pith/arXiv arXiv
[20]

arXiv preprint arXiv:2010.11929 , year=

An image is worth 16x16 words: Transformers for image recognition at scale , author=. arXiv preprint arXiv:2010.11929 , year=

Pith/arXiv arXiv 2010
[21]

and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis , year =

Highly accurate protein structure prediction with. Nature , author =. 2021 , pages =. doi:10.1038/s41586-021-03819-2 , abstract =

work page doi:10.1038/s41586-021-03819-2 2021
[22]

Advances in Neural Information Processing Systems , volume=

Universal physics transformers: A framework for efficiently scaling neural operators , author=. Advances in Neural Information Processing Systems , volume=
[23]

arXiv preprint arXiv:2507.02748 , year=

Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics , author=. arXiv preprint arXiv:2507.02748 , year=

arXiv
[24]

arXiv preprint arXiv:1909.00668 , year=

Logic and the 2 -Simplicial Transformer , author=. arXiv preprint arXiv:1909.00668 , year=

arXiv 1909
[25]

arXiv preprint arXiv:2507.02754 , year=

Fast and Simplex: 2-Simplicial Attention in Triton , author=. arXiv preprint arXiv:2507.02754 , year=

arXiv
[26]

Advances in Neural Information Processing Systems , volume=

Systematic generalization with edge transformers , author=. Advances in Neural Information Processing Systems , volume=
[27]

Advances in Neural Information Processing Systems , volume=

Representational strengths and limitations of transformers , author=. Advances in Neural Information Processing Systems , volume=
[28]

SIAM Journal on Applied Mathematics , author =

Stationary. SIAM Journal on Applied Mathematics , author =. 1984 , note =. doi:10.1137/0144008 , abstract =

work page doi:10.1137/0144008 1984
[29]

and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and

Nature Methods , author =. 2020 , pages =. doi:10.1038/s41592-019-0686-2 , abstract =

work page doi:10.1038/s41592-019-0686-2 2020
[30]

arXiv preprint arXiv:2306.10619 , year=

Towards stability of autoregressive neural operators , author=. arXiv preprint arXiv:2306.10619 , year=

arXiv
[31]

Journal of Computational Physics , author =

A standard test set for numerical approximations to the shallow water equations in spherical geometry , volume =. Journal of Computational Physics , author =. 1992 , pages =. doi:https://doi.org/10.1016/S0021-9991(05)80016-6 , abstract =

work page doi:10.1016/s0021-9991(05)80016-6 1992
[32]

Physical Review Research , volume=

Dedalus: A flexible framework for numerical simulations with spectral methods , author=. Physical Review Research , volume=. 2020 , publisher=

2020
[33]

Advances in Neural Information Processing Systems , volume=

The well: a large-scale collection of diverse physics simulations for machine learning , author=. Advances in Neural Information Processing Systems , volume=
[34]

Hersbach, Hans and Bell, Bill and Berrisford, Paul and Hirahara, Shoji and Horányi, András and Muñoz-Sabater, Joaquín and Nicolas, Julien and Peubey, Carole and Radu, Raluca and Schepers, Dinand and Simmons, Adrian and Soci, Cornel and Abdalla, Saleh and Abellan, Xavier and Balsamo, Gianpaolo and Bechtold, Peter and Biavati, Gionata and Bidlot, Jean and B...

work page doi:10.1002/qj.3803
[35]

and Zaki, Tamer A

Cheung, Lawrence C. and Zaki, Tamer A. , year=. An exact representation of the nonlinear triad interaction terms in spectral space , volume=. doi:10.1017/jfm.2014.179 , journal=

work page doi:10.1017/jfm.2014.179 2014
[36]

Advances in Neural Information Processing Systems , volume=

Pdebench: An extensive benchmark for scientific machine learning , author=. Advances in Neural Information Processing Systems , volume=
[37]

arXiv preprint arXiv:2204.11127 , year=

U-no: U-shaped neural operators , author=. arXiv preprint arXiv:2204.11127 , year=

arXiv
[38]

Transactions on Machine Learning Research , year=

Dynamic Schwartz-Fourier Neural Operator for Enhanced Expressive Power , author=. Transactions on Machine Learning Research , year=
[39]

arXiv preprint arXiv:2412.10354 , year=

A library for learning neural operators , author=. arXiv preprint arXiv:2412.10354 , year=

arXiv
[40]

arXiv preprint arXiv:1910.03193 , year=

Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators , author=. arXiv preprint arXiv:1910.03193 , year=

Pith/arXiv arXiv 1910
[41]

Journal of fluid mechanics , volume=

An exact representation of the nonlinear triad interaction terms in spectral space , author=. Journal of fluid mechanics , volume=. 2014 , publisher=

2014
[42]

Journal of Computational Physics , volume=

Computing nearly singular solutions using pseudo-spectral methods , author=. Journal of Computational Physics , volume=. 2007 , publisher=

2007
[43]

IEEE Transactions on neural networks , volume=

Volterra models and three-layer perceptrons , author=. IEEE Transactions on neural networks , volume=. 1997 , publisher=

1997
[44]

Journal of Machine Learning Research , volume=

Volterra neural networks (vnns) , author=. Journal of Machine Learning Research , volume=
[45]

The cubic nonlinear Schr

Killip, Rowan and Tao, Terence and Vișan, Monica , journal=. The cubic nonlinear Schr
[46]

Reviews of modern physics , volume=

The world of the complex Ginzburg-Landau equation , author=. Reviews of modern physics , volume=. 2002 , publisher=

2002
[47]

Computer Methods in Applied Mechanics and Engineering , volume=

An energy stable method for the Swift--Hohenberg equation with quadratic--cubic nonlinearity , author=. Computer Methods in Applied Mechanics and Engineering , volume=. 2019 , publisher=

2019
[48]

SIAM Journal on Scientific Computing , volume=

Galerkin neural networks: A framework for approximating variational equations with error control , author=. SIAM Journal on Scientific Computing , volume=. 2021 , publisher=

2021
[49]

Advances in neural information processing systems , volume=

Choose a transformer: Fourier or galerkin , author=. Advances in neural information processing systems , volume=
[50]

arXiv preprint arXiv:2003.03485 , year=

Neural operator: Graph kernel network for partial differential equations , author=. arXiv preprint arXiv:2003.03485 , year=

Pith/arXiv arXiv 2003
[51]

Seidman and Leonardo Ferreira Guilhoto and Victor M

Georgios Kissas and Jacob H. Seidman and Leonardo Ferreira Guilhoto and Victor M. Preciado and George J. Pappas and Paris Perdikaris , title =. Journal of Machine Learning Research , year =
[52]

arXiv preprint arXiv:2406.06486 , year=

Continuum attention for neural operators , author=. arXiv preprint arXiv:2406.06486 , year=

arXiv
[53]

arXiv preprint arXiv:2412.06740 , year=

Convolution goes higher-order: a biologically inspired mechanism empowers image classification , author=. arXiv preprint arXiv:2412.06740 , year=

arXiv
[54]

2015 , eprint=

U-Net: Convolutional Networks for Biomedical Image Segmentation , author=. 2015 , eprint=

2015
[55]

2015 , eprint=

Deep Residual Learning for Image Recognition , author=. 2015 , eprint=

2015
[56]

2021 , eprint=

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , author=. 2021 , eprint=

2021
[57]

doi:10.1038/s42256-021-00302-5 Lu Lu, Raphaël Pestourie, Steven G

Lu, Lu and Jin, Pengzhan and Pang, Guofei and Zhang, Zhongqiang and Karniadakis, George Em , year=. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators , volume=. Nature Machine Intelligence , publisher=. doi:10.1038/s42256-021-00302-5 , number=

work page doi:10.1038/s42256-021-00302-5
[58]

2021 , eprint=

Multiwavelet-based Operator Learning for Differential Equations , author=. 2021 , eprint=

2021
[59]

2022 , eprint=

U-FNO -- An enhanced Fourier neural operator-based deep-learning model for multiphase flow , author=. 2022 , eprint=

2022
[60]

2023 , eprint=

Factorized Fourier Neural Operators , author=. 2023 , eprint=

2023
[61]

2023 , eprint=

Solving High-Dimensional PDEs with Latent Spectral Models , author=. 2023 , eprint=

2023
[62]

2021 , eprint=

Choose a Transformer: Fourier or Galerkin , author=. 2021 , eprint=

2021
[63]

Ht-net: Hierarchical transformer based operator learning model for multiscale pdes , author=
[64]

2023 , eprint=

Transformer for Partial Differential Equations' Operator Learning , author=. 2023 , eprint=

2023
[65]

2023 , eprint=

GNOT: A General Neural Operator Transformer for Operator Learning , author=. 2023 , eprint=

2023
[66]

2023 , eprint=

Scalable Transformer for PDE Surrogate Modeling , author=. 2023 , eprint=

2023
[67]

2024 , eprint=

Improved Operator Learning by Orthogonal Attention , author=. 2024 , eprint=

2024
[68]

2024 , eprint=

Transolver: A Fast Transformer Solver for PDEs on General Geometries , author=. 2024 , eprint=

2024
[69]

Forty-second International Conference on Machine Learning , year=

Latent Mamba Operator for Partial Differential Equations , author=. Forty-second International Conference on Machine Learning , year=
[70]

International Conference on Learning Representations , year=

Decoupled Weight Decay Regularization , author=. International Conference on Learning Representations , year=
[71]

2017 , url=

Ilya Loshchilov and Frank Hutter , booktitle=. 2017 , url=

2017
[72]

2023 , eprint=

PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers , author=. 2023 , eprint=

2023
[73]

ICLR 2024 Workshop on AI4DifferentialEquations In Science , year=

Mixture of neural operators: Incorporating historical information for longer rollouts , author=. ICLR 2024 Workshop on AI4DifferentialEquations In Science , year=

2024
[74]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

MSPT: Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[75]

International Conference on Machine Learning , pages=

Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems , author=. International Conference on Machine Learning , pages=. 2025 , organization=

2025
[76]

2025 , eprint=

Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries , author=. 2025 , eprint=

2025
[77]

arXiv preprint arXiv:2310.00120 , year=

Multi-grid tensorized fourier neural operator for high-resolution pdes , author=. arXiv preprint arXiv:2310.00120 , year=

arXiv
[78]

1996 , publisher=

Spectral methods , author=. 1996 , publisher=

1996
[79]

Klimenko, A. Y. and Pope, S. B. , title =. Physics of Fluids , volume =. 2003 , month =. doi:10.1063/1.1575754 , url =

work page doi:10.1063/1.1575754 2003
[80]

European Journal of Physics , volume=

Nonlinear electrostatics: the Poisson--Boltzmann equation , author=. European Journal of Physics , volume=. 2018 , publisher=

2018

Showing first 80 references.

[1] [1]

2011 , publisher=

Green's functions and boundary value problems , author=. 2011 , publisher=

2011

[2] [2]

2022 , publisher=

Global Atmospheric and Oceanic Modelling: Fundamental Equations , author=. 2022 , publisher=

2022

[3] [3]

2007 , isbn =

LeVeque, Randall , title =. 2007 , isbn =

2007

[4] [4]

Numerical solution of partial differential equations by the finite element method

Johnson, C. Numerical solution of partial differential equations by the finite element method

[5] [5]

R. J. LeVeque , publisher =. Finite Volume Methods for Hyperbolic Problems , year =

[6] [6]

arXiv preprint arXiv:2506.10973 , year=

Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning , author=. arXiv preprint arXiv:2506.10973 , year=

arXiv

[7] [7]

arXiv preprint arXiv:2010.08895 , year=

Fourier neural operator for parametric partial differential equations , author=. arXiv preprint arXiv:2010.08895 , year=

Pith/arXiv arXiv 2010

[8] [8]

International conference on machine learning , pages=

Spherical fourier neural operators: Learning stable dynamics on the sphere , author=. International conference on machine learning , pages=. 2023 , organization=

2023

[9] [9]

arXiv preprint arXiv:2302.08166 , year=

Learning neural operators on riemannian manifolds , author=. arXiv preprint arXiv:2302.08166 , year=

arXiv

[10] [10]

Advances in Neural Information Processing Systems , volume=

Geometry-informed neural operator for large-scale 3d pdes , author=. Advances in Neural Information Processing Systems , volume=

[11] [11]

arXiv preprint arXiv:2507.20065 , year=

Geometric Operator Learning with Optimal Transport , author=. arXiv preprint arXiv:2507.20065 , year=

arXiv

[12] [12]

Journal of Machine Learning Research , volume=

Fourier neural operator with learned deformations for pdes on general geometries , author=. Journal of Machine Learning Research , volume=

[13] [13]

Journal of Machine Learning Research , volume=

On universal approximation and error bounds for Fourier neural operators , author=. Journal of Machine Learning Research , volume=

[14] [14]

Journal of Machine Learning Research , volume=

Neural operator: Learning maps between function spaces with applications to pdes , author=. Journal of Machine Learning Research , volume=

[15] [15]

Advances in Neural Information Processing Systems , volume=

AROMA: Preserving spatial structure for latent PDE modeling with local neural fields , author=. Advances in Neural Information Processing Systems , volume=

[16] [16]

, TITLE =

Cybenko, G. , TITLE =. Math. Control Signals Systems , FJOURNAL =. 1989 , PAGES =

1989

[17] [17]

Circuits, Systems and Signal Processing , author =

Universal approximation capability of. Circuits, Systems and Signal Processing , author =. 1996 , pages =. doi:10.1007/BF01188988 , abstract =

work page doi:10.1007/bf01188988 1996

[18] [18]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=

[19] [19]

arXiv preprint arXiv:1409.0473 , year=

Neural machine translation by jointly learning to align and translate , author=. arXiv preprint arXiv:1409.0473 , year=

Pith/arXiv arXiv

[20] [20]

arXiv preprint arXiv:2010.11929 , year=

An image is worth 16x16 words: Transformers for image recognition at scale , author=. arXiv preprint arXiv:2010.11929 , year=

Pith/arXiv arXiv 2010

[21] [21]

and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis , year =

Highly accurate protein structure prediction with. Nature , author =. 2021 , pages =. doi:10.1038/s41586-021-03819-2 , abstract =

work page doi:10.1038/s41586-021-03819-2 2021

[22] [22]

Advances in Neural Information Processing Systems , volume=

Universal physics transformers: A framework for efficiently scaling neural operators , author=. Advances in Neural Information Processing Systems , volume=

[23] [23]

arXiv preprint arXiv:2507.02748 , year=

Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics , author=. arXiv preprint arXiv:2507.02748 , year=

arXiv

[24] [24]

arXiv preprint arXiv:1909.00668 , year=

Logic and the 2 -Simplicial Transformer , author=. arXiv preprint arXiv:1909.00668 , year=

arXiv 1909

[25] [25]

arXiv preprint arXiv:2507.02754 , year=

Fast and Simplex: 2-Simplicial Attention in Triton , author=. arXiv preprint arXiv:2507.02754 , year=

arXiv

[26] [26]

Advances in Neural Information Processing Systems , volume=

Systematic generalization with edge transformers , author=. Advances in Neural Information Processing Systems , volume=

[27] [27]

Advances in Neural Information Processing Systems , volume=

Representational strengths and limitations of transformers , author=. Advances in Neural Information Processing Systems , volume=

[28] [28]

SIAM Journal on Applied Mathematics , author =

Stationary. SIAM Journal on Applied Mathematics , author =. 1984 , note =. doi:10.1137/0144008 , abstract =

work page doi:10.1137/0144008 1984

[29] [29]

and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and

Nature Methods , author =. 2020 , pages =. doi:10.1038/s41592-019-0686-2 , abstract =

work page doi:10.1038/s41592-019-0686-2 2020

[30] [30]

arXiv preprint arXiv:2306.10619 , year=

Towards stability of autoregressive neural operators , author=. arXiv preprint arXiv:2306.10619 , year=

arXiv

[31] [31]

Journal of Computational Physics , author =

A standard test set for numerical approximations to the shallow water equations in spherical geometry , volume =. Journal of Computational Physics , author =. 1992 , pages =. doi:https://doi.org/10.1016/S0021-9991(05)80016-6 , abstract =

work page doi:10.1016/s0021-9991(05)80016-6 1992

[32] [32]

Physical Review Research , volume=

Dedalus: A flexible framework for numerical simulations with spectral methods , author=. Physical Review Research , volume=. 2020 , publisher=

2020

[33] [33]

Advances in Neural Information Processing Systems , volume=

The well: a large-scale collection of diverse physics simulations for machine learning , author=. Advances in Neural Information Processing Systems , volume=

[34] [34]

Hersbach, Hans and Bell, Bill and Berrisford, Paul and Hirahara, Shoji and Horányi, András and Muñoz-Sabater, Joaquín and Nicolas, Julien and Peubey, Carole and Radu, Raluca and Schepers, Dinand and Simmons, Adrian and Soci, Cornel and Abdalla, Saleh and Abellan, Xavier and Balsamo, Gianpaolo and Bechtold, Peter and Biavati, Gionata and Bidlot, Jean and B...

work page doi:10.1002/qj.3803

[35] [35]

and Zaki, Tamer A

Cheung, Lawrence C. and Zaki, Tamer A. , year=. An exact representation of the nonlinear triad interaction terms in spectral space , volume=. doi:10.1017/jfm.2014.179 , journal=

work page doi:10.1017/jfm.2014.179 2014

[36] [36]

Advances in Neural Information Processing Systems , volume=

Pdebench: An extensive benchmark for scientific machine learning , author=. Advances in Neural Information Processing Systems , volume=

[37] [37]

arXiv preprint arXiv:2204.11127 , year=

U-no: U-shaped neural operators , author=. arXiv preprint arXiv:2204.11127 , year=

arXiv

[38] [38]

Transactions on Machine Learning Research , year=

Dynamic Schwartz-Fourier Neural Operator for Enhanced Expressive Power , author=. Transactions on Machine Learning Research , year=

[39] [39]

arXiv preprint arXiv:2412.10354 , year=

A library for learning neural operators , author=. arXiv preprint arXiv:2412.10354 , year=

arXiv

[40] [40]

arXiv preprint arXiv:1910.03193 , year=

Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators , author=. arXiv preprint arXiv:1910.03193 , year=

Pith/arXiv arXiv 1910

[41] [41]

Journal of fluid mechanics , volume=

An exact representation of the nonlinear triad interaction terms in spectral space , author=. Journal of fluid mechanics , volume=. 2014 , publisher=

2014

[42] [42]

Journal of Computational Physics , volume=

Computing nearly singular solutions using pseudo-spectral methods , author=. Journal of Computational Physics , volume=. 2007 , publisher=

2007

[43] [43]

IEEE Transactions on neural networks , volume=

Volterra models and three-layer perceptrons , author=. IEEE Transactions on neural networks , volume=. 1997 , publisher=

1997

[44] [44]

Journal of Machine Learning Research , volume=

Volterra neural networks (vnns) , author=. Journal of Machine Learning Research , volume=

[45] [45]

The cubic nonlinear Schr

Killip, Rowan and Tao, Terence and Vișan, Monica , journal=. The cubic nonlinear Schr

[46] [46]

Reviews of modern physics , volume=

The world of the complex Ginzburg-Landau equation , author=. Reviews of modern physics , volume=. 2002 , publisher=

2002

[47] [47]

Computer Methods in Applied Mechanics and Engineering , volume=

An energy stable method for the Swift--Hohenberg equation with quadratic--cubic nonlinearity , author=. Computer Methods in Applied Mechanics and Engineering , volume=. 2019 , publisher=

2019

[48] [48]

SIAM Journal on Scientific Computing , volume=

Galerkin neural networks: A framework for approximating variational equations with error control , author=. SIAM Journal on Scientific Computing , volume=. 2021 , publisher=

2021

[49] [49]

Advances in neural information processing systems , volume=

Choose a transformer: Fourier or galerkin , author=. Advances in neural information processing systems , volume=

[50] [50]

arXiv preprint arXiv:2003.03485 , year=

Neural operator: Graph kernel network for partial differential equations , author=. arXiv preprint arXiv:2003.03485 , year=

Pith/arXiv arXiv 2003

[51] [51]

Seidman and Leonardo Ferreira Guilhoto and Victor M

Georgios Kissas and Jacob H. Seidman and Leonardo Ferreira Guilhoto and Victor M. Preciado and George J. Pappas and Paris Perdikaris , title =. Journal of Machine Learning Research , year =

[52] [52]

arXiv preprint arXiv:2406.06486 , year=

Continuum attention for neural operators , author=. arXiv preprint arXiv:2406.06486 , year=

arXiv

[53] [53]

arXiv preprint arXiv:2412.06740 , year=

Convolution goes higher-order: a biologically inspired mechanism empowers image classification , author=. arXiv preprint arXiv:2412.06740 , year=

arXiv

[54] [54]

2015 , eprint=

U-Net: Convolutional Networks for Biomedical Image Segmentation , author=. 2015 , eprint=

2015

[55] [55]

2015 , eprint=

Deep Residual Learning for Image Recognition , author=. 2015 , eprint=

2015

[56] [56]

2021 , eprint=

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , author=. 2021 , eprint=

2021

[57] [57]

doi:10.1038/s42256-021-00302-5 Lu Lu, Raphaël Pestourie, Steven G

Lu, Lu and Jin, Pengzhan and Pang, Guofei and Zhang, Zhongqiang and Karniadakis, George Em , year=. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators , volume=. Nature Machine Intelligence , publisher=. doi:10.1038/s42256-021-00302-5 , number=

work page doi:10.1038/s42256-021-00302-5

[58] [58]

2021 , eprint=

Multiwavelet-based Operator Learning for Differential Equations , author=. 2021 , eprint=

2021

[59] [59]

2022 , eprint=

U-FNO -- An enhanced Fourier neural operator-based deep-learning model for multiphase flow , author=. 2022 , eprint=

2022

[60] [60]

2023 , eprint=

Factorized Fourier Neural Operators , author=. 2023 , eprint=

2023

[61] [61]

2023 , eprint=

Solving High-Dimensional PDEs with Latent Spectral Models , author=. 2023 , eprint=

2023

[62] [62]

2021 , eprint=

Choose a Transformer: Fourier or Galerkin , author=. 2021 , eprint=

2021

[63] [63]

Ht-net: Hierarchical transformer based operator learning model for multiscale pdes , author=

[64] [64]

2023 , eprint=

Transformer for Partial Differential Equations' Operator Learning , author=. 2023 , eprint=

2023

[65] [65]

2023 , eprint=

GNOT: A General Neural Operator Transformer for Operator Learning , author=. 2023 , eprint=

2023

[66] [66]

2023 , eprint=

Scalable Transformer for PDE Surrogate Modeling , author=. 2023 , eprint=

2023

[67] [67]

2024 , eprint=

Improved Operator Learning by Orthogonal Attention , author=. 2024 , eprint=

2024

[68] [68]

2024 , eprint=

Transolver: A Fast Transformer Solver for PDEs on General Geometries , author=. 2024 , eprint=

2024

[69] [69]

Forty-second International Conference on Machine Learning , year=

Latent Mamba Operator for Partial Differential Equations , author=. Forty-second International Conference on Machine Learning , year=

[70] [70]

International Conference on Learning Representations , year=

Decoupled Weight Decay Regularization , author=. International Conference on Learning Representations , year=

[71] [71]

2017 , url=

Ilya Loshchilov and Frank Hutter , booktitle=. 2017 , url=

2017

[72] [72]

2023 , eprint=

PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers , author=. 2023 , eprint=

2023

[73] [73]

ICLR 2024 Workshop on AI4DifferentialEquations In Science , year=

Mixture of neural operators: Incorporating historical information for longer rollouts , author=. ICLR 2024 Workshop on AI4DifferentialEquations In Science , year=

2024

[74] [74]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

MSPT: Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[75] [75]

International Conference on Machine Learning , pages=

Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems , author=. International Conference on Machine Learning , pages=. 2025 , organization=

2025

[76] [76]

2025 , eprint=

Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries , author=. 2025 , eprint=

2025

[77] [77]

arXiv preprint arXiv:2310.00120 , year=

Multi-grid tensorized fourier neural operator for high-resolution pdes , author=. arXiv preprint arXiv:2310.00120 , year=

arXiv

[78] [78]

1996 , publisher=

Spectral methods , author=. 1996 , publisher=

1996

[79] [79]

Klimenko, A. Y. and Pope, S. B. , title =. Physics of Fluids , volume =. 2003 , month =. doi:10.1063/1.1575754 , url =

work page doi:10.1063/1.1575754 2003

[80] [80]

European Journal of Physics , volume=

Nonlinear electrostatics: the Poisson--Boltzmann equation , author=. European Journal of Physics , volume=. 2018 , publisher=

2018