arxiv: 2605.09096 · v1 · submitted 2026-05-09 · 💻 cs.LG

Recognition: no theorem link

Bridging Spectral Operator Learning and U-Net Hierarchies: SpectraNet for Stable Autoregressive PDE Surrogates

Elias Ioup, Enrique Hern\'andez Noguera, Julian Simeonov, Mahdi Abdelguerfi, Md Meftahul Ferdaus

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:50 UTC · model grok-4.3

classification 💻 cs.LG

keywords neural operatorsspectral convolutionsU-Net hierarchyPDE surrogatesautoregressive modelingsemigroup consistencyresidual targetNavier-Stokes

0 comments

The pith

SpectraNet embeds truncated spectral convolutions in a U-Net hierarchy with residual-target blocks and semigroup-consistency loss to convert exponential rollout error into linear drift for PDE surrogates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Neural operators for time-dependent PDEs face exponential error growth from spectral methods and loss of resolution invariance from hierarchical ones. SpectraNet resolves this by placing truncated spectral convolutions inside U-Net levels and parametrizing each step as a residual from the prior state. Training uses a semigroup-consistency loss that enforces the operator composition property over multiple steps. On Navier-Stokes at viscosity 1e-5 the model reaches lower test L2 error than canonical FNO while using 2.33 times fewer parameters, and free rollouts remain bounded to T=100 where FNO diverges. The same architecture trained at native 128 squared resolution improves further while FNO degrades.

Core claim

SpectraNet composes truncated spectral convolutions inside a U-Net hierarchy with a Residual-Target Spectral Block trained under a Semigroup-Consistency Loss. The residual-target parametrization replaces L^T stability blow-up with linear T*delta drift, and the spectral path's parameter count is Theta(L w^2 M^2), independent of grid N. Under a single unified protocol against 16 published neural-operator baselines on Navier-Stokes nu=1e-5 at 64x64, SpectraNet reaches test relative L2 = 0.0822 at 2.04M parameters and wins five of six rows in a cross-PDE comparison against FNO.

What carries the argument

The Residual-Target Spectral Block, which parametrizes each autoregressive step as a residual update via truncated spectral convolutions placed inside U-Net hierarchy levels and trained with a semigroup-consistency loss that penalizes deviation from operator composition.

Load-bearing premise

The residual-target parametrization and semigroup-consistency loss will continue to bound rollout error linearly for initial conditions and time horizons outside the training distribution.

What would settle it

Measure rollout relative L2 error on time horizons or initial conditions outside the training distribution and check whether the growth remains linear in the number of steps or becomes exponential.

Figures

Figures reproduced from arXiv: 2605.09096 by Elias Ioup, Enrique Hern\'andez Noguera, Julian Simeonov, Mahdi Abdelguerfi, Md Meftahul Ferdaus.

**Figure 1.** Figure 1: SpectraNet architecture. Input (B, 64, 64, 10) + (x, y)-grid is lifted to w=32 channels; a three-level encoder at truncations Mℓ∈ {12, 6, 3} feeds a single-block bottleneck (M3=1); a mirrored decoder upsamples with additive skips (⊕); a two-layer 1×1 MLP head (hidden 4w, GeLU) emits residual ∆θ, summed with ωt to recover ωbt+1 (Theorem 1). 4.2 Encoder–bottleneck–decoder backbone Spectral block. Each level … view at source ↗

**Figure 2.** Figure 2: SpectraNet on the cost–accuracy Pareto frontier (NS ν=10−5 , 64×64). SpectraNet (red) occupies the lower-left frontier on (L 2 , params) and (L 2 , H100 latency at B=1). Five of six lightweight Pareto-frontier points are SpectraNet variants. The full-attention NSL Transformer (green star) wins raw L 2 at 3.3× the GPU latency at B=1 and 75× at B=32. 10 20 50 100 Rollout step T 0.0 0.2 0.4 0.6 0.8 1.0 Blow-u… view at source ↗

**Figure 3.** Figure 3: Free-rollout behavior beyond the training horizon. Free rollout to T=100 for SpectraNet, the SpectraNet w=48 plain ablation, FNO, and the NSL Transformer (200 NS ν=10−5 test trajectories). (a) Fraction of the 200 test trajectories that have diverged (NaN or vorticity energy past the blow-up threshold) by step T. SpectraNet variants and the Transformer remain at 0.0 throughout; FNO jumps to 1.0 between T=2… view at source ↗

**Figure 4.** Figure 4: CPU deployment Pareto frontier. SpectraNet (red) and width-scaling variants occupy the lightweight CPU frontier; the NSL Transformer’s ∼10 s per sample rules it out for edge or interactive use. Median over 50 batches on Intel i5-1155G7, single-thread, B=1. Transformer on raw L 2 at 642 ; the contribution is on the lightweight (≤5 M parameter, sub-200 ms CPU) Pareto frontier. Limitations. The headline NS ν=… view at source ↗

**Figure 5.** Figure 5: GPU latency at B=1 vs parameter count for the unified-protocol leaderboard. Canonical SpectraNet sits in the lower-left quadrant (2.04 M parameters, 32.8 ms); the full-attention NSL Transformer pays a ∼ 3.3× latency premium for its accuracy, and the largest U-FNO and U-Net baselines pay a parameter-cost premium of one-to-two orders of magnitude without a matching accuracy gain. 21 [PITH_FULL_IMAGE:figures… view at source ↗

**Figure 6.** Figure 6: Single-sample latency on GPU and CPU (B=1). Left: NVIDIA H100 (80 GB), FP32, single stream. Right: Intel i5-1155G7 single-thread, FP32. The attention models that win raw L 2 on H100 (Transformer, Galerkin, Transolver, GNOT) move to the right of the CPU plot by an order of magnitude and become unsuitable for consumer-CPU deployment; canonical SpectraNet stays sub-200 ms on both axes. F Cross-viscosity zero-… view at source ↗

**Figure 7.** Figure 7: GPU throughput at B=32 (samples per second). The training-equivalent batch regime widens the gap between the spectral models (FNO, F-FNO, SpectraNet) and the attention-based models because the latter incur an O(N2 ) memory cost per sample that limits parallel batch processing. Limitations. (i) The cross-viscosity datasets were not generated with controlled IC matching to the training set; we cannot disent… view at source ↗

**Figure 8.** Figure 8: Peak GPU memory (MiB) measured via torch.cuda.max memory allocated() during the timed region at B=32. Canonical SpectraNet’s peak memory is bounded by the spectral-truncation budget and is independent of the spatial grid (Proposition 2); the O(N2 )-attention models scale with the token count squared and would become infeasible at 1282 without sliding-window approximations. Why FNO as the only comparator. T… view at source ↗

read the original abstract

Neural operators for time-dependent PDEs face a structural tension: spectral architectures (FNO and descendants) inherit exponential rollout-error growth from their one-step Lipschitz constant, while hierarchical U-Net operators trade resolution invariance for multi-scale detail. We introduce SpectraNet, an autoregressive neural operator that composes truncated spectral convolutions inside a U-Net hierarchy with a Residual-Target Spectral Block trained under a Semigroup-Consistency Loss. The residual-target parametrization replaces L^T stability blow-up with linear T*delta drift, and the spectral path's parameter count is Theta(L w^2 M^2), independent of grid N. Under a single unified protocol against 16 published neural-operator baselines on Navier-Stokes nu=1e-5 at 64x64, SpectraNet reaches test relative L2 = 0.0822 at 2.04M parameters -- 2.33x fewer than canonical FNO at ~20% lower error -- and wins five of six rows in a cross-PDE comparison against FNO (NS at nu in {1e-4, 1e-3}, PDEBench Shallow-Water 2D and Diffusion-Reaction, with the Active-Matter row going to FNO inside its seed spread). Trained from scratch at native 128^2 under the same protocol, SpectraNet improves to 0.0724 while FNO regresses to 0.3080. Free rollout stays bounded for T=100 where FNO diverges across all 200 test trajectories. On consumer CPU at B=1, SpectraNet runs sub-200ms while the full-attention Transformer that wins raw L2 pays ~60x latency; we do not claim to beat that Transformer on raw L2, only to dominate the lightweight (<=5M parameter, sub-200ms CPU) Pareto frontier. Source code: https://github.com/Enrikkk/spectranet

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SpectraNet gets lower error and stable long rollouts than FNO on the reported PDE tests with fewer parameters, but the linear-drift stability is shown only inside the training distribution.

read the letter

SpectraNet puts truncated spectral convolutions inside a U-Net hierarchy, adds a residual-target spectral block, and trains with a semigroup-consistency loss. The goal is to replace the usual exponential rollout blow-up with linear error growth so autoregressive predictions stay bounded farther out in time. On the Navier-Stokes nu=1e-5 case at 64x64 it reports 0.0822 relative L2 at 2M parameters, beating canonical FNO on error while using roughly half the parameters, and it stays stable to T=100 across the 200 test trajectories where FNO diverges. It also wins most of the cross-PDE comparisons and runs fast on CPU. The code is public, which is helpful for checking details.

Referee Report

2 major / 2 minor

Summary. The paper introduces SpectraNet, an autoregressive neural operator that composes truncated spectral convolutions inside a U-Net hierarchy with a Residual-Target Spectral Block trained under a Semigroup-Consistency Loss. It claims this replaces exponential L^T rollout-error growth with linear T*delta drift, yielding test relative L2=0.0822 at 2.04M parameters on NS nu=1e-5 (64x64) -- outperforming FNO with 2.33x fewer parameters and ~20% lower error -- while winning five of six cross-PDE rows, maintaining bounded free rollouts to T=100 where FNO diverges on all 200 test trajectories, and offering sub-200ms CPU inference.

Significance. If the stability and efficiency claims hold, the work would be significant for neural PDE surrogates by bridging spectral resolution invariance with U-Net multi-scale detail, directly addressing rollout instability while reporting concrete wins on parameter count, cross-PDE performance, and latency on the lightweight frontier.

major comments (2)

[§4 (Experiments), free-rollout paragraph and NS results] §4 (Experiments), free-rollout paragraph and NS results: The bounded T=100 rollouts and linear-drift claim are shown exclusively on the 200 in-distribution test trajectories drawn from the same distribution used for training (nu=1e-5 NS at 64x64 and the other PDEBench cases). No OOD probes with altered IC statistics, extended horizons, or perturbed forcings are reported, leaving open whether the residual-target + semigroup loss structurally enforces linear error growth or merely fits the benchmark trajectory statistics.
[§3 (Methodology), Semigroup-Consistency Loss and Residual-Target Spectral Block] §3 (Methodology), Semigroup-Consistency Loss and Residual-Target Spectral Block: The manuscript provides no full derivation of the loss, no explicit equation for how the residual-target parametrization interacts with the semigroup consistency term, and no details on exact data splits or the rollout-error analysis protocol. This absence makes it difficult to verify that the design replaces L^T blow-up with linear drift independently of the training distribution.

minor comments (2)

The parameter-count and latency comparisons would be clearer if the exact FNO and baseline configurations (modes, layers, etc.) used in the unified protocol were tabulated alongside SpectraNet.
A short reproducibility statement listing random seeds, exact hyper-parameters, and data-split indices should be added to support the reported L2 numbers and win rates.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point-by-point below, providing clarifications on the experimental scope and committing to revisions that strengthen the methodological exposition without misrepresenting the current results.

read point-by-point responses

Referee: The bounded T=100 rollouts and linear-drift claim are shown exclusively on the 200 in-distribution test trajectories drawn from the same distribution used for training (nu=1e-5 NS at 64x64 and the other PDEBench cases). No OOD probes with altered IC statistics, extended horizons, or perturbed forcings are reported, leaving open whether the residual-target + semigroup loss structurally enforces linear error growth or merely fits the benchmark trajectory statistics.

Authors: We appreciate the referee's emphasis on this distinction. Our free-rollout experiments and linear-drift observations are indeed confined to in-distribution test trajectories drawn from the standard benchmark distributions. The Residual-Target Spectral Block and Semigroup-Consistency Loss are designed to encourage multi-step consistency by penalizing deviations from the semigroup property, which we show empirically yields bounded rollouts to T=100 on these trajectories (where FNO diverges). While this does not constitute a formal proof of distribution-independent structural enforcement, the cross-PDE wins and the contrast with FNO on identical data provide supporting evidence that the behavior is not an artifact of fitting narrow statistics. In the revised manuscript we will expand §4 with an explicit discussion of the in-distribution scope of the results and the associated limitations for out-of-distribution generalization. revision: partial
Referee: The manuscript provides no full derivation of the loss, no explicit equation for how the residual-target parametrization interacts with the semigroup consistency term, and no details on exact data splits or the rollout-error analysis protocol. This absence makes it difficult to verify that the design replaces L^T blow-up with linear drift independently of the training distribution.

Authors: We agree that these elements were insufficiently detailed in the submitted version. The Semigroup-Consistency Loss measures the discrepancy between a direct multi-step prediction and the iterated application of the one-step operator, while the residual-target parametrization expresses the update relative to the identity map to keep the effective Lipschitz constant near unity. In the revision we will insert a complete derivation of the loss, the explicit interaction equations between the residual block and the consistency term, the precise data-split ratios employed, and the autoregressive rollout protocol used for error accumulation. These additions will appear in §3 and will allow readers to verify the intended mechanism for replacing exponential with linear error growth. revision: yes

Circularity Check

1 steps flagged

Semigroup-consistency loss and residual-target design enforce linear-drift claim by construction

specific steps

self definitional [Abstract]
"We introduce SpectraNet, an autoregressive neural operator that composes truncated spectral convolutions inside a U-Net hierarchy with a Residual-Target Spectral Block trained under a Semigroup-Consistency Loss. The residual-target parametrization replaces L^T stability blow-up with linear T*delta drift"

The architecture is defined to use residual-target updates and is trained with a loss constructed to enforce semigroup consistency; the paper then presents the resulting linear error growth as a derived property of the parametrization. The stability guarantee is therefore equivalent to the modeling decision rather than obtained from an external mathematical step or data-driven verification.

full rationale

The paper's stability argument is introduced via two modeling choices (residual-target Spectral Block and Semigroup-Consistency Loss) whose explicit purpose is to convert exponential rollout blow-up into linear T*delta drift. The central claim therefore reduces to the design ansatz rather than an independent derivation from the PDE or data; empirical results on in-distribution trajectories provide supporting evidence but do not break the definitional link. No self-citation chains, uniqueness theorems, or fitted-input renamings appear in the load-bearing steps.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on a new architectural block and loss whose details are only sketched in the abstract; the truncation level M and U-Net widths are free choices that control the parameter count but are not fitted to the final performance metric.

free parameters (2)

spectral truncation M
Number of retained Fourier modes that sets the Theta(L w^2 M^2) parameter scaling independent of grid size N.
U-Net depth L and width w
Hyperparameters defining the hierarchy that are chosen by the authors to balance capacity and speed.

axioms (1)

domain assumption The underlying PDE time evolution satisfies the semigroup property
Invoked to motivate and define the Semigroup-Consistency Loss.

invented entities (1)

Residual-Target Spectral Block no independent evidence
purpose: To replace exponential L^T stability blow-up with linear T*delta drift in autoregressive rollouts
New parametrization introduced by the authors to address the Lipschitz-constant limitation of standard spectral operators.

pith-pipeline@v0.9.0 · 5686 in / 1585 out tokens · 63360 ms · 2026-05-12T01:50:02.335940+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

[1]

ICLR , year =

Fourier Neural Operator for Parametric Partial Differential Equations , author =. ICLR , year =

work page
[2]

ICLR , year =

Factorized Fourier Neural Operators , author =. ICLR , year =

work page
[3]

Advances in Water Resources , year =

U-FNO: An enhanced Fourier neural operator-based deep-learning model for multiphase flow , author =. Advances in Water Resources , year =

work page
[4]

TMLR , year =

U-NO: U-shaped Neural Operators , author =. TMLR , year =

work page
[5]

ICML , year =

Solving High-Dimensional PDEs with Latent Spectral Models , author =. ICML , year =

work page
[6]

NeurIPS , year =

Multiwavelet-based Operator Learning for Differential Equations , author =. NeurIPS , year =

work page
[7]

Neural-Solver-Library: A Library for Advanced Neural

Wu, Haixu and others , year =. Neural-Solver-Library: A Library for Advanced Neural

work page
[8]

NeurIPS , year =

Choose a Transformer: Fourier or Galerkin , author =. NeurIPS , year =

work page
[9]

NeurIPS , year =

Scalable Transformer for PDE Surrogate Modeling , author =. NeurIPS , year =

work page
[10]

TMLR , year =

Transformer for Partial Differential Equations' Operator Learning , author =. TMLR , year =

work page
[11]

ICML , year =

Transolver: A Fast Transformer Solver for PDEs on General Geometries , author =. ICML , year =

work page
[12]

ICML , year =

GNOT: A General Neural Operator Transformer for Operator Learning , author =. ICML , year =

work page
[13]

NeurIPS , year =

Convolutional Neural Operators for Robust and Accurate Learning of PDEs , author =. NeurIPS , year =

work page
[14]

Journal of Computational Physics , year =

Koopman Neural Operator as a Mesh-Free Solver of Non-linear PDEs , author =. Journal of Computational Physics , year =

work page
[15]

ICML , year =

Improved Operator Learning by Orthogonal Attention , author =. ICML , year =

work page
[16]

JMLR , year =

Neural Operator: Learning Maps Between Function Spaces , author =. JMLR , year =

work page
[17]

NeurIPS , year =

PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers , author =. NeurIPS , year =

work page
[18]

Science Advances , year =

Learning the solution operator of parametric partial differential equations with physics-informed DeepONets , author =. Science Advances , year =

work page
[19]

CVPR , year =

Deep Residual Learning for Image Recognition , author =. CVPR , year =

work page
[20]

ICLR , year =

Sequence Level Training with Recurrent Neural Networks , author =. ICLR , year =

work page
[21]

KAN: Kolmogorov-Arnold Networks

KAN: Kolmogorov-Arnold Networks , author =. arXiv preprint arXiv:2404.19756 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Takamoto, Makoto and Praditia, Timothy and Leiteritz, Raphael and MacKinlay, Daniel and Alesiani, Francesco and Pfl\"uger, Dirk and Niepert, Mathias , booktitle =

work page
[23]

Medical Image Computing and Computer-Assisted Intervention (MICCAI) , year =

U-Net: Convolutional Networks for Biomedical Image Segmentation , author =. Medical Image Computing and Computer-Assisted Intervention (MICCAI) , year =

work page
[24]

International Conference on Learning Representations (ICLR) , year =

Decoupled Weight Decay Regularization , author =. International Conference on Learning Representations (ICLR) , year =

work page
[25]

SPIE Defense + Commercial Sensing: Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications , year =

Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates , author =. SPIE Defense + Commercial Sensing: Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications , year =

work page
[26]

and Welling, Max , booktitle =

Brandstetter, Johannes and Worrall, Daniel E. and Welling, Max , booktitle =. Message Passing Neural

work page
[27]

and Beneitez, Miguel and Berger, Marsha and Burkhart, Blakesley and Dalziel, Stuart B

Ohana, Ruben and McCabe, Michael and Meyer, Lucas and Morel, Rudy and Agocs, Fruzsina J. and Beneitez, Miguel and Berger, Marsha and Burkhart, Blakesley and Dalziel, Stuart B. and Fielding, Drummond B. and Fortunato, Daniel and Goldberg, Jared A. and Hirashima, Keiya and Jiang, Yan-Fei and Kerswell, Rich R. and Maddu, Suryanarayana and Miller, Jonah and M...

work page
[28]

arXiv preprint arXiv:2511.21856 , year=

A Comprehensive Review of Phase-Averaged and Phase-Resolving Wave Models for Coastal Modeling Applications , author=. arXiv preprint arXiv:2511.21856 , year=

work page arXiv
[29]

Quarterly Journal of the Royal Meteorological Society , pages=

An algorithm for modelling differential processes utilising a ratio-coupled loss , author=. Quarterly Journal of the Royal Meteorological Society , pages=. 2026 , publisher=

work page 2026
[30]

2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) , pages=

Forecasting rogue waves in oceanic waters , author=. 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) , pages=. 2020 , organization=

work page 2020