ELADO: Elliptic PDE Assessment Datasets for Operator Learning

Frank Ehebrecht; Martin Atzmueller; Toni Scharle

arxiv: 2606.20771 · v1 · pith:I6YSDJ64new · submitted 2026-06-18 · 💻 cs.LG

ELADO: Elliptic PDE Assessment Datasets for Operator Learning

Frank Ehebrecht , Toni Scharle , Martin Atzmueller This is my paper

Pith reviewed 2026-06-26 18:17 UTC · model grok-4.3

classification 💻 cs.LG

keywords neural operatorselliptic PDEsbenchmark datasetsoperator learningfailure modesheavy-tailed distributionsspectral shiftinput sensitivity

0 comments

The pith

ELADO benchmark datasets isolate how heavy-tailed targets, spectral shifts, and input sensitivity degrade neural operator accuracy on elliptic PDEs in ways mean relative L2 error misses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds ELADO as a collection of datasets for training and testing neural operators on elliptic PDEs such as Poisson's equation and the Helmholtz equation with non-constant coefficients. A controllable data-generating process produces separate datasets that each target one difficulty: heavy-tailed solution values from light-tailed coefficients, shifts in the spectral content of inputs, heavy tails in the frequency domain of solutions, measured input sensitivity via local Lipschitz constants, and controlled effects of input signal complexity. Evaluations across multiple neural operator models show clear accuracy drops on these datasets. A reader would care because operator learning for physical systems needs to handle these naturally occurring challenges rather than just average-case performance.

Core claim

ELADO supplies datasets generated by a controllable process around Poisson's and Helmholtz equations that separately isolate heavy-tailed solution distributions, spectral input shifts, frequency-domain heavy tails, input sensitivity, and signal complexity effects; across tested neural operator architectures these factors each produce substantial prediction degradation that the mean relative L2 error metric can obscure.

What carries the argument

A controllable data-generating process that creates separate datasets each isolating one source of difficulty for elliptic PDE operator learning.

If this is right

Heavy-tailed solution distributions from light-tailed coefficient fields reduce prediction accuracy.
Spectral distribution shifts between training and test inputs degrade operator performance.
Input sensitivity, quantified by empirical local Lipschitz analysis, leads to larger errors on certain inputs.
Heavy tails in the frequency domain of solutions arise even from light-tailed coefficients and hurt accuracy.
Standard mean relative L2 error can mask these specific failure modes on elliptic PDE tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

New evaluation metrics that separately track tail behavior and spectral properties may be needed alongside average error.
Architectures could incorporate explicit normalization or frequency-aware layers to mitigate the isolated sensitivities.
The same isolation approach could be applied to time-dependent or nonlinear PDEs to check whether the same failure modes dominate.

Load-bearing premise

The controllable data-generating process isolates each targeted difficulty without creating confounding interactions among them.

What would settle it

Training the same neural operator architectures on the ELADO datasets and finding that mean relative L2 error fully tracks the observed accuracy losses with no extra degradation attributable to heavy tails, spectral shift, or input sensitivity.

Figures

Figures reproduced from arXiv: 2606.20771 by Frank Ehebrecht, Martin Atzmueller, Toni Scharle.

**Figure 1.** Figure 1: Histograms for A1-train. The values of the input field a are short-tailed, while the values of the output field u are heavy-tailed. The right panel shows the distribution of per-sample maxima of u; the shaded region marks samples exceeding the 95th percentile. The A1-tail dataset is constructed by drawing exclusively from this region. 0.00 0.02 0.04 0.06 0.08 0.10 Relative L2 Error 0.0 0.5 1.0 1.5 2.0 2.5… view at source ↗

**Figure 2.** Figure 2: Joint distribution of the relative L 2 prediction error of a trained FNO model and the local amplification factor λ for samples from A1-train, shown as a 2D histogram (log-scale color). The black curve shows the conditional mean of λ: within the oversampling range (blue) it is averaged over the fixed bins; beyond it, a sliding-window average is used. Dashed vertical lines indicate the bin boundaries used … view at source ↗

**Figure 3.** Figure 3: Histograms of the relative L 2 prediction error (log-scaled y-axis) for the four train → test configurations of the oversampling experiment. The red dashed lines marks the mean relative error ⟨L 2 ⟩ (reported in each tile); the shaded regions show the oversampling interval. The errors were only evaluated on this interval. σs,0 and σjump continuously over the intervals σ˜s,0 ∈ [0.01, 0.07], σ˜jump ∈ [0.01,… view at source ↗

**Figure 4.** Figure 4: Distributional characteristics of C1-train. Left: distribution of spectral centroid values; the shaded region marks samples exceeding the 95th percentile, from which C1- tail is constructed. Center: distribution of per-sample maximum values of u. Right: joint distribution of spectral centroid values and maximum value of a (log-scale color). The spectral centroid distribution is right-skewed, producing a he… view at source ↗

**Figure 5.** Figure 5: shows a visualization of datasets B1-ramp and D1-ramp. These were used as a diagnostic tool to find the parameters of datasets B1-test and D1-test [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

read the original abstract

We introduce ELADO (Elliptic PDE Assessment Datasets for Operator Learning), a systematic benchmark suite constructed to show and quantify failure modes of neural operator architectures when learning solution operators of elliptic PDEs. While the benchmarks of existing datasets focus on average case performance, the ELADO datasets are constructed to highlight challenges that arise naturally in elliptic PDE problems. In particular, we construct several datasets built around Poisson's equation and the Helmholtz equation, each with non-constant coefficients. We define a controllable data-generating process to create datasets, that are designed to isolate a distinct source of difficulty. Specifically, these are (1) heavy-tailed solution distributions arising from light-tailed coefficient field distributions, (2) spectral distribution shift of the input data, (3) heavy-tailed distributions in the frequency domain of solutions, arising from light-tailed coefficient field distributions, (4) input sensitivity of learned operators, quantified by an empirical local Lipschitz analysis, and (5) the effect of input signal complexity on prediction accuracy under controlled amplitude normalization. We evaluate several neural operator architectures across all datasets and show that heavy-tailed targets, spectral shift, and input sensitivity each cause substantial degradation of the prediction accuracy that standard datasets and metrics (e.g., the mean relative $L^2$ error) may obscure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ELADO builds new elliptic PDE datasets targeting specific operator failures but the isolation of those failures is not shown to be clean.

read the letter

The main things to know are that the authors created five datasets for Poisson and Helmholtz equations using a controllable data-generating process, each meant to hit one difficulty—heavy-tailed solutions from light-tailed coefficients, spectral shift in inputs, heavy-tailed frequency content, input sensitivity via local Lipschitz, and input complexity under amplitude normalization—and that their tests on neural operators show these cause accuracy drops missed by standard mean relative L2. This moves past average-case benchmarks and tries to make the hard cases explicit.

The construction is the actual novelty. Prior operator learning datasets do not isolate these modes in the same controlled way for elliptic problems, so the idea of a systematic suite is useful for people who need to check robustness.

The soft spot is the isolation claim. Elliptic operators couple coefficient statistics to solution tails, spectral content, and local sensitivity through the Green's function, so a change aimed at heavy tails will typically shift other properties too. The paper gives no evidence of cross-checks, such as confirming spectral content stays normal on the heavy-tail dataset or reporting how the other axes behave across sets. Without those diagnostics or any quantitative results with error bars, the attribution of performance drops to distinct causes stays unverified.

This is for researchers testing neural operators on PDEs who want benchmarks that stress real failure modes rather than average performance. A reader focused on reliable learned solvers would find the targeted cases worth looking at.

It deserves peer review because benchmark work can shape how the field measures progress, even if this version needs stronger validation of the data generation.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces ELADO, a benchmark suite of datasets for neural operator learning on elliptic PDEs (Poisson and Helmholtz with non-constant coefficients). It defines a controllable data-generating process intended to isolate five distinct difficulties: (1) heavy-tailed solution distributions from light-tailed coefficient fields, (2) spectral distribution shift in inputs, (3) heavy-tailed frequency content in solutions, (4) input sensitivity measured by empirical local Lipschitz constants, and (5) input complexity under amplitude normalization. Evaluations of several neural operator architectures on these datasets are reported to show that heavy-tailed targets, spectral shift, and input sensitivity produce substantial accuracy degradation that is obscured by standard mean relative L² error on conventional benchmarks.

Significance. If the claimed isolation holds and the datasets are released with reproducible generation code, ELADO would supply a useful addition to the operator-learning literature by moving beyond average-case metrics to targeted stress tests for elliptic problems. The empirical focus and explicit construction of failure-mode-specific data are strengths that could guide architecture improvements, provided the attribution of performance drops to individual factors is secured.

major comments (2)

[Data Generation / Methods] The central claim that each dataset isolates one targeted difficulty without confounding interactions is load-bearing for all reported conclusions, yet the manuscript provides no verification (e.g., cross-measurement of spectral content or local Lipschitz constants on the heavy-tailed-coefficient dataset) that the DGP achieves orthogonality; elliptic Green's functions couple coefficient statistics to both solution tails and eigenstructure, so the attribution step remains unsecured.
[Evaluation Results] § on evaluation results: the abstract asserts 'substantial degradation' and that standard metrics 'may obscure' it, but no quantitative tables, error bars, or comparisons against baseline datasets are supplied in the available text; without these numbers the practical magnitude of the claimed effect cannot be assessed.

minor comments (2)

[Abstract] The abstract lists five datasets but does not name the specific neural operator architectures evaluated or the precise PDE instances (e.g., domain, boundary conditions) used for each.
[Methods] Notation for the empirical local Lipschitz analysis and the amplitude-normalization procedure should be defined explicitly before the results section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions we will make.

read point-by-point responses

Referee: [Data Generation / Methods] The central claim that each dataset isolates one targeted difficulty without confounding interactions is load-bearing for all reported conclusions, yet the manuscript provides no verification (e.g., cross-measurement of spectral content or local Lipschitz constants on the heavy-tailed-coefficient dataset) that the DGP achieves orthogonality; elliptic Green's functions couple coefficient statistics to both solution tails and eigenstructure, so the attribution step remains unsecured.

Authors: We agree that explicit verification of isolation is necessary to support the attribution of performance drops to individual factors. The data-generating process was designed with independent controls on coefficient statistics, spectral properties, and amplitude, but we did not include cross-dataset measurements in the original submission. In the revised version we will add these verifications, reporting spectral content, frequency-domain statistics, and empirical local Lipschitz constants across all datasets to quantify any residual confounding. We will also discuss the role of the elliptic Green's function and how the chosen parameter ranges limit unintended couplings. revision: yes
Referee: [Evaluation Results] § on evaluation results: the abstract asserts 'substantial degradation' and that standard metrics 'may obscure' it, but no quantitative tables, error bars, or comparisons against baseline datasets are supplied in the available text; without these numbers the practical magnitude of the claimed effect cannot be assessed.

Authors: We apologize that the quantitative results were not presented with sufficient detail in the reviewed manuscript. The full paper contains evaluations of multiple neural operator architectures on each dataset, but we will expand the evaluation section to include complete tables of mean relative L² errors with standard deviations across repeated runs, together with direct comparisons against conventional benchmarks. These additions will make the magnitude of the reported degradation and the limitations of average-case metrics explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical benchmark construction with no derivations or self-referential loops

full rationale

The paper presents an empirical benchmark suite for neural operators on elliptic PDEs, defining controllable data-generating processes to create datasets targeting specific difficulties (heavy-tailed solutions, spectral shifts, etc.). No derivation chain, first-principles predictions, fitted parameters renamed as outputs, or load-bearing self-citations appear in the abstract or described structure. The work is self-contained as dataset construction and evaluation against external architectures and metrics, with no steps reducing by construction to their own inputs. Attribution of failure modes to isolated factors is an empirical claim open to external verification rather than a logical reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities; the contribution is dataset construction using standard elliptic PDEs.

pith-pipeline@v0.9.1-grok · 5754 in / 943 out tokens · 18797 ms · 2026-06-26T18:17:40.229181+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 7 canonical work pages · 2 internal anchors

[1]

Nature Reviews Physics6(5), 320–328 (2024)

Azizzadenesheli, K., Kovachki, N., Li, Z., Liu-Schiaffini, M., Kossaifi, J., Anandku- mar, A.: Neural operators for accelerating scientific simulations and design. Nature Reviews Physics6(5), 320–328 (2024)

2024
[2]

IEEE transactions on neural networks6(4), 911–917 (1995)

Chen, T., Chen, H.: Universal approximation to nonlinear operators by neural net- works with arbitrary activation functions and its application to dynamical systems. IEEE transactions on neural networks6(4), 911–917 (1995)

1995
[3]

arXiv preprint arXiv:2209.15616 (2022)

Gupta, J.K., Brandstetter, J.: Towards multi-spatiotemporal-scale generalized pde modeling. arXiv preprint arXiv:2209.15616 (2022)

work page arXiv 2022
[4]

arXivabs/2108.08481(2021)

Kovachki, N.B., Li, Z.Y., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A.M., Anandkumar, A.: Neural operator: Learning maps between function spaces. arXivabs/2108.08481(2021)

work page arXiv 2021
[5]

JMLR24(388), 1–26 (2023)

Li, Z., Huang, D.Z., Liu, B., Anandkumar, A.: Fourier neural operator with learned deformations for pdes on general geometries. JMLR24(388), 1–26 (2023)

2023
[6]

Fourier Neural Operator for Parametric Partial Differential Equations

Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., Anandkumar, A.: Fourier neural operator for parametric partial differential equa- tions. arXiv preprint arXiv:2010.08895 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[7]

Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Stuart, A., Bhattacharya, K., Anandkumar, A.: Multipole graph neural operator for parametric partial differen- tial equations. Proc. NeurIPS33, 6755–6766 (2020)

2020
[8]

arXiv preprint arXiv:2111.04860 (2021)

Liu, L., Cai, W.: Multiscale deeponet for nonlinear operators in oscillatory function spaces for building seismic wave responses. arXiv preprint arXiv:2111.04860 (2021)

work page arXiv 2021
[9]

arXiv preprint arXiv:2209.08397 (2022)

Liu, L., Nath, K., Cai, W.: A causality-deeponet for causal responses of linear dynamical systems. arXiv preprint arXiv:2209.08397 (2022)

work page arXiv 2022
[10]

DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

Lu, L., Jin, P., Karniadakis, G.E.: Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1910
[11]

Takamoto, M., Praditia, T., Leiteritz, R., MacKinlay, D., Alesiani, F., Pflüger, D., Niepert, M.: Pdebench: An extensive benchmark for scientific machine learning (2024), https://arxiv.org/abs/2210.07182

work page arXiv 2024

[1] [1]

Nature Reviews Physics6(5), 320–328 (2024)

Azizzadenesheli, K., Kovachki, N., Li, Z., Liu-Schiaffini, M., Kossaifi, J., Anandku- mar, A.: Neural operators for accelerating scientific simulations and design. Nature Reviews Physics6(5), 320–328 (2024)

2024

[2] [2]

IEEE transactions on neural networks6(4), 911–917 (1995)

Chen, T., Chen, H.: Universal approximation to nonlinear operators by neural net- works with arbitrary activation functions and its application to dynamical systems. IEEE transactions on neural networks6(4), 911–917 (1995)

1995

[3] [3]

arXiv preprint arXiv:2209.15616 (2022)

Gupta, J.K., Brandstetter, J.: Towards multi-spatiotemporal-scale generalized pde modeling. arXiv preprint arXiv:2209.15616 (2022)

work page arXiv 2022

[4] [4]

arXivabs/2108.08481(2021)

Kovachki, N.B., Li, Z.Y., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A.M., Anandkumar, A.: Neural operator: Learning maps between function spaces. arXivabs/2108.08481(2021)

work page arXiv 2021

[5] [5]

JMLR24(388), 1–26 (2023)

Li, Z., Huang, D.Z., Liu, B., Anandkumar, A.: Fourier neural operator with learned deformations for pdes on general geometries. JMLR24(388), 1–26 (2023)

2023

[6] [6]

Fourier Neural Operator for Parametric Partial Differential Equations

Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., Anandkumar, A.: Fourier neural operator for parametric partial differential equa- tions. arXiv preprint arXiv:2010.08895 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[7] [7]

Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Stuart, A., Bhattacharya, K., Anandkumar, A.: Multipole graph neural operator for parametric partial differen- tial equations. Proc. NeurIPS33, 6755–6766 (2020)

2020

[8] [8]

arXiv preprint arXiv:2111.04860 (2021)

Liu, L., Cai, W.: Multiscale deeponet for nonlinear operators in oscillatory function spaces for building seismic wave responses. arXiv preprint arXiv:2111.04860 (2021)

work page arXiv 2021

[9] [9]

arXiv preprint arXiv:2209.08397 (2022)

Liu, L., Nath, K., Cai, W.: A causality-deeponet for causal responses of linear dynamical systems. arXiv preprint arXiv:2209.08397 (2022)

work page arXiv 2022

[10] [10]

DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

Lu, L., Jin, P., Karniadakis, G.E.: Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1910

[11] [11]

Takamoto, M., Praditia, T., Leiteritz, R., MacKinlay, D., Alesiani, F., Pflüger, D., Niepert, M.: Pdebench: An extensive benchmark for scientific machine learning (2024), https://arxiv.org/abs/2210.07182

work page arXiv 2024