pith. sign in

arxiv: 2605.15806 · v2 · pith:RHPKBY35new · submitted 2026-05-15 · 💻 cs.LG

Martingale Neural Operators: Learning Stochastic Marginals via Doob-Meyer Factorization

Pith reviewed 2026-05-20 20:50 UTC · model grok-4.3

classification 💻 cs.LG
keywords neural operatorsstochastic PDEsDoob-Meyer decompositionmartingaleuncertainty quantificationGaussian residualWasserstein distancelow-rank covariance
0
0 comments X

The pith

Martingale Neural Operators map initial conditions directly to the conditional mean and covariance of terminal laws in stochastic PDEs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard neural operators collapse to the conditional mean on stochastic PDEs and lose the variance and tail behavior needed for uncertainty quantification. The paper translates the Doob-Meyer theorem, which splits any semimartingale into a predictable drift and an unpredictable zero-mean martingale, into an architectural prior. The resulting Martingale Neural Operator outputs both the drift-like mean and a low-rank factor B_φ whose Gram matrix is positive semi-definite by construction. Experiments on 1D SPDEs, rough volatility, and 2D tasks show large reductions in Wasserstein distance while preserving one-shot evaluation and resolution invariance.

Core claim

MNO maps an initial condition directly to the conditional mean and covariance of the terminal law, parameterized by a drift-like mean and a low-rank factor B_φ with B_φ^T B_φ positive semi-definite by construction. For the experiments a Gaussian residual instantiation is used. This yields up to 120× lower Wasserstein distance on φ^4 field theory and 68× on stochastic Burgers while evaluating roughly 3× faster than a conditional diffusion baseline at matched training budgets.

What carries the argument

The Doob-Meyer factorization, which decomposes a semimartingale into predictable drift plus zero-mean martingale, realized by outputting a mean field together with a low-rank factor whose product remains positive semi-definite.

If this is right

  • MNO supplies one-shot uncertainty estimates for SPDE solutions without separate Monte Carlo rollouts.
  • The architecture inherits resolution invariance and fast evaluation from standard neural operators.
  • Performance holds on turbulent and rough-volatility problems but degrades on near-deterministic systems such as Gray-Scott.
  • Training cost remains comparable to deterministic operators while delivering distribution-level output.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The low-rank martingale factor could be replaced by other structured residuals to target non-Gaussian tails.
  • The same factorization idea might transfer to surrogate modeling of stochastic processes outside PDEs, such as path-dependent financial models.
  • Because the covariance is produced explicitly, gradient-based optimization over the predicted distribution becomes feasible without sampling.

Load-bearing premise

The Doob-Meyer decomposition can be realized inside a neural operator by a low-rank Gaussian residual that still captures the full stochastic structure of the target SPDEs.

What would settle it

Run many independent Monte Carlo trajectories of a chosen SPDE from the same initial condition, compute the empirical terminal mean and covariance, and check whether the MNO prediction lies inside the sampling error bars of those statistics.

Figures

Figures reproduced from arXiv: 2605.15806 by Kai Hidajat.

Figure 1
Figure 1. Figure 1: Stochastic Burgers’ Equation results. For a representative test initial condition, the figure [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Rough volatility results: comparison of MNO against sequential stochastic baselines across [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: ϕ 4 field theory benchmark (SPDEBench). MNO achieves W2 = 0.0055 while Neural SPDE obtains W2 = 0.6572. The figure shows the learned terminal mean, variance field, and a representative marginal density induced by MNO’s low-rank residual factor. At H = 0.1, MNO achieves mean W2 = 0.0257 on the terminal marginal across five seeds, compared to Neural SDE (W2 = 0.0668, a 2.6× gap) and Neural CDE (W2 = 0.0675).… view at source ↗
Figure 4
Figure 4. Figure 4: 1D zero-shot super-resolution. MNO (trained at resolution 32) achieves stable or improving [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Generative efficiency comparison on stochastic Burgers’. MNO predicts terminal moments [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Diagnostic I: residual centering and scaling. The reported diagnostic records (i) near-zero [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Diagnostic II: Martingale Mirror. Left: one-shot MNO conditioned on u0 matches the plotted terminal marginal density for fBm with H = 0.1. Center: autoregressive reuse produces a wider density; the variance grows closer to linearly than to t 2H = t 0.2 . Right: AR paths have Hˆ ≈ 0.486, while ground-truth paths have Hˆ ≈ 0.101, showing that repeated Markovian stepping changes the temporal scaling [PITH_FU… view at source ↗
Figure 8
Figure 8. Figure 8: Diagnostic III: resolution-transfer variance. MNO is trained at resolution 64 and evaluated [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Diagnostic IV: head-separation audit. Across the three synthetic cases the trained MNO [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Diagnostic V: uncertainty decomposition. The plotted bars compare an under-trained [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: 2D turbulent flow (Kolmogorov-forced, NS-style, [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: 2D resolution transfer. Models are trained at [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: 2D Gray-Scott reaction-diffusion. The reported run gives mean RMSE [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗
read the original abstract

Neural operators excel as deterministic surrogates, but inevitably collapse to the conditional mean when applied to stochastic PDEs, discarding the variance and tail structure upon which uncertainty quantification depends. Recovering this structure typically requires Monte Carlo rollouts or grafted generative models, both of which surrender the one-shot efficiency and resolution invariance that define the operator paradigm. To resolve this, we draw on the Doob-Meyer theorem, which establishes that any semimartingale fundamentally decomposes into a predictable drift and an unpredictable, zero-mean martingale. Translating this theorem into an architectural prior, we introduce the Martingale Neural Operator (MNO). MNO maps an initial condition directly to the conditional mean and covariance of the terminal law, parameterized by a drift-like mean and a low-rank factor $B_\phi$ with $B_\phi^\top B_\phi$ positive semi-definite by construction. For our experiments, we use a Gaussian residual instantiation. Across 1D SPDEs, rough volatility, and 2D operator tasks, MNO reduces Wasserstein distance by up to $120\times$ on $\phi^4$ field theory and $68\times$ on stochastic Burgers, evaluating $\sim 3\times$ faster than a conditional diffusion baseline at matched wall-clock training budgets. On 2D tasks, MNO is comparable to FNO on zero-shot resolution transfer and turbulent flow, while quasi-deterministic systems such as Gray-Scott remain a failure mode.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Martingale Neural Operator (MNO), which translates the Doob-Meyer decomposition into a neural operator architecture. MNO maps an initial condition to the conditional mean and covariance of the terminal law of an SPDE, with the covariance parameterized via a low-rank factor B_φ such that B_φ^T B_φ is positive semi-definite by construction. A Gaussian residual instantiation is used in experiments. The approach is evaluated on 1D SPDEs (including ϕ⁴ field theory and stochastic Burgers), rough volatility, and 2D operator tasks, reporting up to 120× reduction in Wasserstein distance relative to baselines while maintaining resolution invariance and faster evaluation than a conditional diffusion model; quasi-deterministic systems such as Gray-Scott are identified as a failure mode.

Significance. If the empirical claims are supported by controlled experiments with error bars and the Gaussian low-rank residual proves sufficient for the reported Wasserstein metrics, the work would offer a novel architectural prior that recovers stochastic structure in neural operators without Monte Carlo rollouts or grafted generative models. This could advance efficient uncertainty quantification for SPDEs while preserving the one-shot, resolution-invariant properties of the operator paradigm. The explicit grounding in the Doob-Meyer theorem and the built-in PSD guarantee on the covariance factor are technical strengths.

major comments (2)
  1. Abstract: The reported Wasserstein reductions (120× on ϕ⁴ field theory, 68× on stochastic Burgers) and the 3× speed-up versus the conditional diffusion baseline are central to the empirical claim, yet the abstract and available text provide no information on the number of Monte Carlo samples used to estimate ground-truth terminal laws, the number of independent training runs, error bars, or hyperparameter controls. This absence makes it impossible to assess whether the gains are robust or sensitive to implementation details.
  2. Method (low-rank Gaussian residual instantiation): The architecture maps to conditional mean plus low-rank covariance B_φ^T B_φ and instantiates the residual as Gaussian. For the ϕ⁴ and stochastic Burgers benchmarks, whose terminal marginals are known to be non-Gaussian, it is unclear whether matching only the first two moments is sufficient to explain the cited Wasserstein improvements or whether higher-order structure is being captured implicitly; the Gray-Scott failure mode suggests the approximation can break down when the martingale component deviates from this form.
minor comments (2)
  1. Notation: The construction of the covariance from the low-rank factor B_φ is described at a high level; an explicit equation showing the mapping from network output to the PSD matrix would improve clarity.
  2. The abstract states that MNO is 'comparable to FNO on zero-shot resolution transfer and turbulent flow'; a table or figure quantifying the resolution-transfer error for both models would make this comparison precise.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment below and describe the revisions we intend to make.

read point-by-point responses
  1. Referee: Abstract: The reported Wasserstein reductions (120× on ϕ⁴ field theory, 68× on stochastic Burgers) and the 3× speed-up versus the conditional diffusion baseline are central to the empirical claim, yet the abstract and available text provide no information on the number of Monte Carlo samples used to estimate ground-truth terminal laws, the number of independent training runs, error bars, or hyperparameter controls. This absence makes it impossible to assess whether the gains are robust or sensitive to implementation details.

    Authors: We agree that the absence of these experimental details limits the ability to evaluate robustness. The current manuscript does not report the Monte Carlo sample count used for ground-truth terminal laws, the number of independent training runs, error bars, or hyperparameter controls in the abstract or main text. In the revised version we will (i) update the abstract with a concise statement of the evaluation protocol and (ii) insert a new “Experimental Setup” subsection that specifies the Monte Carlo sample size (10 000 trajectories for terminal-law estimation), the number of independent runs (five seeds), standard-deviation error bars, and the hyperparameter search procedure. These additions will make the reported Wasserstein reductions and speed-up claims directly verifiable. revision: yes

  2. Referee: Method (low-rank Gaussian residual instantiation): The architecture maps to conditional mean plus low-rank covariance B_φ^T B_φ and instantiates the residual as Gaussian. For the ϕ⁴ and stochastic Burgers benchmarks, whose terminal marginals are known to be non-Gaussian, it is unclear whether matching only the first two moments is sufficient to explain the cited Wasserstein improvements or whether higher-order structure is being captured implicitly; the Gray-Scott failure mode suggests the approximation can break down when the martingale component deviates from this form.

    Authors: We acknowledge that the terminal marginals of the ϕ⁴ and stochastic Burgers problems are non-Gaussian. The MNO architecture, derived from the Doob-Meyer decomposition, is explicitly constructed to recover the conditional mean and a low-rank covariance factor; the Gaussian residual is an instantiation chosen for tractability. The observed Wasserstein improvements arise from correctly predicting these first- and second-order statistics, which mean-only baselines omit. We do not claim that higher-order moments are recovered beyond what the low-rank Gaussian supplies. The Gray-Scott case is included precisely to illustrate the breakdown that occurs when the martingale component deviates from this form. To remove ambiguity we will expand the Method and Discussion sections to (a) state the Gaussian assumption explicitly, (b) note that Wasserstein distance is sensitive to the moments we target, and (c) discuss the conditions under which the approximation remains adequate. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation grounded in external theorem and data-driven training

full rationale

The paper translates the external Doob-Meyer theorem into an architectural prior, with MNO learning to output a drift-like conditional mean and low-rank factor B_φ (PSD enforced by construction via B_φ^T B_φ) plus a Gaussian residual instantiation for experiments. This parameterization is a design choice to represent the martingale component, not a self-definitional loop where outputs equal inputs by fiat. No fitted parameters are relabeled as predictions, no load-bearing self-citations appear, and no uniqueness theorems or ansatzes are imported from prior author work. Performance metrics (Wasserstein reductions on φ⁴ and Burgers) are empirical results from training and evaluation rather than tautological reductions. The architecture remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the applicability of the Doob-Meyer theorem to the stochastic processes studied and on the sufficiency of a low-rank Gaussian parameterization to represent the martingale component.

free parameters (1)
  • low-rank factor B_φ
    Learned parameterization of the covariance structure; its rank and values are determined during training.
axioms (1)
  • domain assumption Any semimartingale decomposes into a predictable drift and a zero-mean martingale per the Doob-Meyer theorem.
    Invoked to justify mapping initial conditions to mean and covariance of the terminal law.
invented entities (1)
  • Martingale Neural Operator (MNO) no independent evidence
    purpose: Neural operator architecture that outputs both conditional mean and covariance via martingale factorization.
    New model class introduced in the paper.

pith-pipeline@v0.9.0 · 5790 in / 1372 out tokens · 57015 ms · 2026-05-20T20:50:51.919254+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.