Martingale Neural Operators: Learning Stochastic Marginals via Doob-Meyer Factorization

Kai Hidajat

arxiv: 2605.15806 · v2 · pith:RHPKBY35new · submitted 2026-05-15 · 💻 cs.LG

Martingale Neural Operators: Learning Stochastic Marginals via Doob-Meyer Factorization

Kai Hidajat This is my paper

Pith reviewed 2026-05-20 20:50 UTC · model grok-4.3

classification 💻 cs.LG

keywords neural operatorsstochastic PDEsDoob-Meyer decompositionmartingaleuncertainty quantificationGaussian residualWasserstein distancelow-rank covariance

0 comments

The pith

Martingale Neural Operators map initial conditions directly to the conditional mean and covariance of terminal laws in stochastic PDEs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard neural operators collapse to the conditional mean on stochastic PDEs and lose the variance and tail behavior needed for uncertainty quantification. The paper translates the Doob-Meyer theorem, which splits any semimartingale into a predictable drift and an unpredictable zero-mean martingale, into an architectural prior. The resulting Martingale Neural Operator outputs both the drift-like mean and a low-rank factor B_φ whose Gram matrix is positive semi-definite by construction. Experiments on 1D SPDEs, rough volatility, and 2D tasks show large reductions in Wasserstein distance while preserving one-shot evaluation and resolution invariance.

Core claim

MNO maps an initial condition directly to the conditional mean and covariance of the terminal law, parameterized by a drift-like mean and a low-rank factor B_φ with B_φ^T B_φ positive semi-definite by construction. For the experiments a Gaussian residual instantiation is used. This yields up to 120× lower Wasserstein distance on φ^4 field theory and 68× on stochastic Burgers while evaluating roughly 3× faster than a conditional diffusion baseline at matched training budgets.

What carries the argument

The Doob-Meyer factorization, which decomposes a semimartingale into predictable drift plus zero-mean martingale, realized by outputting a mean field together with a low-rank factor whose product remains positive semi-definite.

If this is right

MNO supplies one-shot uncertainty estimates for SPDE solutions without separate Monte Carlo rollouts.
The architecture inherits resolution invariance and fast evaluation from standard neural operators.
Performance holds on turbulent and rough-volatility problems but degrades on near-deterministic systems such as Gray-Scott.
Training cost remains comparable to deterministic operators while delivering distribution-level output.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The low-rank martingale factor could be replaced by other structured residuals to target non-Gaussian tails.
The same factorization idea might transfer to surrogate modeling of stochastic processes outside PDEs, such as path-dependent financial models.
Because the covariance is produced explicitly, gradient-based optimization over the predicted distribution becomes feasible without sampling.

Load-bearing premise

The Doob-Meyer decomposition can be realized inside a neural operator by a low-rank Gaussian residual that still captures the full stochastic structure of the target SPDEs.

What would settle it

Run many independent Monte Carlo trajectories of a chosen SPDE from the same initial condition, compute the empirical terminal mean and covariance, and check whether the MNO prediction lies inside the sampling error bars of those statistics.

Figures

Figures reproduced from arXiv: 2605.15806 by Kai Hidajat.

**Figure 2.** Figure 2: Rough volatility results: comparison of MNO against sequential stochastic baselines across [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: ϕ 4 field theory benchmark (SPDEBench). MNO achieves W2 = 0.0055 while Neural SPDE obtains W2 = 0.6572. The figure shows the learned terminal mean, variance field, and a representative marginal density induced by MNO’s low-rank residual factor. At H = 0.1, MNO achieves mean W2 = 0.0257 on the terminal marginal across five seeds, compared to Neural SDE (W2 = 0.0668, a 2.6× gap) and Neural CDE (W2 = 0.0675).… view at source ↗

**Figure 4.** Figure 4: 1D zero-shot super-resolution. MNO (trained at resolution 32) achieves stable or improving [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Generative efficiency comparison on stochastic Burgers’. MNO predicts terminal moments [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Diagnostic I: residual centering and scaling. The reported diagnostic records (i) near-zero [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Diagnostic II: Martingale Mirror. Left: one-shot MNO conditioned on u0 matches the plotted terminal marginal density for fBm with H = 0.1. Center: autoregressive reuse produces a wider density; the variance grows closer to linearly than to t 2H = t 0.2 . Right: AR paths have Hˆ ≈ 0.486, while ground-truth paths have Hˆ ≈ 0.101, showing that repeated Markovian stepping changes the temporal scaling [PITH_FU… view at source ↗

**Figure 8.** Figure 8: Diagnostic III: resolution-transfer variance. MNO is trained at resolution 64 and evaluated [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Diagnostic IV: head-separation audit. Across the three synthetic cases the trained MNO [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Diagnostic V: uncertainty decomposition. The plotted bars compare an under-trained [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: 2D turbulent flow (Kolmogorov-forced, NS-style, [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗

**Figure 12.** Figure 12: 2D resolution transfer. Models are trained at [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗

**Figure 13.** Figure 13: 2D Gray-Scott reaction-diffusion. The reported run gives mean RMSE [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗

read the original abstract

Neural operators excel as deterministic surrogates, but inevitably collapse to the conditional mean when applied to stochastic PDEs, discarding the variance and tail structure upon which uncertainty quantification depends. Recovering this structure typically requires Monte Carlo rollouts or grafted generative models, both of which surrender the one-shot efficiency and resolution invariance that define the operator paradigm. To resolve this, we draw on the Doob-Meyer theorem, which establishes that any semimartingale fundamentally decomposes into a predictable drift and an unpredictable, zero-mean martingale. Translating this theorem into an architectural prior, we introduce the Martingale Neural Operator (MNO). MNO maps an initial condition directly to the conditional mean and covariance of the terminal law, parameterized by a drift-like mean and a low-rank factor $B_\phi$ with $B_\phi^\top B_\phi$ positive semi-definite by construction. For our experiments, we use a Gaussian residual instantiation. Across 1D SPDEs, rough volatility, and 2D operator tasks, MNO reduces Wasserstein distance by up to $120\times$ on $\phi^4$ field theory and $68\times$ on stochastic Burgers, evaluating $\sim 3\times$ faster than a conditional diffusion baseline at matched wall-clock training budgets. On 2D tasks, MNO is comparable to FNO on zero-shot resolution transfer and turbulent flow, while quasi-deterministic systems such as Gray-Scott remain a failure mode.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MNO translates Doob-Meyer into a neural operator to output mean and low-rank covariance directly, but the Gaussian residual may only match moments rather than full non-Gaussian marginals.

read the letter

The main point is that the Martingale Neural Operator uses the Doob-Meyer theorem to map initial conditions straight to the conditional mean and covariance of the terminal law in stochastic PDEs, using a low-rank factor for the martingale part. This is new in how it turns the decomposition into an architectural bias for neural operators. It keeps the efficiency of operator learning while adding variance information without extra sampling. The results show solid improvements, like big drops in Wasserstein distance on the phi^4 field theory and stochastic Burgers examples, and it runs faster than diffusion baselines. On 2D tasks it matches FNO for resolution transfer. The soft spot is the Gaussian residual choice. Since many of the target SPDEs have non-Gaussian marginals, matching just mean and covariance might not fully explain the reported gains on distribution distances. The Gray-Scott case being a failure mode already shows where this breaks down. More details on controls and whether higher moments are checked would help. This is aimed at researchers doing uncertainty quantification with neural operators on stochastic systems. Readers who care about efficient surrogates for SPDEs would get something out of it. The work has a clear theorem-based idea and some empirical backing, so it deserves a serious referee. I would send it out for peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Martingale Neural Operator (MNO), which translates the Doob-Meyer decomposition into a neural operator architecture. MNO maps an initial condition to the conditional mean and covariance of the terminal law of an SPDE, with the covariance parameterized via a low-rank factor B_φ such that B_φ^T B_φ is positive semi-definite by construction. A Gaussian residual instantiation is used in experiments. The approach is evaluated on 1D SPDEs (including ϕ⁴ field theory and stochastic Burgers), rough volatility, and 2D operator tasks, reporting up to 120× reduction in Wasserstein distance relative to baselines while maintaining resolution invariance and faster evaluation than a conditional diffusion model; quasi-deterministic systems such as Gray-Scott are identified as a failure mode.

Significance. If the empirical claims are supported by controlled experiments with error bars and the Gaussian low-rank residual proves sufficient for the reported Wasserstein metrics, the work would offer a novel architectural prior that recovers stochastic structure in neural operators without Monte Carlo rollouts or grafted generative models. This could advance efficient uncertainty quantification for SPDEs while preserving the one-shot, resolution-invariant properties of the operator paradigm. The explicit grounding in the Doob-Meyer theorem and the built-in PSD guarantee on the covariance factor are technical strengths.

major comments (2)

Abstract: The reported Wasserstein reductions (120× on ϕ⁴ field theory, 68× on stochastic Burgers) and the 3× speed-up versus the conditional diffusion baseline are central to the empirical claim, yet the abstract and available text provide no information on the number of Monte Carlo samples used to estimate ground-truth terminal laws, the number of independent training runs, error bars, or hyperparameter controls. This absence makes it impossible to assess whether the gains are robust or sensitive to implementation details.
Method (low-rank Gaussian residual instantiation): The architecture maps to conditional mean plus low-rank covariance B_φ^T B_φ and instantiates the residual as Gaussian. For the ϕ⁴ and stochastic Burgers benchmarks, whose terminal marginals are known to be non-Gaussian, it is unclear whether matching only the first two moments is sufficient to explain the cited Wasserstein improvements or whether higher-order structure is being captured implicitly; the Gray-Scott failure mode suggests the approximation can break down when the martingale component deviates from this form.

minor comments (2)

Notation: The construction of the covariance from the low-rank factor B_φ is described at a high level; an explicit equation showing the mapping from network output to the PSD matrix would improve clarity.
The abstract states that MNO is 'comparable to FNO on zero-shot resolution transfer and turbulent flow'; a table or figure quantifying the resolution-transfer error for both models would make this comparison precise.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment below and describe the revisions we intend to make.

read point-by-point responses

Referee: Abstract: The reported Wasserstein reductions (120× on ϕ⁴ field theory, 68× on stochastic Burgers) and the 3× speed-up versus the conditional diffusion baseline are central to the empirical claim, yet the abstract and available text provide no information on the number of Monte Carlo samples used to estimate ground-truth terminal laws, the number of independent training runs, error bars, or hyperparameter controls. This absence makes it impossible to assess whether the gains are robust or sensitive to implementation details.

Authors: We agree that the absence of these experimental details limits the ability to evaluate robustness. The current manuscript does not report the Monte Carlo sample count used for ground-truth terminal laws, the number of independent training runs, error bars, or hyperparameter controls in the abstract or main text. In the revised version we will (i) update the abstract with a concise statement of the evaluation protocol and (ii) insert a new “Experimental Setup” subsection that specifies the Monte Carlo sample size (10 000 trajectories for terminal-law estimation), the number of independent runs (five seeds), standard-deviation error bars, and the hyperparameter search procedure. These additions will make the reported Wasserstein reductions and speed-up claims directly verifiable. revision: yes
Referee: Method (low-rank Gaussian residual instantiation): The architecture maps to conditional mean plus low-rank covariance B_φ^T B_φ and instantiates the residual as Gaussian. For the ϕ⁴ and stochastic Burgers benchmarks, whose terminal marginals are known to be non-Gaussian, it is unclear whether matching only the first two moments is sufficient to explain the cited Wasserstein improvements or whether higher-order structure is being captured implicitly; the Gray-Scott failure mode suggests the approximation can break down when the martingale component deviates from this form.

Authors: We acknowledge that the terminal marginals of the ϕ⁴ and stochastic Burgers problems are non-Gaussian. The MNO architecture, derived from the Doob-Meyer decomposition, is explicitly constructed to recover the conditional mean and a low-rank covariance factor; the Gaussian residual is an instantiation chosen for tractability. The observed Wasserstein improvements arise from correctly predicting these first- and second-order statistics, which mean-only baselines omit. We do not claim that higher-order moments are recovered beyond what the low-rank Gaussian supplies. The Gray-Scott case is included precisely to illustrate the breakdown that occurs when the martingale component deviates from this form. To remove ambiguity we will expand the Method and Discussion sections to (a) state the Gaussian assumption explicitly, (b) note that Wasserstein distance is sensitive to the moments we target, and (c) discuss the conditions under which the approximation remains adequate. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation grounded in external theorem and data-driven training

full rationale

The paper translates the external Doob-Meyer theorem into an architectural prior, with MNO learning to output a drift-like conditional mean and low-rank factor B_φ (PSD enforced by construction via B_φ^T B_φ) plus a Gaussian residual instantiation for experiments. This parameterization is a design choice to represent the martingale component, not a self-definitional loop where outputs equal inputs by fiat. No fitted parameters are relabeled as predictions, no load-bearing self-citations appear, and no uniqueness theorems or ansatzes are imported from prior author work. Performance metrics (Wasserstein reductions on φ⁴ and Burgers) are empirical results from training and evaluation rather than tautological reductions. The architecture remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the applicability of the Doob-Meyer theorem to the stochastic processes studied and on the sufficiency of a low-rank Gaussian parameterization to represent the martingale component.

free parameters (1)

low-rank factor B_φ
Learned parameterization of the covariance structure; its rank and values are determined during training.

axioms (1)

domain assumption Any semimartingale decomposes into a predictable drift and a zero-mean martingale per the Doob-Meyer theorem.
Invoked to justify mapping initial conditions to mean and covariance of the terminal law.

invented entities (1)

Martingale Neural Operator (MNO) no independent evidence
purpose: Neural operator architecture that outputs both conditional mean and covariance via martingale factorization.
New model class introduced in the paper.

pith-pipeline@v0.9.0 · 5790 in / 1372 out tokens · 57015 ms · 2026-05-20T20:50:51.919254+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MNO maps an initial condition directly to the conditional mean and covariance of the terminal law, parameterized by a drift-like mean and a low-rank factor B_φ with B_φ^T B_φ positive semi-definite by construction. For our experiments, we use a Gaussian residual instantiation.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We draw on the Doob-Meyer theorem, which establishes that any semimartingale fundamentally decomposes into a predictable drift and an unpredictable, zero-mean martingale.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.