arxiv: 2601.20756 · v2 · submitted 2026-01-28 · 💻 cs.LG · stat.ML

Recognition: 2 theorem links

· Lean Theorem

Supervised Guidance Training for Infinite-Dimensional Diffusion Models

Elizabeth L. Baker , Alexander Denker , Jes Frellsen

Authors on Pith no claims yet

Pith reviewed 2026-05-16 10:27 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords diffusion modelsinfinite-dimensionalDoob h-transformscore matchingBayesian inverse problemsposterior samplingfunction space

0 comments

The pith

Infinite-dimensional diffusion models can be conditioned to sample from posteriors using an extension of Doob's h-transform and supervised guidance training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a way to condition score-based diffusion models defined on infinite-dimensional function spaces so they sample from a posterior given noisy observations. It proves that an infinite-dimensional version of Doob's h-transform yields a decomposition of the conditional score into an unconditional score plus a guidance term. Because the guidance term cannot be evaluated directly, the authors introduce a simulation-free score-matching objective called Supervised Guidance Training that fine-tunes a pre-trained model to learn the guidance. The result is the first practical function-space method for turning an expressive diffusion prior into accurate posterior samples for Bayesian inverse problems.

Core claim

We prove that the models can be conditioned using an infinite-dimensional extension of Doob's h-transform, and that the conditional score decomposes into an unconditional score and a guidance term. As the guidance term is intractable, we propose a simulation-free score matching objective (called Supervised Guidance Training) enabling efficient and stable posterior sampling. We illustrate the theory with numerical examples on Bayesian inverse problems in function spaces.

What carries the argument

The infinite-dimensional Doob's h-transform, which decomposes the conditional score into the unconditional score plus an additive guidance term, together with the supervised guidance training objective that approximates the intractable guidance via score matching.

If this is right

A pre-trained diffusion model can be fine-tuned once to produce samples from the posterior without any simulation of the reverse process.
The same procedure applies directly to inverse problems whose unknowns are functions, such as those governed by partial differential equations.
Posterior sampling becomes possible in function space while retaining the expressive power of the original diffusion prior.
Training remains stable because the objective is simulation-free and uses only score-matching losses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The guidance decomposition may extend to other conditioning signals besides likelihood terms, such as constraints or external controls.
Function-space operation could mitigate dimensionality issues when the underlying data truly lives on a continuum rather than a fixed grid.
The supervised objective might be combined with existing finite-dimensional guidance techniques to handle hybrid discrete-continuous settings.

Load-bearing premise

The prior distribution either lies in the Cameron-Martin space or is absolutely continuous with respect to a Gaussian measure.

What would settle it

Generate samples from the supervised-guidance-trained model on a discretized Bayesian inverse problem whose true posterior is independently computable by MCMC or another exact method, then compare empirical moments or densities of the generated samples against those of the true posterior.

Figures

Figures reproduced from arXiv: 2601.20756 by Alexander Denker, Elizabeth L. Baker, Jes Frellsen.

**Figure 1.** Figure 1: Posterior sampling for the point evaluation. We show the ground truth signal in black, the mean sample in red and individual samples in light blue. Note, that the uncertainty is small on the observed points (green). SGT is evaluated with k = 0 for the preconditioner [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: Posterior sampling for the heat equation. We show the ground truth signal in black, the mean sample in red and individual samples in light blue. Note, that due to the boundary condition f(0) = f(1) = 0 the uncertainty at the boundary of the domain is small. Appendix C.3. EFD provides a finite-dimensional shape representation that is invariant to landmark resolution. We formulate conditional sampling as an … view at source ↗

**Figure 3.** Figure 3: Conditional sampling results on shape data using the first six EFD modes. We show the reconstruction from the first six EFD modes, the ground-truth shape, and conditional samples generated by Conditional Diffusion, FunDPS, and our SGT. Mean predictions are shown in darker colours, with individual samples overlaid in lighter shades [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Posterior sampling for the heat equation. We show the ground truth signal in black, the mean sample in red and individual samples in light blue. Note, that due to the boundary condition f(0) = f(1) = 0 the uncertainty at the boundary of the domain is small. C.3. Shape Inpainting Elliptic Fourier descriptors We make use of elliptic Fourier descriptors (EFD) (Kuhl & Giardina, 1982). The curve (x(t), y(t)), t… view at source ↗

**Figure 5.** Figure 5: The first 8 examples of the MNIST test dataset, parametrised using 64 landmarks. Conditional Sampling We compare against FunDPS (Yao et al., 2025). Note that the standard version of FunDPS trains a joint score model on both the observations and ground truth data and solves the conditional sampling task by masking one of the input channels. Here, we compare against the version in Appendix D of Yao et al. (2… view at source ↗

**Figure 6.** Figure 6: Conditional sampling results for the shape data. We show the ground truth shape and 5 samples per method (overlay). The first column is the reconstruction of the shape from the first (2, 4, 6, 8) coefficients. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

read the original abstract

Score-based diffusion models have recently been extended to infinite-dimensional function spaces, with uses such as inverse problems arising from partial differential equations. In the Bayesian formulation of inverse problems, the aim is to sample from a posterior distribution over functions obtained by conditioning a prior on noisy observations. While diffusion models provide expressive priors in function space, the theory of conditioning them to sample from the posterior remains open. We address this, assuming that either the prior lies in the Cameron-Martin space, or is absolutely continuous with respect to a Gaussian measure. We prove that the models can be conditioned using an infinite-dimensional extension of Doob's $h$-transform, and that the conditional score decomposes into an unconditional score and a guidance term. As the guidance term is intractable, we propose a simulation-free score matching objective (called Supervised Guidance Training) enabling efficient and stable posterior sampling. We illustrate the theory with numerical examples on Bayesian inverse problems in function spaces. In summary, our work offers the first function-space method for fine-tuning trained diffusion models to accurately sample from a posterior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first explicit function-space fine-tuning procedure for conditioning infinite-dimensional diffusion models via an h-transform extension and a simulation-free supervised guidance objective, but it only works under a strong prior measure assumption.

read the letter

The main takeaway is that this paper supplies the first concrete function-space method for fine-tuning a trained diffusion model to sample from a posterior. They extend Doob's h-transform to infinite dimensions, show that the conditional score decomposes into the unconditional score plus a guidance term, and replace the intractable guidance with a supervised score-matching loss they call Supervised Guidance Training. The numerical examples on Bayesian inverse problems in function space indicate the approach can be run in practice for PDE-related tasks.

Referee Report

2 major / 2 minor

Summary. The manuscript extends score-based diffusion models to infinite-dimensional function spaces for Bayesian inverse problems. Under the assumption that the prior lies in the Cameron-Martin space or is absolutely continuous w.r.t. a Gaussian measure, it proves an infinite-dimensional extension of Doob's h-transform, shows that the conditional score decomposes into an unconditional score plus a guidance term, and introduces a simulation-free Supervised Guidance Training objective to learn the intractable guidance term. Numerical examples on function-space Bayesian inverse problems are provided to illustrate the approach.

Significance. If the central theoretical extension holds, the work is significant because it supplies the first rigorous function-space method for conditioning pre-trained diffusion models to sample from posteriors, directly addressing an open theoretical gap for applications such as PDE inverse problems where function-valued priors are natural. The simulation-free matching loss is a practical advantage over methods requiring posterior sampling during training.

major comments (2)

[Prior measure assumption (§2)] The assumption that the prior lies in the Cameron-Martin space or is absolutely continuous with respect to a Gaussian measure (abstract and §2) is load-bearing for the existence of the Radon-Nikodym derivative and the well-definedness of the infinite-dimensional h-transform. The manuscript should add an explicit discussion of how restrictive this is for typical non-Gaussian priors arising in PDE inverse problems and whether the decomposition fails outside this regime.
[h-transform proof and score decomposition (§3)] The proof of the conditional score decomposition (Eq. (X) in §3) and the claim that Supervised Guidance Training recovers the exact posterior must be presented with full details, including regularity conditions on the guidance term in function space and an error analysis showing that the simulation-free objective does not introduce bias that prevents exact posterior sampling.

minor comments (2)

[Notation and definitions] Clarify the precise definition of the infinite-dimensional score and guidance operators to avoid ambiguity in the notation for function-space objects.
[Numerical experiments] The numerical examples section would benefit from additional quantitative metrics (e.g., posterior mean error, coverage) and explicit comparison against finite-dimensional baselines or existing conditioning methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and will revise the manuscript to incorporate the suggested clarifications and expansions.

read point-by-point responses

Referee: [Prior measure assumption (§2)] The assumption that the prior lies in the Cameron-Martin space or is absolutely continuous with respect to a Gaussian measure (abstract and §2) is load-bearing for the existence of the Radon-Nikodym derivative and the well-definedness of the infinite-dimensional h-transform. The manuscript should add an explicit discussion of how restrictive this is for typical non-Gaussian priors arising in PDE inverse problems and whether the decomposition fails outside this regime.

Authors: We agree that the assumption is central to the Radon-Nikodym derivative and the h-transform construction. In the revised manuscript we will add a dedicated paragraph in §2 discussing its scope: many priors used in PDE inverse problems (e.g., Matérn Gaussian processes or log-Gaussian fields) satisfy absolute continuity with respect to a Gaussian reference measure, so the framework applies directly. We will also note that for priors that are mutually singular with every Gaussian measure the decomposition does not hold in the present form and alternative conditioning strategies would be required; this limitation will be stated explicitly. revision: yes
Referee: [h-transform proof and score decomposition (§3)] The proof of the conditional score decomposition (Eq. (X) in §3) and the claim that Supervised Guidance Training recovers the exact posterior must be presented with full details, including regularity conditions on the guidance term in function space and an error analysis showing that the simulation-free objective does not introduce bias that prevents exact posterior sampling.

Authors: We will expand §3 to include the complete proof of the conditional score decomposition together with the required regularity conditions on the guidance term (e.g., square-integrability in the Cameron-Martin space and Lipschitz continuity of the observation map). We will also add an error analysis showing that the simulation-free supervised objective converges to the true guidance term in the appropriate function-space norm, implying that the learned conditional score yields exact posterior sampling in the limit of infinite data and model capacity. These additions will be placed immediately after the statement of the decomposition. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation extends classical Doob h-transform

full rationale

The paper states an explicit assumption (prior in Cameron-Martin space or absolutely continuous w.r.t. Gaussian) and proves an infinite-dimensional extension of Doob's h-transform, yielding the conditional score decomposition into unconditional score plus guidance term. The Supervised Guidance Training objective is then introduced as a simulation-free matching loss to replace the intractable term. No equations reduce the claimed results to fitted parameters by construction, no self-citations are invoked as load-bearing uniqueness theorems, and the derivation chain relies on classical probability results extended to function space rather than renaming or smuggling ansatzes from prior author work. The numerical examples on Bayesian inverse problems validate the method inside the stated assumption regime without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on a domain assumption about the prior measure that enables the h-transform to be well-defined in function space; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption The prior measure either lies in the Cameron-Martin space or is absolutely continuous with respect to a Gaussian measure
This assumption is required for the infinite-dimensional extension of Doob's h-transform to hold and for the conditional score decomposition to be valid.

pith-pipeline@v0.9.0 · 5482 in / 1228 out tokens · 35789 ms · 2026-05-16T10:27:37.451472+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove that the models can be conditioned using an infinite-dimensional extension of Doob’s h-transform... assuming that either the prior lies in the Cameron-Martin space, or is absolutely continuous with respect to a Gaussian measure.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the conditional score decomposes into an unconditional score and a guidance term

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

GRIFDIR: Graph Resolution-Invariant FEM Diffusion Models in Function Spaces over Irregular Domains
cs.LG 2026-05 unverdicted novelty 7.0

GRIFDIR proposes graph resolution-invariant FEM diffusion models that maintain resolution invariance and high fidelity on complex irregular domains.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Conditional image generation with score-based diffusion models.arXiv preprint arXiv:2111.13606,

Batzolis, G., Stanczuk, J., Sch¨onlieb, C.-B., and Etmann, C. Conditional image generation with score-based diffusion models.arXiv preprint arXiv:2111.13606,

work page arXiv
[2]

doi: 10.1007/978-3-319-12385-1

ISBN 978-3-319-12385-1. doi: 10.1007/978-3-319-12385-1

work page doi:10.1007/978-3-319-12385-1
[3]

Neural operator: Graph kernel network for partial differential equations

Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhat- tacharya, K., Stuart, A., and Anandkumar, A. Neural operator: Graph kernel network for partial differential equations. InICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations,

work page 2020
[4]

Learning PDE solution operator for continuous modeling of time-series

Park, Y ., Choi, J., Yoon, C., Kang, M., et al. Learning PDE solution operator for continuous modeling of time-series. arXiv preprint arXiv:2302.00854,

work page arXiv
[5]

Ramachandran, P., Zoph, B., and Le, Q. V . Searching for activation functions.arXiv preprint arXiv:1710.05941,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

L., Tseng, A

Uehara, M., Zhao, Y ., Black, K., Hajiramezanali, E., Scalia, G., Diamant, N. L., Tseng, A. M., Biancalani, T., and Levine, S. Fine-tuning of continuous-time diffusion models as entropy-regularized control.arXiv preprint arXiv:2402.15194,

work page arXiv
[7]

Moreover, writingZ t with respect to the measureQsatisfies(29)

Then we can define another distributionQvia dQ=h y(T, ZT )dP. Moreover, writingZ t with respect to the measureQsatisfies(29). Proof.We will apply Theorem 5.1 from Baker et al. (2024), which says that if 1.h y is twice Fr´echet differentiable with respect to z∈H and once differentiable with respect to t, with continuous derivatives,

work page 2024
[8]

That the time-reversal has a strong solution holds from Theorem 12 and Theorem 13 of Pidstrigach et al

only relies onh y being once Fr´echet differentiable with respect toz, therefore this is what we will use. That the time-reversal has a strong solution holds from Theorem 12 and Theorem 13 of Pidstrigach et al. (2024), for Setting 1 or Setting 2 respectively. Moreover, following Lemma 5.2 of Baker et al. (2024) and noting that the proof does not depend on...

work page 2024
[9]

Then hy(t, x) is Fr´echet differentiable with respect tox. Proof. The proof for this is similar to the proof that s(t, x) is Lipschitz continuous in Theorem 12 by Pidstrigach et al. (2024). Let Xt be a solution to the forward SDE in (6). Then given X0 ∈H , we know that the transition kernel from X0 at time0to timetis given by N(e − t 2 X0,(1−e −t)C).(38) ...

work page 2024
[10]

Proof of Theorem 3.2 We prove Theorem 3.2, in two lemmas for the two settings

exp(−Φ(x0, y))dπ(x0)(44e) = Z P(A|x 0)dπy(x0) =P(A|Y=y).(44f) A.2. Proof of Theorem 3.2 We prove Theorem 3.2, in two lemmas for the two settings. For a proof under Setting 1 see Lemma A.6 and for a proof under Setting 2 see Lemma A.7. Lemma A.6.Let Setting 1 hold. Then C∇logh y(t, x) =s y(t, x)−s(t, x).(45) 14 Supervised Guidance Training for Infinite-Dim...

work page 2010
[11]

LetPbe the path measure of the unconditional reverse SDE (7), which we re-state here dZt = 1 2 Zt +s(T−t, Z t) dt+ √ CdW H t

or inverse problems (Denker et al., 2024). LetPbe the path measure of the unconditional reverse SDE (7), which we re-state here dZt = 1 2 Zt +s(T−t, Z t) dt+ √ CdW H t . Let us define E(Q) :=KL(Q||P) +E Q[Φ(ZT , y)],(83) for allQ≪P. Then, we have the following proposition. Proposition B.1.The conditional path measureP y is the unique minimiser ofF(Q). Pro...

work page 2024
[12]

The original FNO architecture does not incorporate additional time information, while in our case the score depends explicitly on the time

to design network architectures which can be applied to any discretisation. The original FNO architecture does not incorporate additional time information, while in our case the score depends explicitly on the time. For this, we make use of a learnable time-embedding, which was previously used for diffusion bridges (Yang et al., 2025a) and PDE application...

work page 2023
[13]

We use a time-modulated FNO with 4 layers, 16 modes and 64 hidden channels, which results in 1 140 161 trainable parameters

The unconditional diffusion model is trained for 40 000 gradient steps. We use a time-modulated FNO with 4 layers, 16 modes and 64 hidden channels, which results in 1 140 161 trainable parameters. For the conditional diffusion model, we use 8 FNO layers with 16 modes and a hidden channel dimension of 128, which results in 8 737 153trainable parameters. We...

work page 2007
[14]

We train the conditional diffusion model using the loss function in Baldassari et al

In total, the diffusion model has 1 140 161trainable parameters. We train the conditional diffusion model using the loss function in Baldassari et al. (2023). The conditional diffusion models depends explicitly on the observations y. For this experiment both the observation and the ground truth signal are functions overΩ. We discretise both on the same sp...

work page 2023
[15]

We useν= 1for the covariance operator in Equation (26)

to simulate measurements. We useν= 1for the covariance operator in Equation (26). 21 Supervised Guidance Training for Infinite-Dimensional Diffusion Models 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.5 0.0 0.5 1.0 Ground Truth Mean ±2 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.5 0.0 0.5 1.0 Ground Truth Mean ±2 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.5 0.0 0.5 1.0 Ground Truth Mean ±2 0.0 0....

work page 1982