A Tutorial on Diffusion Theory: From Differential Equations to Diffusion Models

Jiayi Fu; Yuxia Wang

arxiv: 2605.22586 · v3 · pith:MF62BIOCnew · submitted 2026-05-21 · 💻 cs.LG · cs.CL

A Tutorial on Diffusion Theory: From Differential Equations to Diffusion Models

Jiayi Fu , Yuxia Wang This is my paper

Pith reviewed 2026-05-22 07:09 UTC · model grok-4.3

classification 💻 cs.LG cs.CL

keywords diffusion modelsscore matchingstochastic differential equationsreverse SDEprobability flow ODEDDPMDDIM

0 comments

The pith

The standard noise-prediction objective is equivalent to score matching up to an additive constant independent of the model parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This tutorial develops diffusion models by starting from a conditional Gaussian forward process that adds noise to data points. It demonstrates that this process can be represented as both an ODE and an SDE, and when averaged over the data distribution, these yield marginal dynamics that transport the data distribution to a standard Gaussian. The paper then derives the reverse SDE and reverse probability-flow ODE, both controlled by the marginal score function. A central result is the equivalence between the usual noise-prediction training objective and score matching, differing only by a parameter-independent constant. This matters for readers because it explains why various diffusion sampling methods work and how they relate to score-based generative modeling.

Core claim

The paper shows that marginalizing the conditional Gaussian forward process produces forward ODE and SDE formulations transporting p_data to N(0,I). The reverse-time dynamics consist of a reverse SDE and a probability-flow ODE, both governed by the marginal score grad log p_t(x). This setup yields a score estimation training objective, with the result that the standard noise-prediction objective equals score matching plus a constant independent of model parameters. DDPM and DDIM are shown to share this objective, with their samplers corresponding to discrete versions of the reverse SDE and reverse ODE respectively.

What carries the argument

The marginal score function grad log p_t(x) that drives both the reverse SDE and the reverse probability-flow ODE.

If this is right

The reverse dynamics can be simulated using numerical integrators to generate samples from the data distribution.
DDPM sampling corresponds to a discrete approximation of the reverse SDE.
DDIM sampling corresponds to a discrete approximation of the reverse probability-flow ODE.
Guided generation is achieved by modifying the score with classifier guidance or classifier-free guidance.
Higher-order solvers such as DPM-Solver can be applied to the reverse ODE for faster sampling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The shown equivalence implies that any advance in efficient score estimation can be transferred to improve diffusion model training without changing the objective.
Treating the diffusion process as a deterministic ODE may enable new sampling algorithms that avoid the variance of stochastic paths.
The differential equation perspective could be used to analyze convergence rates or design custom noising schedules beyond the standard ones.

Load-bearing premise

The forward noising process is a Gaussian conditional distribution that admits equivalent ODE and SDE representations with well-defined marginals over the data distribution.

What would settle it

A calculation that shows the noise-prediction loss and the score-matching loss differ by a term that depends on the parameters of the model being trained would disprove the equivalence.

Figures

Figures reproduced from arXiv: 2605.22586 by Jiayi Fu, Yuxia Wang.

**Figure 2.** Figure 2: Marginalized forward process: the initial state [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Reverse process: the reverse dynamics start from Gaussian noise and progressively denoise [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

Diffusion models have emerged as a dominant framework for generative modeling, but their mathematical foundations are often presented separately through diffusion probabilistic models, score-based modeling, stochastic differential equations, and numerical sampling methods. We write this tutorial to provide a unified and self-contained account of these viewpoints from the perspective of differential equations. Starting from a conditional Gaussian noising process, we derive ordinary differential equation (ODE) and stochastic differential equation (SDE) representations, pass to the corresponding marginal forward dynamics, and then obtain the reverse-time SDE and probability-flow ODE that make generation possible. We show that the central unknown quantity in reverse sampling is the marginal score, explain how score matching becomes the standard denoising objective under a noise-prediction parameterization, and discuss practical reverse-time sampling and guidance. We further place DDPM, DDIM, flow matching, and score-based SDEs in a common framework, and conclude with diffusion language models in continuous embedding space together with a brief discussion of discrete masked-token diffusion. The tutorial is intended as a bridge between the analytical foundations of diffusion processes and the modern generative algorithms built upon them.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This tutorial re-derives the standard SDE/ODE connections for diffusion models with clear steps but adds no new results or proofs.

read the letter

The main point is that this is a tutorial that starts from the conditional Gaussian forward process, shows it admits both ODE and SDE forms, marginalizes to get the data-to-noise transport, and then derives the reverse SDE and probability-flow ODE driven by the score. It correctly notes that the usual noise-prediction loss equals score matching plus a constant term independent of the model parameters, so the training objectives line up. Later parts cover DPM-Solver sampling, classifier and classifier-free guidance, and map DDPM to discrete reverse-SDE steps while DDIM maps to reverse-ODE steps. These links are presented in one connected narrative, which can save readers from chasing the same equivalences across multiple earlier papers. The derivations follow directly from the forward-process definition without circularity or invented quantities. The central claims line up with the existing literature the authors cite, and the stress-test note on the objective equivalence holds up because the extra term drops out of the gradient. No load-bearing contradictions appear in the outline. The soft spots are the expected ones for a tutorial: everything here is re-derivation rather than new mathematics or experiments, so the novelty is low and the scope stays explanatory. Without the full manuscript I cannot inspect every intermediate step or edge-case handling in the ODE/SDE transitions, but the abstract gives no sign of gaps or errors. The Gaussian conditional assumption is the standard one and is stated at the outset. This paper is aimed at readers who already know basic diffusion models and want the differential-equation framing spelled out in one place, or at students who prefer to see the score and noise-prediction views side by side. It is not aimed at experts hunting for advances. I would bring it to a reading group focused on generative-model foundations if the group wants a compact reference for the equivalences. I would not cite it in my own work. It deserves peer review for a venue that publishes tutorials or educational pieces, because the presentation is coherent and the math checks out on the points that are visible.

Referee Report

0 major / 3 minor

Summary. The paper is a tutorial that starts from the conditional Gaussian forward process and derives both its ODE and SDE representations. Marginalizing over the data distribution produces forward ODE/SDE dynamics that transport p_data to a standard Gaussian. The corresponding reverse-time SDE and probability-flow ODE are then obtained, both driven by the marginal score. This leads to a score-estimation training objective whose equivalence to the standard noise-prediction loss (up to a parameter-independent additive constant) is shown. The tutorial continues with sampling algorithms (including DPM-Solver), classifier and classifier-free guidance, and a comparison showing that DDPM corresponds to discrete reverse-SDE sampling while DDIM corresponds to reverse-ODE sampling, all sharing the same training objective.

Significance. If the derivations are accurate and clearly presented, the tutorial supplies a unified differential-equation perspective that connects the continuous SDE/ODE framework to the discrete DDPM/DDIM algorithms. The explicit demonstration that the noise-prediction objective differs from score matching only by an additive constant independent of model parameters is a standard but pedagogically useful result. The manuscript also supplies reproducible derivations and a consistent notation that could serve as a reference for newcomers to the field.

minor comments (3)

[§3] §3 (Reverse dynamics): the transition from the reverse SDE to the probability-flow ODE is stated without an explicit intermediate step showing how the diffusion term is removed; adding one line of algebra would improve readability.
[§4] §4 (Training objective): the claim that the constant term is independent of model parameters is correct but would benefit from a short parenthetical reminder that it equals E[||true score||²] evaluated under the marginal.
[§6] §6 (Comparison with DDPM/DDIM): the statement that both methods share the same training objective is accurate, yet the discrete-time indexing conventions (t = 0 … T versus continuous t ∈ [0,1]) are not aligned in a single equation; a small table mapping the two would eliminate potential confusion.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their accurate and positive summary of the manuscript, for highlighting its potential utility as a reference for newcomers, and for recommending minor revision. We appreciate the recognition that the explicit equivalence between the noise-prediction objective and score matching (differing only by a parameter-independent constant) is pedagogically useful, and that the connections between continuous SDE/ODE dynamics and discrete DDPM/DDIM sampling are clearly presented.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The tutorial derives the equivalence of the noise-prediction objective to score matching directly from the Gaussian conditional forward process and its marginalization, with the additive constant term shown to be independent of model parameters via explicit expansion of the loss. All central steps follow from the initial definitions of the forward ODE/SDE and the marginal score without any parameter fitting inside the paper, self-referential definitions, or load-bearing self-citations that reduce the result to its own inputs. The derivation is self-contained against the stated assumptions on the forward process.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The tutorial rests on standard properties of Gaussian processes and Itô calculus rather than introducing new fitted quantities or entities. No free parameters are introduced to support a novel claim.

axioms (2)

domain assumption The conditional forward process is Gaussian and admits both an ODE and an SDE representation.
Stated in the opening paragraph of the abstract as the starting point for all subsequent derivations.
domain assumption Averaging the conditional process over the data distribution yields well-defined marginal forward ODE and SDE that transport p_data to N(0,I).
Invoked immediately after the conditional-process statement to obtain the marginal dynamics.

pith-pipeline@v0.9.0 · 5744 in / 1288 out tokens · 36979 ms · 2026-05-22T07:09:09.805526+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

shows that the standard noise-prediction objective is equivalent to score matching up to an additive constant independent of the model parameters
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

conditional Gaussian forward kernel pt(x|x0) := N(x; αt x0, σ²t I)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.