arxiv: 2605.08392 · v1 · submitted 2026-05-08 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Geometry-Aware Discretization Error of Diffusion Models

Gabriel Peyr\'e, Samuel Hurault, Thomas Moreau

Pith reviewed 2026-05-12 01:20 UTC · model grok-4.3

classification 💻 cs.LG

keywords diffusion modelsdiscretization errorasymptotic expansionEuler-Maruyamaexact scorecovariance spectrumreverse diffusion

0 comments

The pith

Discretization error in exact-score diffusion sampling adapts to data geometry through the covariance spectrum and interacts with diffusion parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Practical diffusion models rely on simulating reverse SDEs or ODEs with few steps, making discretization error the main bottleneck under fixed compute. In the exact-score setting the paper derives first-order asymptotic expansions for the weak and Fréchet errors of the Euler-Maruyama scheme on general smooth reverse diffusions. These expansions become fully explicit when the data are Gaussian, revealing explicit dependence on the eigenvalues of the data covariance matrix and on the choice of noise schedule and diffusion coefficient. The resulting formulas supply concrete, geometry-aware objectives for tuning diffusion parameters. The same qualitative behavior is observed to hold for non-Gaussian image data and for posterior sampling tasks.

Core claim

We derive first-order asymptotic expansions of the Euler-Maruyama weak and Fréchet discretization errors for reverse diffusions under the exact-score assumption. The expansions hold for general smooth reverse processes and become fully explicit for Gaussian data, showing that the leading error term depends on the covariance spectrum and couples to the diffusion schedule and diffusion-term coefficient.

What carries the argument

First-order asymptotic expansions of the weak and Fréchet discretization errors of the Euler-Maruyama integrator applied to reverse-time SDEs, made explicit via the spectrum of the data covariance under Gaussian assumptions.

If this is right

Diffusion schedules and diffusion coefficients can be optimized by minimizing explicit geometry-dependent error expressions instead of coarse worst-case bounds.
Sampling error for different data distributions is predicted to vary systematically with the spread of the covariance spectrum.
The same expansions supply tractable objectives that can be used even when only approximate score functions are available.
Qualitative geometry dependence persists across image-generation and image-posterior-sampling tasks with non-Gaussian geometries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

When the score is learned rather than exact, the geometry-dependent discretization term may still dominate total error once the score is sufficiently accurate.
Adaptive step-size controllers could be designed that locally rescale steps according to the local curvature or covariance spectrum.
Higher-order integrators could be analyzed with the same expansion technique to quantify the reduction in geometry-sensitive error.

Load-bearing premise

The analysis assumes perfect knowledge of the score function together with sufficient smoothness of the reverse diffusions.

What would settle it

Direct numerical computation of the actual weak or Fréchet discretization error for a known Gaussian forward process, using successively smaller step sizes, and checking whether the observed error matches the predicted leading term involving the covariance eigenvalues.

Figures

Figures reproduced from arXiv: 2605.08392 by Gabriel Peyr\'e, Samuel Hurault, Thomas Moreau.

**Figure 2.** Figure 2: Empirical FID and theoretical Fréchet Distance (FD) for image sampling across datasets (top) and [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Fréchet error for sampling a 100-dimensional Gaussian mixture model with 10 centers and covariance rescaled to follow two different power spectra shown in (c). Using α = 0.25 and K = 50. Figures (a) and (b) compare two noise schedule families: the one-dimensional optimal schedule (15) σ ∗ t (cσ) as a function of cσ (top axis), and the polynomial schedules σt = σmax(t/T) β as a function of β (bottom axis). … view at source ↗

read the original abstract

Practical diffusion sampling is a numerical approximation problem: under a fixed inference budget, one must simulate a reverse-time ODE or SDE using only a limited number of denoising steps, so discretization error is often the dominant source of error. Existing non-asymptotic analyses provide convergence guarantees, but are typically too loose and too insensitive to diffusion parameters to guide practical design: broad families of schedules receive the same rates, which depend on coarse worst-case quantities such as the dimension or the drift Lipschitz constant. We take a less ambitious but more informative route. In the exact-score setting, we derive first-order asymptotic expansions of the Euler-Maruyama weak and Fr\'echet discretization errors. These formulas hold for general smooth reverse diffusions and become fully explicit under Gaussian data. They show how discretization error adapts to the geometry of the data through the covariance spectrum, and how this geometry interacts with key diffusion parameters, including the diffusion schedules and the diffusion-term coefficient. This yields tractable objectives for geometry-aware parameter optimization. Finally, we show that the qualitative predictions of the Gaussian formulas remain robust across diffusion sampling problems with different geometries, including image generation on different datasets and image posterior sampling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives first-order asymptotic expansions for Euler-Maruyama discretization errors that depend explicitly on the covariance spectrum for Gaussians, giving tighter and more data-sensitive bounds than prior dimension-based rates.

read the letter

The main thing is that they derive first-order asymptotic expansions for the weak and Fréchet discretization errors of the reverse diffusion under exact score. These hold for general smooth processes via Itô-Taylor expansion of the error and become fully explicit for Gaussian data once you diagonalize in the covariance eigenbasis. The leading terms then show how the error scales with the schedule, the diffusion coefficient, and the individual eigenvalues rather than just overall dimension or a global Lipschitz constant. That is the concrete advance over the loose non-asymptotic guarantees in the literature they cite. The derivations look careful and the assumptions (exact score, sufficient smoothness) are stated clearly up front. The image experiments are only qualitative checks that the directional predictions survive on real data, which is reasonable given the scope. The main limitation is the exact-score idealization; once score error is present the interaction with discretization is left open. The asymptotic regime also needs checking against the very small step counts used in practice, but the paper does not claim otherwise. This is useful for anyone who wants to optimize diffusion schedules or step budgets with some geometry awareness instead of grid search. It is worth sending to peer review because the math is grounded and the motivation is practical even if the downstream gains still need empirical follow-up.

Referee Report

0 major / 4 minor

Summary. The manuscript derives first-order asymptotic expansions of the Euler-Maruyama weak and Fréchet discretization errors for reverse diffusion processes in the exact-score setting. These expansions are obtained via Itô-Taylor analysis for general smooth diffusions and become fully explicit for Gaussian data by diagonalization in the covariance eigenbasis, revealing the dependence of error on the data covariance spectrum and its interaction with diffusion schedules and the diffusion coefficient. Qualitative robustness of the Gaussian predictions is demonstrated on image generation and posterior sampling tasks across datasets.

Significance. If the expansions are correct, the work supplies a more informative, geometry-aware characterization of discretization error than existing non-asymptotic bounds, which are typically loose and insensitive to specific parameters. The explicit Gaussian formulas and their interaction with covariance eigenvalues could enable tractable optimization of diffusion schedules tailored to data geometry, improving sampling quality under fixed budgets. The robustness experiments on non-Gaussian image data provide useful supporting evidence.

minor comments (4)

§2.2: The definition of the Fréchet discretization error should include an explicit reference to the metric used on the space of measures or paths to avoid ambiguity with other common distances.
Figure 4: The error curves for different covariance spectra are difficult to distinguish when printed in grayscale; consider adding line styles or markers.
The discussion of related work on weak convergence for SDEs (e.g., references to Kloeden & Platen or recent diffusion-specific analyses) could be expanded in §1 to better position the first-order expansions.
Notation: The symbol for the diffusion coefficient in the general case is occasionally overloaded with the schedule β(t); a brief clarification in §3 would help.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work and for recommending minor revision. The report accurately reflects the manuscript's focus on first-order asymptotic expansions of weak and Fréchet discretization errors for reverse diffusion processes, their explicit form under Gaussian data via covariance geometry, and the supporting robustness experiments on image tasks. No specific major comments were provided that require point-by-point rebuttal or changes to the technical content.

Circularity Check

0 steps flagged

No significant circularity; derivations are self-contained SDE analysis

full rationale

The central derivations consist of first-order asymptotic expansions obtained via Itô-Taylor expansion of the error process for general smooth reverse diffusions, followed by explicit specialization to Gaussian data via diagonalization in the covariance eigenbasis. These steps start from the SDE definitions and standard stochastic calculus under the stated exact-score and smoothness assumptions; they do not reduce to fitted parameters renamed as predictions, self-definitional equations, or load-bearing self-citations. The resulting formulas are presented as asymptotic approximations that depend on the data covariance spectrum and diffusion parameters, with no evidence that any key claim is equivalent to its inputs by construction. The geometry-aware optimization objectives are downstream applications, not part of the derivation chain itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard assumptions from stochastic calculus and diffusion model literature; no new free parameters or invented entities are introduced.

axioms (2)

domain assumption Reverse diffusions are smooth
Invoked to justify validity of the first-order asymptotic expansions.
domain assumption Exact score function is available
The analysis is restricted to the exact-score setting.

pith-pipeline@v0.9.0 · 5501 in / 1282 out tokens · 59464 ms · 2026-05-12T01:20:19.499818+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
Theorem 1 (First-order weak error... et,s(Y) := −½(Jt,s[(∂s+Ls)vs](Ys)+ȧsΔt,s(Y)+2as[Ht,s(Y),∇vs(Ys)]) ... Et,s(Y) := −½Jt,s(as(∇vs+∇vs⊤)+ȧs Id)Jt,s⊤
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear
Proposition 1 ... ΔΣ,[1](λ) = −λ ∫ Ns(λ)α[(As+Bs(λ))²−α²Bs(λ)²] ds ... for VE schedule σt=σmax(t/T)β

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 4 internal anchors

[1]

Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions, 2023.URL https://arxiv. org/abs/2303.08797, 3:1,

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Convergence of deterministic and stochastic diffusion-model samplers: A simple analysis in wasserstein distance.arXiv preprint arXiv:2508.03210,

11 Eliot Beyler and Francis Bach. Convergence of deterministic and stochastic diffusion-model samplers: A simple analysis in wasserstein distance.arXiv preprint arXiv:2508.03210,

work page arXiv
[3]

Lipschitz-guided design of interpolation schedules in generative models.arXiv preprint arXiv:2509.01629,

Yifan Chen, Eric Vanden-Eijnden, and Jiawei Xu. Lipschitz-guided design of interpolation schedules in generative models.arXiv preprint arXiv:2509.01629,

work page arXiv
[4]

arXiv preprint arXiv:2208.05314 , year=

Valentin De Bortoli. Convergence of denoising diffusion models under the manifold hypothesis.arXiv preprint arXiv:2208.05314,

work page arXiv
[5]

Generative Modeling via Drifting

Mingyang Deng, He Li, Tianhong Li, Yilun Du, and Kaiming He. Generative modeling via drifting.arXiv preprint arXiv:2602.04770,

work page internal anchor Pith review arXiv
[6]

Fast diffusion probabilistic model sampling through the lens of backward error analysis.arXiv preprint arXiv:2304.11446,

Yansong Gao, Zhihong Pan, Xin Zhou, Le Kang, and Pratik Chaudhari. Fast diffusion probabilistic model sampling through the lens of backward error analysis.arXiv preprint arXiv:2304.11446,

work page arXiv
[7]

From score matching to diffusion: A fine-grained error analysis in the gaussian setting.arXiv preprint arXiv:2503.11615,

Samuel Hurault, Matthieu Terris, Thomas Moreau, and Gabriel Peyré. From score matching to diffusion: A fine-grained error analysis in the gaussian setting.arXiv preprint arXiv:2503.11615,

work page arXiv
[8]

Flow Matching for Generative Modeling

12 Xiang Li, Yixiang Dai, and Qing Qu. Understanding generalizability of diffusion models requires rethinking the hidden gaussian structure.Advances in neural information processing systems, 37:57499–57538, 2024b. Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Wasserstein riemannian geometry of positive definite matrices.arXiv preprint arXiv:1801.09269,

Luigi Malago, Luigi Montrucchio, and Giovanni Pistone. Wasserstein riemannian geometry of positive definite matrices.arXiv preprint arXiv:1801.09269,

work page arXiv
[11]

Diffusion models for gaussian distributions: Exact solutions and wasserstein errors.arXiv preprint arXiv:2405.14250,

Emile Pierret and Bruno Galerne. Diffusion models for gaussian distributions: Exact solutions and wasserstein errors.arXiv preprint arXiv:2405.14250,

work page arXiv
[12]

Deepinverse: A python package for solving imaging inverse problems with deep learning.arXiv preprint arXiv:2505.20160, 2025

Julián Tachella, Matthieu Terris, Samuel Hurault, Andrew Wang, Dongdong Chen, Minh-Hai Nguyen, Maxime Song, Thomas Davies, Leo Davy, Jonathan Dong, et al. Deepinverse: A python package for solving imaging inverse problems with deep learning.arXiv preprint arXiv:2505.20160,

work page arXiv
[13]

URL https: //openreview.net/forum?id=CD9Snc73AW

ISSN 2835-8856. URL https: //openreview.net/forum?id=CD9Snc73AW. Expert Certification. 13 Binxu Wang and John J Vastola. The hidden linear structure in score-based models and its application.arXiv preprint arXiv:2311.10892,

work page arXiv
[14]

The unreasonable effectiveness of gaussian score approximation for diffusion models and its applications.arXiv preprint arXiv:2412.09726,

Binxu Wang and John J Vastola. The unreasonable effectiveness of gaussian score approximation for diffusion models and its applications.arXiv preprint arXiv:2412.09726,

work page arXiv
[15]

Towards more accurate diffusion model acceleration with a timestep aligner.arXiv preprint arXiv:2310.09469,

Mengfei Xia, Yujun Shen, Changsong Lei, Ran Zhou, Ran Yi, Deli Zhao, Wenping Wang, and Yong- jin Liu. Towards more accurate diffusion model acceleration with a timestep aligner.arXiv preprint arXiv:2310.09469,

work page arXiv
[16]

that the Jacobian JT,0 verifies (at time T ) in the Gaussian data case: JT,0 =η −1 T Σ 1+α 2 data Σdata +σ 2 T Id − 1+α 2 . We thus get, after replacing δm0 and δΣ0, that the mean and covariance errors, at final diffusion time T , due to initialization bias, are: δmT = Σ 1+α 2 data Σdata +σ 2 T Id − 1+α 2 µdata δΣT = Σ2+α data Σdata +σ 2 T Id −(1+α) which...

work page 1990
[17]

For the covariance error, we specialize (31) for affine drift

=− γ 2 Z tk 0 Jtk,s ¨ms ds+O(γ 2), which proves (37). For the covariance error, we specialize (31) for affine drift. Using the variation-of-constants for- mula [Davis, 1977], the solution of the SDE (2) with linear drift takes the form Yt =J t,sYs +b t,s. ThusCov(Y s, Yt) =C sJ ⊤ t,s and using (41): Cov[etk,s(Y), Y tk] =− 1 2 Jtk,s ˙Hs +H 2 s CsJ ⊤ tk,s. ...

work page 1977
[18]

31 UsingΣ t =η 2 t (Σdata +σ 2 t Id)gives d dt log Σt = Σ−1 t d dtΣt = 2 ˙ηt ηt Id +2η2 t σt ˙σtΣ−1 t , hence Ht =α ˙ηT−t ηT−t Id− 1 +α 2 d du log Σu u=T−t

Recall thatJ t,s is the unique solution of the linear ODE: dJt,s(Y) dt =∇v t(Yt)Jt,s(Y) =H tJt,s(Y), J s,s = Id Note that all theH s commute in time and thus Jt,s(Y) = exp Z t s Hτ dτ . 31 UsingΣ t =η 2 t (Σdata +σ 2 t Id)gives d dt log Σt = Σ−1 t d dtΣt = 2 ˙ηt ηt Id +2η2 t σt ˙σtΣ−1 t , hence Ht =α ˙ηT−t ηT−t Id− 1 +α 2 d du log Σu u=T−t . Changing vari...

work page 2021
[19]

Assume further that σt =σ max t T β , β > 1 2 , α >0

Then the first and second order mean errors from Proposition 1 vanish: ∆µ,[1](λ) = ∆µ,[2](λ) = 0 Moreover, if˙σtσt →0ast→0, Assumption 1(ii) holds and the covariance error (9) simplifies to: ∆Σ,[1](λ) = (α2 −1)λ α+1 Z T 0 (σs ˙σs)2 (λ+σ 2s)α+2 ds+o σmax→∞(1). Assume further that σt =σ max t T β , β > 1 2 , α >0. Then ∆Σ,[1](λ) = (α2 −1) β2σ4 max T λα+1 Z ...

work page 2021
[20]

Indeed, defining Ls(Ys) :=∥∇v s(Ys)∥, we have the pointwise bounds ∥(vs · ∇)vs(Ys)∥ ≤L s(Ys)∥v s(Ys)∥,∥J t,s(Y)∥ ≤exp Z t s Lτ(Yτ)dτ

proposed optimizing parameters by minimizing an average squared local Lipschitzness of the drift along the trajectory: min Z T 0 E h ∥∇vs(Ys)∥2 i ds.(79) 47 Connection with the general weak error formula.The objective (79) is a natural surrogate for our weak error expansion (4). Indeed, defining Ls(Ys) :=∥∇v s(Ys)∥, we have the pointwise bounds ∥(vs · ∇)v...

work page 2025
[21]

Small α yields smoother, mean- seeking samples, while largerαincreases variability

Figure I.3: FFHQ samples for increasing α with VE and K= 100 steps. Small α yields smoother, mean- seeking samples, while largerαincreases variability. I.4 Variance Exploding withα= 0is adequate for (posterior) mean estimation The above observation can be particularly insightful for posterior mean estimation. In the context of image inverse problems, diff...

work page 2018