arxiv: 2605.07625 · v1 · submitted 2026-05-08 · 🧮 math.ST · stat.ML· stat.TH

Recognition: 2 theorem links

· Lean Theorem

Statistical Convergence of Spherical First Hitting Diffusion Models

Lukas Trottner, Simon Bienewald

Pith reviewed 2026-05-11 01:49 UTC · model grok-4.3

classification 🧮 math.ST stat.MLstat.TH

keywords first hitting diffusion modelsstatistical convergenceminimax ratestotal variationSobolev smoothnessspherical supportrandom generation timedenoising diffusion

0 comments

The pith

First hitting diffusion models achieve minimax optimal convergence rates in total variation for spherically supported Sobolev distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that first hitting diffusion models reach the best possible rates of convergence when approximating data distributions that are supported on spheres and possess Sobolev smoothness properties. This result holds in total variation distance and includes only logarithmic factors. It matters because these models use random adaptive generation times rather than fixed schedules, providing the first proof of statistical optimality for such denoising diffusion approaches with variable timing. A sympathetic reader would see this as confirming that manifold-aware constructions can match or exceed the performance of standard diffusion models while reducing average simulation time.

Core claim

We show that, up to logarithmic factors, FHDMs achieve the minimax optimal convergence rate in total variation for spherically supported Sobolev smooth data distributions. In particular, this is the first statistical optimality result for denoising diffusion modelling with random generation time.

What carries the argument

The conditioning framework of Doob's h-transform applied to first hitting times, which produces time-homogeneous dynamics tailored to spherical manifolds.

Load-bearing premise

The target data distributions must be supported exactly on a sphere and belong to a Sobolev smoothness class.

What would settle it

A concrete spherically supported distribution in the Sobolev class for which the total variation error of an FHDM exceeds the known minimax rate by more than logarithmic factors.

read the original abstract

Denoising diffusion models have evolved into a state-of-the-art method for tasks in various fields, such as denoising and generation of images, text generation, or generation of synthetic data for training of other machine learning models. First hitting diffusion models (FHDM) are a particular class of denoising diffusion models with \textit{random} adaptive generation time tailored to generate data on a known manifold. Building on the conditioning framework of Doob's $h$-transform these models leverage the given information on the target data manifold to demonstrate strong performance across tasks while offering distinct features such as time-homogeneous dynamics of the generating process and a reduced average simulation time. Even though the theoretical investigation of standard forward-backward diffusion models has attracted much attention in the recent past, the statistical convergence properties of FHDMs are not yet understood. In this work, we show that, up to logarithmic factors, FHDMs achieve the minimax optimal convergence rate in total variation for spherically supported Sobolev smooth data distributions. In particular, this is the first statistical optimality result for denoising diffusion modelling with random generation time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper proves the first minimax optimality result for diffusion models with random hitting times, matching the known lower bound in TV for spherical Sobolev data up to logs.

read the letter

The main point is that the authors establish the first statistical optimality guarantee for first-hitting diffusion models. They show that these models, built with Doob's h-transform for random adaptive generation time, reach the minimax rate in total variation for data supported on the sphere and belonging to a Sobolev smoothness class, up to logarithmic factors. This distinguishes the work from earlier fixed-time analyses of standard diffusion models.

Referee Report

0 major / 3 minor

Summary. The paper claims that first hitting diffusion models (FHDMs) constructed via Doob's h-transform for data distributions supported on the unit sphere and belonging to a Sobolev smoothness class achieve the minimax optimal convergence rate in total variation distance, up to logarithmic factors. This is presented as the first statistical optimality result for any denoising diffusion model that uses a random (adaptive) generation time.

Significance. If the upper-bound analysis is correct, the result is significant because it supplies the first matching of known minimax lower bounds for this function class while controlling the random hitting time without introducing an extra polynomial factor in the rate. The work thereby validates the statistical efficiency of the h-transform conditioning framework for manifold-supported data and adds a concrete optimality benchmark to the theory of diffusion-based generative models.

minor comments (3)

[§2.3] §2.3, Definition 2.4: the precise statement of the spherical Sobolev ball (including the precise norm and the role of the radius) is referenced but not restated; repeating the definition would improve readability for readers who skip the preliminaries.
[Theorem 3.2] Theorem 3.2: the logarithmic factor is stated as O(log n) but the proof sketch does not explicitly track whether the constant in front of the log depends on the dimension d or the smoothness index s; a short remark clarifying this dependence would strengthen the result statement.
[Figure 1] Figure 1: the caption refers to 'empirical TV distance' but the plotted quantity appears to be an averaged Monte-Carlo estimate; adding the precise definition of the plotted error (including number of samples used) would avoid ambiguity.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading and positive evaluation of our manuscript. We are encouraged by the recognition that our result provides the first matching of minimax lower bounds for Sobolev data on the sphere under the random-time FHDM framework, without incurring an extra polynomial penalty from the adaptive hitting time. Since the report raises no specific technical concerns or requests for clarification, we have no major comments to address point by point.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The manuscript derives an upper bound on total variation convergence for FHDMs by applying the standard Doob h-transform conditioning to spherically supported Sobolev densities, controlling the random hitting time directly within the analysis. This produces a rate that matches the known minimax lower bound for the function class up to logarithmic factors, without any fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations. All steps rely on external classical results (Doob h-transform, Sobolev embedding) and standard statistical arguments that remain independent of the target claim.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on standard domain assumptions from diffusion models and functional analysis; no free parameters or invented entities are visible in the abstract.

axioms (2)

domain assumption Target distributions are spherically supported and Sobolev smooth
Explicitly stated as the setting in which the minimax rate holds.
domain assumption Doob's h-transform conditioning framework applies to first hitting diffusion models
Described as the building block for the models.

pith-pipeline@v0.9.0 · 5488 in / 1261 out tokens · 39648 ms · 2026-05-11T01:49:08.793461+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
we show that, up to logarithmic factors, FHDMs achieve the minimax optimal convergence rate in total variation for spherically supported Sobolev smooth data distributions
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
KL(ℚ^h_ε ‖ ℙ^s_ε) = ½ 𝔼[∫ ‖s(Z^h_t) - ∇log h(Z^h_t)‖² dt]

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

[1]

Aubin.Some nonlinear problems in Riemannian geometry

T. Aubin.Some nonlinear problems in Riemannian geometry. Springer Monographs in Mathemat- ics. Springer-Verlag, Berlin, 1998, pp. xviii+395.doi:10.1007/978-3-662-13006-3

work page doi:10.1007/978-3-662-13006-3 1998
[2]

Axler, P

S. Axler, P. Bourdon, and W. Ramey.Harmonic function theory. 2nd ed. Vol. 137. Graduate Texts in Mathematics. Springer-Verlag, New York, 2001, pp. xii+259.doi:10.1007/978-1-4757-8137-3

work page doi:10.1007/978-1-4757-8137-3 2001
[3]

Convergence of diffusion models under the manifold hypothesis in high-dimensions

I. Azangulov, G. Deligiannidis, and J. Rousseau.Convergence of Diffusion Models Under the Man- ifold Hypothesis in High-Dimensions. 2025. arXiv:2409.18804 [stat.ML]

work page arXiv 2025
[4]

R. M. Blumenthal and R. K. Getoor.Markov processes and potential theory. Vol. Vol. 29. Pure and Applied Mathematics. Academic Press, New York-London, 1968, pp. x+313

work page 1968
[5]

Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data

M. Chen, K. Huang, T. Zhao, and M. Wang. “Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data”. In:Proceedings of the 40th International Conference on Machine Learning. Vol. 202. 2023, pp. 4672–4712

work page 2023
[6]

Christensen, J

S. Christensen, J. Kallsen, C. Strauch, and L. Trottner.Beyond Fixed Horizons: A Theoretical Frame- work for Adaptive Denoising Diffusions. 2026. arXiv:2501.19373 [stat.ML]

work page arXiv 2026
[7]

K. L. Chung and J. B. Walsh.Markov processes, Brownian motion, and time symmetry. Second. Vol. 249. Grundlehren der mathematischen Wissenschaften. Springer, New York, 2005, pp. xii+431. doi:10.1007/0-387-28696-9

work page doi:10.1007/0-387-28696-9 2005
[8]

K. L. Chung and Z. X. Zhao.From Brownian motion to Schrödinger’s equation. Vol. 312. Grundlehren der mathematischen Wissenschaften. Springer-Verlag, Berlin, 1995, pp. xii+287.doi:10.1007/ 978-3-642-57856-4

work page 1995
[9]

Deep neural network approximation theory

D. Elbrächter, D. Perekrestenko, P. Grohs, and H. Bölcskei. “Deep neural network approximation theory”. In:IEEE Trans. Inform. Theory67.5 (2021), pp. 2581–2623.doi:10 . 1109 / TIT . 2021 . 3062161

work page 2021
[10]

J. Fan, Y. Gu, and X. Li.Optimal estimation of a factorizable density using diffusion models with ReLU neural networks. 2025. arXiv:2510.03994 [math.ST]

work page arXiv 2025
[11]

Testing the manifold hypothesis,

C. Fefferman, S. Mitter, and H. Narayanan. “Testing the manifold hypothesis”. In:J. Amer. Math. Soc.29.4 (2016), pp. 983–1049.doi:10.1090/jams/852

work page doi:10.1090/jams/852 2016
[12]

Gilbarg and N

D. Gilbarg and N. S. Trudinger.Elliptic partial differential equations of second order. Classics in Mathematics. Reprint of the 1998 edition. Springer-Verlag, Berlin, 2001, pp. xiv+517. 46

work page 1998
[13]

A. Holk, C. Strauch, and L. Trottner.Reflected diffusion models adapt to low-dimensional data

work page
[14]

arXiv:2603.24495 [math.ST]

work page arXiv
[15]

Statistical guarantees for denoising reflected diffusion mod- els

A. Holk, C. Strauch, and L. Trottner. “Statistical guarantees for denoising reflected diffusion mod- els”. In:J. Mach. Learn. Res.(to appear). arXiv:2411.01563 [math.ST]

work page arXiv
[16]

Exit time moments, boundary value problems, and the geometry of domains in Euclidean space

K. K. J. Kinateder, P. McDonald, and D. Miller. “Exit time moments, boundary value problems, and the geometry of domains in Euclidean space”. In:Probab. Theory Related Fields111.4 (1998), pp. 469–487.doi:10.1007/s004400050174

work page doi:10.1007/s004400050174 1998
[17]

Calculation and estimation of the Poisson kernel

S. G. Krantz. “Calculation and estimation of the Poisson kernel”. In:J. Math. Anal. Appl.302.1 (2005), pp. 143–148.doi:10.1016/j.jmaa.2004.08.010

work page doi:10.1016/j.jmaa.2004.08.010 2005
[18]

Nonparametric Estimation of a Factorizable Density using Diffusion Models

H. K. Kwon, D. Kim, I. Ohn, and M. Chae. “Nonparametric Estimation of a Factorizable Density using Diffusion Models”. In:J. Mach. Learn. Res.27.22 (2026), pp. 1–125

work page 2026
[19]

Le Gall.Brownian Motion, Martingales, and Stochastic Calculus

J.-F. Le Gall.Brownian Motion, Martingales, and Stochastic Calculus. First. Graduate Texts in Math- ematics. Springer Cham, 2016, pp. xxiii+273.doi:10.1007/978-3-319-31089-3

work page doi:10.1007/978-3-319-31089-3 2016
[20]

J. M. Lee.Introduction to smooth manifolds. Second. Vol. 218. Graduate Texts in Mathematics. Springer, New York, 2013, pp. 28–29

work page 2013
[21]

J. M. Lee.Riemannian manifolds. Vol. 176. Graduate Texts in Mathematics. An introduction to curvature. Springer-Verlag, New York, 1997, pp. 35–36.doi:10.1007/b98852

work page doi:10.1007/b98852 1997
[22]

Reflected Diffusion Models

A. Lou and S. Ermon. “Reflected Diffusion Models”. In:Proceedings of the 40th International Con- ference on Machine Learning. Vol. 202. 2023, pp. 22675–22701

work page 2023
[23]

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

A. Lou, C. Meng, and S. Ermon. “Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution”. In:Proceedings of the 41st International Conference on Machine Learning. Vol. 235. 2024, pp. 32819–32848

work page 2024
[24]

Metsch.Parabolic and Elliptic Schauder Theory on Manifolds for a Fourth-Order Problem with a First- and a Third-Order Boundary Condition

J.-H. Metsch.Parabolic and Elliptic Schauder Theory on Manifolds for a Fourth-Order Problem with a First- and a Third-Order Boundary Condition. 2023. arXiv:2304.04184 [math.AP]

work page arXiv 2023
[25]

Diffusion Models are Minimax Optimal Distribution Estima- tors

K. Oko, S. Akiyama, and T. Suzuki. “Diffusion Models are Minimax Optimal Distribution Estima- tors”. In:International Conference on Machine Learning. 2023

work page 2023
[26]

L. C. G. Rogers and D. Williams.Diffusions, Markov processes, and martingales. Vol. 2. Cambridge Mathematical Library. Cambridge University Press, Cambridge, 2000, pp. xiv+480.doi:10.1017/ CBO9781107590120

work page 2000
[27]

Spherical harmonics

R. T. Seeley. “Spherical harmonics”. In:Amer. Math. Monthly73.4 (1966), pp. 115–121.doi:10. 2307/2313760

work page 1966
[28]

Score-Based Gener- ative Modeling through Stochastic Differential Equations

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. “Score-Based Gener- ative Modeling through Stochastic Differential Equations”. In:International Conference on Learn- ing Representations. 2021

work page 2021
[29]

Generalization bounds for score-based generative models: a synthetic proof

A. Stéphanovitch, E. Aamari, and C. Levrard.Generalization bounds for score-based generative models: a synthetic proof. 2025. arXiv:2507.04794 [math.ST]

work page arXiv 2025
[30]

Šubin.Pseudodifferential Operators and Spectral Theory

M. Šubin.Pseudodifferential Operators and Spectral Theory. Soviet Mathematics Series. Springer- Verlag, 1987, pp. 167–169

work page 1987
[31]

Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality

T. Suzuki. “Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality”. In:International Conference on Learning Rep- resentations. 2019

work page 2019
[32]

Adaptivity of Diffusion Models to Manifold Structures

R. Tang and Y. Yang. “Adaptivity of Diffusion Models to Manifold Structures”. In:Proceedings of The 27th International Conference on Artificial Intelligence and Statistics. Ed. by S. Dasgupta, S. Mandt, and Y. Li. Vol. 238. Proceedings of Machine Learning Research. 2024, pp. 1648–1656

work page 2024
[33]

Score-based diffusion models via stochastic differential equations

W. Tang and H. Zhao. “Score-based diffusion models via stochastic differential equations”. In: Stat. Surv.19 (2025), pp. 28–64.doi:10.1214/25-ss152. 47

work page doi:10.1214/25-ss152 2025
[34]

L. N. Trefethen.Approximation theory and approximation practice. Extended. Society for Indus- trial and Applied Mathematics (SIAM), Philadelphia, PA, 2020, pp. xi+363

work page 2020
[35]

State Size Independent Statistical Error Bound for Discrete Diffusion Models

S. Wakasugi and T. Suzuki. “State Size Independent Statistical Error Bound for Discrete Diffusion Models”. In:The Thirty-ninth Annual Conference on Neural Information Processing Systems. 2026

work page 2026
[36]

Generalization error bound for denoising score matching under relaxed manifold assumption

K. Yakovlev and N. Puchkin. “Generalization error bound for denoising score matching under relaxed manifold assumption”. In:Proceedings of Thirty Eighth Conference on Learning Theory. Ed. by N. Haghtalab and A. Moitra. Vol. 291. 2025, pp. 5824–5891

work page 2025
[37]

Information-theoretic determination of minimax rates of convergence

Y. Yang and A. Barron. “Information-theoretic determination of minimax rates of convergence”. In:Ann. Statist.27.5 (1999), pp. 1564–1599.doi:10.1214/aos/1017939142

work page doi:10.1214/aos/1017939142 1999
[38]

First Hitting Diffusion Models for Generating Manifold, Graph and Categorical Data

M. Ye, L. Wu, and Q. Liu. “First Hitting Diffusion Models for Generating Manifold, Graph and Categorical Data”. In:Advances in Neural Information Processing Systems. Vol. 35. 2022, pp. 27280– 27292

work page 2022
[39]

Minimax optimality of score-based diffusion models: beyond the density lower bound assumptions

K. Zhang, C. H. Yin, F. Liang, and J. Liu. “Minimax optimality of score-based diffusion models: beyond the density lower bound assumptions”. In:Proceedings of the 41st International Conference on Machine Learning. Vienna, Austria, 2024. 48

work page 2024