pith. sign in

arxiv: 2606.03820 · v1 · pith:Q4KOS5N5new · submitted 2026-06-02 · 📊 stat.ML · cs.LG

A Quantitative Approximation Framework for Flow Distillation in Diffusion Models

Pith reviewed 2026-06-28 07:57 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords diffusion distillationprobability flow ODEresidual networksLipschitz stabilitynon-uniform time gridGaussian mixture modelOrnstein-Uhlenbeck processscore approximation
0
0 comments X

The pith

Residual compositions approximate long-horizon transport in diffusion flows with global error controlled by the stability amplification factor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a quantitative framework that treats few-step sampling in diffusion models as error propagation through compositions of learned flow maps for the probability-flow ODE. It separates the task of approximating the time-dependent score field from the task of controlling dynamical amplification that arises when the underlying dynamics become stiff in low-noise multimodal regimes. Within an analytically tractable Gaussian-mixture Ornstein-Uhlenbeck process, explicit L^p guarantees show that ReLU-ReQU networks approximate the score with polylogarithmic dependence on accuracy, while an explicit bound L(t) on the spatial Lipschitz constant converts into a flow-map stability estimate governed by the time integral of L(u). These estimates establish that deep residual compositions efficiently approximate long-horizon transport and that a Lipschitz-mismatch regime renders one-step distillation structurally unfavorable, yielding a non-uniform time grid obtained by uniform partitioning in the cumulative stability coordinate.

Core claim

In an analytically tractable Gaussian-mixture Ornstein--Uhlenbeck setting, deep residual compositions efficiently approximate the long-horizon transport, with global error controlled by the stability amplification factor, and a Lipschitz-mismatch regime makes one-step distillation structurally unfavorable; the resulting theory yields a stability-balanced non-uniform time grid obtained by uniform partitioning in the cumulative stability coordinate.

What carries the argument

The stability amplification factor obtained from the time integral of the spatial Lipschitz constant L(t) of the probability-flow velocity; it governs error propagation across compositions of flow maps.

If this is right

  • Global error in residual compositions remains controlled by the stability amplification factor instead of accumulating local errors.
  • One-step distillation is structurally unfavorable whenever the Lipschitz constant grows substantially at late times.
  • Uniform partitioning in the cumulative stability coordinate produces a non-uniform time grid that improves few-step sampling.
  • ReLU-ReQU networks achieve score approximation with depth and width scaling polylogarithmically in target accuracy and mixture geometry.
  • The framework predicts and experiments confirm up to 51.9 percent reduction in relative MSE with eight segments versus uniform grids.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The separation of approximation error from stability amplification could extend to other stiff ODE-based generative models.
  • The stability coordinate might guide adaptive step-size selection in sampling algorithms outside the diffusion setting.
  • Direct numerical verification of the explicit L(t) bound on non-Gaussian multimodal data would test the reach of the analysis.
  • The approach connects to classical numerical methods for integrating stiff dynamical systems.

Load-bearing premise

The Gaussian-mixture Ornstein-Uhlenbeck process is treated as representative of the multimodal low-noise regime where stability amplification occurs in diffusion models.

What would settle it

If the proposed stability-balanced non-uniform time grid fails to reduce end-to-end relative MSE relative to a uniform grid on the Gaussian-mixture Ornstein-Uhlenbeck diffusion model, the central prediction on grid optimality would be falsified.

read the original abstract

We develop a quantitative approximation framework for diffusion distillation, viewing few-step sampling as error propagation under compositions of learned flow maps. Focusing on trajectory distillation for the probability-flow ODE, we show that local approximation errors can be strongly amplified in low-noise multimodal regimes, where the underlying dynamics become stiff. In an analytically tractable Gaussian-mixture Ornstein--Uhlenbeck setting, we separate two core difficulties: approximating the time-dependent score field and controlling the dynamical amplification governed by the time-integrated Jacobian bound of the probability-flow ODE. On the approximation side, we prove constructive L^p(p_t) guarantees showing that ReLU--ReQU networks approximate the Gaussian-mixture score uniformly over time, with depth and width scaling polylogarithmically in the target accuracy and explicitly with the mixture geometry. On the stability side, we derive an explicit bound L(t) for the spatial Lipschitz constant of the probability-flow velocity and convert it into a flow map stability estimate governed by \int_s^t L(u)\,du, making late-time amplification in stiff regimes computable. Building on these estimates, we prove that deep residual compositions efficiently approximate the long-horizon transport, with global error controlled by the stability amplification factor, and identify a Lipschitz-mismatch regime in which one-step distillation is structurally unfavorable. The resulting theory yields a stability-balanced non-uniform time grid obtained by uniform partitioning in the cumulative stability coordinate. Experiments support the prediction and reduce end-to-end relative MSE by up to 51.9\% with 8 segments compared with uniform grids.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a quantitative approximation framework for flow distillation in diffusion models, viewing few-step sampling as error propagation under compositions of learned flow maps for the probability-flow ODE. In an analytically tractable Gaussian-mixture Ornstein-Uhlenbeck process, it proves constructive L^p(p_t) guarantees for ReLU-ReQU networks approximating the time-dependent score field (with depth/width scaling polylogarithmically in accuracy and explicitly with mixture geometry), derives an explicit spatial Lipschitz bound L(t) on the probability-flow velocity, converts it to a flow-map stability estimate governed by ∫_s^t L(u) du, proves that deep residual compositions approximate long-horizon transport with global error controlled by the stability amplification factor, identifies a Lipschitz-mismatch regime in which one-step distillation is structurally unfavorable, and constructs a stability-balanced non-uniform time grid via uniform partitioning in the cumulative stability coordinate. Experiments report up to 51.9% reduction in end-to-end relative MSE with 8 segments versus uniform grids.

Significance. If the separation of approximation versus stability difficulties, the explicit bounds, and the resulting non-uniform grid construction hold and transfer, the work supplies a rigorous, constructive theoretical basis for understanding amplification in stiff multimodal regimes and for designing better distillation schedules. The polylogarithmic network-size guarantees and parameter-free stability integral are particular strengths that could guide practical choices beyond the specific setting analyzed.

major comments (2)
  1. [Abstract and main theoretical sections] Abstract and theoretical development (all quantitative results on L^p guarantees, L(t), ∫ L(u) du stability, residual-composition error, Lipschitz-mismatch regime, and non-uniform grid): these are obtained exclusively inside the Gaussian-mixture Ornstein-Uhlenbeck process and presented as representative of the multimodal low-noise regime of interest, yet no extension argument, robustness check, or counter-example analysis is supplied showing that the separation of approximation and stability difficulties survives when the score field or dynamics deviate from this mixture structure. This is load-bearing for the applicability claim to general diffusion models.
  2. [Abstract] Abstract: the claims that proofs exist for the network approximation and stability bound are stated, but the manuscript does not include the full derivations in a form that permits verification of whether the L^p(p_t) guarantees hold uniformly over time or whether the Lipschitz-mismatch regime is correctly identified; this directly affects soundness of the central quantitative claims.
minor comments (2)
  1. Notation for the cumulative stability coordinate and the precise definition of the non-uniform grid construction could be clarified with an explicit equation or algorithm box for reproducibility.
  2. The experimental section would benefit from reporting the precise mixture parameters and noise schedule used in the Gaussian-mixture OU simulations to allow direct comparison with the theoretical L(t) bound.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, clarifying the scope of our results while proposing targeted revisions to improve clarity and verifiability.

read point-by-point responses
  1. Referee: [Abstract and main theoretical sections] Abstract and theoretical development (all quantitative results on L^p guarantees, L(t), ∫ L(u) du stability, residual-composition error, Lipschitz-mismatch regime, and non-uniform grid): these are obtained exclusively inside the Gaussian-mixture Ornstein-Uhlenbeck process and presented as representative of the multimodal low-noise regime of interest, yet no extension argument, robustness check, or counter-example analysis is supplied showing that the separation of approximation and stability difficulties survives when the score field or dynamics deviate from this mixture structure. This is load-bearing for the applicability claim to general diffusion models.

    Authors: The Gaussian-mixture OU process is deliberately selected for analytical tractability to derive explicit, constructive bounds that separate approximation error from dynamical stability amplification. The manuscript frames the contribution as a quantitative case study revealing the Lipschitz-mismatch phenomenon and the utility of stability-balanced discretization, rather than a universal theorem for arbitrary score fields. We will add a dedicated limitations paragraph in the revised manuscript that explicitly states the setting-specific nature of the proofs and discusses how the identified mismatch regime may inform schedule design in broader multimodal regimes, without claiming automatic transfer. revision: partial

  2. Referee: [Abstract] Abstract: the claims that proofs exist for the network approximation and stability bound are stated, but the manuscript does not include the full derivations in a form that permits verification of whether the L^p(p_t) guarantees hold uniformly over time or whether the Lipschitz-mismatch regime is correctly identified; this directly affects soundness of the central quantitative claims.

    Authors: The complete proofs appear in the appendix. To improve accessibility and allow direct verification of time-uniformity and the mismatch identification, we will insert concise proof sketches (including key intermediate steps for the L^p bounds and the stability integral) into the main theoretical sections of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: explicit derivations of bounds and guarantees within the model

full rationale

The paper performs constructive mathematical derivations inside the Gaussian-mixture Ornstein-Uhlenbeck process: it derives an explicit spatial Lipschitz bound L(t) on the probability-flow velocity, converts it to a stability estimate via the integral of L(u) du, proves L^p approximation guarantees with polylog network scaling, controls residual composition error by the stability factor, and obtains the non-uniform grid by uniform partitioning in the cumulative stability coordinate. None of these steps reduce to a self-definition, a fitted parameter renamed as a prediction, or a load-bearing self-citation chain; each is obtained by direct analysis of the model dynamics and score field. The limitation to this analytically tractable setting is a question of scope and transfer, not a circular reduction of the claimed results to their inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the choice of the Gaussian-mixture Ornstein-Uhlenbeck process as the analytically tractable model in which both approximation and stability can be controlled explicitly; no additional free parameters or invented entities are introduced beyond standard neural-network approximation theory.

axioms (2)
  • domain assumption Data distribution is a finite Gaussian mixture evolving under an Ornstein-Uhlenbeck process
    Invoked to obtain an analytically tractable setting where the score field and the Jacobian of the probability-flow ODE can be written in closed form.
  • domain assumption ReLU-ReQU networks are used for score approximation
    The constructive L^p(p_t) guarantees are proved specifically for this network class.

pith-pipeline@v0.9.1-grok · 5809 in / 1635 out tokens · 22398 ms · 2026-06-28T07:57:28.235720+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 12 canonical work pages · 1 internal anchor

  1. [1]

    Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information theory, 39(3):930–945, 2002

    Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information theory, 39(3):930–945, 2002

  2. [2]

    Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations.Neural Networks, 161:242–253, 2023

    Denis Belomestny, Alexey Naumov, Nikita Puchkin, and Sergey Samsonov. Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations.Neural Networks, 161:242–253, 2023

  3. [3]

    On the edge of memorization in diffusion models

    Sam Buchanan, Druv Pai, Yi Ma, and Valentin De Bortoli. On the edge of memorization in diffusion models. InAdvances in Neural Information Processing Systems, 2025

  4. [4]

    Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data

    Minshuo Chen, Kaixuan Huang, Tuo Zhao, and Mengdi Wang. Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data. In International Conference on Machine Learning, 2023

  5. [5]

    Sam- pling is as easy as learning the score: Theory for diffusion models with minimal data assumptions

    Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru R Zhang. Sam- pling is as easy as learning the score: Theory for diffusion models with minimal data assumptions. InInternational Conference on Learning Representations, 2023

  6. [6]

    Lipschitz-Guided Design of Interpolation Schedules in Generative Models

    Yifan Chen, Eric Vanden-Eijnden, and Jiawei Xu. Lipschitz-guided design of interpola- tion schedules in generative models.arXiv preprint arXiv:2509.01629, 2025

  7. [7]

    What does guidance do? a fine-grained analysis in a simple setting

    Muthu Chidambaram, Khashayar Gatmiry, Sitan Chen, Holden Lee, and Jianfeng Lu. What does guidance do? a fine-grained analysis in a simple setting. InAdvances in Neural Information Processing Systems, 2024

  8. [8]

    Analysis of learning a flow-based generative model from limited sample complexity

    Hugo Cui, Florent Krzakala, Eric Vanden-Eijnden, and Lenka Zdeborova. Analysis of learning a flow-based generative model from limited sample complexity. InInternational Conference on Learning Representations, 2023. 35

  9. [9]

    Convergence of denoising diffusion models under the manifold hy- pothesis.Transactions on Machine Learning Research, 2022

    Valentin De Bortoli. Convergence of denoising diffusion models under the manifold hy- pothesis.Transactions on Machine Learning Research, 2022

  10. [10]

    Neural network approximation

    Ronald DeVore, Boris Hanin, and Guergana Petrova. Neural network approximation. Acta Numerica, 30:327–444, 2021

  11. [11]

    Characteristic learning for provable one step generation.arXiv preprint arXiv:2405.05512, 2024

    Zhao Ding, Chenguang Duan, Yuling Jiao, Ruoxuan Li, Jerry Zhijian Yang, and Ping- wen Zhang. Characteristic learning for provable one step generation.arXiv preprint arXiv:2405.05512, 2024

  12. [12]

    Overparameterization of deep ResNet: Zero loss and mean-field analysis.Journal of Machine Learning Research, 23 (48):1–65, 2022

    Zhiyan Ding, Shi Chen, Qin Li, and Stephen J Wright. Overparameterization of deep ResNet: Zero loss and mean-field analysis.Journal of Machine Learning Research, 23 (48):1–65, 2022

  13. [13]

    One step diffusion via shortcut models

    Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One step diffusion via shortcut models. InInternational Conference on Learning Representations, 2025

  14. [14]

    How do flow matching models memorize and generalize in sample data subspaces?arXiv preprint arXiv:2410.23594, 2024

    Weiguo Gao and Ming Li. How do flow matching models memorize and generalize in sample data subspaces?arXiv preprint arXiv:2410.23594, 2024

  15. [15]

    Toward theoretical insights into diffusion trajectory distillation via operator merging.Neural Networks, 202:109023, 2026

    Weiguo Gao and Ming Li. Toward theoretical insights into diffusion trajectory distillation via operator merging.Neural Networks, 202:109023, 2026

  16. [16]

    Terminally constrained flow-based generative models from an optimal control perspective.arXiv preprint arXiv:2601.09474, 2026

    Weiguo Gao, Ming Li, and Qianxiao Li. Terminally constrained flow-based generative models from an optimal control perspective.arXiv preprint arXiv:2601.09474, 2026

  17. [17]

    Learning mixtures of Gaussians using diffusion models.arXiv preprint arXiv:2404.18869, 2024

    Khashayar Gatmiry, Jonathan Kelner, and Holden Lee. Learning mixtures of Gaussians using diffusion models.arXiv preprint arXiv:2404.18869, 2024

  18. [18]

    Mean flows for one-step generative modeling

    Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling. InAdvances in Neural Information Processing Systems, 2025

  19. [19]

    BOOT: Data-free distillation of denoising diffusion models with bootstrapping

    Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, and Joshua M Susskind. BOOT: Data-free distillation of denoising diffusion models with bootstrapping. InICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling, 2023

  20. [20]

    Gaussian mixture solvers for diffusion models

    Hanzhong Guo, Cheng Lu, Fan Bao, Tianyu Pang, Shuicheng Yan, Chao Du, and Chongxuan Li. Gaussian mixture solvers for diffusion models. InAdvances in Neural Information Processing Systems, 2023

  21. [21]

    Neural network-based score esti- mation in diffusion models: Optimization and generalization

    Yinbin Han, Meisam Razaviyayn, and Renyuan Xu. Neural network-based score esti- mation in diffusion models: Optimization and generalization. InAdvances in Neural Information Processing Systems, 2024

  22. [22]

    Zhang, Shaoqing Ren, and Jian Sun

    Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im- age recognition.2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015

  23. [23]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, 2020

  24. [24]

    Structured diffusion models with mixture of Gaussians as prior distribution.arXiv preprint arXiv:2410.19149, 2024

    Nanshan Jia, Tingyu Zhu, Haoyu Liu, and Zeyu Zheng. Structured diffusion models with mixture of Gaussians as prior distribution.arXiv preprint arXiv:2410.19149, 2024. 36

  25. [25]

    Elucidating the design space of diffusion-based generative models

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. InAdvances in Neural Information Processing Systems, 2022

  26. [26]

    Convergence for score-based generative model- ing with polynomial complexity

    Holden Lee, Jianfeng Lu, and Yixin Tan. Convergence for score-based generative model- ing with polynomial complexity. InAdvances in Neural Information Processing Systems, 2022

  27. [27]

    Better approximations of high dimensional smooth functions by deep neural networks with rectified power units.Communications in Computational Physics, 2019

    Bo Li, Shanshan Tang, and Haijun Yu. Better approximations of high dimensional smooth functions by deep neural networks with rectified power units.Communications in Computational Physics, 2019

  28. [28]

    Faster diffusion models via higher- order approximation.arXiv preprint arXiv:2506.24042, 2025

    Gen Li, Yuchen Zhou, Yuting Wei, and Yuxin Chen. Faster diffusion models via higher- order approximation.arXiv preprint arXiv:2506.24042, 2025

  29. [29]

    Critical windows: Non-asymptotic theory for feature emer- gence in diffusion models

    Marvin Li and Sitan Chen. Critical windows: Non-asymptotic theory for feature emer- gence in diffusion models. InInternational Conference on Machine Learning, 2024

  30. [30]

    Un- raveling the smoothness properties of diffusion models: A Gaussian mixture perspective

    Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Mingda Wan, and Yufa Zhou. Un- raveling the smoothness properties of diffusion models: A Gaussian mixture perspective. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025

  31. [31]

    DPM- Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM- Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. InAdvances in Neural Information Processing Systems, 2022

  32. [32]

    Resolving memorization in empirical diffusion model for manifold data in high-dimensional spaces.arXiv preprint arXiv:2505.02508, 2025

    Yang Lyu, Tan Minh Nguyen, Yuchun Qian, and Xin T Tong. Resolving memorization in empirical diffusion model for manifold data in high-dimensional spaces.arXiv preprint arXiv:2505.02508, 2025

  33. [33]

    Mean-field theory of two-layers neural networks: Dimension-free bounds and kernel limit

    Song Mei, Theodor Misiakiewicz, and Andrea Montanari. Mean-field theory of two-layers neural networks: Dimension-free bounds and kernel limit. InConference on Learning Theory, 2019

  34. [34]

    Neural networks for optimal approximation of smooth and ana- lytic functions.Neural Computation, 8(1):164–177, 1996

    Hrushikesh N Mhaskar. Neural networks for optimal approximation of smooth and ana- lytic functions.Neural Computation, 8(1):164–177, 1996

  35. [35]

    Diffusion models are minimax optimal distribution estimators

    Kazusato Oko, Shunta Akiyama, and Taiji Suzuki. Diffusion models are minimax optimal distribution estimators. InInternational Conference on Machine Learning, 2023

  36. [36]

    Progressive distillation for fast sampling of diffusion models

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Conference on Learning Representations, 2022

  37. [37]

    Adversarial diffusion distillation

    Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. InEuropean Conference on Computer Vision, 2024

  38. [38]

    Learning mixtures of Gaussians using the DDPM objective

    Kulin Shah, Sitan Chen, and Adam Klivans. Learning mixtures of Gaussians using the DDPM objective. InAdvances in Neural Information Processing Systems, 2023

  39. [39]

    Mean field analysis of neural networks: A law of large numbers.SIAM Journal on Applied Mathematics, 80(2):725–752, 2020

    Justin Sirignano and Konstantinos Spiliopoulos. Mean field analysis of neural networks: A law of large numbers.SIAM Journal on Applied Mathematics, 80(2):725–752, 2020

  40. [40]

    Denoising diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. InInternational Conference on Learning Representations, 2020. 37

  41. [41]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2020

  42. [42]

    Consistency models

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In International Conference on Machine Learning, 2023

  43. [43]

    Adaptivity of diffusion models to manifold structures

    Rong Tang and Yun Yang. Adaptivity of diffusion models to manifold structures. In International Conference on Artificial Intelligence and Statistics, 2024

  44. [44]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, 2017

  45. [45]

    Are we really learning the score function? reinterpreting diffusion models through Wasserstein gradient flow matching

    An B Vuong, Michael T McCann, Javier E Santos, and Yen Ting Lin. Are we really learning the score function? reinterpreting diffusion models through Wasserstein gradient flow matching. InNeurIPS Workshop on Structured Probabilistic Inference, 2025

  46. [46]

    Diffusion mod- els learn low-dimensional distributions via subspace clustering

    Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, and Qing Qu. Diffusion mod- els learn low-dimensional distributions via subspace clustering. InInternational Confer- ence on Learning Representations 2025 Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy, 2024

  47. [47]

    Error estimates of a training-free diffusion model for high-dimensional sampling

    Pengjun Wang, Zezhong Zhang, Minglei Yang, Feng Bao, Yanzhao Cao, and Guannan Zhang. Error estimates of a training-free diffusion model for high-dimensional sampling. arXiv preprint arXiv:2601.19740, 2026

  48. [48]

    Simultaneous approximation of the score func- tion and its derivatives by deep neural networks.arXiv preprint arXiv:2512.23643, 2025

    Konstantin Yakovlev and Nikita Puchkin. Simultaneous approximation of the score func- tion and its derivatives by deep neural networks.arXiv preprint arXiv:2512.23643, 2025

  49. [49]

    Nearly optimal VC-dimension and pseudo- dimension bounds for deep neural network derivatives

    Yahong Yang, Haizhao Yang, and Yang Xiang. Nearly optimal VC-dimension and pseudo- dimension bounds for deep neural network derivatives. InAdvances in Neural Information Processing Systems, 2023

  50. [50]

    Lipschitz singularities in diffusion models

    Zhantao Yang, Ruili Feng, Han Zhang, Yujun Shen, Kai Zhu, Lianghua Huang, Yifei Zhang, Yu Liu, Deli Zhao, Jingren Zhou, et al. Lipschitz singularities in diffusion models. InInternational Conference on Learning Representations, 2023

  51. [51]

    Improved distribution matching distillation for fast image syn- thesis

    Tianwei Yin, Micha¨ el Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Du- rand, and Bill Freeman. Improved distribution matching distillation for fast image syn- thesis. InAdvances in Neural Information Processing Systems, 2024

  52. [52]

    One-step diffusion with distribution matching distillation

    Tianwei Yin, Micha¨ el Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024

  53. [53]

    Exact diffusion inversion via bidirectional integration approximation

    Guoqiang Zhang, Jonathan P Lewis, and W Bastiaan Kleijn. Exact diffusion inversion via bidirectional integration approximation. InEuropean Conference on Computer Vision, 2024. 38

  54. [54]

    Stability and generalizability in SDE diffusion models with measure-preserving dynamics

    Weitong Zhang, Chengqi Zang, Liu Li, Sarah Cechnicka, Cheng Ouyang, and Bernhard Kainz. Stability and generalizability in SDE diffusion models with measure-preserving dynamics. InAdvances in Neural Information Processing Systems, 2024

  55. [55]

    UniPC: A unified predictor-corrector framework for fast sampling of diffusion models

    Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, and Jiwen Lu. UniPC: A unified predictor-corrector framework for fast sampling of diffusion models. InAdvances in Neural Information Processing Systems, 2023

  56. [56]

    Expressive power of deep networks on manifolds: Simultaneous approximation.arXiv preprint arXiv:2509.09362, 2025

    Hanfei Zhou and Lei Shi. Expressive power of deep networks on manifolds: Simultaneous approximation.arXiv preprint arXiv:2509.09362, 2025

  57. [57]

    Smoothing the score function for generalization in diffusion models: An optimization-based explanation framework.arXiv preprint arXiv:2601.19285, 2026

    Xinyu Zhou, Jiawei Zhang, and Stephen J Wright. Smoothing the score function for generalization in diffusion models: An optimization-based explanation framework.arXiv preprint arXiv:2601.19285, 2026

  58. [58]

    Simple distillation for one-step diffusion models

    Huaisheng Zhu, Teng Xiao, Shijie Zhou, Zhimeng Guo, Hangfan Zhang, Siyuan Xu, and Vasant G Honavar. Simple distillation for one-step diffusion models. InAdvances in Neural Information Processing Systems, 2025. 39