A Quantitative Approximation Framework for Flow Distillation in Diffusion Models
Pith reviewed 2026-06-28 07:57 UTC · model grok-4.3
The pith
Residual compositions approximate long-horizon transport in diffusion flows with global error controlled by the stability amplification factor.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In an analytically tractable Gaussian-mixture Ornstein--Uhlenbeck setting, deep residual compositions efficiently approximate the long-horizon transport, with global error controlled by the stability amplification factor, and a Lipschitz-mismatch regime makes one-step distillation structurally unfavorable; the resulting theory yields a stability-balanced non-uniform time grid obtained by uniform partitioning in the cumulative stability coordinate.
What carries the argument
The stability amplification factor obtained from the time integral of the spatial Lipschitz constant L(t) of the probability-flow velocity; it governs error propagation across compositions of flow maps.
If this is right
- Global error in residual compositions remains controlled by the stability amplification factor instead of accumulating local errors.
- One-step distillation is structurally unfavorable whenever the Lipschitz constant grows substantially at late times.
- Uniform partitioning in the cumulative stability coordinate produces a non-uniform time grid that improves few-step sampling.
- ReLU-ReQU networks achieve score approximation with depth and width scaling polylogarithmically in target accuracy and mixture geometry.
- The framework predicts and experiments confirm up to 51.9 percent reduction in relative MSE with eight segments versus uniform grids.
Where Pith is reading between the lines
- The separation of approximation error from stability amplification could extend to other stiff ODE-based generative models.
- The stability coordinate might guide adaptive step-size selection in sampling algorithms outside the diffusion setting.
- Direct numerical verification of the explicit L(t) bound on non-Gaussian multimodal data would test the reach of the analysis.
- The approach connects to classical numerical methods for integrating stiff dynamical systems.
Load-bearing premise
The Gaussian-mixture Ornstein-Uhlenbeck process is treated as representative of the multimodal low-noise regime where stability amplification occurs in diffusion models.
What would settle it
If the proposed stability-balanced non-uniform time grid fails to reduce end-to-end relative MSE relative to a uniform grid on the Gaussian-mixture Ornstein-Uhlenbeck diffusion model, the central prediction on grid optimality would be falsified.
read the original abstract
We develop a quantitative approximation framework for diffusion distillation, viewing few-step sampling as error propagation under compositions of learned flow maps. Focusing on trajectory distillation for the probability-flow ODE, we show that local approximation errors can be strongly amplified in low-noise multimodal regimes, where the underlying dynamics become stiff. In an analytically tractable Gaussian-mixture Ornstein--Uhlenbeck setting, we separate two core difficulties: approximating the time-dependent score field and controlling the dynamical amplification governed by the time-integrated Jacobian bound of the probability-flow ODE. On the approximation side, we prove constructive L^p(p_t) guarantees showing that ReLU--ReQU networks approximate the Gaussian-mixture score uniformly over time, with depth and width scaling polylogarithmically in the target accuracy and explicitly with the mixture geometry. On the stability side, we derive an explicit bound L(t) for the spatial Lipschitz constant of the probability-flow velocity and convert it into a flow map stability estimate governed by \int_s^t L(u)\,du, making late-time amplification in stiff regimes computable. Building on these estimates, we prove that deep residual compositions efficiently approximate the long-horizon transport, with global error controlled by the stability amplification factor, and identify a Lipschitz-mismatch regime in which one-step distillation is structurally unfavorable. The resulting theory yields a stability-balanced non-uniform time grid obtained by uniform partitioning in the cumulative stability coordinate. Experiments support the prediction and reduce end-to-end relative MSE by up to 51.9\% with 8 segments compared with uniform grids.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a quantitative approximation framework for flow distillation in diffusion models, viewing few-step sampling as error propagation under compositions of learned flow maps for the probability-flow ODE. In an analytically tractable Gaussian-mixture Ornstein-Uhlenbeck process, it proves constructive L^p(p_t) guarantees for ReLU-ReQU networks approximating the time-dependent score field (with depth/width scaling polylogarithmically in accuracy and explicitly with mixture geometry), derives an explicit spatial Lipschitz bound L(t) on the probability-flow velocity, converts it to a flow-map stability estimate governed by ∫_s^t L(u) du, proves that deep residual compositions approximate long-horizon transport with global error controlled by the stability amplification factor, identifies a Lipschitz-mismatch regime in which one-step distillation is structurally unfavorable, and constructs a stability-balanced non-uniform time grid via uniform partitioning in the cumulative stability coordinate. Experiments report up to 51.9% reduction in end-to-end relative MSE with 8 segments versus uniform grids.
Significance. If the separation of approximation versus stability difficulties, the explicit bounds, and the resulting non-uniform grid construction hold and transfer, the work supplies a rigorous, constructive theoretical basis for understanding amplification in stiff multimodal regimes and for designing better distillation schedules. The polylogarithmic network-size guarantees and parameter-free stability integral are particular strengths that could guide practical choices beyond the specific setting analyzed.
major comments (2)
- [Abstract and main theoretical sections] Abstract and theoretical development (all quantitative results on L^p guarantees, L(t), ∫ L(u) du stability, residual-composition error, Lipschitz-mismatch regime, and non-uniform grid): these are obtained exclusively inside the Gaussian-mixture Ornstein-Uhlenbeck process and presented as representative of the multimodal low-noise regime of interest, yet no extension argument, robustness check, or counter-example analysis is supplied showing that the separation of approximation and stability difficulties survives when the score field or dynamics deviate from this mixture structure. This is load-bearing for the applicability claim to general diffusion models.
- [Abstract] Abstract: the claims that proofs exist for the network approximation and stability bound are stated, but the manuscript does not include the full derivations in a form that permits verification of whether the L^p(p_t) guarantees hold uniformly over time or whether the Lipschitz-mismatch regime is correctly identified; this directly affects soundness of the central quantitative claims.
minor comments (2)
- Notation for the cumulative stability coordinate and the precise definition of the non-uniform grid construction could be clarified with an explicit equation or algorithm box for reproducibility.
- The experimental section would benefit from reporting the precise mixture parameters and noise schedule used in the Gaussian-mixture OU simulations to allow direct comparison with the theoretical L(t) bound.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below, clarifying the scope of our results while proposing targeted revisions to improve clarity and verifiability.
read point-by-point responses
-
Referee: [Abstract and main theoretical sections] Abstract and theoretical development (all quantitative results on L^p guarantees, L(t), ∫ L(u) du stability, residual-composition error, Lipschitz-mismatch regime, and non-uniform grid): these are obtained exclusively inside the Gaussian-mixture Ornstein-Uhlenbeck process and presented as representative of the multimodal low-noise regime of interest, yet no extension argument, robustness check, or counter-example analysis is supplied showing that the separation of approximation and stability difficulties survives when the score field or dynamics deviate from this mixture structure. This is load-bearing for the applicability claim to general diffusion models.
Authors: The Gaussian-mixture OU process is deliberately selected for analytical tractability to derive explicit, constructive bounds that separate approximation error from dynamical stability amplification. The manuscript frames the contribution as a quantitative case study revealing the Lipschitz-mismatch phenomenon and the utility of stability-balanced discretization, rather than a universal theorem for arbitrary score fields. We will add a dedicated limitations paragraph in the revised manuscript that explicitly states the setting-specific nature of the proofs and discusses how the identified mismatch regime may inform schedule design in broader multimodal regimes, without claiming automatic transfer. revision: partial
-
Referee: [Abstract] Abstract: the claims that proofs exist for the network approximation and stability bound are stated, but the manuscript does not include the full derivations in a form that permits verification of whether the L^p(p_t) guarantees hold uniformly over time or whether the Lipschitz-mismatch regime is correctly identified; this directly affects soundness of the central quantitative claims.
Authors: The complete proofs appear in the appendix. To improve accessibility and allow direct verification of time-uniformity and the mismatch identification, we will insert concise proof sketches (including key intermediate steps for the L^p bounds and the stability integral) into the main theoretical sections of the revised manuscript. revision: yes
Circularity Check
No circularity: explicit derivations of bounds and guarantees within the model
full rationale
The paper performs constructive mathematical derivations inside the Gaussian-mixture Ornstein-Uhlenbeck process: it derives an explicit spatial Lipschitz bound L(t) on the probability-flow velocity, converts it to a stability estimate via the integral of L(u) du, proves L^p approximation guarantees with polylog network scaling, controls residual composition error by the stability factor, and obtains the non-uniform grid by uniform partitioning in the cumulative stability coordinate. None of these steps reduce to a self-definition, a fitted parameter renamed as a prediction, or a load-bearing self-citation chain; each is obtained by direct analysis of the model dynamics and score field. The limitation to this analytically tractable setting is a question of scope and transfer, not a circular reduction of the claimed results to their inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Data distribution is a finite Gaussian mixture evolving under an Ornstein-Uhlenbeck process
- domain assumption ReLU-ReQU networks are used for score approximation
Reference graph
Works this paper leans on
-
[1]
Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information theory, 39(3):930–945, 2002
Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information theory, 39(3):930–945, 2002
2002
-
[2]
Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations.Neural Networks, 161:242–253, 2023
Denis Belomestny, Alexey Naumov, Nikita Puchkin, and Sergey Samsonov. Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations.Neural Networks, 161:242–253, 2023
2023
-
[3]
On the edge of memorization in diffusion models
Sam Buchanan, Druv Pai, Yi Ma, and Valentin De Bortoli. On the edge of memorization in diffusion models. InAdvances in Neural Information Processing Systems, 2025
2025
-
[4]
Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data
Minshuo Chen, Kaixuan Huang, Tuo Zhao, and Mengdi Wang. Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data. In International Conference on Machine Learning, 2023
2023
-
[5]
Sam- pling is as easy as learning the score: Theory for diffusion models with minimal data assumptions
Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru R Zhang. Sam- pling is as easy as learning the score: Theory for diffusion models with minimal data assumptions. InInternational Conference on Learning Representations, 2023
2023
-
[6]
Lipschitz-Guided Design of Interpolation Schedules in Generative Models
Yifan Chen, Eric Vanden-Eijnden, and Jiawei Xu. Lipschitz-guided design of interpola- tion schedules in generative models.arXiv preprint arXiv:2509.01629, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
What does guidance do? a fine-grained analysis in a simple setting
Muthu Chidambaram, Khashayar Gatmiry, Sitan Chen, Holden Lee, and Jianfeng Lu. What does guidance do? a fine-grained analysis in a simple setting. InAdvances in Neural Information Processing Systems, 2024
2024
-
[8]
Analysis of learning a flow-based generative model from limited sample complexity
Hugo Cui, Florent Krzakala, Eric Vanden-Eijnden, and Lenka Zdeborova. Analysis of learning a flow-based generative model from limited sample complexity. InInternational Conference on Learning Representations, 2023. 35
2023
-
[9]
Convergence of denoising diffusion models under the manifold hy- pothesis.Transactions on Machine Learning Research, 2022
Valentin De Bortoli. Convergence of denoising diffusion models under the manifold hy- pothesis.Transactions on Machine Learning Research, 2022
2022
-
[10]
Neural network approximation
Ronald DeVore, Boris Hanin, and Guergana Petrova. Neural network approximation. Acta Numerica, 30:327–444, 2021
2021
-
[11]
Characteristic learning for provable one step generation.arXiv preprint arXiv:2405.05512, 2024
Zhao Ding, Chenguang Duan, Yuling Jiao, Ruoxuan Li, Jerry Zhijian Yang, and Ping- wen Zhang. Characteristic learning for provable one step generation.arXiv preprint arXiv:2405.05512, 2024
-
[12]
Overparameterization of deep ResNet: Zero loss and mean-field analysis.Journal of Machine Learning Research, 23 (48):1–65, 2022
Zhiyan Ding, Shi Chen, Qin Li, and Stephen J Wright. Overparameterization of deep ResNet: Zero loss and mean-field analysis.Journal of Machine Learning Research, 23 (48):1–65, 2022
2022
-
[13]
One step diffusion via shortcut models
Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One step diffusion via shortcut models. InInternational Conference on Learning Representations, 2025
2025
-
[14]
Weiguo Gao and Ming Li. How do flow matching models memorize and generalize in sample data subspaces?arXiv preprint arXiv:2410.23594, 2024
-
[15]
Toward theoretical insights into diffusion trajectory distillation via operator merging.Neural Networks, 202:109023, 2026
Weiguo Gao and Ming Li. Toward theoretical insights into diffusion trajectory distillation via operator merging.Neural Networks, 202:109023, 2026
2026
-
[16]
Weiguo Gao, Ming Li, and Qianxiao Li. Terminally constrained flow-based generative models from an optimal control perspective.arXiv preprint arXiv:2601.09474, 2026
-
[17]
Learning mixtures of Gaussians using diffusion models.arXiv preprint arXiv:2404.18869, 2024
Khashayar Gatmiry, Jonathan Kelner, and Holden Lee. Learning mixtures of Gaussians using diffusion models.arXiv preprint arXiv:2404.18869, 2024
-
[18]
Mean flows for one-step generative modeling
Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling. InAdvances in Neural Information Processing Systems, 2025
2025
-
[19]
BOOT: Data-free distillation of denoising diffusion models with bootstrapping
Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, and Joshua M Susskind. BOOT: Data-free distillation of denoising diffusion models with bootstrapping. InICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling, 2023
2023
-
[20]
Gaussian mixture solvers for diffusion models
Hanzhong Guo, Cheng Lu, Fan Bao, Tianyu Pang, Shuicheng Yan, Chao Du, and Chongxuan Li. Gaussian mixture solvers for diffusion models. InAdvances in Neural Information Processing Systems, 2023
2023
-
[21]
Neural network-based score esti- mation in diffusion models: Optimization and generalization
Yinbin Han, Meisam Razaviyayn, and Renyuan Xu. Neural network-based score esti- mation in diffusion models: Optimization and generalization. InAdvances in Neural Information Processing Systems, 2024
2024
-
[22]
Zhang, Shaoqing Ren, and Jian Sun
Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im- age recognition.2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015
2016
-
[23]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, 2020
2020
-
[24]
Nanshan Jia, Tingyu Zhu, Haoyu Liu, and Zeyu Zheng. Structured diffusion models with mixture of Gaussians as prior distribution.arXiv preprint arXiv:2410.19149, 2024. 36
-
[25]
Elucidating the design space of diffusion-based generative models
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. InAdvances in Neural Information Processing Systems, 2022
2022
-
[26]
Convergence for score-based generative model- ing with polynomial complexity
Holden Lee, Jianfeng Lu, and Yixin Tan. Convergence for score-based generative model- ing with polynomial complexity. InAdvances in Neural Information Processing Systems, 2022
2022
-
[27]
Better approximations of high dimensional smooth functions by deep neural networks with rectified power units.Communications in Computational Physics, 2019
Bo Li, Shanshan Tang, and Haijun Yu. Better approximations of high dimensional smooth functions by deep neural networks with rectified power units.Communications in Computational Physics, 2019
2019
-
[28]
Faster diffusion models via higher- order approximation.arXiv preprint arXiv:2506.24042, 2025
Gen Li, Yuchen Zhou, Yuting Wei, and Yuxin Chen. Faster diffusion models via higher- order approximation.arXiv preprint arXiv:2506.24042, 2025
-
[29]
Critical windows: Non-asymptotic theory for feature emer- gence in diffusion models
Marvin Li and Sitan Chen. Critical windows: Non-asymptotic theory for feature emer- gence in diffusion models. InInternational Conference on Machine Learning, 2024
2024
-
[30]
Un- raveling the smoothness properties of diffusion models: A Gaussian mixture perspective
Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Mingda Wan, and Yufa Zhou. Un- raveling the smoothness properties of diffusion models: A Gaussian mixture perspective. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025
2025
-
[31]
DPM- Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM- Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. InAdvances in Neural Information Processing Systems, 2022
2022
-
[32]
Yang Lyu, Tan Minh Nguyen, Yuchun Qian, and Xin T Tong. Resolving memorization in empirical diffusion model for manifold data in high-dimensional spaces.arXiv preprint arXiv:2505.02508, 2025
-
[33]
Mean-field theory of two-layers neural networks: Dimension-free bounds and kernel limit
Song Mei, Theodor Misiakiewicz, and Andrea Montanari. Mean-field theory of two-layers neural networks: Dimension-free bounds and kernel limit. InConference on Learning Theory, 2019
2019
-
[34]
Neural networks for optimal approximation of smooth and ana- lytic functions.Neural Computation, 8(1):164–177, 1996
Hrushikesh N Mhaskar. Neural networks for optimal approximation of smooth and ana- lytic functions.Neural Computation, 8(1):164–177, 1996
1996
-
[35]
Diffusion models are minimax optimal distribution estimators
Kazusato Oko, Shunta Akiyama, and Taiji Suzuki. Diffusion models are minimax optimal distribution estimators. InInternational Conference on Machine Learning, 2023
2023
-
[36]
Progressive distillation for fast sampling of diffusion models
Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Conference on Learning Representations, 2022
2022
-
[37]
Adversarial diffusion distillation
Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. InEuropean Conference on Computer Vision, 2024
2024
-
[38]
Learning mixtures of Gaussians using the DDPM objective
Kulin Shah, Sitan Chen, and Adam Klivans. Learning mixtures of Gaussians using the DDPM objective. InAdvances in Neural Information Processing Systems, 2023
2023
-
[39]
Mean field analysis of neural networks: A law of large numbers.SIAM Journal on Applied Mathematics, 80(2):725–752, 2020
Justin Sirignano and Konstantinos Spiliopoulos. Mean field analysis of neural networks: A law of large numbers.SIAM Journal on Applied Mathematics, 80(2):725–752, 2020
2020
-
[40]
Denoising diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. InInternational Conference on Learning Representations, 2020. 37
2020
-
[41]
Score-based generative modeling through stochastic differential equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2020
2020
-
[42]
Consistency models
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In International Conference on Machine Learning, 2023
2023
-
[43]
Adaptivity of diffusion models to manifold structures
Rong Tang and Yun Yang. Adaptivity of diffusion models to manifold structures. In International Conference on Artificial Intelligence and Statistics, 2024
2024
-
[44]
Attention is all you need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, 2017
2017
-
[45]
Are we really learning the score function? reinterpreting diffusion models through Wasserstein gradient flow matching
An B Vuong, Michael T McCann, Javier E Santos, and Yen Ting Lin. Are we really learning the score function? reinterpreting diffusion models through Wasserstein gradient flow matching. InNeurIPS Workshop on Structured Probabilistic Inference, 2025
2025
-
[46]
Diffusion mod- els learn low-dimensional distributions via subspace clustering
Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, and Qing Qu. Diffusion mod- els learn low-dimensional distributions via subspace clustering. InInternational Confer- ence on Learning Representations 2025 Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy, 2024
2025
-
[47]
Error estimates of a training-free diffusion model for high-dimensional sampling
Pengjun Wang, Zezhong Zhang, Minglei Yang, Feng Bao, Yanzhao Cao, and Guannan Zhang. Error estimates of a training-free diffusion model for high-dimensional sampling. arXiv preprint arXiv:2601.19740, 2026
-
[48]
Konstantin Yakovlev and Nikita Puchkin. Simultaneous approximation of the score func- tion and its derivatives by deep neural networks.arXiv preprint arXiv:2512.23643, 2025
-
[49]
Nearly optimal VC-dimension and pseudo- dimension bounds for deep neural network derivatives
Yahong Yang, Haizhao Yang, and Yang Xiang. Nearly optimal VC-dimension and pseudo- dimension bounds for deep neural network derivatives. InAdvances in Neural Information Processing Systems, 2023
2023
-
[50]
Lipschitz singularities in diffusion models
Zhantao Yang, Ruili Feng, Han Zhang, Yujun Shen, Kai Zhu, Lianghua Huang, Yifei Zhang, Yu Liu, Deli Zhao, Jingren Zhou, et al. Lipschitz singularities in diffusion models. InInternational Conference on Learning Representations, 2023
2023
-
[51]
Improved distribution matching distillation for fast image syn- thesis
Tianwei Yin, Micha¨ el Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Du- rand, and Bill Freeman. Improved distribution matching distillation for fast image syn- thesis. InAdvances in Neural Information Processing Systems, 2024
2024
-
[52]
One-step diffusion with distribution matching distillation
Tianwei Yin, Micha¨ el Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024
2024
-
[53]
Exact diffusion inversion via bidirectional integration approximation
Guoqiang Zhang, Jonathan P Lewis, and W Bastiaan Kleijn. Exact diffusion inversion via bidirectional integration approximation. InEuropean Conference on Computer Vision, 2024. 38
2024
-
[54]
Stability and generalizability in SDE diffusion models with measure-preserving dynamics
Weitong Zhang, Chengqi Zang, Liu Li, Sarah Cechnicka, Cheng Ouyang, and Bernhard Kainz. Stability and generalizability in SDE diffusion models with measure-preserving dynamics. InAdvances in Neural Information Processing Systems, 2024
2024
-
[55]
UniPC: A unified predictor-corrector framework for fast sampling of diffusion models
Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, and Jiwen Lu. UniPC: A unified predictor-corrector framework for fast sampling of diffusion models. InAdvances in Neural Information Processing Systems, 2023
2023
-
[56]
Hanfei Zhou and Lei Shi. Expressive power of deep networks on manifolds: Simultaneous approximation.arXiv preprint arXiv:2509.09362, 2025
-
[57]
Xinyu Zhou, Jiawei Zhang, and Stephen J Wright. Smoothing the score function for generalization in diffusion models: An optimization-based explanation framework.arXiv preprint arXiv:2601.19285, 2026
-
[58]
Simple distillation for one-step diffusion models
Huaisheng Zhu, Teng Xiao, Shijie Zhou, Zhimeng Guo, Hangfan Zhang, Siyuan Xu, and Vasant G Honavar. Simple distillation for one-step diffusion models. InAdvances in Neural Information Processing Systems, 2025. 39
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.