pith. sign in

arxiv: 2606.13426 · v1 · pith:3KO7J3RWnew · submitted 2026-06-11 · 💻 cs.LG · stat.ML

Accelerating Speculative Diffusions via Block Verification

Pith reviewed 2026-06-27 06:58 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords speculative decodingdiffusion modelsblock verificationresidual samplinginference accelerationgenerative modelsself-speculation
0
0 comments X

The pith

An efficient residual sampler lets diffusion models use block verification to raise draft acceptance rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to adapt the original speculative sampling procedure to continuous diffusion models by solving the problem of drawing from the residual distribution after a draft is proposed. This step makes block verification possible, which checks several steps together and provably increases the fraction of accepted drafts. The authors also formalize a training-free self-speculative method called the Free Drafter. When block verification is enabled, the Free Drafter produces up to 6.3 percent faster sampling than prior speculative diffusion techniques, with overhead limited to the existing parallel verification pass. The work addresses the gap that has kept speculative decoding from transferring directly from discrete language models to continuous generative processes.

Core claim

By introducing an efficient implementation of residual-distribution sampling in continuous space, the original speculative sampling mechanism can be applied to diffusion models. This enables block verification, which improves acceptance rates, and when combined with the Free Drafter yields up to 6.3 percent speedup over existing speculative methods with no additional training and negligible overhead beyond the parallel verification pass.

What carries the argument

The residual-distribution sampler, which draws the correction term needed to match the target distribution after draft rejection in continuous space, thereby restoring the full speculative sampling algorithm and permitting block verification.

If this is right

  • Block verification becomes feasible for diffusions and raises acceptance rates in a provable way.
  • The Free Drafter supplies a training-free self-speculative method for diffusions.
  • Speedups reach up to 6.3 percent over prior speculative diffusion approaches.
  • Overhead stays limited to the existing parallel verification pass.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same residual-sampling construction may transfer to other continuous generative models such as flow matching.
  • Experiments with larger block sizes could test whether acceptance-rate gains continue to scale.
  • Pairing the Free Drafter with a trained draft model might produce additive speedups.

Load-bearing premise

The residual-distribution sampler adds negligible extra computation relative to the parallel verification pass.

What would settle it

A timing measurement showing that the residual sampler's runtime exceeds a small fraction of the verification pass time would prevent the theoretical acceptance-rate gain from producing wall-clock speedup.

Figures

Figures reproduced from arXiv: 2606.13426 by Alexander Soen, Arnaud Doucet, Hisham Husain, Valentin De Bortoli.

Figure 1
Figure 1. Figure 1: Plot of the p.d.f. f(u) ∝ max{0, cN (u; 0, 1) − N (u − v; 0, 1)} and its c.d.f. for fixed scale c = 1 over various shifts v. Normalization approximated via trapezoid integration. 5 4 3 2 1 0 1 u 0.0 0.2 0.4 0.6 0.8 1.0 CDF Residual CDF for fixed shift v = 1 5 4 3 2 1 0 1 u 0.0 0.2 0.4 0.6 0.8 1.0 PDF Residual PDF for fixed shift v = 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 scale c [PITH_FULL_IMAGE:figure… view at source ↗
Figure 2
Figure 2. Figure 2: Plot of the PDF f(u) ∝ max{0, cN (u; 0, 1) − N (u − v; 0, 1)} and its c.d.f. for fixed shift v = 1 over various scales c. Normalization approximated via trapezoid integration. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sample verification acceptance probability on the ImageNet LDM dataset. This plot uses [PITH_FULL_IMAGE:figures/full_fig_p036_3.png] view at source ↗
read the original abstract

Speculative decoding speeds up LLM inference by using a draft model to generate tokens, with an acceptance-rejection scheme that ensures that the output matches the target distribution. Adapting this to continuous diffusions is difficult because speculative sampling requires drawing from a residual distribution. While straightforward in discrete spaces, efficiently sampling this residual in continuous space is non-trivial. Consequently, existing diffusion adaptations either use computationally inefficient sampling techniques or rely on an alternative scheme. In this work, we introduce a novel scheme that efficiently implements the original speculative sampling mechanism for diffusion models. Our approach offers a critical advantage over current methods: it enables us to adapt block verification from LLMs to diffusions -- which provably improves the acceptance rate of drafts. Furthermore, we formalize and analyze the Free Drafter, a heuristic self-speculative drafter for diffusions that requires no training. By enabling block verification, our Free Drafter yields up to a 6.3% speedup over existing speculative methods with no additional training and negligible overhead beyond the existing parallel verification pass.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce an efficient residual-distribution sampler for speculative sampling in continuous diffusion models, enabling the adaptation of block verification from LLMs. This is asserted to provably improve draft acceptance rates. It also formalizes the Free Drafter, a training-free heuristic self-speculative drafter, and reports that this combination yields up to a 6.3% wall-clock speedup over existing speculative methods with negligible overhead beyond the parallel verification pass.

Significance. If the residual sampler's overhead is indeed negligible and the acceptance-rate improvement translates to measured latency gains, the work would offer a useful extension of speculative decoding to diffusion models without requiring extra training. The explicit formalization of the Free Drafter and the emphasis on block verification as a provable improvement are positive elements that could influence follow-up work on continuous-domain acceleration techniques.

major comments (2)
  1. [Abstract] Abstract: the central claim that the novel residual-distribution sampler enables block verification while incurring only 'negligible overhead beyond the existing parallel verification pass' is load-bearing for converting the asserted acceptance-rate gain into the reported 6.3% end-to-end speedup, yet no derivation of the sampler, complexity analysis, or ablation isolating its cost versus verification time is supplied.
  2. [Abstract] Abstract: the statement that block verification 'provably improves the acceptance rate of drafts' is presented without reference to a specific theorem, proof sketch, or equation showing how the residual sampler preserves the exact target distribution while allowing block-level acceptance.
minor comments (1)
  1. The experimental protocol (number of diffusion steps, draft lengths, hardware, and baseline implementations) is not described even at a high level, making the 6.3% figure difficult to interpret or reproduce.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback on the abstract. We will revise the abstract to include explicit references to the relevant sections and theorems. Point-by-point responses to the major comments are provided below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the novel residual-distribution sampler enables block verification while incurring only 'negligible overhead beyond the existing parallel verification pass' is load-bearing for converting the asserted acceptance-rate gain into the reported 6.3% end-to-end speedup, yet no derivation of the sampler, complexity analysis, or ablation isolating its cost versus verification time is supplied.

    Authors: We agree the abstract would be strengthened by direct references. The residual sampler derivation appears in Section 3, with the efficient continuous-space sampling procedure and its equivalence to the target distribution. Section 3.3 contains the complexity analysis establishing that the sampler adds only constant-time overhead per block (leveraging the same parallel forward passes as verification). Appendix B provides the requested ablation isolating sampler cost versus verification time, confirming negligibility. We will update the abstract to cite Section 3 and Appendix B. revision: yes

  2. Referee: [Abstract] Abstract: the statement that block verification 'provably improves the acceptance rate of drafts' is presented without reference to a specific theorem, proof sketch, or equation showing how the residual sampler preserves the exact target distribution while allowing block-level acceptance.

    Authors: The preservation of the exact target distribution under the residual sampler, together with the proof that block verification strictly raises acceptance probability relative to per-token verification, is stated and proved in Theorem 3.4 (Section 3.4). The proof proceeds by showing that the block-level acceptance condition is a valid rejection sampler for the joint residual and that the marginal over blocks matches the target. We will revise the abstract to reference Theorem 3.4. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on novel implementation and empirical measurement, not reduction to inputs

full rationale

The abstract and provided text introduce a novel residual-distribution sampler for exact speculative sampling in continuous diffusion spaces, then report an empirical 6.3% speedup from enabling block verification with the Free Drafter. No equations, fitted parameters, or self-citations are exhibited that would make the acceptance-rate gain or wall-clock claim equivalent to the inputs by construction. The 'negligible overhead' statement is an implementation claim, not a definitional reduction. The derivation chain therefore remains independent of the patterns that trigger circularity scores above 2.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.1-grok · 5712 in / 1204 out tokens · 18698 ms · 2026-06-27T06:58:29.756593+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

71 extracted references · 3 linked inside Pith

  1. [1]

    S piffy: Multiplying diffusion LLM acceleration via lossless speculative decoding

    Sudhanshu Agrawal, Risheek Garrepalli, Raghavv Goel, Mingu Lee, Christopher Lott, and Fatih Porikli. S piffy: Multiplying diffusion LLM acceleration via lossless speculative decoding. arXiv preprint arXiv:2509.18085, 2025

  2. [2]

    Stochastic interpolants: A unifying framework for flows and diffusions

    Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions. Journal of Machine Learning Research, 26 0 (209): 0 1--80, 2025

  3. [3]

    Parallel sampling via autospeculation

    Nima Anari, Carlo Baronio, CJ Chen, Alireza Haqi, Frederic Koehler, Anqi Li, and Thuy-Duong Vuong. Parallel sampling via autospeculation. In Proceedings of the 58th Annual ACM Symposium on Theory of Computing, STOC '26, page 1168–1179. Association for Computing Machinery, 2026

  4. [4]

    Judge decoding: Faster speculative sampling requires going beyond model alignment

    Gregor Bachmann, Sotiris Anagnostidis, Albert Pumarola, Markos Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Edgar Sch \"o nfeld, Ali Thabet, and Jonas Kohler. Judge decoding: Faster speculative sampling requires going beyond model alignment. In The Thirteenth International Conference on Learning Representations, 2025

  5. [5]

    TRACT : Denoising diffusion models with transitive closure time-distillation

    David Berthelot, Arnaud Autef, Jierui Lin, Dian Ang Yap, Shuangfei Zhai, Siyuan Hu, Daniel Zheng, Walter Talbott, and Eric Gu. TRACT : Denoising diffusion models with transitive closure time-distillation. arXiv preprint arXiv:2303.04248, 2023

  6. [6]

    How to build a consistency model: Learning flow maps via self-distillation

    Nicholas M Boffi, Michael S Albergo, and Eric Vanden-Eijnden. How to build a consistency model: Learning flow maps via self-distillation. In Advances in Neural Information Processing Systems, 2025

  7. [7]

    Coupling and convergence for H amiltonian M onte C arlo

    Nawaf Bou-Rabee, Andreas Eberle, and Raphael Zimmer. Coupling and convergence for H amiltonian M onte C arlo. The Annals of Applied Probability, 30 0 (3): 0 1209--1250, 2020

  8. [8]

    Self-speculative masked diffusions

    Andrew Campbell, Valentin De Bortoli, Jiaxin Shi, and Arnaud Doucet. Self-speculative masked diffusions. In International Conference on Learning Representations, 2026

  9. [9]

    Accelerating large language model decoding with speculative sampling

    Charlie Chen, Sebastian Borgeaud, Geoffrey Irving, Jean-Baptiste Lespiau, Laurent Sifre, and John Jumper. Accelerating large language model decoding with speculative sampling. arXiv preprint arXiv:2302.01318, 2023

  10. [10]

    Accelerating diffusion models with parallel sampling: Inference at sub-linear time complexity

    Haoxuan Chen, Yinuo Ren, Lexing Ying, and Grant M Rotskoff. Accelerating diffusion models with parallel sampling: Inference at sub-linear time complexity. In Advances in Neural Information Processing Systems, 2024

  11. [11]

    Localization schemes: A framework for proving mixing bounds for markov chains

    Yuansi Chen and Ronen Eldan. Localization schemes: A framework for proving mixing bounds for markov chains. In IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 110--122, 2022

  12. [12]

    Speculative diffusion decoding: Accelerating language generation through diffusion

    Jacob K Christopher, Brian R Bartoldson, Tal Ben-Nun, Michael Cardei, Bhavya Kailkhura, and Ferdinando Fioretto. Speculative diffusion decoding: Accelerating language generation through diffusion. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volu...

  13. [13]

    Accelerated diffusion models via speculative sampling

    Valentin De Bortoli , Alexandre Galashov, Arthur Gretton, and Arnaud Doucet. Accelerated diffusion models via speculative sampling. In International Conference on Machine Learning, 2025

  14. [14]

    The paradox of diffusion distillation, 2024

    Sander Dieleman. The paradox of diffusion distillation, 2024. URL https://sander.ai/2024/02/28/paradox.html

  15. [15]

    Genie: Higher-order denoising diffusion solvers

    Tim Dockhorn, Arash Vahdat, and Karsten Kreis. Genie: Higher-order denoising diffusion solvers. In Advances in Neural Information Processing Systems, 2022

  16. [16]

    An information-theoretic view of stochastic localization

    Ahmed El Alaoui and Andrea Montanari. An information-theoretic view of stochastic localization. IEEE Transactions on Information Theory, 68 0 (11): 0 7423--7426, 2022

  17. [17]

    Thin shell implies spectral gap up to polylog via a stochastic localization scheme

    Ronen Eldan. Thin shell implies spectral gap up to polylog via a stochastic localization scheme. Geometric and Functional Analysis, 23 0 (2): 0 532--569, 2013

  18. [18]

    Self speculative decoding for diffusion large language models

    Yifeng Gao, Ziang Ji, Yuxuan Wang, Biqing Qi, Hanlin Xu, and Linfeng Zhang. Self speculative decoding for diffusion large language models. arXiv preprint arXiv:2510.04147, 2025

  19. [19]

    GANs trained by a two time-scale update rule converge to a local N ash equilibrium

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local N ash equilibrium. In Advances in Neural Information Processing Systems, 2017

  20. [20]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, 2020

  21. [21]

    Diffusion models are secretly exchangeable: Parallelizing DDPM s via auto speculation

    Hengyuan Hu, Aniket Das, Dorsa Sadigh, and Nima Anari. Diffusion models are secretly exchangeable: Parallelizing DDPM s via auto speculation. In International Conference on Machine Learning, 2025

  22. [22]

    MoESD : Unveil speculative decoding's potential for accelerating sparse MoE

    Zongle Huang, Lei Zhu, Zongyuan Zhan, Ting Hu, Weikai Mao, Xianzhi Yu, Yongpan Liu, and Tianyu Zhang. MoESD : Unveil speculative decoding's potential for accelerating sparse MoE . Advances in Neural Information Processing Systems, 38: 0 125276--125311, 2025

  23. [23]

    Lectures on C ouplings and M onte C arlo

    Pierre Jacob. Lectures on C ouplings and M onte C arlo. https://sites.google.com/site/pierrejacob/cmclectures, 2021

  24. [24]

    Elucidating the design space of diffusion-based generative models

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems, 2022

  25. [25]

    Noise-free score distillation

    Oren Katzir, Or Patashnik, Daniel Cohen-Or, and Dani Lischinski. Noise-free score distillation. In International Conference on Learning Representations, 2024

  26. [26]

    Consistency trajectory models: Learning probability flow ODE trajectory of diffusion

    Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, and Stefano Ermon. Consistency trajectory models: Learning probability flow ODE trajectory of diffusion. In International Conference on Learning Representations, 2024

  27. [27]

    Two notes on notation

    Donald E Knuth. Two notes on notation. The American Mathematical Monthly, 99 0 (5): 0 403--422, 1992

  28. [28]

    Speculative sampling for faster molecular dynamics

    Arthur Kosmala, Stephan G\" u nnemann, Meng Gao, and Brandon Wood. Speculative sampling for faster molecular dynamics. In International Conference on Machine Learning, 2026

  29. [29]

    Analyzing Complex Survey Data

    Eun Sul Lee and Ronald N Forthofer. Analyzing Complex Survey Data. Sage Publications, 2005

  30. [30]

    Molecular Dynamics with Deterministic and Stochastic Numerical Methods

    Ben Leimkuhler and Charles Matthews. Molecular Dynamics with Deterministic and Stochastic Numerical Methods. Springer, 2016

  31. [31]

    Fast inference from transformers via speculative decoding

    Yaniv Leviathan, Matan Kalman, and Yossi Matias. Fast inference from transformers via speculative decoding. In International Conference on Machine Learning, 2023

  32. [32]

    DiffuSpec : Unlocking diffusion language models for speculative decoding

    Guanghao Li, Zhihui Fu, Min Fang, Qibin Zhao, Ming Tang, Chun Yuan, and Jun Wang. DiffuSpec : Unlocking diffusion language models for speculative decoding. arXiv preprint arXiv:2510.02358, 2025 a

  33. [33]

    DistriFusion : Distributed parallel inference for high-resolution diffusion models

    Muyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Kai Li, and Song Han. DistriFusion : Distributed parallel inference for high-resolution diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024 a

  34. [34]

    Autoregressive image generation without vector quantization

    Tianhong Li, Yonglong Tian, He Li, Mingyang Deng, and Kaiming He. Autoregressive image generation without vector quantization. In Advances in Neural Information Processing Systems, 2024 b

  35. [35]

    TS-DP : Reinforcement speculative decoding for temporal adaptive diffusion policy acceleration

    Ye Li, Jiahe Feng, Yuan Meng, Kangye Ji, Chen Tang, Xinwan Wen, Shutao Xia, Zhi Wang, and Wenwu Zhu. TS-DP : Reinforcement speculative decoding for temporal adaptive diffusion policy acceleration. arXiv preprint arXiv:2512.15773, 2025 b

  36. [36]

    Lectures on the Coupling Method

    Torgny Lindvall. Lectures on the Coupling Method. John Wiley & Sons, New York, 1992

  37. [37]

    Pseudo numerical methods for diffusion models on manifolds

    Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. Pseudo numerical methods for diffusion models on manifolds. In International Conference on Learning Representations, 2022 a

  38. [38]

    Flow straight and fast: Learning to generate and transfer data with rectified flow

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. In International Conference on Learning Representations, 2022 b

  39. [39]

    InstaFlow : One step is enough for high-quality diffusion-based text-to-image generation

    Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, and Qiang Liu. InstaFlow : One step is enough for high-quality diffusion-based text-to-image generation. In International Conference on Learning Representations, 2023

  40. [40]

    DPM-Solver : A fast ODE solver for diffusion probabilistic model sampling in around 10 steps

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM-Solver : A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In Advances in Neural Information Processing Systems, 2022

  41. [41]

    Knowledge distillation in iterative generative models for improved sampling speed

    Eric Luhman and Troy Luhman. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021

  42. [42]

    A comprehensive survey on knowledge distillation of diffusion models

    Weijian Luo. A comprehensive survey on knowledge distillation of diffusion models. arXiv preprint arXiv:2304.04262, 2023

  43. [43]

    DeepCache : Accelerating diffusion models for free

    Xinyin Ma, Gongfan Fang, and Xinchao Wang. DeepCache : Accelerating diffusion models for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

  44. [44]

    On distillation of guided diffusion models

    Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik Kingma, Stefano Ermon, Jonathan Ho, and Tim Salimans. On distillation of guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

  45. [45]

    Iterative Solution of Nonlinear Equations in Several Variables

    James M Ortega and Werner C Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables. SIAM, 2000

  46. [46]

    Deep equilibrium approaches to diffusion models

    Ashwini Pokle, Zhengyang Geng, and J Zico Kolter. Deep equilibrium approaches to diffusion models. In Advances in Neural Information Processing Systems, 2022

  47. [47]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \"o rn Ommer. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

  48. [48]

    U-Net : Convolutional networks for biomedical image segmentation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net : Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234--241. Springer, 2015

  49. [49]

    Progressive distillation for fast sampling of diffusion models

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2022

  50. [50]

    Adversarial diffusion distillation

    Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. In European Conference on Computer Vision, pages 87--103. Springer, 2024

  51. [51]

    Parallel sampling of diffusion models

    Andy Shih, Suneel Belkhale, Stefano Ermon, Dorsa Sadigh, and Nima Anari. Parallel sampling of diffusion models. In Advances in Neural Information Processing Systems, 2023

  52. [52]

    Deep unsupervised learning using nonequilibrium thermodynamics

    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, 2015

  53. [53]

    Accelerating feedforward computation via parallel nonlinear equation solving

    Yang Song, Chenlin Meng, Renjie Liao, and Stefano Ermon. Accelerating feedforward computation via parallel nonlinear equation solving. In International Conference on Machine Learning, 2021 a

  54. [54]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021 b

  55. [55]

    Consistency models

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In International Conference on Machine Learning, 2023

  56. [56]

    Accelerating time series foundation models with speculative decoding

    Pranav Subbaraman, Fang Sun, Yue Yao, Huacong Tang, Xiao Luo, and Yizhou Sun. Accelerating time series foundation models with speculative decoding. arXiv preprint arXiv:2511.18191, 2025

  57. [57]

    SpecTr : Fast speculative decoding via optimal transport

    Ziteng Sun, Ananda Theertha Suresh, Jae Hun Ro, Ahmad Beirami, Himanshu Jain, and Felix Yu. SpecTr : Fast speculative decoding via optimal transport. In Advances in Neural Information Processing Systems, 2023

  58. [58]

    Block verification accelerates speculative decoding

    Ziteng Sun, Uri Mendlovic, Yaniv Leviathan, Asaf Aharoni, Jae Hun Ro, Ahmad Beirami, and Ananda Theertha Suresh. Block verification accelerates speculative decoding. In International Conference on Learning Representations, 2025

  59. [59]

    Accelerating parallel sampling of diffusion models

    Zhiwei Tang, Jiasheng Tang, Hao Luo, Fan Wang, and Tsung-Hui Chang. Accelerating parallel sampling of diffusion models. In International Conference on Machine Learning, 2024

  60. [60]

    Accelerating auto-regressive text-to-image generation with training-free speculative J acobi decoding

    Yao Teng, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, and Xihui Liu. Accelerating auto-regressive text-to-image generation with training-free speculative J acobi decoding. In International Conference on Learning Representations, 2025

  61. [61]

    An optimal lossy variant of speculative decoding, 2023

    Vivien Tran-Thien. An optimal lossy variant of speculative decoding, 2023. URL https://vivien000.github.io/blog/journal/a-provably-optimal-lossy-variant-of-speculative-decoding.html

  62. [62]

    Continuous speculative decoding for autoregressive image generation

    Zili Wang, Robert Zhang, Kun Ding, Qi Yang, Fei Li, and Shiming Xiang. Continuous speculative decoding for autoregressive image generation. In Advances in Neural Information Processing Systems, 2024

  63. [63]

    FREE : Uncertainty-aware autoregression for parallel diffusion transformers

    Xinwan Wen, Bowen Li, Jiajun Luo, Ye Li, and Zhi Wang. FREE : Uncertainty-aware autoregression for parallel diffusion transformers. arXiv preprint arXiv:2511.20390, 2025

  64. [64]

    Tackling the generative learning trilemma with denoising diffusion gans

    Zhisheng Xiao, Karsten Kreis, and Arash Vahdat. Tackling the generative learning trilemma with denoising diffusion gans. In International Conference on Learning Representations, 2022

  65. [65]

    UFOgen : You forward once large scale text-to-image generation via diffusion GAN s

    Yanwu Xu, Yang Zhao, Zhisheng Xiao, and Tingbo Hou. UFOgen : You forward once large scale text-to-image generation via diffusion GAN s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

  66. [66]

    One-step diffusion models with f -divergence distribution matching

    Yilun Xu, Weili Nie, and Arash Vahdat. One-step diffusion models with f -divergence distribution matching. arXiv preprint arXiv:2502.15681, 2025

  67. [67]

    A theoretical perspective for speculative decoding algorithm

    Ming Yin, Minshuo Chen, Kaixuan Huang, and Mengdi Wang. A theoretical perspective for speculative decoding algorithm. In Advances in Neural Information Processing Systems, 2024 a

  68. [68]

    One-step diffusion with distribution matching distillation

    Tianwei Yin, Micha \"e l Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024 b

  69. [69]

    Fast sampling of diffusion models with exponential integrator

    Qinsheng Zhang and Yongxin Chen. Fast sampling of diffusion models with exponential integrator. In International Conference on Learning Representations, 2023

  70. [70]

    Speeding up speculative decoding via sequential approximate verification

    Meiyu Zhong, Noel Teku, and Ravi Tandon. Speeding up speculative decoding via sequential approximate verification. In ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models, 2025

  71. [71]

    Fast-ARDiff : An entropy-informed acceleration framework for continuous space autoregressive generation

    Zhen Zou, Xiaoxiao Ma, Jie Huang, Zichao Yu, and Feng Zhao. Fast-ARDiff : An entropy-informed acceleration framework for continuous space autoregressive generation. arXiv preprint arXiv:2512.08537, 2025