pith. machine review for the scientific record. sign in

arxiv: 2605.06376 · v1 · submitted 2026-05-07 · 💻 cs.CV · cs.AI

Recognition: unknown

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

Authors on Pith no claims yet

Pith reviewed 2026-05-08 13:24 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords diffusion distillationfew-step generationdistribution matchingcontinuous-time optimizationimage synthesisstep accelerationgenerative models
0
0 comments X

The pith

Migrating distribution matching from discrete timesteps to continuous random-length schedules enables few-step diffusion models to reach high visual fidelity without auxiliary objectives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard distribution matching distillation supervises only at sparse fixed timesteps and uses mode-seeking reverse KL, which produces artifacts and over-smoothing in few-step sampling. CDM replaces the fixed schedule with a dynamic continuous one of random length so matching occurs at arbitrary points along trajectories, and adds an off-trajectory alignment term that matches latents extrapolated by the student’s own velocity field. This continuous formulation improves generalization and detail preservation. Readers care because the approach removes the need for extra GANs or reward models that complicate training. Experiments on SD3-Medium and Longcat-Image show competitive results in few steps.

Core claim

CDM migrates the DMD framework from discrete anchoring to continuous optimization through a dynamic continuous schedule of random length that enforces distribution matching at arbitrary points along sampling trajectories and a continuous-time alignment objective that performs active off-trajectory matching on latents extrapolated via the student’s velocity field, achieving highly competitive visual fidelity for few-step image generation without complex auxiliary objectives.

What carries the argument

Dynamic continuous schedule of random length combined with continuous-time alignment objective for off-trajectory matching using the student’s velocity field.

If this is right

  • Distribution matching is enforced at arbitrary points along sampling trajectories rather than only at fixed discrete anchors.
  • Off-trajectory alignment using the student velocity field improves generalization and preserves fine visual details.
  • Competitive visual fidelity is achieved on SD3-Medium and Longcat-Image architectures without auxiliary GANs or reward models.
  • The continuous formulation avoids the restricted supervision and mode-seeking artifacts of vanilla discrete DMD.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same random-length continuous matching idea could be applied to consistency distillation to reduce reliance on full-trajectory self-consistency losses.
  • Variable inference step counts might become possible without retraining by sampling the random schedule range at test time.
  • Continuous supervision may lower sensitivity to the choice of specific timestep anchors that plague discrete methods.

Load-bearing premise

Enforcing distribution matching at arbitrary continuous points along student trajectories via random-length schedules and off-trajectory alignment will generalize across architectures and preserve fine details without introducing new instabilities.

What would settle it

Training CDM on a new architecture such as a different Stable Diffusion variant and measuring whether 1-4 step outputs exhibit more visual artifacts or loss of fine detail than discrete DMD with auxiliary modules.

Figures

Figures reproduced from arXiv: 2605.06376 by Bo Zheng, Hao Yan, Jinsong Lan, Mengting Chen, Ming-Ming Cheng, Taihang Hu, Tao Liu, Xiaoyong Zhu, Yaxing Wang, Zhengrong Yue, Zihao Pan.

Figure 1
Figure 1. Figure 1: CDM enables high-fidelity few-step text-to-image generation. We compare our Continuous-Time Distribution Matching (CDM) against DMD2, both distilled from Longcat-Image (1024×1024) and evaluated at 4 NFE with identical prompts and seeds. Without relying on any GAN or reward-model auxiliary objectives, CDM produces sharper textures, richer fine-grained details, and overall higher visual fidelity, while DMD2 … view at source ↗
Figure 2
Figure 2. Figure 2: Empirical evidence of schedule decoupling. (a) Conventional distillation strictly anchors backward simulation to predefined discrete inference timesteps. In contrast, our dynamic scheduling optimizes over uniformly sampled continuous timesteps t ∈ (0, 1] at each iteration. (b) Visually, the dynamically scheduled model produces finer details and fewer artifacts than the strictly aligned baseline. (c) Quanti… view at source ↗
Figure 3
Figure 3. Figure 3: Visual evidence on the role of the DM loss. Samples from teacher models (SD3-Medium and Longcat-Image) with and without CFG, compared against student models distilled with the DM loss alone. Students distilled with the DM loss alone closely match their teachers’ CFG-free samples, indicating that the DM loss is not a mere stabilizer but the key driver that aligns the student to the teacher’s CFG-free distri… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of Continuous-Time Distribution Matching (CDM). Top: Our approach employs a dynamic continuous time schedule during backward simulation, sampling intermediate anchors uniformly from (0, 1]. Bottom Left: CFG augmentation (CA) and distribution matching (DM) operate on this dynamic schedule to align text-image conditions and data distributions at on-trajectory anchors. Bottom Right: To address inter-… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison on SD3-Medium. CDM produces more photorealistic results with richer details than competing methods. All results are generated using the same initial noise and random seed for fair comparison view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative ablation of loss components across training steps. Left: Individual losses (CA, DM, CDM) in isolation. Right: Pairwise and full combinations. Partial combinations suf￾fer from brightness collapse or degraded local fidelity at later stages, whereas our full objective (CA+DM+CDM) effectively preserves both global semantic coherence and local details view at source ↗
Figure 7
Figure 7. Figure 7: Generations from the same CDM checkpoint under varying NFE view at source ↗
Figure 8
Figure 8. Figure 8: Additional qualitative results of CDM on SD3-Medium at view at source ↗
Figure 9
Figure 9. Figure 9: Additional qualitative results of CDM on Longcat-Image at view at source ↗
read the original abstract

Step distillation has become a leading technique for accelerating diffusion models, among which Distribution Matching Distillation (DMD) and Consistency Distillation are two representative paradigms. While consistency methods enforce self-consistency along the full PF-ODE trajectory to steer it toward the clean data manifold, vanilla DMD relies on sparse supervision at a few predefined discrete timesteps. This restricted discrete-time formulation and mode-seeking nature of the reverse KL divergence tends to exhibit visual artifacts and over-smoothed outputs, often necessitating complex auxiliary modules -- such as GANs or reward models -- to restore visual fidelity. In this work, we introduce Continuous-Time Distribution Matching (CDM), migrating the DMD framework from discrete anchoring to continuous optimization for the first time. CDM achieves this through two continuous-time designs. First, we replace the fixed discrete schedule with a dynamic continuous schedule of random length, so that distribution matching is enforced at arbitrary points along sampling trajectories rather than only at a few fixed anchors. Second, we propose a continuous-time alignment objective that performs active off-trajectory matching on latents extrapolated via the student's velocity field, improving generalization and preserving fine visual details. Extensive experiments on different architectures, including SD3-Medium and Longcat-Image, demonstrate that CDM provides highly competitive visual fidelity for few-step image generation without relying on complex auxiliary objectives. Code is available at https://github.com/byliutao/cdm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Continuous-Time Distribution Matching (CDM) to extend Distribution Matching Distillation (DMD) for few-step diffusion model acceleration. It replaces fixed discrete timesteps with a dynamic continuous schedule of random length to enforce distribution matching at arbitrary trajectory points, and adds a continuous-time alignment objective that performs off-trajectory matching by extrapolating latents using the student's velocity field. Experiments on SD3-Medium and Longcat-Image are reported to yield highly competitive visual fidelity for few-step generation without auxiliary objectives such as GANs or reward models.

Significance. If the results hold under rigorous verification, CDM could streamline few-step distillation by removing reliance on complex auxiliary modules, addressing discrete DMD's tendencies toward artifacts and over-smoothing through continuous optimization. This would represent a meaningful simplification in the field if the off-trajectory alignment proves stable across training stages and architectures.

major comments (2)
  1. [continuous-time alignment objective (as described in the method)] The continuous-time alignment objective extrapolates latents via the student's own velocity field rather than the teacher's. No analysis or ablation is provided to test stability when this field is inaccurate (as is likely early in distillation), which directly bears on whether the method avoids introducing instabilities or artifacts instead of improving fidelity. This is load-bearing for the central claim of reliable generalization without auxiliaries.
  2. [experiments section] Experiments on SD3-Medium and Longcat-Image claim competitive results, but lack controls that isolate the off-trajectory alignment (e.g., ablating it or substituting the teacher's velocity field). Without such tests, it is unclear whether fidelity gains survive the skeptic's concern about early-training velocity inaccuracy or are attributable to the random-length schedule alone.
minor comments (1)
  1. [method] The manuscript could clarify the exact sampling procedure for the random-length continuous schedule and any associated hyperparameters to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We have carefully reviewed the concerns about the stability of the continuous-time alignment objective and the need for isolating controls in the experiments. We address each point below and commit to revisions that strengthen the manuscript.

read point-by-point responses
  1. Referee: The continuous-time alignment objective extrapolates latents via the student's own velocity field rather than the teacher's. No analysis or ablation is provided to test stability when this field is inaccurate (as is likely early in distillation), which directly bears on whether the method avoids introducing instabilities or artifacts instead of improving fidelity. This is load-bearing for the central claim of reliable generalization without auxiliaries.

    Authors: We agree that analyzing the stability of the student's velocity field, particularly in early training, is important for validating the method. The choice to use the student's field enables active, on-policy off-trajectory matching that aligns with the student's evolving distribution, which we argue is central to avoiding auxiliary modules. The dynamic random-length schedule further supports this by varying supervision points and reducing dependence on any single inaccurate prediction. In the revised manuscript, we will add a new subsection in the method section providing analysis of velocity field evolution during training, along with discussion of observed stability and mitigation via the schedule. We will also include an ablation comparing student-field extrapolation to a teacher-field variant. revision: yes

  2. Referee: Experiments on SD3-Medium and Longcat-Image claim competitive results, but lack controls that isolate the off-trajectory alignment (e.g., ablating it or substituting the teacher's velocity field). Without such tests, it is unclear whether fidelity gains survive the skeptic's concern about early-training velocity inaccuracy or are attributable to the random-length schedule alone.

    Authors: We concur that explicit isolation of the off-trajectory alignment's contribution is necessary to address concerns about early-training inaccuracies. While the random-length schedule provides continuous distribution matching, the alignment objective extends this to extrapolated points for better generalization. In the revised version, we will add ablation experiments to the experiments section: one removing the alignment objective entirely (relying solely on random-length matching) and another substituting the teacher's velocity field for extrapolation. These will quantify the incremental gains from the alignment and demonstrate that fidelity improvements persist beyond the schedule alone, with results showing degradation in detail preservation when alignment is ablated. revision: yes

Circularity Check

0 steps flagged

No significant circularity: CDM designs are independent extensions of DMD.

full rationale

The paper's central contribution consists of two explicit continuous-time modifications to the DMD framework: a random-length dynamic schedule and an off-trajectory alignment term using the student's velocity field. Neither reduces by construction to fitted inputs, self-definitions, or prior self-citations. The abstract and described objectives treat these as new design choices whose validity is assessed empirically on SD3-Medium and Longcat-Image, without invoking uniqueness theorems or renaming known results. This is the expected non-circular outcome for an engineering extension paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are detailed; the method introduces new objectives but their precise parameterization is not specified here.

pith-pipeline@v0.9.0 · 5577 in / 937 out tokens · 40360 ms · 2026-05-08T13:24:11.418363+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 22 canonical work pages · 6 internal anchors

  1. [1]

    Flash diffusion: Accel- erating any conditional diffusion model for few steps image generation

    Clement Chadebec, Onur Tasar, Eyal Benaroche, and Benjamin Aubin. Flash diffusion: Accel- erating any conditional diffusion model for few steps image generation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 15686–15695, 2025

  2. [2]

    Cross- resolution distribution matching for diffusion distillation.arXiv preprint arXiv:2603.06136, 2026

    Feiyang Chen, Hongpeng Pan, Haonan Xu, Xinyu Duan, Yang Yang, and Zhefeng Wang. Cross- resolution distribution matching for diffusion distillation.arXiv preprint arXiv:2603.06136, 2026

  3. [3]

    Sana-sprint: One-step diffusion with continuous-time consistency distillation, 2025

    Junsong Chen, Shuchen Xue, Yuyang Zhao, Jincheng Yu, Sayak Paul, Junyu Chen, Han Cai, Song Han, and Enze Xie. Sana-sprint: One-step diffusion with continuous-time consistency distillation, 2025

  4. [4]

    Sharegpt-4o-image: Aligning multimodal models with gpt-4o-level image generation, 2025

    Junying Chen, Zhenyang Cai, Pengcheng Chen, Shunian Chen, Ke Ji, Xidong Wang, Yunjin Yang, and Benyou Wang. Sharegpt-4o-image: Aligning multimodal models with gpt-4o-level image generation, 2025

  5. [5]

    Twinflow: Realizing one-step generation on large models with self-adversarial flows.arXiv preprint arXiv:2512.05150, 2025

    Zhenglin Cheng, Peng Sun, Jianguo Li, and Tao Lin. Twinflow: Realizing one-step generation on large models with self-adversarial flows.arXiv preprint arXiv:2512.05150, 2025

  6. [6]

    PaddleOCR 3.0 Technical Report

    Cheng Cui, Ting Sun, Manhui Lin, Tingquan Gao, Yubo Zhang, Jiaxuan Liu, Xueqing Wang, Zelun Zhang, Changda Zhou, Hongen Liu, et al. Paddleocr 3.0 technical report.arXiv preprint arXiv:2507.05595, 2025

  7. [7]

    Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106(496):1602–1614, 2011

    Bradley Efron. Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106(496):1602–1614, 2011

  8. [8]

    Scaling rectified flow trans- formers for high-resolution image synthesis

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow trans- formers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024

  9. [9]

    Phased dmd: Few-step distribution matching distillation via score matching within subintervals

    Xiangyu Fan, Zesong Qiu, Zhuguanyu Wu, Fanzhou Wang, Zhiqian Lin, Tianxiang Ren, Dahua Lin, Ruihao Gong, and Lei Yang. Phased dmd: Few-step distribution matching distillation via score matching within subintervals.arXiv preprint arXiv:2510.27684, 2025

  10. [10]

    arXiv preprint arXiv:2506.00523 (2025) 3

    Xingtong Ge, Xin Zhang, Tongda Xu, Yi Zhang, Xinjie Zhang, Yan Wang, and Jun Zhang. Senseflow: Scaling distribution matching for flow-based text-to-image distillation.arXiv preprint arXiv:2506.00523, 2025

  11. [11]

    Clipscore: A reference-free evaluation metric for image captioning

    Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation metric for image captioning. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 7514–7528, 2021

  12. [12]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017

  13. [13]

    Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

  14. [14]

    Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022

  15. [15]

    ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

    Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, and Gang Yu. Ella: Equip diffusion models with llm for enhanced semantic alignment.arXiv preprint arXiv:2403.05135, 2024. 10

  16. [16]

    Distribution matching distillation meets reinforcement learning.arXiv preprint arXiv:2511.13649, 2025

    Dengyang Jiang, Dongyang Liu, Zanyi Wang, Qilong Wu, Liuzhuozheng Li, Hengzhuang Li, Xin Jin, David Liu, Zhen Li, Bo Zhang, et al. Distribution matching distillation meets reinforcement learning.arXiv preprint arXiv:2511.13649, 2025

  17. [17]

    Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022

  18. [18]

    Consistency trajectory models: Learning probability flow ode trajectory of diffusion

    Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, and Stefano Ermon. Consistency trajectory models: Learning probability flow ode trajectory of diffusion. InThe Twelfth International Conference on Learning Representations, 2024

  19. [19]

    Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023

    Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023

  20. [20]

    Haoyu Li, Tingyan Wen, Lin Qi, Zhe Wu, Yihuang Chen, Xing Zhou, Lifei Zhu, Xueqian Wang, and Kai Zhang. 1. x-distill: Breaking the diversity, quality, and efficiency barrier in distribution matching distillation.arXiv preprint arXiv:2604.04018, 2026

  21. [21]

    arXiv preprint arXiv:2402.13929 (2024) 5

    Shanchuan Lin, Anran Wang, and Xiao Yang. Sdxl-lightning: Progressive adversarial diffusion distillation.arXiv preprint arXiv:2402.13929, 2024

  22. [22]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014

  23. [23]

    Flow matching for generative modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2022

  24. [24]

    arXiv preprint arXiv:2511.22677 (2025) 4, 5

    Dongyang Liu, Peng Gao, David Liu, Ruoyi Du, Zhen Li, Qilong Wu, Xin Jin, Sihan Cao, Shifeng Zhang, Hongsheng Li, et al. Decoupled dmd: Cfg augmentation as the spear, distribution matching as the shield.arXiv preprint arXiv:2511.22677, 2025

  25. [25]

    Flow-GRPO: Training Flow Matching Models via Online RL

    Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl.arXiv preprint arXiv:2505.05470, 2025

  26. [26]

    Flow straight and fast: Learning to generate and transfer data with rectified flow

    Xingchao Liu, Chengyue Gong, et al. Flow straight and fast: Learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations, 2022

  27. [27]

    Instaflow: One step is enough for high-quality diffusion-based text-to-image generation

    Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, et al. Instaflow: One step is enough for high-quality diffusion-based text-to-image generation. InThe Twelfth International Conference on Learning Representations, 2023

  28. [28]

    Simplifying, stabilizing and scaling continuous-time consistency models

    Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models. InThe Thirteenth International Conference on Learning Representations, 2025

  29. [29]

    Adversarial distribution matching for diffusion distillation towards efficient image and video synthesis

    Yanzuo Lu, Yuxi Ren, Xin Xia, Shanchuan Lin, Xing Wang, Xuefeng Xiao, Andy J Ma, Xiaohua Xie, and Jian-Huang Lai. Adversarial distribution matching for diffusion distillation towards efficient image and video synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16818–16829, 2025

  30. [30]

    Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

    Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference.arXiv preprint arXiv:2310.04378, 2023

  31. [31]

    Diff- instruct: A universal approach for transferring knowledge from pre-trained diffusion models

    Weijian Luo, Tianyang Hu, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, and Zhihua Zhang. Diff- instruct: A universal approach for transferring knowledge from pre-trained diffusion models. Advances in Neural Information Processing Systems, 36:76525–76546, 2023. 11

  32. [32]

    Learning few-step diffusion models by trajectory distribution matching

    Yihong Luo, Tianyang Hu, Jiacheng Sun, Yujun Cai, and Jing Tang. Learning few-step diffusion models by trajectory distribution matching. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 17719–17728, 2025

  33. [33]

    Hpsv3: Towards wide-spectrum hu- man preference score

    Yuhang Ma, Xiaoshi Wu, Keqiang Sun, and Hongsheng Li. Hpsv3: Towards wide-spectrum hu- man preference score. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15086–15095, 2025

  34. [34]

    On distillation of guided diffusion models

    Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik Kingma, Stefano Ermon, Jonathan Ho, and Tim Salimans. On distillation of guided diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14297–14306, 2023

  35. [35]

    Transition matching distillation for fast video generation.arXiv preprint arXiv:2601.09881, 2026

    Weili Nie, Julius Berner, Nanye Ma, Chao Liu, Saining Xie, and Arash Vahdat. Transition matching distillation for fast video generation.arXiv preprint arXiv:2601.09881, 2026

  36. [36]

    Elucidating the exposure bias in diffusion models

    Mang Ning, Mingxiao Li, Jianlin Su, Albert Ali Salah, and Itir Onal Ertugrul. Elucidating the exposure bias in diffusion models. InThe Twelfth International Conference on Learning Representations, 2024

  37. [37]

    Input perturbation reduces exposure bias in diffusion models

    Mang Ning, Enver Sangineto, Angelo Porrello, Simone Calderara, and Rita Cucchiara. Input perturbation reduces exposure bias in diffusion models. InInternational Conference on Machine Learning, pages 26245–26265. PMLR, 2023

  38. [38]

    Facm: Flow-anchored consistency models.arXiv preprint arXiv:2507.03738, 2025

    Yansong Peng, Kai Zhu, Yu Liu, Pingyu Wu, Hebei Li, Xiaoyan Sun, and Feng Wu. Facm: Flow-anchored consistency models.arXiv preprint arXiv:2507.03738, 2025

  39. [39]

    Dreamfusion: Text-to-3d using 2d diffusion

    Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. InThe Eleventh International Conference on Learning Representations, 2023

  40. [40]

    Soar: Self-correction for optimal alignment and refinement in diffusion models, 2026

    You Qin, Linqing Wang, Hao Fei, Roger Zimmermann, Liefeng Bo, Qinglin Lu, and Chunyu Wang. Soar: Self-correction for optimal alignment and refinement in diffusion models, 2026

  41. [41]

    Hyper-sd: Trajectory segmented consistency model for efficient image synthesis.Advances in neural information processing systems, 37:117340–117362, 2024

    Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, Xing Wang, and Xuefeng Xiao. Hyper-sd: Trajectory segmented consistency model for efficient image synthesis.Advances in neural information processing systems, 37:117340–117362, 2024

  42. [42]

    High- resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

  43. [43]

    Align your flow: Scaling continuous-time flow map distillation

    Amirmojtaba Sabour, Sanja Fidler, and Karsten Kreis. Align your flow: Scaling continuous-time flow map distillation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  44. [44]

    Progressive Distillation for Fast Sampling of Diffusion Models

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022

  45. [45]

    Adversarial diffusion distillation

    Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. InEuropean Conference on Computer Vision, pages 87–103. Springer, 2024

  46. [46]

    LAION-Aesthetics

    Christoph Schuhmann. LAION-Aesthetics. https://laion.ai/blog/ laion-aesthetics/, Aug 2022. Blog post

  47. [47]

    Laion- 5b: An open large-scale dataset for training next generation image-text models.Advances in neural information processing systems, 35:25278–25294, 2022

    Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion- 5b: An open large-scale dataset for training next generation image-text models.Advances in neural information processing systems, 35:25278–25294, 2022

  48. [48]

    Denoising diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021

  49. [49]

    Consistency models

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Proceedings of the 40th International Conference on Machine Learning, pages 32211–32252, 2023. 12

  50. [50]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021

  51. [51]

    One-step diffusion distillation via deep equilibrium models.Advances in Neural Information Processing Systems, 2023

    Nikita Starodubcev, Ilya Drobyshevskiy, Denis Kuznedelev, Artem Babenko, and Dmitry Baranchuk. Scale-wise distillation of diffusion models.arXiv preprint arXiv:2503.16397, 2025

  52. [52]

    Swiftvideo: A unified framework for few-step video generation through trajectory-distribution alignment

    Yanxiao Sun, Jiafu Wu, Yun Cao, Chengming Xu, Yabiao Wang, Weijian Cao, Donghao Luo, Chengjie Wang, and Yanwei Fu. Swiftvideo: A unified framework for few-step video generation through trajectory-distribution alignment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 9233–9241, 2026

  53. [53]

    Longcat-image technical report

    Meituan LongCat Team, Hanghang Ma, Haoxian Tan, Jiale Huang, Junqiang Wu, Jun-Yan He, Lishuai Gao, Songlin Xiao, Xiaoming Wei, Xiaoqi Ma, et al. Longcat-image technical report. arXiv preprint arXiv:2512.07584, 2025

  54. [54]

    Phased consistency models.Advances in neural information processing systems, 37:83951–84009, 2024

    Fu-Yun Wang, Zhaoyang Huang, Alexander W Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, et al. Phased consistency models.Advances in neural information processing systems, 37:83951–84009, 2024

  55. [55]

    Pro- lificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation

    Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Pro- lificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in neural information processing systems, 36:8406–8441, 2023

  56. [56]

    Skywork unipic 3.0: Unified multi-image composition via sequence modeling.arXiv preprint arXiv:2601.15664, 2026

    Hongyang Wei, Hongbo Liu, Zidong Wang, Yi Peng, Baixin Xu, Size Wu, Xuying Zhang, Xianglong He, Zexiang Liu, Peiyu Wang, et al. Skywork unipic 3.0: Unified multi-image composition via sequence modeling.arXiv preprint arXiv:2601.15664, 2026

  57. [57]

    Em distillation for one-step diffusion models.Advances in Neural Information Processing Systems, 37:45073–45104, 2024

    Sirui Xie, Zhisheng Xiao, Diederik P Kingma, Tingbo Hou, Ying N Wu, Kevin Murphy, Tim Salimans, Ben Poole, and Ruiqi Gao. Em distillation for one-step diffusion models.Advances in Neural Information Processing Systems, 37:45073–45104, 2024

  58. [58]

    Improved distribution matching distillation for fast image synthesis.Advances in neural information processing systems, 37:47455–47487, 2024

    Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and Bill Freeman. Improved distribution matching distillation for fast image synthesis.Advances in neural information processing systems, 37:47455–47487, 2024

  59. [59]

    One-step diffusion with distribution matching distillation

    Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6613–6623, 2024

  60. [60]

    Text-to-3d with classifier score distillation.arXiv preprint arXiv:2310.19415, 2023

    Xin Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Song-Hai Zhang, and Xiaojuan Qi. Text-to-3d with classifier score distillation.arXiv preprint arXiv:2310.19415, 2023

  61. [61]

    Trajectory consistency distillation: Improved latent consistency distillation by semi-linear consistency function with trajectory mapping.arXiv preprint arXiv:2402.19159, 2024

    Jianbin Zheng, Minghui Hu, Zhongyi Fan, Chaoyue Wang, Changxing Ding, Dacheng Tao, and Tat-Jen Cham. Trajectory consistency distillation: Improved latent consistency distillation by semi-linear consistency function with trajectory mapping.arXiv preprint arXiv:2402.19159, 2024

  62. [62]

    Few-step diffusion via score identity distillation

    Mingyuan Zhou, Yi Gu, and Zhendong Wang. Few-step diffusion via score identity distillation. arXiv preprint arXiv:2505.12674, 2025

  63. [63]

    text-to-image-2m: A high-quality, diverse text–image training dataset, 2024

    Kai Zou. text-to-image-2m: A high-quality, diverse text–image training dataset, 2024. 13 A Limitations While CDM achieves strong few-step generation quality, it has several limitations that we leave for future work. First, although our dynamic continuous schedule and CDM loss do not introduce any additional cost at inference time, they do increase per-ite...