Recognition: unknown
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
Pith reviewed 2026-05-08 13:24 UTC · model grok-4.3
The pith
Migrating distribution matching from discrete timesteps to continuous random-length schedules enables few-step diffusion models to reach high visual fidelity without auxiliary objectives.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CDM migrates the DMD framework from discrete anchoring to continuous optimization through a dynamic continuous schedule of random length that enforces distribution matching at arbitrary points along sampling trajectories and a continuous-time alignment objective that performs active off-trajectory matching on latents extrapolated via the student’s velocity field, achieving highly competitive visual fidelity for few-step image generation without complex auxiliary objectives.
What carries the argument
Dynamic continuous schedule of random length combined with continuous-time alignment objective for off-trajectory matching using the student’s velocity field.
If this is right
- Distribution matching is enforced at arbitrary points along sampling trajectories rather than only at fixed discrete anchors.
- Off-trajectory alignment using the student velocity field improves generalization and preserves fine visual details.
- Competitive visual fidelity is achieved on SD3-Medium and Longcat-Image architectures without auxiliary GANs or reward models.
- The continuous formulation avoids the restricted supervision and mode-seeking artifacts of vanilla discrete DMD.
Where Pith is reading between the lines
- The same random-length continuous matching idea could be applied to consistency distillation to reduce reliance on full-trajectory self-consistency losses.
- Variable inference step counts might become possible without retraining by sampling the random schedule range at test time.
- Continuous supervision may lower sensitivity to the choice of specific timestep anchors that plague discrete methods.
Load-bearing premise
Enforcing distribution matching at arbitrary continuous points along student trajectories via random-length schedules and off-trajectory alignment will generalize across architectures and preserve fine details without introducing new instabilities.
What would settle it
Training CDM on a new architecture such as a different Stable Diffusion variant and measuring whether 1-4 step outputs exhibit more visual artifacts or loss of fine detail than discrete DMD with auxiliary modules.
Figures
read the original abstract
Step distillation has become a leading technique for accelerating diffusion models, among which Distribution Matching Distillation (DMD) and Consistency Distillation are two representative paradigms. While consistency methods enforce self-consistency along the full PF-ODE trajectory to steer it toward the clean data manifold, vanilla DMD relies on sparse supervision at a few predefined discrete timesteps. This restricted discrete-time formulation and mode-seeking nature of the reverse KL divergence tends to exhibit visual artifacts and over-smoothed outputs, often necessitating complex auxiliary modules -- such as GANs or reward models -- to restore visual fidelity. In this work, we introduce Continuous-Time Distribution Matching (CDM), migrating the DMD framework from discrete anchoring to continuous optimization for the first time. CDM achieves this through two continuous-time designs. First, we replace the fixed discrete schedule with a dynamic continuous schedule of random length, so that distribution matching is enforced at arbitrary points along sampling trajectories rather than only at a few fixed anchors. Second, we propose a continuous-time alignment objective that performs active off-trajectory matching on latents extrapolated via the student's velocity field, improving generalization and preserving fine visual details. Extensive experiments on different architectures, including SD3-Medium and Longcat-Image, demonstrate that CDM provides highly competitive visual fidelity for few-step image generation without relying on complex auxiliary objectives. Code is available at https://github.com/byliutao/cdm.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Continuous-Time Distribution Matching (CDM) to extend Distribution Matching Distillation (DMD) for few-step diffusion model acceleration. It replaces fixed discrete timesteps with a dynamic continuous schedule of random length to enforce distribution matching at arbitrary trajectory points, and adds a continuous-time alignment objective that performs off-trajectory matching by extrapolating latents using the student's velocity field. Experiments on SD3-Medium and Longcat-Image are reported to yield highly competitive visual fidelity for few-step generation without auxiliary objectives such as GANs or reward models.
Significance. If the results hold under rigorous verification, CDM could streamline few-step distillation by removing reliance on complex auxiliary modules, addressing discrete DMD's tendencies toward artifacts and over-smoothing through continuous optimization. This would represent a meaningful simplification in the field if the off-trajectory alignment proves stable across training stages and architectures.
major comments (2)
- [continuous-time alignment objective (as described in the method)] The continuous-time alignment objective extrapolates latents via the student's own velocity field rather than the teacher's. No analysis or ablation is provided to test stability when this field is inaccurate (as is likely early in distillation), which directly bears on whether the method avoids introducing instabilities or artifacts instead of improving fidelity. This is load-bearing for the central claim of reliable generalization without auxiliaries.
- [experiments section] Experiments on SD3-Medium and Longcat-Image claim competitive results, but lack controls that isolate the off-trajectory alignment (e.g., ablating it or substituting the teacher's velocity field). Without such tests, it is unclear whether fidelity gains survive the skeptic's concern about early-training velocity inaccuracy or are attributable to the random-length schedule alone.
minor comments (1)
- [method] The manuscript could clarify the exact sampling procedure for the random-length continuous schedule and any associated hyperparameters to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. We have carefully reviewed the concerns about the stability of the continuous-time alignment objective and the need for isolating controls in the experiments. We address each point below and commit to revisions that strengthen the manuscript.
read point-by-point responses
-
Referee: The continuous-time alignment objective extrapolates latents via the student's own velocity field rather than the teacher's. No analysis or ablation is provided to test stability when this field is inaccurate (as is likely early in distillation), which directly bears on whether the method avoids introducing instabilities or artifacts instead of improving fidelity. This is load-bearing for the central claim of reliable generalization without auxiliaries.
Authors: We agree that analyzing the stability of the student's velocity field, particularly in early training, is important for validating the method. The choice to use the student's field enables active, on-policy off-trajectory matching that aligns with the student's evolving distribution, which we argue is central to avoiding auxiliary modules. The dynamic random-length schedule further supports this by varying supervision points and reducing dependence on any single inaccurate prediction. In the revised manuscript, we will add a new subsection in the method section providing analysis of velocity field evolution during training, along with discussion of observed stability and mitigation via the schedule. We will also include an ablation comparing student-field extrapolation to a teacher-field variant. revision: yes
-
Referee: Experiments on SD3-Medium and Longcat-Image claim competitive results, but lack controls that isolate the off-trajectory alignment (e.g., ablating it or substituting the teacher's velocity field). Without such tests, it is unclear whether fidelity gains survive the skeptic's concern about early-training velocity inaccuracy or are attributable to the random-length schedule alone.
Authors: We concur that explicit isolation of the off-trajectory alignment's contribution is necessary to address concerns about early-training inaccuracies. While the random-length schedule provides continuous distribution matching, the alignment objective extends this to extrapolated points for better generalization. In the revised version, we will add ablation experiments to the experiments section: one removing the alignment objective entirely (relying solely on random-length matching) and another substituting the teacher's velocity field for extrapolation. These will quantify the incremental gains from the alignment and demonstrate that fidelity improvements persist beyond the schedule alone, with results showing degradation in detail preservation when alignment is ablated. revision: yes
Circularity Check
No significant circularity: CDM designs are independent extensions of DMD.
full rationale
The paper's central contribution consists of two explicit continuous-time modifications to the DMD framework: a random-length dynamic schedule and an off-trajectory alignment term using the student's velocity field. Neither reduces by construction to fitted inputs, self-definitions, or prior self-citations. The abstract and described objectives treat these as new design choices whose validity is assessed empirically on SD3-Medium and Longcat-Image, without invoking uniqueness theorems or renaming known results. This is the expected non-circular outcome for an engineering extension paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Flash diffusion: Accel- erating any conditional diffusion model for few steps image generation
Clement Chadebec, Onur Tasar, Eyal Benaroche, and Benjamin Aubin. Flash diffusion: Accel- erating any conditional diffusion model for few steps image generation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 15686–15695, 2025
2025
-
[2]
Feiyang Chen, Hongpeng Pan, Haonan Xu, Xinyu Duan, Yang Yang, and Zhefeng Wang. Cross- resolution distribution matching for diffusion distillation.arXiv preprint arXiv:2603.06136, 2026
-
[3]
Sana-sprint: One-step diffusion with continuous-time consistency distillation, 2025
Junsong Chen, Shuchen Xue, Yuyang Zhao, Jincheng Yu, Sayak Paul, Junyu Chen, Han Cai, Song Han, and Enze Xie. Sana-sprint: One-step diffusion with continuous-time consistency distillation, 2025
2025
-
[4]
Sharegpt-4o-image: Aligning multimodal models with gpt-4o-level image generation, 2025
Junying Chen, Zhenyang Cai, Pengcheng Chen, Shunian Chen, Ke Ji, Xidong Wang, Yunjin Yang, and Benyou Wang. Sharegpt-4o-image: Aligning multimodal models with gpt-4o-level image generation, 2025
2025
-
[5]
Zhenglin Cheng, Peng Sun, Jianguo Li, and Tao Lin. Twinflow: Realizing one-step generation on large models with self-adversarial flows.arXiv preprint arXiv:2512.05150, 2025
-
[6]
PaddleOCR 3.0 Technical Report
Cheng Cui, Ting Sun, Manhui Lin, Tingquan Gao, Yubo Zhang, Jiaxuan Liu, Xueqing Wang, Zelun Zhang, Changda Zhou, Hongen Liu, et al. Paddleocr 3.0 technical report.arXiv preprint arXiv:2507.05595, 2025
work page internal anchor Pith review arXiv 2025
-
[7]
Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106(496):1602–1614, 2011
Bradley Efron. Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106(496):1602–1614, 2011
2011
-
[8]
Scaling rectified flow trans- formers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow trans- formers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024
2024
-
[9]
Phased dmd: Few-step distribution matching distillation via score matching within subintervals
Xiangyu Fan, Zesong Qiu, Zhuguanyu Wu, Fanzhou Wang, Zhiqian Lin, Tianxiang Ren, Dahua Lin, Ruihao Gong, and Lei Yang. Phased dmd: Few-step distribution matching distillation via score matching within subintervals.arXiv preprint arXiv:2510.27684, 2025
-
[10]
arXiv preprint arXiv:2506.00523 (2025) 3
Xingtong Ge, Xin Zhang, Tongda Xu, Yi Zhang, Xinjie Zhang, Yan Wang, and Jun Zhang. Senseflow: Scaling distribution matching for flow-based text-to-image distillation.arXiv preprint arXiv:2506.00523, 2025
-
[11]
Clipscore: A reference-free evaluation metric for image captioning
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation metric for image captioning. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 7514–7528, 2021
2021
-
[12]
Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017
2017
-
[13]
Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
2020
-
[14]
Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022
2022
-
[15]
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, and Gang Yu. Ella: Equip diffusion models with llm for enhanced semantic alignment.arXiv preprint arXiv:2403.05135, 2024. 10
work page internal anchor Pith review arXiv 2024
-
[16]
Dengyang Jiang, Dongyang Liu, Zanyi Wang, Qilong Wu, Liuzhuozheng Li, Hengzhuang Li, Xin Jin, David Liu, Zhen Li, Bo Zhang, et al. Distribution matching distillation meets reinforcement learning.arXiv preprint arXiv:2511.13649, 2025
-
[17]
Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022
2022
-
[18]
Consistency trajectory models: Learning probability flow ode trajectory of diffusion
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, and Stefano Ermon. Consistency trajectory models: Learning probability flow ode trajectory of diffusion. InThe Twelfth International Conference on Learning Representations, 2024
2024
-
[19]
Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023
Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663, 2023
2023
-
[20]
Haoyu Li, Tingyan Wen, Lin Qi, Zhe Wu, Yihuang Chen, Xing Zhou, Lifei Zhu, Xueqian Wang, and Kai Zhang. 1. x-distill: Breaking the diversity, quality, and efficiency barrier in distribution matching distillation.arXiv preprint arXiv:2604.04018, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[21]
arXiv preprint arXiv:2402.13929 (2024) 5
Shanchuan Lin, Anran Wang, and Xiao Yang. Sdxl-lightning: Progressive adversarial diffusion distillation.arXiv preprint arXiv:2402.13929, 2024
-
[22]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014
2014
-
[23]
Flow matching for generative modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2022
2022
-
[24]
arXiv preprint arXiv:2511.22677 (2025) 4, 5
Dongyang Liu, Peng Gao, David Liu, Ruoyi Du, Zhen Li, Qilong Wu, Xin Jin, Sihan Cao, Shifeng Zhang, Hongsheng Li, et al. Decoupled dmd: Cfg augmentation as the spear, distribution matching as the shield.arXiv preprint arXiv:2511.22677, 2025
-
[25]
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl.arXiv preprint arXiv:2505.05470, 2025
work page internal anchor Pith review arXiv 2025
-
[26]
Flow straight and fast: Learning to generate and transfer data with rectified flow
Xingchao Liu, Chengyue Gong, et al. Flow straight and fast: Learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations, 2022
2022
-
[27]
Instaflow: One step is enough for high-quality diffusion-based text-to-image generation
Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, et al. Instaflow: One step is enough for high-quality diffusion-based text-to-image generation. InThe Twelfth International Conference on Learning Representations, 2023
2023
-
[28]
Simplifying, stabilizing and scaling continuous-time consistency models
Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models. InThe Thirteenth International Conference on Learning Representations, 2025
2025
-
[29]
Adversarial distribution matching for diffusion distillation towards efficient image and video synthesis
Yanzuo Lu, Yuxi Ren, Xin Xia, Shanchuan Lin, Xing Wang, Xuefeng Xiao, Andy J Ma, Xiaohua Xie, and Jian-Huang Lai. Adversarial distribution matching for diffusion distillation towards efficient image and video synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16818–16829, 2025
2025
-
[30]
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference.arXiv preprint arXiv:2310.04378, 2023
work page internal anchor Pith review arXiv 2023
-
[31]
Diff- instruct: A universal approach for transferring knowledge from pre-trained diffusion models
Weijian Luo, Tianyang Hu, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, and Zhihua Zhang. Diff- instruct: A universal approach for transferring knowledge from pre-trained diffusion models. Advances in Neural Information Processing Systems, 36:76525–76546, 2023. 11
2023
-
[32]
Learning few-step diffusion models by trajectory distribution matching
Yihong Luo, Tianyang Hu, Jiacheng Sun, Yujun Cai, and Jing Tang. Learning few-step diffusion models by trajectory distribution matching. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 17719–17728, 2025
2025
-
[33]
Hpsv3: Towards wide-spectrum hu- man preference score
Yuhang Ma, Xiaoshi Wu, Keqiang Sun, and Hongsheng Li. Hpsv3: Towards wide-spectrum hu- man preference score. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15086–15095, 2025
2025
-
[34]
On distillation of guided diffusion models
Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik Kingma, Stefano Ermon, Jonathan Ho, and Tim Salimans. On distillation of guided diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14297–14306, 2023
2023
-
[35]
Transition matching distillation for fast video generation.arXiv preprint arXiv:2601.09881, 2026
Weili Nie, Julius Berner, Nanye Ma, Chao Liu, Saining Xie, and Arash Vahdat. Transition matching distillation for fast video generation.arXiv preprint arXiv:2601.09881, 2026
-
[36]
Elucidating the exposure bias in diffusion models
Mang Ning, Mingxiao Li, Jianlin Su, Albert Ali Salah, and Itir Onal Ertugrul. Elucidating the exposure bias in diffusion models. InThe Twelfth International Conference on Learning Representations, 2024
2024
-
[37]
Input perturbation reduces exposure bias in diffusion models
Mang Ning, Enver Sangineto, Angelo Porrello, Simone Calderara, and Rita Cucchiara. Input perturbation reduces exposure bias in diffusion models. InInternational Conference on Machine Learning, pages 26245–26265. PMLR, 2023
2023
-
[38]
Facm: Flow-anchored consistency models.arXiv preprint arXiv:2507.03738, 2025
Yansong Peng, Kai Zhu, Yu Liu, Pingyu Wu, Hebei Li, Xiaoyan Sun, and Feng Wu. Facm: Flow-anchored consistency models.arXiv preprint arXiv:2507.03738, 2025
-
[39]
Dreamfusion: Text-to-3d using 2d diffusion
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. InThe Eleventh International Conference on Learning Representations, 2023
2023
-
[40]
Soar: Self-correction for optimal alignment and refinement in diffusion models, 2026
You Qin, Linqing Wang, Hao Fei, Roger Zimmermann, Liefeng Bo, Qinglin Lu, and Chunyu Wang. Soar: Self-correction for optimal alignment and refinement in diffusion models, 2026
2026
-
[41]
Hyper-sd: Trajectory segmented consistency model for efficient image synthesis.Advances in neural information processing systems, 37:117340–117362, 2024
Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, Xing Wang, and Xuefeng Xiao. Hyper-sd: Trajectory segmented consistency model for efficient image synthesis.Advances in neural information processing systems, 37:117340–117362, 2024
2024
-
[42]
High- resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022
2022
-
[43]
Align your flow: Scaling continuous-time flow map distillation
Amirmojtaba Sabour, Sanja Fidler, and Karsten Kreis. Align your flow: Scaling continuous-time flow map distillation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
2025
-
[44]
Progressive Distillation for Fast Sampling of Diffusion Models
Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022
work page internal anchor Pith review arXiv 2022
-
[45]
Adversarial diffusion distillation
Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. InEuropean Conference on Computer Vision, pages 87–103. Springer, 2024
2024
-
[46]
LAION-Aesthetics
Christoph Schuhmann. LAION-Aesthetics. https://laion.ai/blog/ laion-aesthetics/, Aug 2022. Blog post
2022
-
[47]
Laion- 5b: An open large-scale dataset for training next generation image-text models.Advances in neural information processing systems, 35:25278–25294, 2022
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion- 5b: An open large-scale dataset for training next generation image-text models.Advances in neural information processing systems, 35:25278–25294, 2022
2022
-
[48]
Denoising diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021
2021
-
[49]
Consistency models
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Proceedings of the 40th International Conference on Machine Learning, pages 32211–32252, 2023. 12
2023
-
[50]
Score-based generative modeling through stochastic differential equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021
2021
-
[51]
Nikita Starodubcev, Ilya Drobyshevskiy, Denis Kuznedelev, Artem Babenko, and Dmitry Baranchuk. Scale-wise distillation of diffusion models.arXiv preprint arXiv:2503.16397, 2025
-
[52]
Swiftvideo: A unified framework for few-step video generation through trajectory-distribution alignment
Yanxiao Sun, Jiafu Wu, Yun Cao, Chengming Xu, Yabiao Wang, Weijian Cao, Donghao Luo, Chengjie Wang, and Yanwei Fu. Swiftvideo: A unified framework for few-step video generation through trajectory-distribution alignment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 9233–9241, 2026
2026
-
[53]
Longcat-image technical report
Meituan LongCat Team, Hanghang Ma, Haoxian Tan, Jiale Huang, Junqiang Wu, Jun-Yan He, Lishuai Gao, Songlin Xiao, Xiaoming Wei, Xiaoqi Ma, et al. Longcat-image technical report. arXiv preprint arXiv:2512.07584, 2025
-
[54]
Phased consistency models.Advances in neural information processing systems, 37:83951–84009, 2024
Fu-Yun Wang, Zhaoyang Huang, Alexander W Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, et al. Phased consistency models.Advances in neural information processing systems, 37:83951–84009, 2024
2024
-
[55]
Pro- lificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation
Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Pro- lificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in neural information processing systems, 36:8406–8441, 2023
2023
-
[56]
Hongyang Wei, Hongbo Liu, Zidong Wang, Yi Peng, Baixin Xu, Size Wu, Xuying Zhang, Xianglong He, Zexiang Liu, Peiyu Wang, et al. Skywork unipic 3.0: Unified multi-image composition via sequence modeling.arXiv preprint arXiv:2601.15664, 2026
-
[57]
Em distillation for one-step diffusion models.Advances in Neural Information Processing Systems, 37:45073–45104, 2024
Sirui Xie, Zhisheng Xiao, Diederik P Kingma, Tingbo Hou, Ying N Wu, Kevin Murphy, Tim Salimans, Ben Poole, and Ruiqi Gao. Em distillation for one-step diffusion models.Advances in Neural Information Processing Systems, 37:45073–45104, 2024
2024
-
[58]
Improved distribution matching distillation for fast image synthesis.Advances in neural information processing systems, 37:47455–47487, 2024
Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and Bill Freeman. Improved distribution matching distillation for fast image synthesis.Advances in neural information processing systems, 37:47455–47487, 2024
2024
-
[59]
One-step diffusion with distribution matching distillation
Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6613–6623, 2024
2024
-
[60]
Text-to-3d with classifier score distillation.arXiv preprint arXiv:2310.19415, 2023
Xin Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Song-Hai Zhang, and Xiaojuan Qi. Text-to-3d with classifier score distillation.arXiv preprint arXiv:2310.19415, 2023
-
[61]
Jianbin Zheng, Minghui Hu, Zhongyi Fan, Chaoyue Wang, Changxing Ding, Dacheng Tao, and Tat-Jen Cham. Trajectory consistency distillation: Improved latent consistency distillation by semi-linear consistency function with trajectory mapping.arXiv preprint arXiv:2402.19159, 2024
-
[62]
Few-step diffusion via score identity distillation
Mingyuan Zhou, Yi Gu, and Zhendong Wang. Few-step diffusion via score identity distillation. arXiv preprint arXiv:2505.12674, 2025
-
[63]
text-to-image-2m: A high-quality, diverse text–image training dataset, 2024
Kai Zou. text-to-image-2m: A high-quality, diverse text–image training dataset, 2024. 13 A Limitations While CDM achieves strong few-step generation quality, it has several limitations that we leave for future work. First, although our dynamic continuous schedule and CDM loss do not introduce any additional cost at inference time, they do increase per-ite...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.