pith. machine review for the scientific record. sign in

arxiv: 2605.06124 · v1 · submitted 2026-05-07 · 💻 cs.AI

Recognition: unknown

P-Guide: Parameter-Efficient Prior Steering for Single-Pass CFG Inference

Xin Peng , Ang Gao

Authors on Pith no claims yet

Pith reviewed 2026-05-08 10:24 UTC · model grok-4.3

classification 💻 cs.AI
keywords classifier-free guidancesingle-pass inferencelatent state modulationflow matchingprior steeringinference efficiencyheteroscedastic priors
0
0 comments X

The pith

P-Guide achieves high-quality classifier-free guidance in a single inference pass by modulating only the initial latent state.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces P-Guide to remove the need for dual forward passes in classifier-free guidance during flow matching sampling. It does so by adjusting only the initial latent state to steer generation toward the conditioned output. Under a first-order approximation this prior-space adjustment matches the effect of standard CFG without any velocity field extrapolation at later steps. The method works for both fixed-variance and variable-variance priors and yields roughly half the usual inference time while keeping sample fidelity and prompt alignment close to dual-pass baselines.

Core claim

P-Guide demonstrates that modulating the initial latent state alone reproduces the guidance effect of classifier-free guidance. Under a first-order approximation this steers the process from the prior distribution without requiring ongoing velocity adjustments during sampling, and joint mean-variance modeling in heteroscedastic priors adds adaptive loss attenuation.

What carries the argument

Modulation of the initial latent state to steer generation from the prior space in a single forward pass.

Load-bearing premise

A first-order approximation is enough for initial latent modulation to capture the full guidance effect without needing velocity field changes at later sampling steps.

What would settle it

Generate outputs from the same prompts with both P-Guide and standard dual-pass CFG, then compare quantitative metrics such as FID scores and CLIP alignment; a clear gap in quality or conditioning strength would falsify the claimed equivalence.

Figures

Figures reproduced from arXiv: 2605.06124 by Ang Gao, Xin Peng.

Figure 1
Figure 1. Figure 1: Conceptual overview of the P-Guide framework. Standard CFG requires evaluating the velocity field twice (vcond, vuncond) at each integration step. P-Guide relocates this guidance to the origin by modulating the initial noise state using a data-dependent prior. Based on our trajectory-level approximation, this initial shift anchors the global ODE path, enabling high-fidelity conditional generation with only… view at source ↗
Figure 2
Figure 2. Figure 2: Visualizing trajectory steering in a 2D toy setup. Standard CFG requires per-step corrections in velocity space (dual-pass). P-Guide utilizes the prior shift zc − zu as a directional anchor at t = 0. As shown, increasing w from 1.0 to 2.0 causes trajectories to concentrate and separate precisely toward target modes using only a single inference pass. Experimental Setup. We construct a 2D dataset where the … view at source ↗
Figure 3
Figure 3. Figure 3: Visual comparison of P-Guide generation trajectories on MNIST. Each row shows the evolution from prior (t = 0) to data (t = 1). Increasing the guidance scale (w = 1.0 → 1.5) sharpens semantic structure from early steps, confirming effective trajectory-level control from the origin. 7 view at source ↗
Figure 4
Figure 4. Figure 4: Visual comparison of P-Guide trajectories on ImageNet. Each row shows the evolution from prior (t = 0) to data (t = 1). Increasing the guidance scale (w = 1.0 → 1.1) sharpens semantic structure from early steps, confirming effective trajectory-level control from the origin. range (e.g., 1.1 for U-Net and 1.2 for DiT-B/2) leads to a clear reduction in FID, indicating that prior-space steering effectively sh… view at source ↗
Figure 5
Figure 5. Figure 5: Comparison between trajectory-level guidance (P-Guide) and distribution-level CFG on view at source ↗
Figure 6
Figure 6. Figure 6: Architecture of the default P-Guide prior steering module (1.247 MB). The model consists view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparison of P-Guide generation trajectories on CIFAR-10. Each row shows the evolution of a sample from its initial latent state (t = 0) to the final generated image (t = 1). Comparing w = 1.0 and w = 1.2, increasing the guidance scale leads to more semantically coherent structures emerging from the earliest stages of generation. This supports our hypothesis that modulating the initial latent state… view at source ↗
Figure 8
Figure 8. Figure 8: Trajectory comparison under different initial latent distributions on CIFAR-10. Each row visualizes the evolution of a sample from its initial state (t = 0) to the final generated image (t = 1). Standard CFM starts from a fixed Gaussian prior that is independent of class labels, resulting in less structured early-stage trajectories. In contrast, P-Guide initializes from a class-conditioned latent distribut… view at source ↗
Figure 9
Figure 9. Figure 9: Trajectory comparison under different initial latent distributions on ImageNet-1k. Each row shows the evolution of a sample from its initial state (t = 0) to the final generated image (t = 1). As in CIFAR-10, standard CFM starts from a class-agnostic Gaussian prior, leading to less structured trajectories at early stages. In contrast, P-Guide initializes from a class-conditioned latent distribution, where … view at source ↗
read the original abstract

Classifier-Free Guidance (CFG) is essential for high-fidelity conditional generation in flow matching, yet it imposes significant computational overhead by requiring dual forward passes at each sampling step. In this work, we address this bottleneck by introducing \textbf{P-Guide}, a framework that achieves high-quality guidance through a single inference pass by modulating only the initial latent state. We further show that, under a first-order approximation, P-Guide is equivalent to CFG in the sense that it steers generation from the prior space, without requiring explicit velocity field extrapolation during sampling. We consider both homoscedastic and \textbf{heteroscedastic} priors, and find that jointly modeling the mean and variance enables adaptive loss attenuation and improved robustness to data uncertainty. Extensive experiments demonstrate that P-Guide reduces inference latency by approximately 50\% while maintaining fidelity and prompt alignment competitive with standard dual-pass CFG baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes P-Guide, a method for single-pass classifier-free guidance (CFG) in flow-matching models. It achieves guidance by modulating only the initial latent state rather than performing dual forward passes at each sampling step. The central claim is that, under a first-order approximation of the velocity field, this initial-state steering is equivalent to standard CFG (i.e., it steers generation from the prior space without explicit velocity extrapolation during sampling). The work also examines both homoscedastic and heteroscedastic priors, claiming that joint mean-variance modeling enables adaptive loss attenuation and better robustness. Experiments are said to show ~50% latency reduction while preserving fidelity and prompt alignment comparable to dual-pass CFG baselines.

Significance. If the first-order equivalence can be made rigorous with explicit remainder bounds and the empirical results are reproducible, P-Guide would offer a practical efficiency gain for conditional generation in flow-matching and related models. The heteroscedastic prior extension is a constructive addition for handling data uncertainty. No machine-checked proofs or fully parameter-free derivations are present, but the single-pass formulation, if validated, would be a useful engineering contribution.

major comments (2)
  1. [Abstract and equivalence derivation] Abstract and the equivalence derivation (first-order approximation section): The claim that modulating only the initial latent state z0 produces a trajectory whose integrated effect matches dual-pass CFG rests on a first-order Taylor expansion of the velocity field around the unguided path. However, because the sampling trajectory in flow matching is itself a function of the initial condition, shifting z0 moves every point (z_t, t) at which v_t is evaluated. The linearization therefore does not automatically recover the guided velocity v_uncond + s(v_cond - v_uncond) along the new path. The manuscript supplies no explicit bound on the remainder term of this expansion, nor any analysis of how the discrepancy grows with guidance scale s or the number of sampling steps.
  2. [Experimental evaluation] Experimental evaluation section: The abstract asserts competitive fidelity and prompt alignment with ~50% latency reduction, yet provides no quantitative metrics, error bars, number of runs, or details on the baselines and sampling schedules used. Without these, it is impossible to determine whether the first-order approximation introduces visible artifacts or whether the reported gains are robust.
minor comments (2)
  1. [Prior modeling] Notation for the heteroscedastic prior should be introduced with an explicit equation showing how the variance term enters the loss and the sampling update.
  2. [Figures] All figures comparing P-Guide to CFG should include error bars and state the exact number of sampling steps and guidance scales tested.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which highlights important aspects of both the theoretical justification and experimental reporting in our work. We address each major comment below and indicate the changes planned for the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract and equivalence derivation] Abstract and the equivalence derivation (first-order approximation section): The claim that modulating only the initial latent state z0 produces a trajectory whose integrated effect matches dual-pass CFG rests on a first-order Taylor expansion of the velocity field around the unguided path. However, because the sampling trajectory in flow matching is itself a function of the initial condition, shifting z0 moves every point (z_t, t) at which v_t is evaluated. The linearization therefore does not automatically recover the guided velocity v_uncond + s(v_cond - v_uncond) along the new path. The manuscript supplies no explicit bound on the remainder term of this expansion, nor any analysis of how the discrepancy grows with guidance scale s or the number of sampling steps.

    Authors: We appreciate the referee's precise observation regarding the path dependence induced by the initial-state modulation. The first-order Taylor expansion is performed around the unguided trajectory, and we acknowledge that higher-order terms arise from the fact that the evaluation points (z_t, t) themselves shift. The original derivation was intended to provide an intuitive motivation rather than a fully rigorous equivalence. In the revision we will expand the first-order approximation section to explicitly discuss this path dependence, include an empirical quantification of the approximation error (measured as the integrated difference between the steered trajectory and standard CFG) across a range of guidance scales s and step counts, and add a limitations paragraph noting that a closed-form remainder bound is left for future work. This will clarify the scope of the claimed equivalence without overstating its rigor. revision: partial

  2. Referee: [Experimental evaluation] Experimental evaluation section: The abstract asserts competitive fidelity and prompt alignment with ~50% latency reduction, yet provides no quantitative metrics, error bars, number of runs, or details on the baselines and sampling schedules used. Without these, it is impossible to determine whether the first-order approximation introduces visible artifacts or whether the reported gains are robust.

    Authors: We agree that greater transparency in the experimental section is necessary for assessing robustness. Although the manuscript contains tables reporting FID, CLIP similarity, and wall-clock latency on standard benchmarks, we will revise the experimental evaluation section to add error bars from at least five independent runs with different random seeds, explicitly state the sampling schedule (number of steps, ODE solver, and discretization), and provide precise descriptions of the dual-pass CFG baselines including the guidance scales employed. These additions will allow readers to directly evaluate whether any artifacts from the first-order approximation are visible and to verify the reported latency reduction. revision: yes

Circularity Check

0 steps flagged

No significant circularity; equivalence shown via standard approximation on independent method

full rationale

The paper introduces P-Guide as modulating only the initial latent state for single-pass guidance, then applies a first-order Taylor expansion to relate the resulting trajectory to dual-pass CFG. This is a conventional linearization step in dynamical systems analysis and does not reduce the central claim to its own inputs by construction, nor does it rely on fitted parameters renamed as predictions, self-citations for uniqueness, or ansatz smuggling. The derivation remains self-contained against external flow-matching benchmarks, with the approximation serving as an explanatory bridge rather than a definitional equivalence. No load-bearing step collapses to a tautology or prior self-result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on an unverified first-order approximation for equivalence and on the assumption that initial-latent modulation suffices for guidance; no explicit free parameters or invented entities are stated in the abstract.

axioms (1)
  • domain assumption First-order approximation suffices to equate P-Guide to standard CFG without velocity extrapolation.
    Stated in abstract as the basis for equivalence.

pith-pipeline@v0.9.0 · 5444 in / 1192 out tokens · 45444 ms · 2026-05-08T10:24:45.323172+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 14 canonical work pages · 4 internal anchors

  1. [1]

    Building normalizing flows with stochastic interpolants

    Michael Samuel Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. InInternational Conference on Learning Representations (ICLR), 2023

  2. [2]

    An ensemble of simple convolutional neural network models for mnist digit recognition, 2020

    Sanghyeon An, Minjun Lee, Sanglee Park, Heerin Yang, and Jungmin So. An ensemble of simple convolutional neural network models for mnist digit recognition, 2020. URL https://arxiv.org/abs/2008. 10400

  3. [3]

    5-flash: Distribution-guided distillation of generative flows

    Hmrishav Bandyopadhyay, Rahim Entezari, Jim Scott, Reshinth Adithyan, Yi-Zhe Song, and Varun Jampani. Sd3.5-flash: Distribution-guided distillation of generative flows, 2025. URL https://arxiv.org/ abs/2509.21318

  4. [4]

    Flux.1, 2024

    Black-Forest-Labs. Flux.1, 2024. URLhttps://blackforestlabs.ai/

  5. [5]

    CAR-flow: Condition-aware reparameterization aligns source and target for better flow matching

    Chen Chen, Pengsheng Guo, Liangchen Song, Jiasen Lu, Rui Qian, Tsu-Jui Fu, Xinze Wang, Wei Liu, Yinfei Yang, and Alex Schwing. CAR-flow: Condition-aware reparameterization aligns source and target for better flow matching. InNeural Information Processing Systems (NeurIPS), 2026. URL https: //openreview.net/forum?id=idnW3BiZcV

  6. [6]

    Neural ordinary differential equations

    Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. InNeural Information Processing Systems (NeurIPS), 2018

  7. [7]

    Qwen-image technical report.arXiv preprint arXiv:2405.12230, 2024

    Yunfei Chu, Jin Xu, Wei Jiang, Lin Yang, Yang Wei, Jiaming Li, Shuailei Wang, Zejun Wang, Junyang Lin, and Jingren Zhou. Qwen-image technical report.arXiv preprint arXiv:2405.12230, 2024

  8. [8]

    arXiv preprint arXiv:2406.08070(2024)

    Hyungjin Chung, Jeongsol Kim, Geon Yeong Park, Hyelin Nam, and Jong Chul Ye. Cfg++: Manifold- constrained classifier free guidance for diffusion models. 2024. URL https://arxiv.org/abs/2406.08070

  9. [9]

    Flow matching in latent space.arXiv preprint arXiv:2307.08698,

    Quan Dao, Hao Phung, Binh Nguyen, and Anh Tran. Flow matching in latent space.arXiv, 2023. URL https://arxiv.org/abs/2307.08698

  10. [10]

    ImageNet: A large-scale hierarchical im age database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. doi: 10.1109/CVPR.2009.5206848

  11. [11]

    Diffusion models beat gans on image synthesis.Neural Information Processing Systems (NeurIPS), 34, 2021

    Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Neural Information Processing Systems (NeurIPS), 34, 2021

  12. [12]

    Scaling rectified flow transformers for high-resolution image synthesis

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. InInternational Conference on Machine Learning (ICML), 2024

  13. [13]

    Yeh, and Ziwei Liu

    Weichen Fan, Amber Yijia Zheng, Raymond A. Yeh, and Ziwei Liu. Cfg-zero*: Improved classifier-free guidance for flow matching models. 2025

  14. [14]

    C$^2$FG: Control Classifier-Free Guidance via Score Discrepancy Analysis

    Jiayang Gao, Tianyi Zheng, Jiayang Zou, Fengxiang Yang, Shice Liu, Luyao Fan, Zheyu Zhang, Hao Zhang, Jinwei Chen, Peng-Tao Jiang, Bo Li, and Jia Wang. C 2fg: Control classifier-free guidance via score discrepancy analysis. 2026. URLhttps://arxiv.org/abs/2603.08155

  15. [15]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

  16. [16]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. InNeural Information Processing Systems (NeurIPS), NIPS’17, page 6629–6640, Red Hook, NY , USA, 2017. Curran Associates Inc. ISBN 9781510860964

  17. [17]

    Classifier-Free Diffusion Guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

  18. [18]

    Denoising diffusion probabilistic models.Neural Information Processing Systems (NeurIPS), 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Neural Information Processing Systems (NeurIPS), 2020

  19. [19]

    What uncertainties do we need in bayesian deep learning for computer vision? InNeural Information Processing Systems (NeurIPS), pages 5574–5584, 2017

    Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for computer vision? InNeural Information Processing Systems (NeurIPS), pages 5574–5584, 2017

  20. [20]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario, 2009. URL https://www.cs.toronto.edu/~kriz/ learning-features-2009-TR.pdf. 10

  21. [21]

    Heung-Chang Lee and Jeonggeun Song

    Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791

  22. [22]

    Priorgrad: Improving conditional denoising diffusion models with data-dependent adaptive prior

    Sang-gil Lee, Heeseung Kim, Changho Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Yoonjoo Sung, and Tie-Yan Liu. Priorgrad: Improving conditional denoising diffusion models with data-dependent adaptive prior. InInternational Conference on Learning Representations (ICLR), 2022

  23. [23]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InInternational Conference on Learning Representations (ICLR), 2023

  24. [24]

    Flowing from words to pixels: A noise-free framework for cross-modality evolution

    Qihao Liu, Xi Yin, Alan Yuille, Andrew Brown, and Mannat Singh. Flowing from words to pixels: A noise-free framework for cross-modality evolution. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2755–2765, 2025

  25. [25]

    Flow straight and fast: Learning to generate and transfer data with rectified flow

    Xingchao Liu, Chengyue Gong, and qiang liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InInternational Conference on Learning Representations (ICLR), 2023

  26. [26]

    Latent consistency models: Synthesizing high-resolution images with few-step inference, 2023

    Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference, 2023

  27. [27]

    Improved denoising diffusion probabilistic models

    Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning (ICML). PMLR, 2021

  28. [28]

    Glide: Towards photorealistic image generation and editing with text-guided diffusion models

    Alexander Quinn Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. InInternational Conference on Machine Learning (ICML), pages 16784–16804. PMLR, 2022

  29. [29]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023

  30. [30]

    SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

    Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis.arXiv preprint arXiv:2307.01952, 2023

  31. [31]

    Zero-shot text-to-image generation

    Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. InInternational Conference on Machine Learning (ICML), 2021

  32. [32]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 2022

  33. [33]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022

  34. [34]

    U-net: Convolutional networks for biomedical image segmentation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi, editors, Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 234–241, Cham, 2015. Springer International Publishing. ISBN 978-3-319-24574-4

  35. [35]

    Photorealistic text-to-image diffusion models with deep language understanding

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. InNeural Information Processing Systems (NeurIPS), volume 35, pages 36479–36494, 2022

  36. [36]

    Rectified CFG++ for flow based models

    Shreshth Saini, Shashank Gupta, and Alan Bovik. Rectified CFG++ for flow based models. InNeural Information Processing Systems (NeurIPS), 2026. URL https://openreview.net/forum?id=NosdT1FHPv

  37. [37]

    Progressive distillation for fast sampling of diffusion models

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations (ICLR), 2022

  38. [38]

    Improved techniques for training gans

    Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. InNeural Information Processing Systems (NeurIPS), NIPS’16, page 2234–2242, Red Hook, NY , USA, 2016. Curran Associates Inc. ISBN 9781510838819

  39. [39]

    Jonas Scholz and Richard E. Turner. Warm starts accelerate conditional diffusion. 2025. URL https: //arxiv.org/abs/2507.09212. 11

  40. [40]

    Denoising diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. InInternational Conference on Learning Representations (ICLR), 2021

  41. [41]

    Generative modeling by estimating gradients of the data distribution

    Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Neural Information Processing Systems (NeurIPS), 2019

  42. [42]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations (ICLR), 2021

  43. [43]

    Consistency models

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. InInternational Conference on Machine Learning (ICML), 2023

  44. [44]

    Matus Telgarsky

    Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826, 2016. doi: 10.1109/CVPR.2016.308

  45. [45]

    Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2502.17332, 2025

    Wan Team. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2502.17332, 2025

  46. [46]

    Cfg-ctrl: Control-based classifier-free diffusion guidance

    Hanyang Wang, Yiyang Liu, Jiawei Chi, Fangfu Liu, Ran Xue, and Yueqi Duan. Cfg-ctrl: Control-based classifier-free diffusion guidance. 2026. URLhttps://arxiv.org/abs/2603.03281

  47. [47]

    Masked generative distillation

    Zhendong Yang, Zhe Li, Xiaojuan Jiang, Yuan Gong, Zehuan Yuan, Danpeng Zhao, and Chun Zhan. Masked generative distillation. InEuropean Conference on Computer Vision (ECCV), 2022

  48. [48]

    Reconstruction vs

    Jingfeng Yao, Bin Yang, and Xinggang Wang. Reconstruction vs. generation: Taming optimization dilemma in latent diffusion models. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 15703–15712, 2025. 12 A Proof of trajectory-level approximation In this section, we provide a detailed derivation showing that classifier-free guidance ...