C$^2$FG: Control Classifier-Free Guidance via Score Discrepancy Analysis

Bo Li; Fengxiang Yang; Hao Zhang; Jia Wang; Jiayang Gao; Jiayang Zou; Jinwei Chen; Luyao Fan; Peng-Tao Jiang; Shice Liu

arxiv: 2603.08155 · v3 · pith:TXHVVOCUnew · submitted 2026-03-09 · 💻 cs.LG

C²FG: Control Classifier-Free Guidance via Score Discrepancy Analysis

Jiayang Gao , Tianyi Zheng , Jiayang Zou , Fengxiang Yang , Shice Liu , Luyao Fan , Zheyu Zhang , Hao Zhang

show 4 more authors

Jinwei Chen Peng-Tao Jiang Bo Li Jia Wang

This is my paper

Pith reviewed 2026-05-21 11:59 UTC · model grok-4.3

classification 💻 cs.LG

keywords classifier-free guidancediffusion modelsscore discrepancytime-dependent guidanceconditional generationcontrol function

0 comments

The pith

Time-dependent guidance derived from score bounds improves conditional diffusion models without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives strict upper bounds on how much the score functions of conditional and unconditional distributions can differ at each timestep during the diffusion process. This analysis reveals why fixed guidance weights in classifier-free guidance fail to match the changing dynamics and supports the use of a time-varying control. The proposed C²FG method uses an exponential decay function to adjust guidance strength accordingly, offering a training-free enhancement applicable to various generative tasks.

Core claim

We establish strict upper bounds on the score discrepancy between conditional and unconditional distributions at different timesteps based on the diffusion process. This finding explains the limitations of fixed-weight strategies and establishes a principled foundation for time-dependent guidance. Motivated by this insight, we introduce Control Classifier-Free Guidance (C²FG), a novel, training-free, and plug-in method that aligns the guidance strength with the diffusion dynamics via an exponential decay control function.

What carries the argument

The exponential decay control function that modulates guidance weight to match the derived upper bounds on score discrepancy across timesteps.

If this is right

C²FG serves as a plug-in replacement for standard CFG in any diffusion model.
The guidance strength naturally decreases as timesteps progress toward the data distribution.
Experimental results show effectiveness across diverse conditional generation tasks.
The method remains orthogonal to other existing guidance strategies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The score discrepancy bounds could be extended to other forward processes or variance schedules.
Similar analysis might apply to guidance in non-diffusion generative models.
Further optimization of the decay rate could be explored without task-specific tuning.

Load-bearing premise

That choosing an exponential decay function to follow the upper bounds will deliver consistent quality gains without new artifacts or needing per-task adjustments.

What would settle it

Running the C²FG method on a standard benchmark and observing either no quality improvement over fixed-weight CFG or the introduction of new generation artifacts.

Figures

Figures reproduced from arXiv: 2603.08155 by Bo Li, Fengxiang Yang, Hao Zhang, Jia Wang, Jiayang Gao, Jiayang Zou, Jinwei Chen, Luyao Fan, Peng-Tao Jiang, Shice Liu, Tianyi Zheng, Zheyu Zhang.

**Figure 1.** Figure 1: Following [47], (a) and (b) present results for t ≥ t0 > 0. (a) shows that the MSE of conditional score and unconditional score can be bounded by a function which tends to 0 when t → +∞; (b) shows that the normalized cosine similarity between the two vectors decreases over reverse time, indicating that their directions gradually diverge in the reasoning process. t = T t = 0 Diffusion Dynamics For VP and VE… view at source ↗

**Figure 2.** Figure 2: Noise to Image Process of C 2FG: Dynamic guidance weight ω(t) adaptively balances conditional and unconditional outputs at each timestep t during generation, guided by theoretical bounds on the score function. Moreover, we can choose to add the method of [26], where we fix the ω(t) = 1 at the beginning of generation or when t tends to 0. Furthermore, our framework also provides a theoretical interpretation… view at source ↗

**Figure 3.** Figure 3: A two-dimensional distribution featuring two classes represented by gray and orange regions. Approximately 99% of the probability [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative Comparison. Qualitative comparison on Class-Conditional ImageNet datasets with different architectures and samplers. The sampler used and the number of inference steps are indicated in parentheses [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison between reverse diffusion process by CFG and C [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗

**Figure 6.** Figure 6: Heatmaps of the logarithmic ratio (log2 ) between conditional and unconditional predictions at selected timesteps. White indicates no difference (ratio=1), while red and blue highlight amplification and suppression, respectively. Stronger colors denote larger deviations between the two predictions. Comparisons of various forms of ω(t). As shown in [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: Impact of the initial schedule weight ω0 on IS–FID performance (with fixed λ = 1.0, 250 inference steps). 0.0 0.2 0.4 0.6 0.8 1.0 t/tm 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 (t) Trend of different (t) functions sin t 1 t ours (a) Trend of various ω(t) 100 200 300 400 500 IS (Inception Score) better 5.00 10.00 20.00 40.00 80.00 FID better IS FID Comparison Across Schedules (FID-10K) baseline sine ours 1-t … view at source ↗

**Figure 8.** Figure 8: Comparison of IS–FID performance under different hyperparameter settings on DiT-XL/2 and ImageNet-256. [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗

**Figure 9.** Figure 9: (a) demonstrates the impact of initial weight [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison between results during the denoising process of C [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗

**Figure 12.** Figure 12: Additional results for C2 FG [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗

read the original abstract

Classifier-Free Guidance (CFG) is a cornerstone of modern conditional diffusion models, yet its reliance on the fixed or heuristic dynamic guidance weight is predominantly empirical and overlooks the inherent dynamics of the diffusion process. In this paper, we provide a rigorous theoretical analysis of the Classifier-Free Guidance. Specifically, we establish strict upper bounds on the score discrepancy between conditional and unconditional distributions at different timesteps based on the diffusion process. This finding explains the limitations of fixed-weight strategies and establishes a principled foundation for time-dependent guidance. Motivated by this insight, we introduce \textbf{Control Classifier-Free Guidance (C$^2$FG)}, a novel, training-free, and plug-in method that aligns the guidance strength with the diffusion dynamics via an exponential decay control function. Extensive experiments demonstrate that C$^2$FG is effective and broadly applicable across diverse generative tasks, while also exhibiting orthogonality to existing strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives upper bounds on score discrepancy to motivate time-dependent CFG, but the specific exponential controller is motivated by rather than strictly forced by those bounds.

read the letter

The main thing here is that they analyze the diffusion process to put strict upper bounds on how much the conditional and unconditional scores can differ at each timestep. This is used to explain why a fixed guidance weight is suboptimal and to motivate a time-dependent schedule instead. They then introduce C²FG as a plug-in exponential decay controller that aligns with those bounds. It's training-free and presented as orthogonal to other tricks, which is practical for people already running diffusion models.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to establish strict upper bounds on the score discrepancy between conditional and unconditional distributions in diffusion processes at different timesteps. This analysis is used to motivate and justify a time-dependent guidance strategy in Classifier-Free Guidance (CFG). The authors propose C²FG, which employs an exponential decay control function to adjust the guidance weight dynamically in a training-free, plug-in manner. Extensive experiments across diverse generative tasks demonstrate its effectiveness and orthogonality to existing methods.

Significance. If the derived upper bounds hold and the exponential control function is rigorously linked to them, this could offer a theoretically grounded improvement over fixed or heuristic CFG weights, potentially enhancing sample quality in conditional diffusion models without requiring retraining. The work highlights the importance of diffusion dynamics in guidance strategies.

major comments (2)

[§3] §3 (Theoretical Analysis): The derivation of strict upper bounds on score discrepancy is presented as the foundation for time-dependent guidance, yet the manuscript does not demonstrate that these bounds uniquely imply or force the exponential decay form adopted in C²FG. The bounds appear to constrain the discrepancy to be monotonically decreasing, but multiple functional forms (e.g., linear or 1/t schedules) could respect the same envelope without violating the stated inequalities.
[§4.2] §4.2 (Control Function Definition): The exponential decay parameter is chosen to 'align' with the upper bounds, but no explicit mapping or optimization step is shown that derives the decay rate directly from the discrepancy analysis (e.g., no equation linking bound tightness at timestep t to the specific exponential coefficient). This leaves open whether performance gains stem from the discrepancy insight or from generic time-variation.

minor comments (2)

[Figure 2] Figure 2: The plot of score discrepancy vs. timestep would benefit from an overlay of the proposed control function to visually confirm alignment with the derived bounds.
[§3.1] Notation: Ensure that the symbols for conditional score s_θ(x_t | y) and unconditional score s_θ(x_t) are used consistently when stating the discrepancy bounds in §3.1.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We appreciate the opportunity to clarify the links between our theoretical bounds and the design of C²FG. Below we respond point by point to the major comments. We will revise the manuscript to address the identified gaps in explicitness and uniqueness.

read point-by-point responses

Referee: [§3] §3 (Theoretical Analysis): The derivation of strict upper bounds on score discrepancy is presented as the foundation for time-dependent guidance, yet the manuscript does not demonstrate that these bounds uniquely imply or force the exponential decay form adopted in C²FG. The bounds appear to constrain the discrepancy to be monotonically decreasing, but multiple functional forms (e.g., linear or 1/t schedules) could respect the same envelope without violating the stated inequalities.

Authors: We agree that the derived bounds establish a monotonically decreasing envelope but do not uniquely dictate the exponential form; other schedules could satisfy the inequalities. The exponential was selected because it provides a smooth, continuous decay that matches the observed rapid early-timestep drop in score discrepancy followed by stabilization, consistent with diffusion dynamics. We do not claim uniqueness in the current manuscript. In revision we will add an explicit statement in §3 acknowledging alternative forms and include a short comparison (linear and 1/t) in the experiments to illustrate that time-dependence informed by the bounds is the primary driver of gains, while exponential yields practical advantages in stability and quality. revision: yes
Referee: [§4.2] §4.2 (Control Function Definition): The exponential decay parameter is chosen to 'align' with the upper bounds, but no explicit mapping or optimization step is shown that derives the decay rate directly from the discrepancy analysis (e.g., no equation linking bound tightness at timestep t to the specific exponential coefficient). This leaves open whether performance gains stem from the discrepancy insight or from generic time-variation.

Authors: We acknowledge that the current presentation leaves the choice of decay rate somewhat implicit. The parameter is currently set by visual and quantitative alignment with the discrepancy upper-bound curves computed from the diffusion process. For the revision we will add an explicit mapping in §4.2 (and an appendix derivation): the coefficient is obtained by minimizing the L2 distance between the exponential schedule and the normalized upper-bound tightness across a discrete set of timesteps, yielding a closed-form relation to the variance schedule β_t. This will make clear that the functional choice is directly informed by the bound analysis rather than generic time variation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; bounds derived independently and control function presented as motivated ansatz.

full rationale

The paper first derives strict upper bounds on score discrepancy directly from the diffusion process equations, which constitutes an independent first-principles step not presupposing the C²FG method or the exponential schedule. The exponential decay control function is explicitly introduced as 'motivated by this insight' rather than shown to be the unique or forced functional form satisfying the bounds at every timestep. No equation reduces to another by construction, no self-citation chain carries the central claim, and no fitted parameter is relabeled as a prediction. The overall derivation chain remains self-contained against the diffusion process assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the exponential decay schedule itself may function as an implicit modeling choice whose justification rests on the unshown bounds.

pith-pipeline@v0.9.0 · 5720 in / 984 out tokens · 28080 ms · 2026-05-21T11:59:35.138739+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean and Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative; costAlphaLog_fourth_deriv_at_zero echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Theorem 1 (VP-SDE Score MSE Bound): ∥∇log p(x,t)−∇log p̃(x,t)∥ ≤ α(t)/σ²(t) C with α(t)=exp(−½∫β), yielding O(e^{-t}) decay after reparameterization t′=½∫β.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration; CostAlphaLog echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

ω(t)=ω₀ exp(λ(1−t/t_max)) chosen to align with the exponential upper bound on score discrepancy during reverse process.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

P-Guide: Parameter-Efficient Prior Steering for Single-Pass CFG Inference
cs.AI 2026-05 unverdicted novelty 6.0

P-Guide achieves single-pass classifier-free guidance in flow matching by modulating the initial latent state and is equivalent to standard CFG under a first-order approximation while cutting latency by half.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 1 Pith paper · 13 internal anchors

[1]

FD-DINOv2: FD Score via DINOv2.https: //github.com/justin4ai/FD-DINOv2 , 2024

Junyeong Ahn. FD-DINOv2: FD Score via DINOv2.https: //github.com/justin4ai/FD-DINOv2 , 2024. Ver- sion 0.1.0. 28

work page 2024
[2]

Anderson

Brian D.O. Anderson. Reverse-time diffusion equation mod- els.Stochastic Processes and their Applications, 12(3):313– 326, 1982. 2

work page 1982
[3]

Diffusions hypercon- tractives

Dominique Bakry and Michel Émery. Diffusions hypercon- tractives. InSéminaire de Probabilités XIX 1983/84: Proceed- ings, pages 177–206. Springer, 2006. 14, 19

work page 1983
[4]

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Ait- tala, Timo Aila, Samuli Laine, et al. ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers.arXiv preprint arXiv:2211.01324, 2022. 1

work page internal anchor Pith review Pith/arXiv arXiv 2022
[5]

All are worth words: A vit backbone for diffusion models

Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, and Jun Zhu. All are worth words: A vit backbone for diffusion models. InCVPR, 2023. 2, 6, 7

work page 2023
[6]

Classifier-free guid- ance is a predictor-corrector.Transactions on Machine Learn- ing Research, 2025

Arwen Bradley and Preetum Nakkiran. Classifier-free guid- ance is a predictor-corrector.Transactions on Machine Learn- ing Research, 2025. 3, 12

work page 2025
[7]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the International Conference on Computer Vision (ICCV), 2021. 6

work page 2021
[8]

Diffier: Optimizing diffusion models with iterative error reduction, 2025

Ao Chen, Lihe Ding, and Tianfan Xue. Diffier: Optimizing diffusion models with iterative error reduction, 2025. 11

work page 2025
[9]

S 2-guidance: Stochastic self guidance for training-free enhancement of diffusion models, 2025

Chubin Chen, Jiashu Zhu, Xiaokun Feng, Nisha Huang, Meiqi Wu, Fangyuan Mao, Jiahong Wu, Xiangxiang Chu, and Xiu Li. S 2-guidance: Stochastic self guidance for training-free enhancement of diffusion models, 2025. 3, 12

work page 2025
[10]

CFG++: Manifold-constrained clas- sifier free guidance for diffusion models

Hyungjin Chung, Jeongsol Kim, Geon Yeong Park, Hyelin Nam, and Jong Chul Ye. CFG++: Manifold-constrained clas- sifier free guidance for diffusion models. InThe Thirteenth International Conference on Learning Representations, 2025. 1, 11, 29

work page 2025
[11]

Lipdiffuser: Lip-to-speech generation with conditional diffusion models.arXiv preprint arXiv:2505.11391, 2025

Danilo de Oliveira, Julius Richter, Tal Peer, and Timo Gerk- mann. Lipdiffuser: Lip-to-speech generation with conditional diffusion models.arXiv preprint arXiv:2505.11391, 2025. 1

work page arXiv 2025
[12]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 6

work page 2009
[13]

Diffusion Models Beat GANs on Image Synthesis

Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis.ArXiv, abs/2105.05233, 2021. 1, 2, 11

work page internal anchor Pith review Pith/arXiv arXiv 2021
[14]

M. D. Donsker and S. R. S. Varadhan. Asymptotic evaluation of certain markov process expectations for large time, i.Com- munications on Pure and Applied Mathematics, 28(1):1–47,

work page
[15]

Scaling rectified flow transformers for high-resolution image synthesis, 2024

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim En- tezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthesis, 2024. 29

work page 2024
[16]

Evans.Partial Differential Equations

L.C. Evans.Partial Differential Equations. American Mathe- matical Society, 1998. 14

work page 1998
[17]

Handbook of stochastic methods for physics, chemistry and the natural sciences.Springer series in synergetics, 1985

Crispin W Gardiner. Handbook of stochastic methods for physics, chemistry and the natural sciences.Springer series in synergetics, 1985. 2

work page 1985
[18]

Masked Autoencoders Are Scalable Vision Learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners.arXiv:2111.06377, 2021. 6

work page internal anchor Pith review Pith/arXiv arXiv 2021
[19]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bern- hard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017. 6

work page 2017
[20]

Classifier-Free Diffusion Guidance

Jonathan Ho. Classifier-free diffusion guidance.ArXiv, abs/2207.12598, 2022. 1, 3, 11

work page internal anchor Pith review Pith/arXiv arXiv 2022
[21]

Jonathan Ho, Ajay Jain, and P. Abbeel. Denoising diffusion probabilistic models.ArXiv, abs/2006.11239, 2020. 2

work page internal anchor Pith review Pith/arXiv arXiv 2006
[22]

Stage-wise dynamics of classifier-free guidance in diffusion models, 2025

Cheng Jin, Qitan Shi, and Yuantao Gu. Stage-wise dynamics of classifier-free guidance in diffusion models, 2025. 3, 12

work page 2025
[23]

Guiding a diffusion model with a bad version of itself

Tero Karras, Miika Aittala, Tuomas Kynkäänniemi, Jaakko Lehtinen, Timo Aila, and Samuli Laine. Guiding a diffusion model with a bad version of itself. InProc. NeurIPS, 2024. 2, 7, 28

work page 2024
[24]

Analyzing and improving the training dynamics of diffusion models

Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, and Samuli Laine. Analyzing and improving the training dynamics of diffusion models. InProc. CVPR, 2024. 2, 6, 7, 28

work page 2024
[25]

Improved precision and recall metric for assessing generative models.Advances in neural information processing systems, 32, 2019

Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved precision and recall metric for assessing generative models.Advances in neural information processing systems, 32, 2019. 6

work page 2019
[26]

Applying guidance in a limited interval improves sample and distribution quality in diffusion models.Advances in Neural Information Processing Systems, 37:122458–122483, 2024

Tuomas Kynkäänniemi, Miika Aittala, Tero Karras, Samuli Laine, Timo Aila, and Jaakko Lehtinen. Applying guidance in a limited interval improves sample and distribution quality in diffusion models.Advances in Neural Information Processing Systems, 37:122458–122483, 2024. 1, 2, 3, 5, 6, 11, 12

work page 2024
[27]

Flux.1 kontext: Flow matching for in-context image generation and editing in latent space, 2025

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Fred- eric Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context imag...

work page 2025
[28]

Com- mon diffusion noise schedules and sample steps are flawed

Shanchuan Lin, Bingchen Liu, Jiashi Li, and Xiao Yang. Com- mon diffusion noise schedules and sample steps are flawed. InProceedings of the IEEE/CVF winter conference on appli- cations of computer vision, pages 5404–5411, 2024. 11

work page 2024
[29]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 6

work page 2014
[30]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maxi- milian Nickel, and Matt Le. Flow matching for generative modeling.ArXiv, abs/2210.02747, 2022. 2

work page internal anchor Pith review Pith/arXiv arXiv 2022
[31]

Rectified Flow: A Marginal Preserving Approach to Optimal Transport

Qiang Liu. Rectified flow: A marginal preserving approach to optimal transport.ArXiv, abs/2209.14577, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[32]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.ArXiv, abs/2209.03003, 2022. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2022
[33]

Sit: Exploring flow and diffusion-based generative models with scalable in- terpolant transformers

Nanye Ma, Mark Goldstein, Michael S Albergo, Nicholas M Boffi, Eric Vanden-Eijnden, and Saining Xie. Sit: Exploring flow and diffusion-based generative models with scalable in- terpolant transformers. InEuropean Conference on Computer Vision, pages 23–40. Springer, 2024. 2, 6

work page 2024
[34]

Classifier-free guidance with adap- tive scaling.arXiv preprint arXiv:2502.10574, 2025

Dawid Malarz, Artur Kasymov, Maciej Zi˛ eba, Jacek Tabor, and Przemysław Spurek. Classifier-free guidance with adap- tive scaling.arXiv preprint arXiv:2502.10574, 2025. 1, 6, 11, 27, 29

work page arXiv 2025
[35]

Scalable Diffusion Models with Transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers.arXiv preprint arXiv:2212.09748, 2022. 2, 6, 26

work page internal anchor Pith review Pith/arXiv arXiv 2022
[36]

Ge- oguide: Geometric guidance of diffusion models

Mateusz Poleski, Jacek Tabor, and Przemysław Spurek. Ge- oguide: Geometric guidance of diffusion models. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 297–305. IEEE, 2025. 11

work page 2025
[37]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image genera- tion with clip latents.arXiv preprint arXiv:2204.06125, 1(2): 3, 2022. 1

work page internal anchor Pith review Pith/arXiv arXiv 2022
[38]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 1, 2, 6, 7, 29

work page 2022
[39]

Seyedmorteza Sadat, Manuel Kansy, Otmar Hilliges, and Romann M. Weber. No training, no problem: Rethinking classifier-free guidance for diffusion models, 2025. 3, 12

work page 2025
[40]

Seyedmorteza Sadat, Tobias V ontobel, Farnood Salehi, and Romann M. Weber. Guidance in the frequency domain enables high-fidelity sampling at low cfg scales.ArXiv, abs/2506.19713, 2025. 1, 3, 11, 29

work page arXiv 2025
[41]

Improved techniques for training gans.Advances in neural information processing systems, 29, 2016

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans.Advances in neural information processing systems, 29, 2016. 6

work page 2016
[42]

Rethinking the spatial inconsistency in classifier- free diffusion guidance

Dazhong Shen, Guanglu Song, Zeyue Xue, Fu-Yun Wang, and Yu Liu. Rethinking the spatial inconsistency in classifier- free diffusion guidance. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9370–9379, 2024. 11

work page 2024
[43]

Information theoretic learning for diffusion models with warm start

Yirong Shen, Lu GAN, and Cong Ling. Information theoretic learning for diffusion models with warm start. InThe Thirty- ninth Annual Conference on Neural Information Processing Systems, 2025. 1

work page 2025
[44]

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Jascha Narain Sohl-Dickstein, Eric A. Weiss, Niru Ma- heswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics.ArXiv, abs/1503.03585, 2015. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2015
[45]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.ArXiv, abs/2010.02502, 2020. 2

work page internal anchor Pith review Pith/arXiv arXiv 2010
[46]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. InNeural Infor- mation Processing Systems, 2019

work page 2019
[47]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Narain Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differ- ential equations.ArXiv, abs/2011.13456, 2020. 1, 2, 3, 4, 6

work page internal anchor Pith review Pith/arXiv arXiv 2011
[48]

Gradcheck: Analyzing classifier guid- ance gradients for conditional diffusion sampling.arXiv preprint arXiv:2406.17399, 2024

Philipp Vaeth, Alexander M Fruehwald, Benjamin Paassen, and Magda Gregorova. Gradcheck: Analyzing classifier guid- ance gradients for conditional diffusion sampling.arXiv preprint arXiv:2406.17399, 2024. 1

work page arXiv 2024
[49]

Rectified diffusion: Straightness is not your need in rectified flow.arXiv preprint arXiv:2410.07303,

Fu-Yun Wang, Ling Yang, Zhaoyang Huang, Mengdi Wang, and Hongsheng Li. Rectified diffusion: Straightness is not your need in rectified flow.arXiv preprint arXiv:2410.07303,

work page arXiv
[50]

Diffusion-npo: Negative preference optimiza- tion for better preference aligned generation of diffusion mod- els.arXiv preprint arXiv:2505.11245, 2025

Fu-Yun Wang, Yunhao Shui, Jingtan Piao, Keqiang Sun, and Hongsheng Li. Diffusion-npo: Negative preference optimiza- tion for better preference aligned generation of diffusion mod- els.arXiv preprint arXiv:2505.11245, 2025. 11

work page arXiv 2025
[51]

Analysis of classifier-free guidance weight schedulers

Xi Wang, Nicolas Dufour, Nefeli Andreou, Marie-Paule Cani, Victoria Fernández Abrevaya, David Picard, and Vicky Kalo- geiton. Analysis of classifier-free guidance weight schedulers. arXiv preprint arXiv:2404.13040, 2024. 3, 11, 26

work page arXiv 2024
[52]

Harmonyview: Harmonizing consis- tency and diversity in one-image-to-3d

Sangmin Woo, Byeongjun Park, Hyojun Go, Jin-Young Kim, and Changick Kim. Harmonyview: Harmonizing consis- tency and diversity in one-image-to-3d. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10574–10584, 2024. 1

work page 2024
[53]

Efficient training-free high-resolution syn- thesis with energy rectification in diffusion models, 2025

Zhen Yang, Guibao Shen, Minyang Li, Liang Hou, Mushui Liu, Luozhou Wang, Xin Tao, Pengfei Wan, Di Zhang, and Ying-Cong Chen. Efficient training-free high-resolution syn- thesis with energy rectification in diffusion models, 2025. 12

work page 2025
[54]

Tfg: Unified training-free guidance for diffusion models

Haotian Ye, Haowei Lin, Jiaqi Han, Minkai Xu, Sheng Liu, Yitao Liang, Jianzhu Ma, James Zou, and Stefano Ermon. Tfg: Unified training-free guidance for diffusion models. 2024. 1, 12

work page 2024
[55]

Representa- tion alignment for generation: Training diffusion transformers is easier than you think

Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, and Saining Xie. Representa- tion alignment for generation: Training diffusion transformers is easier than you think. InInternational Conference on Learning Representations, 2025. 6, 29

work page 2025
[56]

Bidirectional beta-tuned diffusion model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 48(1):359–373, 2026

Tianyi Zheng, Jiayang Zou, Peng-Tao Jiang, Hao Zhang, Jin- wei Chen, Jia Wang, and Bo Li. Bidirectional beta-tuned diffusion model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 48(1):359–373, 2026. 1

work page 2026
[57]

local-in-space

Shangwen Zhu, Qianyu Peng, Yuting Hu, Zhantao Yang, Han Zhang, Zhao Pu, Ruili Feng, and Fan Cheng. Raag: Ratio aware adaptive guidance, 2025. 1, 3, 12, 27 Appendix Overview This appendix provides additional details and supplementary results to support the main paper. In Section A, we review related literature to place our work in a broader context. Sectio...

work page arXiv 2025

[1] [1]

FD-DINOv2: FD Score via DINOv2.https: //github.com/justin4ai/FD-DINOv2 , 2024

Junyeong Ahn. FD-DINOv2: FD Score via DINOv2.https: //github.com/justin4ai/FD-DINOv2 , 2024. Ver- sion 0.1.0. 28

work page 2024

[2] [2]

Anderson

Brian D.O. Anderson. Reverse-time diffusion equation mod- els.Stochastic Processes and their Applications, 12(3):313– 326, 1982. 2

work page 1982

[3] [3]

Diffusions hypercon- tractives

Dominique Bakry and Michel Émery. Diffusions hypercon- tractives. InSéminaire de Probabilités XIX 1983/84: Proceed- ings, pages 177–206. Springer, 2006. 14, 19

work page 1983

[4] [4]

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Ait- tala, Timo Aila, Samuli Laine, et al. ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers.arXiv preprint arXiv:2211.01324, 2022. 1

work page internal anchor Pith review Pith/arXiv arXiv 2022

[5] [5]

All are worth words: A vit backbone for diffusion models

Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, and Jun Zhu. All are worth words: A vit backbone for diffusion models. InCVPR, 2023. 2, 6, 7

work page 2023

[6] [6]

Classifier-free guid- ance is a predictor-corrector.Transactions on Machine Learn- ing Research, 2025

Arwen Bradley and Preetum Nakkiran. Classifier-free guid- ance is a predictor-corrector.Transactions on Machine Learn- ing Research, 2025. 3, 12

work page 2025

[7] [7]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the International Conference on Computer Vision (ICCV), 2021. 6

work page 2021

[8] [8]

Diffier: Optimizing diffusion models with iterative error reduction, 2025

Ao Chen, Lihe Ding, and Tianfan Xue. Diffier: Optimizing diffusion models with iterative error reduction, 2025. 11

work page 2025

[9] [9]

S 2-guidance: Stochastic self guidance for training-free enhancement of diffusion models, 2025

Chubin Chen, Jiashu Zhu, Xiaokun Feng, Nisha Huang, Meiqi Wu, Fangyuan Mao, Jiahong Wu, Xiangxiang Chu, and Xiu Li. S 2-guidance: Stochastic self guidance for training-free enhancement of diffusion models, 2025. 3, 12

work page 2025

[10] [10]

CFG++: Manifold-constrained clas- sifier free guidance for diffusion models

Hyungjin Chung, Jeongsol Kim, Geon Yeong Park, Hyelin Nam, and Jong Chul Ye. CFG++: Manifold-constrained clas- sifier free guidance for diffusion models. InThe Thirteenth International Conference on Learning Representations, 2025. 1, 11, 29

work page 2025

[11] [11]

Lipdiffuser: Lip-to-speech generation with conditional diffusion models.arXiv preprint arXiv:2505.11391, 2025

Danilo de Oliveira, Julius Richter, Tal Peer, and Timo Gerk- mann. Lipdiffuser: Lip-to-speech generation with conditional diffusion models.arXiv preprint arXiv:2505.11391, 2025. 1

work page arXiv 2025

[12] [12]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 6

work page 2009

[13] [13]

Diffusion Models Beat GANs on Image Synthesis

Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis.ArXiv, abs/2105.05233, 2021. 1, 2, 11

work page internal anchor Pith review Pith/arXiv arXiv 2021

[14] [14]

M. D. Donsker and S. R. S. Varadhan. Asymptotic evaluation of certain markov process expectations for large time, i.Com- munications on Pure and Applied Mathematics, 28(1):1–47,

work page

[15] [15]

Scaling rectified flow transformers for high-resolution image synthesis, 2024

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim En- tezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthesis, 2024. 29

work page 2024

[16] [16]

Evans.Partial Differential Equations

L.C. Evans.Partial Differential Equations. American Mathe- matical Society, 1998. 14

work page 1998

[17] [17]

Handbook of stochastic methods for physics, chemistry and the natural sciences.Springer series in synergetics, 1985

Crispin W Gardiner. Handbook of stochastic methods for physics, chemistry and the natural sciences.Springer series in synergetics, 1985. 2

work page 1985

[18] [18]

Masked Autoencoders Are Scalable Vision Learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners.arXiv:2111.06377, 2021. 6

work page internal anchor Pith review Pith/arXiv arXiv 2021

[19] [19]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bern- hard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017. 6

work page 2017

[20] [20]

Classifier-Free Diffusion Guidance

Jonathan Ho. Classifier-free diffusion guidance.ArXiv, abs/2207.12598, 2022. 1, 3, 11

work page internal anchor Pith review Pith/arXiv arXiv 2022

[21] [21]

Jonathan Ho, Ajay Jain, and P. Abbeel. Denoising diffusion probabilistic models.ArXiv, abs/2006.11239, 2020. 2

work page internal anchor Pith review Pith/arXiv arXiv 2006

[22] [22]

Stage-wise dynamics of classifier-free guidance in diffusion models, 2025

Cheng Jin, Qitan Shi, and Yuantao Gu. Stage-wise dynamics of classifier-free guidance in diffusion models, 2025. 3, 12

work page 2025

[23] [23]

Guiding a diffusion model with a bad version of itself

Tero Karras, Miika Aittala, Tuomas Kynkäänniemi, Jaakko Lehtinen, Timo Aila, and Samuli Laine. Guiding a diffusion model with a bad version of itself. InProc. NeurIPS, 2024. 2, 7, 28

work page 2024

[24] [24]

Analyzing and improving the training dynamics of diffusion models

Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, and Samuli Laine. Analyzing and improving the training dynamics of diffusion models. InProc. CVPR, 2024. 2, 6, 7, 28

work page 2024

[25] [25]

Improved precision and recall metric for assessing generative models.Advances in neural information processing systems, 32, 2019

Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved precision and recall metric for assessing generative models.Advances in neural information processing systems, 32, 2019. 6

work page 2019

[26] [26]

Applying guidance in a limited interval improves sample and distribution quality in diffusion models.Advances in Neural Information Processing Systems, 37:122458–122483, 2024

Tuomas Kynkäänniemi, Miika Aittala, Tero Karras, Samuli Laine, Timo Aila, and Jaakko Lehtinen. Applying guidance in a limited interval improves sample and distribution quality in diffusion models.Advances in Neural Information Processing Systems, 37:122458–122483, 2024. 1, 2, 3, 5, 6, 11, 12

work page 2024

[27] [27]

Flux.1 kontext: Flow matching for in-context image generation and editing in latent space, 2025

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Fred- eric Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context imag...

work page 2025

[28] [28]

Com- mon diffusion noise schedules and sample steps are flawed

Shanchuan Lin, Bingchen Liu, Jiashi Li, and Xiao Yang. Com- mon diffusion noise schedules and sample steps are flawed. InProceedings of the IEEE/CVF winter conference on appli- cations of computer vision, pages 5404–5411, 2024. 11

work page 2024

[29] [29]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 6

work page 2014

[30] [30]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maxi- milian Nickel, and Matt Le. Flow matching for generative modeling.ArXiv, abs/2210.02747, 2022. 2

work page internal anchor Pith review Pith/arXiv arXiv 2022

[31] [31]

Rectified Flow: A Marginal Preserving Approach to Optimal Transport

Qiang Liu. Rectified flow: A marginal preserving approach to optimal transport.ArXiv, abs/2209.14577, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[32] [32]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.ArXiv, abs/2209.03003, 2022. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2022

[33] [33]

Sit: Exploring flow and diffusion-based generative models with scalable in- terpolant transformers

Nanye Ma, Mark Goldstein, Michael S Albergo, Nicholas M Boffi, Eric Vanden-Eijnden, and Saining Xie. Sit: Exploring flow and diffusion-based generative models with scalable in- terpolant transformers. InEuropean Conference on Computer Vision, pages 23–40. Springer, 2024. 2, 6

work page 2024

[34] [34]

Classifier-free guidance with adap- tive scaling.arXiv preprint arXiv:2502.10574, 2025

Dawid Malarz, Artur Kasymov, Maciej Zi˛ eba, Jacek Tabor, and Przemysław Spurek. Classifier-free guidance with adap- tive scaling.arXiv preprint arXiv:2502.10574, 2025. 1, 6, 11, 27, 29

work page arXiv 2025

[35] [35]

Scalable Diffusion Models with Transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers.arXiv preprint arXiv:2212.09748, 2022. 2, 6, 26

work page internal anchor Pith review Pith/arXiv arXiv 2022

[36] [36]

Ge- oguide: Geometric guidance of diffusion models

Mateusz Poleski, Jacek Tabor, and Przemysław Spurek. Ge- oguide: Geometric guidance of diffusion models. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 297–305. IEEE, 2025. 11

work page 2025

[37] [37]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image genera- tion with clip latents.arXiv preprint arXiv:2204.06125, 1(2): 3, 2022. 1

work page internal anchor Pith review Pith/arXiv arXiv 2022

[38] [38]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 1, 2, 6, 7, 29

work page 2022

[39] [39]

Seyedmorteza Sadat, Manuel Kansy, Otmar Hilliges, and Romann M. Weber. No training, no problem: Rethinking classifier-free guidance for diffusion models, 2025. 3, 12

work page 2025

[40] [40]

Seyedmorteza Sadat, Tobias V ontobel, Farnood Salehi, and Romann M. Weber. Guidance in the frequency domain enables high-fidelity sampling at low cfg scales.ArXiv, abs/2506.19713, 2025. 1, 3, 11, 29

work page arXiv 2025

[41] [41]

Improved techniques for training gans.Advances in neural information processing systems, 29, 2016

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans.Advances in neural information processing systems, 29, 2016. 6

work page 2016

[42] [42]

Rethinking the spatial inconsistency in classifier- free diffusion guidance

Dazhong Shen, Guanglu Song, Zeyue Xue, Fu-Yun Wang, and Yu Liu. Rethinking the spatial inconsistency in classifier- free diffusion guidance. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9370–9379, 2024. 11

work page 2024

[43] [43]

Information theoretic learning for diffusion models with warm start

Yirong Shen, Lu GAN, and Cong Ling. Information theoretic learning for diffusion models with warm start. InThe Thirty- ninth Annual Conference on Neural Information Processing Systems, 2025. 1

work page 2025

[44] [44]

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Jascha Narain Sohl-Dickstein, Eric A. Weiss, Niru Ma- heswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics.ArXiv, abs/1503.03585, 2015. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2015

[45] [45]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.ArXiv, abs/2010.02502, 2020. 2

work page internal anchor Pith review Pith/arXiv arXiv 2010

[46] [46]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. InNeural Infor- mation Processing Systems, 2019

work page 2019

[47] [47]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Narain Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differ- ential equations.ArXiv, abs/2011.13456, 2020. 1, 2, 3, 4, 6

work page internal anchor Pith review Pith/arXiv arXiv 2011

[48] [48]

Gradcheck: Analyzing classifier guid- ance gradients for conditional diffusion sampling.arXiv preprint arXiv:2406.17399, 2024

Philipp Vaeth, Alexander M Fruehwald, Benjamin Paassen, and Magda Gregorova. Gradcheck: Analyzing classifier guid- ance gradients for conditional diffusion sampling.arXiv preprint arXiv:2406.17399, 2024. 1

work page arXiv 2024

[49] [49]

Rectified diffusion: Straightness is not your need in rectified flow.arXiv preprint arXiv:2410.07303,

Fu-Yun Wang, Ling Yang, Zhaoyang Huang, Mengdi Wang, and Hongsheng Li. Rectified diffusion: Straightness is not your need in rectified flow.arXiv preprint arXiv:2410.07303,

work page arXiv

[50] [50]

Diffusion-npo: Negative preference optimiza- tion for better preference aligned generation of diffusion mod- els.arXiv preprint arXiv:2505.11245, 2025

Fu-Yun Wang, Yunhao Shui, Jingtan Piao, Keqiang Sun, and Hongsheng Li. Diffusion-npo: Negative preference optimiza- tion for better preference aligned generation of diffusion mod- els.arXiv preprint arXiv:2505.11245, 2025. 11

work page arXiv 2025

[51] [51]

Analysis of classifier-free guidance weight schedulers

Xi Wang, Nicolas Dufour, Nefeli Andreou, Marie-Paule Cani, Victoria Fernández Abrevaya, David Picard, and Vicky Kalo- geiton. Analysis of classifier-free guidance weight schedulers. arXiv preprint arXiv:2404.13040, 2024. 3, 11, 26

work page arXiv 2024

[52] [52]

Harmonyview: Harmonizing consis- tency and diversity in one-image-to-3d

Sangmin Woo, Byeongjun Park, Hyojun Go, Jin-Young Kim, and Changick Kim. Harmonyview: Harmonizing consis- tency and diversity in one-image-to-3d. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10574–10584, 2024. 1

work page 2024

[53] [53]

Efficient training-free high-resolution syn- thesis with energy rectification in diffusion models, 2025

Zhen Yang, Guibao Shen, Minyang Li, Liang Hou, Mushui Liu, Luozhou Wang, Xin Tao, Pengfei Wan, Di Zhang, and Ying-Cong Chen. Efficient training-free high-resolution syn- thesis with energy rectification in diffusion models, 2025. 12

work page 2025

[54] [54]

Tfg: Unified training-free guidance for diffusion models

Haotian Ye, Haowei Lin, Jiaqi Han, Minkai Xu, Sheng Liu, Yitao Liang, Jianzhu Ma, James Zou, and Stefano Ermon. Tfg: Unified training-free guidance for diffusion models. 2024. 1, 12

work page 2024

[55] [55]

Representa- tion alignment for generation: Training diffusion transformers is easier than you think

Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, and Saining Xie. Representa- tion alignment for generation: Training diffusion transformers is easier than you think. InInternational Conference on Learning Representations, 2025. 6, 29

work page 2025

[56] [56]

Bidirectional beta-tuned diffusion model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 48(1):359–373, 2026

Tianyi Zheng, Jiayang Zou, Peng-Tao Jiang, Hao Zhang, Jin- wei Chen, Jia Wang, and Bo Li. Bidirectional beta-tuned diffusion model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 48(1):359–373, 2026. 1

work page 2026

[57] [57]

local-in-space

Shangwen Zhu, Qianyu Peng, Yuting Hu, Zhantao Yang, Han Zhang, Zhao Pu, Ruili Feng, and Fan Cheng. Raag: Ratio aware adaptive guidance, 2025. 1, 3, 12, 27 Appendix Overview This appendix provides additional details and supplementary results to support the main paper. In Section A, we review related literature to place our work in a broader context. Sectio...

work page arXiv 2025