pith. machine review for the scientific record. sign in

arxiv: 2605.07950 · v1 · submitted 2026-05-08 · 💻 cs.LG

Recognition: no theorem link

Slowly Annealed Langevin Dynamics: Theory and Applications to Training-Free Guided Generation

Atsushi Nitanda, Dake Bu, Tanya Veeravalli, Yueming Lyu

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:49 UTC · model grok-4.3

classification 💻 cs.LG
keywords Langevin dynamicsannealed samplingguided generationdiffusion modelsconvergence guaranteestraining-free methodsKL divergence
0
0 comments X

The pith

Slow annealing of Langevin dynamics improves tracking of evolving targets with provable convergence

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that by introducing a slowdown in the annealing process of Langevin dynamics, the sampler can more accurately follow a sequence of changing target distributions, leading to better approximation of the final target. This is shown through non-asymptotic bounds derived from a KL differential inequality that highlights how slowdown contracts the distance to intermediate targets and simplifies the path complexity. The approach is particularly useful for training-free guided generation in diffusion models, where an extension called Velocity-Aware SALD incorporates the pretrained model's marginals to correct for guidance-induced deviations. A reader would care because it provides theoretical grounding for why and how to adjust sampling speed in generative models without needing to retrain them.

Core claim

Slowly Annealed Langevin Dynamics tracks moving target distributions more effectively by time slowdown, establishing non-asymptotic convergence guarantees via a KL differential inequality; Velocity-Aware SALD applies this to correct additional bias from guidance in pretrained score-based generative models, yielding a principled training-free framework with convergence analysis.

What carries the argument

The slowdown mechanism in SALD, which contracts intermediate targets and reduces the complexity of the annealing path to improve tracking accuracy.

If this is right

  • Non-asymptotic convergence guarantees are provided for the slowed dynamics through the KL inequality.
  • Slowdown improves tracking by contracting intermediate targets and lowering path complexity.
  • VA-SALD corrects guidance deviations using the underlying marginal distributions of the pretrained model.
  • The framework applies to diffusion-based and related generative models with clarified roles for functional inequalities and bias.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The theory could inspire adaptive slowdown schedules based on estimated path complexity in other sampling algorithms.
  • It suggests that similar slowing techniques might enhance performance in non-Langevin stochastic processes for distribution tracking.
  • Practical implementations may allow for more efficient conditional sampling in large-scale generative tasks without additional model training.

Load-bearing premise

The KL differential inequality and contraction properties require the score functions and guidance terms to meet certain smoothness and Lipschitz conditions.

What would settle it

Observing no reduction in the KL divergence or no improvement in generated sample quality when increasing the slowdown factor on a guided diffusion task would challenge the central claim.

Figures

Figures reproduced from arXiv: 2605.07950 by Atsushi Nitanda, Dake Bu, Tanya Veeravalli, Yueming Lyu.

Figure 1
Figure 1. Figure 1: Synthetic guided VP experiments. We compare SALD, VA-SALD, and DOIT on two guided VP [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Guided generation on "lion" with guidance of black-box reward functions based on neural Aesthetic [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Reverse-indexed VP marginal family for the unguided two-Gaussian experiment. The forward VP diffusion contracts the mixture means toward the origin, so the reverse initialization is close to a centered Gaussian while the terminal law remains bimodal. 11.1.1 Two-moons guided two-Gaussian VP diffusion. The first guided task starts from a two-component Gaussian data distribution and uses a two-moons guide. Th… view at source ↗
Figure 4
Figure 4. Figure 4: Unguided SALD overview. The KL trajectory and terminal KL confirm that increasing the slow-down budget reduces the mismatch to the terminal two-Gaussian target until the finite-particle and discretization floor dominates. DOIT [35] adaptation and budget matching. We adapt DOIT [35] (Opensource at DOIT) as a training-free local approximation to the Doob h-transform. For a base reverse transition density ϕθ(… view at source ↗
Figure 5
Figure 5. Figure 5: Unguided SALD terminal samples. Larger budgets produce visibly sharper recovery of the two target modes. The corresponding local Doob direction is gbDoob = X M m=1 wm p zm ηDOITβ(T − tk) , (85) and the adapted update is Xk+1 = Xk + ηDOIT (bbase(Xk, tk) + γβ(T − tk)gbDoob) + p ηDOITβ(T − tk) ξk. (86) Since DOIT spends M proposal evaluations per particle and step, a fair particle-step comparison uses approxi… view at source ↗
Figure 6
Figure 6. Figure 6: SALD on the two-moons guided two-Gaussian task. SALD follows the expected budget trend: larger r improves the terminal-target KL, with the curve eventually flattening near the Monte Carlo floor. The guide penalizes a chosen set P of modes. We use the four left-half modes, i.e. those with negative first coordinate, and define fP (x) = λ X j∈P exp − ∥x − cj∥ 2 2ℓ 2 f ! , (91) with gradient ∇fP (x) = − λ ℓ 2 … view at source ↗
Figure 7
Figure 7. Figure 7: SALD samples for the two-moons guided task. Terminal samples increasingly align with the two-moons guided target as the budget increases [PITH_FULL_IMAGE:figures/full_fig_p041_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: VA-SALD on the two-moons guided task. Incorporating the VP velocity field yields low terminal KL across the tested budgets and a stable guidance-objective trajectory [PITH_FULL_IMAGE:figures/full_fig_p042_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: VA-SALD samples for the two-moons guided task. The sample panels show that VA-SALD reaches the guided terminal geometry at small and large budgets. 42 [PITH_FULL_IMAGE:figures/full_fig_p042_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: DOIT on the two-moons guided task. DOIT uses the same VP reverse SDE base sampler and the same total proposal budget, but its local Doob correction does not close the terminal KL gap as effectively as SALD or VA-SALD. 43 [PITH_FULL_IMAGE:figures/full_fig_p043_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Eight-Gaussian guided terminal target. Blue markers denote unpenalized modes, and orange crosses denote penalized modes. Contours show the base and guided terminal densities. 44 [PITH_FULL_IMAGE:figures/full_fig_p044_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: SALD on the eight-Gaussian mode-penalty task. SALD reduces the terminal KL as r increases and then approaches a plateau. 45 [PITH_FULL_IMAGE:figures/full_fig_p045_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: VA-SALD on the eight-Gaussian mode-penalty task. The terminal KL is low and nearly budget-insensitive, indicating that the velocity-aware reverse dynamics already align well with this guided target. 46 [PITH_FULL_IMAGE:figures/full_fig_p046_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: DOIT on the eight-Gaussian mode-penalty task. The local proposal-based Doob correction improves the guide-related statistic but leaves a larger terminal KL than SALD and VA-SALD at the same computational budgets. 47 [PITH_FULL_IMAGE:figures/full_fig_p047_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Generated Images on prompt "A glowing dragon soaring through floating islands, leaving behind a trail [PITH_FULL_IMAGE:figures/full_fig_p052_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Generated Images on prompt "In the ink painting style, a naive giant panda is sitting on the majestic [PITH_FULL_IMAGE:figures/full_fig_p053_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Generated Images on prompt "The art deco music festival poster has a fox made of polished brass in [PITH_FULL_IMAGE:figures/full_fig_p054_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Generated Images on prompt "quick doodle of a guy, medium hair with long bangs, hd detailed detailed" [PITH_FULL_IMAGE:figures/full_fig_p055_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Guided generation on "bear" with guidance parameter [PITH_FULL_IMAGE:figures/full_fig_p056_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Guided generation on "bear" with guidance parameter [PITH_FULL_IMAGE:figures/full_fig_p057_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Guided generation on "wolf" with guidance parameter [PITH_FULL_IMAGE:figures/full_fig_p057_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Guided generation on "wolf" with guidance parameter [PITH_FULL_IMAGE:figures/full_fig_p058_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Guided generation on "hippo" with guidance parameter [PITH_FULL_IMAGE:figures/full_fig_p058_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Guided generation on "hippo" with guidance parameter [PITH_FULL_IMAGE:figures/full_fig_p059_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Guided generation on "lion" with guidance parameter [PITH_FULL_IMAGE:figures/full_fig_p059_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Guided generation on "lion" with guidance parameter [PITH_FULL_IMAGE:figures/full_fig_p060_26.png] view at source ↗
read the original abstract

We study Slowly Annealed Langevin Dynamics (SALD), a sampler for tracking a path of moving target distributions and approximating the terminal target through time slowdown. We establish non-asymptotic convergence guarantees via a KL differential inequality, showing that slowdown improves tracking through contraction of intermediate targets and the complexity of the path. Motivated by training-free guided generation with pretrained score-based generative models, we further introduce Velocity-Aware SALD (VA-SALD), which explicitly incorporates the underlying marginal distributions of the pretrained model and uses slowdown to correct the additional deviation induced by guidance. This yields a principled framework for training-free guided generation for diffusion-based and related generative model families, together with convergence guarantees that clarify the roles of intermediate functional inequalities and guidance bias. Code is available at https://github.com/anitan0925/sald.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes Slowly Annealed Langevin Dynamics (SALD) to track a path of moving target distributions by rescaling time to slow the dynamics, and derives non-asymptotic convergence guarantees via a KL differential inequality showing that slowdown contracts the effective path complexity and improves tracking of intermediate targets. It introduces Velocity-Aware SALD (VA-SALD) that incorporates the marginal distributions of a pretrained score-based model to offset the deviation from guidance, yielding a training-free framework for guided generation together with convergence guarantees that clarify the roles of functional inequalities and guidance bias.

Significance. If the non-asymptotic bounds hold under the stated conditions, the work supplies a principled, training-free mechanism for correcting guidance bias in diffusion models and related families, together with explicit dependence of the tracking error on the slowdown schedule. The open-source code supports reproducibility of the empirical claims.

major comments (3)
  1. [§3 (KL differential inequality)] The KL differential inequality (abstract and §3) is stated as d/dt KL(ρ_t || π_t) ≤ -c(t) KL(ρ_t || π_t) + error(t). For slowdown to guarantee contraction, the integrated error must be controlled uniformly in the slowed time variable; the derivation must therefore bound the Lipschitz constant of the guidance drift (and of the time-dependent score) independently of the slowdown factor. The manuscript does not exhibit this uniform bound for the VA-SALD case where the guidance weight can induce time-dependent growth near t=0.
  2. [§4 (VA-SALD definition and guarantees)] In the VA-SALD construction (§4), the velocity-aware correction term is added to the drift. The proof that this term does not inflate the Lipschitz constant of the overall vector field beyond what the slowdown schedule can compensate is missing; without an explicit estimate relating the guidance weight, the marginal velocity, and the resulting Lip constant, the claimed improvement in tracking error cannot be verified.
  3. [Theorem 3.1 / Assumption list] The non-asymptotic guarantee is obtained by integrating the differential inequality along the slowed path. The manuscript must state the precise smoothness/Lipschitz assumptions on the pretrained score and on the guidance function that make the error term integrable; these assumptions are only implicit in the abstract and are not listed as a numbered assumption set before the main theorem.
minor comments (2)
  1. [§2] Notation for the time-dependent target π_t and the slowed process should be introduced once in §2 and used consistently; the current alternation between “intermediate targets” and “moving targets” is occasionally ambiguous.
  2. [§3] The statement that slowdown “contracts the complexity of the path” would benefit from a short remark clarifying whether this contraction is measured in Wasserstein distance, in total variation of the score, or solely through the integrated Lip constant appearing in the error term.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We sincerely thank the referee for the thorough review and valuable feedback on our work. The comments have helped us identify areas where the presentation and proofs can be strengthened. We address each major comment below and outline the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [§3 (KL differential inequality)] The KL differential inequality (abstract and §3) is stated as d/dt KL(ρ_t || π_t) ≤ -c(t) KL(ρ_t || π_t) + error(t). For slowdown to guarantee contraction, the integrated error must be controlled uniformly in the slowed time variable; the derivation must therefore bound the Lipschitz constant of the guidance drift (and of the time-dependent score) independently of the slowdown factor. The manuscript does not exhibit this uniform bound for the VA-SALD case where the guidance weight can induce time-dependent growth near t=0.

    Authors: We agree with the referee that a uniform bound on the Lipschitz constants, independent of the slowdown factor, is essential to ensure the error term integrates properly in the slowed time. While the original derivation relied on the regularity of the score and guidance to control growth, we did not explicitly demonstrate the uniformity for VA-SALD near t=0. In the revised manuscript, we will introduce an additional lemma establishing that the Lipschitz constant of the combined drift (including the velocity-aware correction) is bounded uniformly with respect to the slowdown parameter. This bound will leverage the properties of the pretrained marginals and ensure the contraction holds as claimed. revision: yes

  2. Referee: [§4 (VA-SALD definition and guarantees)] In the VA-SALD construction (§4), the velocity-aware correction term is added to the drift. The proof that this term does not inflate the Lipschitz constant of the overall vector field beyond what the slowdown schedule can compensate is missing; without an explicit estimate relating the guidance weight, the marginal velocity, and the resulting Lip constant, the claimed improvement in tracking error cannot be verified.

    Authors: Thank you for highlighting this gap. The original manuscript did not provide the explicit estimate for how the velocity-aware correction affects the overall Lipschitz constant. We will add this estimate in the revised version, showing that the Lipschitz constant of the correction term is proportional to the guidance weight and the supremum norm of the marginal velocity. We will then demonstrate that the slowdown schedule can be selected to ensure the contraction dominates any such inflation, thereby verifying the improvement in tracking error. revision: yes

  3. Referee: [Theorem 3.1 / Assumption list] The non-asymptotic guarantee is obtained by integrating the differential inequality along the slowed path. The manuscript must state the precise smoothness/Lipschitz assumptions on the pretrained score and on the guidance function that make the error term integrable; these assumptions are only implicit in the abstract and are not listed as a numbered assumption set before the main theorem.

    Authors: We acknowledge that the assumptions were presented implicitly rather than as an explicit numbered list. In the revised manuscript, we will add a dedicated subsection or paragraph immediately preceding Theorem 3.1 that lists the precise assumptions on the smoothness and Lipschitz continuity of the pretrained score function and the guidance function. This will clarify the conditions under which the error term is integrable and the non-asymptotic bound holds. revision: yes

Circularity Check

0 steps flagged

No significant circularity: KL differential inequality derivation is analytically self-contained

full rationale

The paper's core claim derives non-asymptotic convergence for SALD and VA-SALD from a KL differential inequality d/dt KL(ρ_t || π_t) ≤ -c(t) KL + error, with slowdown rescaling time to contract the path. This is a standard functional-inequality argument under smoothness/Lipschitz assumptions on scores and guidance; the contraction and error bounds are obtained from the dynamics rather than by redefining targets in terms of the inequality itself or by fitting parameters to the same data. No self-citation chains, ansatzes smuggled via prior work, or renaming of known results appear as load-bearing steps. The VA-SALD correction for guidance bias is explicitly constructed from pretrained marginals and does not reduce the guarantee to a tautology. The derivation therefore remains independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; full manuscript would be required to audit the KL inequality assumptions and any Lipschitz or smoothness conditions.

pith-pipeline@v0.9.0 · 5444 in / 1078 out tokens · 30501 ms · 2026-05-11T02:49:29.175899+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 5 internal anchors

  1. [1]

    Springer, 2005

    Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré.Gradient flows: in metric spaces and in the space of probability measures. Springer, 2005

  2. [2]

    Dimension-free multimodal sampling via preconditioned annealed langevin dynamics.arXiv preprint arXiv:2602.01449, 2026

    Lorenzo Baldassari, Josselin Garnier, Knut Solna, and Maarten V de Hoop. Dimension-free multimodal sampling via preconditioned annealed langevin dynamics.arXiv preprint arXiv:2602.01449, 2026

  3. [3]

    Oxford University Press, 2013

    Stéphane Boucheron, Gábor Lugosi, and Pascal Massart.Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013

  4. [4]

    Classifier-free guidance is a predictor-corrector.Transactions on Machine Learning Research, 2025

    Arwen Bradley and Preetum Nakkiran. Classifier-free guidance is a predictor-corrector.Transactions on Machine Learning Research, 2025

  5. [5]

    Diffusion annealed langevin dynamics: a theoretical study.arXiv preprint arXiv:2511.10406, 2025

    Patrick Cattiaux, Paula Cordero-Encinar, and Arnaud Guillin. Diffusion annealed langevin dynamics: a theoretical study.arXiv preprint arXiv:2511.10406, 2025

  6. [6]

    Inference-time alignment for diffusion models via variationally stable doob’s matching.arXiv preprint arXiv:2601.06514, 2026

    Jinyuan Chang, Chenguang Duan, Yuling Jiao, Yi Xu, and Jerry Zhijian Yang. Inference-time alignment for diffusion models via variationally stable doob’s matching.arXiv preprint arXiv:2601.06514, 2026. 10

  7. [7]

    Provable convergence and limitations of geometric tempering for langevin dynamics

    Omar Chehab, Anna Korba, Austin Stromme, and Adrien Vacher. Provable convergence and limitations of geometric tempering for langevin dynamics. InThe Thirteenth International Conference on Learning Representations, 2025

  8. [8]

    An overview of diffusion models: Ap- plications, guided generation, statistical rates and optimization.arXiv preprint arXiv:2404.07771,

    Minshuo Chen, Song Mei, Jianqing Fan, and Mengdi Wang. An overview of diffusion models: Applications, guided generation, statistical rates and optimization.arXiv preprint arXiv:2404.07771, 2024

  9. [9]

    Non-asymptotic analysis of diffusion annealed langevin monte carlo for generative modelling.arXiv preprint arXiv:2502.09306, 2025

    Paula Cordero-Encinar, O Deniz Akyildiz, and Andrew B Duncan. Non-asymptotic analysis of diffusion annealed langevin monte carlo for generative modelling.arXiv preprint arXiv:2502.09306, 2025

  10. [10]

    Scaling rectified flow transformers for high-resolution image synthesis

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024

  11. [11]

    arXiv preprint arXiv:1812.00793 , year=

    Rong Ge, Holden Lee, and Andrej Risteski. Simulated tempering langevin monte carlo ii: An improved proof using soft markov chain decomposition.arXiv preprint arXiv:1812.00793, 2018

  12. [12]

    Mean Flows for One-step Generative Modeling

    Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling.arXiv preprint arXiv:2505.13447, 2025

  13. [13]

    Provable benefit of annealed langevin monte carlo for non-log-concave sampling

    Wei Guo, Molei Tao, and Yongxin Chen. Provable benefit of annealed langevin monte carlo for non-log-concave sampling. InThe Thirteenth International Conference on Learning Representations, 2025

  14. [14]

    Training-free guidance beyond differentiability: Scalable path steering with tree search in diffusion and flow models.arXiv preprint arXiv:2502.11420, 2025

    Yingqing Guo, Yukang Yang, Hui Yuan, and Mengdi Wang. Training-free guidance beyond differentiability: Scalable path steering with tree search in diffusion and flow models.arXiv preprint arXiv:2502.11420, 2025

  15. [15]

    Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

  16. [16]

    Classifier-Free Diffusion Guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

  17. [17]

    arXiv preprint arXiv:2512.04985 , year =

    Yuchen Jiao, Yuxin Chen, and Gen Li. Towards a unified framework for guided diffusion models.arXiv preprint arXiv:2512.04985, 2025

  18. [18]

    Direct distributional optimization for provablealignmentofdiffusionmodels

    Ryotaro Kawata, Kazusato Oko, Atsushi Nitanda, and Taiji Suzuki. Direct distributional optimization for provablealignmentofdiffusionmodels. InThe Thirteenth International Conference on Learning Representations, 2025

  19. [19]

    Test-time alignment of diffusion models without reward over-optimization

    Sunwoo Kim, Minkyu Kim, and Dongmin Park. Test-time alignment of diffusion models without reward over-optimization. InThe Thirteenth International Conference on Learning Representations, 2025

  20. [20]

    Convergence for score-based generative modeling with polynomial complexity.Advances in Neural Information Processing Systems 35, 35:22870–22882, 2022

    Holden Lee, Jianfeng Lu, and Yixin Tan. Convergence for score-based generative modeling with polynomial complexity.Advances in Neural Information Processing Systems 35, 35:22870–22882, 2022

  21. [21]

    Dynamic search for inference-time alignment in diffusion models, 2025

    Xiner Li, Masatoshi Uehara, Xingyu Su, Gabriele Scalia, Tommaso Biancalani, Aviv Regev, Sergey Levine, and Shuiwang Ji. Dynamic search for inference-time alignment in diffusion models.arXiv preprint arXiv:2503.02039, 2025

  22. [22]

    Flow Matching for Generative Modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

  23. [23]

    Alignment of diffusion models: Fundamentals, challenges, and future.ACM Computing Surveys, 58(9):1–37, 2026

    Buhua Liu, Shitong Shao, Bao Li, Lichen Bai, Zhiqiang Xu, Haoyi Xiong, James T Kwok, Sumi Helal, and Zeke Xie. Alignment of diffusion models: Fundamentals, challenges, and future.ACM Computing Surveys, 58(9):1–37, 2026

  24. [24]

    Flow-GRPO: Training Flow Matching Models via Online RL

    Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl.arXiv preprint arXiv:2505.05470, 2025

  25. [25]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022. 11

  26. [26]

    Covariance-adaptive sequential black-box optimization for diffusion targeted generation.arXiv preprint arXiv:2406.00812, 2024

    Yueming Lyu, Kim Yong Tan, Yew Soon Ong, and Ivor W Tsang. Covariance-adaptive sequential black-box optimization for diffusion targeted generation.arXiv preprint arXiv:2406.00812, 2024

  27. [27]

    Black-box optimizer with implicit natural gradient.arXiv preprint arXiv:1910.04301, 2019

    Yueming Lyu and Ivor W Tsang. Black-box optimizer with implicit natural gradient.arXiv preprint arXiv:1910.04301, 2019

  28. [28]

    Random gradient-free minimization of convex functions.Foundations of Computational Mathematics, 17(2):527–566, 2017

    Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions.Foundations of Computational Mathematics, 17(2):527–566, 2017

  29. [29]

    Generative modeling by estimating gradients of the data distribution.Advances in neural information processing systems, 32, 2019

    Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution.Advances in neural information processing systems, 32, 2019

  30. [30]

    Score- based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations. InThe Ninth International Conference on Learning Representations, 2021

  31. [31]

    A stochastic analysis approach to conditional diffusion guidance.Columbia University Preprint, 2024

    Wenpin Tang and Renyuan Xu. A stochastic analysis approach to conditional diffusion guidance.Columbia University Preprint, 2024

  32. [32]

    Evolvable conditional diffusion.arXiv preprint arXiv:2506.13834, 2025

    Zhao Wei, Chin Chun Ooi, Abhishek Gupta, Jian Cheng Wong, Pao-Hsiung Chiu, Sheares Xue Wen Toh, and Yew-Soon Ong. Evolvable conditional diffusion.arXiv preprint arXiv:2506.13834, 2025

  33. [33]

    Posterior sampling by combining diffusion models with annealed langevin dynamics.Advances in Neural Information Processing Systems 38, 2025

    Zhiyang Xun, Shivam Gupta, and Eric Price. Posterior sampling by combining diffusion models with annealed langevin dynamics.Advances in Neural Information Processing Systems 38, 2025

  34. [34]

    TFG: Unified training-free guidance for diffusion models

    Haotian Ye, Haowei Lin, Jiaqi Han, Minkai Xu, Sheng Liu, Yitao Liang, Jianzhu Ma, James Zou, and Stefano Ermon. TFG: Unified training-free guidance for diffusion models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

  35. [35]

    Training-free adaptation of diffusion models via doob’sh-transform.arXiv preprint arXiv:2602.16198, 2026

    Qijie Zhu, Zeqi Ye, Han Liu, Zhaoran Wang, and Minshuo Chen. Training-free adaptation of diffusion models via doob’sh-transform.arXiv preprint arXiv:2602.16198, 2026. 12 Table of Contents 1 Introduction 1 2 Problem Setup and Methods 2 2.1 Slowly Annealed Langevin Dynamics [13] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Velocity-...

  36. [36]

    tackles the problem of supervised fine-tuning with endogenous conditioning, in which they learn theh-function through a martingale-covariation loss. In the alignment for diffusion models in the Direct Preference Optimization (DPO) setting, a recent work [18] formulates a distributional optimization problem and subsequently enable sampling from the learned...

  37. [37]

    There exist constantLπ,space >0such that for allx, y∈R d,t∈[0, T] ∥∇logπ t(x)− ∇logπ t(y)∥ ≤L π,space∥x−y∥

  38. [38]

    There exist a measurable functionM and constants Lπ,time > 0such that for every s < s ′, (s, s′ ∈ [0, S]), and everyx∈R d, ∥∇logπ t(s′)(x)− ∇logπ t(s)(x)∥ ≤L π,time(s′ −s)(1 +M(x))

  39. [39]

    There existsα′ 0 >0such that Eα′ 0(πt,∇logπ t)<+∞,E α′ 0(πt,1 +M)<+∞,∀t∈[0, T]

  40. [40]

    We assumeη2L2 π,space < 1 8. Then, for everyα′ ∈(0, α ′ 0]and everys∈[s k, sk+1], − Z ˆρs⟨δπt(s)(x), As⟩dx≤ 1 4FI(ˆρs∥˜πs) + 2η2α′−1Γ(t(s))KL(ˆρs∥˜πs) + 2η∆(t(s)).(21) Here, Γ(t) := 2L2 π,time + 64L2 π,space ˙s(t)−2 + 1 +η 2L2 π,time , ∆(t) := 64ηL2 π,spaceEα′(πt,∇logπ t) + 2ηL2 π,time(1 + 32η2L2 π,space)Eα′(πt,1 +M) + 16dL 2 π,space. We here present the ...

  41. [41]

    There exist constantsLc,space, Lπ,space >0such that for allx, y∈R d,t∈[0, T] ∥ct(x)−c t(y)∥ ≤L c,space∥x−y∥,(45) ∥∇logπ t(x)− ∇logπ t(y)∥ ≤L π,space∥x−y∥.(46)

  42. [42]

    There exist a measurable functionM and constants Lc,time, Lπ,time > 0such that for every s < s ′, (s, s′ ∈ [0, S]), and everyx∈R d, ∥ct(s′)(x)−c t(s)(x)∥ ≤L c,time(s′ −s)(1 +M(x)),(47) ∥∇logπ t(s′)(x)− ∇logπ t(s)(x)∥ ≤L π,time(s′ −s)(1 +M(x)).(48)

  43. [43]

    There existsα′ 0 >0such that Eα′ 0(πt, ct)<+∞,E α′ 0(πt,∇logπ t)<+∞,E α′ 0(πt,1 +M)<+∞,∀t∈[0, T]

  44. [44]

    The art deco music festival poster has a fox made of polished brass in the center of the picture, with smooth lines and elegant posture

    We assume 4η2 ˙t(s)Lc,space + ση(t(s))2 2 Lπ,space 2 < 1 2 . Then, for everyα′ ∈(0, α ′ 0]and everys∈[s k, sk+1], − Z ˆρs⟨δVA πt(s)(x), As⟩dx≤ ση(t(s))2 8 FI(ˆρs∥˜πs) + 2η2α′−1Γ(t(s))KL(ˆρs∥˜πs) + 2η∆(t(s)). (49) Here Γ(t) :=σ η(t)−2 ( 4 ˙s(t)−2L2 c,time +σ η(t)4L2 π,time + 8 4 ˙s(t)−2L2 c,space +σ η(t)4L2 π,space 4 ˙s(t)−2 +σ η(t)4 +η 2 4 ˙s(t)−2L2 c,tim...