pith. machine review for the scientific record. sign in

arxiv: 2605.12231 · v1 · submitted 2026-05-12 · 🧮 math.OC

Recognition: 2 theorem links

· Lean Theorem

Geometric Asymptotics of Score Mixing and Guidance in Diffusion Models

Enrique Zuazua, Kang Liu

Pith reviewed 2026-05-13 03:57 UTC · model grok-4.3

classification 🧮 math.OC
keywords score mixingdiffusion modelsgeometric potentialLaplace-Varadhan principlesubgradient inclusionguidanceheat flowVoronoi partition
0
0 comments X

The pith

Mixed-score guidance in diffusion models reduces asymptotically to dynamics on a geometric potential of squared distances to data supports.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that mixing two score fields with weight λ produces small-time generation trajectories whose effective driving force is the explicit potential Φ_λ equal to λ times the squared distance to the first support plus (1-λ) times the squared distance to the second support. This geometric object is independent of the detailed shape of the measures and governs the flow uniformly in both the mixture-of-experts range and the amplified-guidance range λ greater than one. By rescaling time in a similarity-invariant manner and applying the Laplace-Varadhan principle, the original singular non-autonomous equation is replaced by an autonomous Clarke subgradient inclusion driven solely by this potential. In the special case of empirical Dirac mixtures the potential becomes piecewise quadratic on a Voronoi partition, which immediately yields convergence of all limiting paths to critical points together with an explicit convergence rate for the original flow.

Core claim

Exploiting a Laplace-Varadhan principle under a similarity-time rescaling, the small-time generation dynamics driven by the mixed score s = λ ∇log u1 + (1-λ) ∇log u2 is governed by the explicit geometric potential Φ_λ = λ d1² + (1-λ) d2², which depends only on the supports of the initial measures and on the mixing parameter. This gives a rigorous reduction from a singular, non-autonomous score-driven dynamics to autonomous Clarke-type subgradient inclusions. In the empirical setting of finite Dirac mixtures, the limiting potential is piecewise quadratic with a Voronoi-type structure, yielding convergence of all autonomous limiting trajectories to critical points and a conditional convergence

What carries the argument

The geometric potential Φ_λ = λ d₁² + (1-λ) d₂², which encodes the limiting small-time behavior of the mixed-score flow and permits its reduction to an autonomous subgradient inclusion.

If this is right

  • The mixed-score dynamics reduces to an autonomous Clarke subgradient inclusion driven by Φ_λ.
  • In the finite-Dirac case all limiting trajectories converge to critical points of the piecewise-quadratic potential.
  • The original generation flow converges to local minimizers of Φ_λ at rate O(√t) whenever the minimizer is smooth and stable.
  • The same geometric description covers both the mixture-of-experts regime 0 ≤ λ ≤ 1 and the classifier-free guidance regime λ > 1.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Choosing λ to sculpt the landscape of Φ_λ offers a direct way to steer sampling toward or away from particular supports without retraining scores.
  • The Voronoi structure of the empirical case suggests that guidance with more than two measures would induce a multi-class Voronoi partition whose geometry controls the flow.
  • Because the reduction is local in time, early-stopping or hybrid integrators that switch to the geometric flow after a short burn-in period become feasible.

Load-bearing premise

The two measures are heat evolutions of compactly supported probability measures, so the Laplace-Varadhan principle applies directly to the rescaled process.

What would settle it

Numerical trajectories generated by the mixed score for small times that deviate systematically from the gradient flow of Φ_λ, or an analytic counter-example in which the supports are compact yet the limiting dynamics is not captured by that potential.

Figures

Figures reproduced from arXiv: 2605.12231 by Enrique Zuazua, Kang Liu.

Figure 1
Figure 1. Figure 1: Non-differentiability sets associated with the empirical supports [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Vector fields of the limiting dynamics and the associated potential landscape. [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Local minimizers of Φλ in the MoE regime. As λ increases from 0 to 1, the minimizers trace a geometric interpolation from A2 toward A1. CFG regime λ > 1. In the CFG regime, Φλ(x) = λd1(x) 2 − (λ − 1)d2(x) 2 = min y1∈A1 max y2∈A2  λ∥x − y1∥ 2 − (λ − 1)∥x − y2∥ 2  . The potential is therefore no longer a cooperative average. It has an extrapolative, competitive structure: the dynamics is attracted toward t… view at source ↗
Figure 4
Figure 4. Figure 4: illustrates the local minimizers of Φλ in the CFG regime for the same finite Dirac￾mixture example. As λ increases, minimizers may approach the interface ND(A1, A2) and then evolve along it, reflecting the stronger role of nonsmooth geometry in the extrapolative regime [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Li–Yau, semiconcavity, and Hamilton–Jacobi structure for the rescaled logarithmic po [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: 1D case. Top row: stacked visualization of the log-product potential [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: 2D case. Backward generation trajectories in [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Backward deterministic generation trajectories driven by pure or mixed scores associated [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Real-data experiment on CIFAR-10 in pixel space. Here [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗
read the original abstract

Diffusion models are routinely guided in practice by combining multiple score fields, yet the mathematical structure of score mixing is still poorly understood. We study the small-time generation dynamics driven by mixed scores $$ s=\lambda\,\nabla\log u_1+(1-\lambda)\,\nabla\log u_2,\qquad \lambda\ge 0, $$ in the heat-flow framework, where $u_1,u_2$ are heat evolutions of two compactly supported probability measures. This single formulation covers both the mixture-of-experts regime $(0\leq \lambda\leq 1)$ and the classifier-free guidance regime $(\lambda>1)$. Exploiting a Laplace-Varadhan principle under a similarity-time rescaling, we show that the small-time generation dynamics is governed by the explicit geometric potential $$ \Phi_\lambda=\lambda d_1^2+(1-\lambda)d_2^2, $$ which depends only on the supports of the initial measures and on the mixing parameter. This gives a rigorous reduction from a singular, non-autonomous score-driven dynamics to autonomous Clarke-type subgradient inclusions. In the empirical setting of finite Dirac mixtures, the limiting potential is piecewise quadratic with a Voronoi-type structure; this rigidity yields convergence of all autonomous limiting trajectories to critical points and a conditional convergence criterion for the original generation flow toward local minimizers of the potential, with rate $\mathcal O(\sqrt t)$ in the smooth stable case.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript analyzes small-time generation dynamics in diffusion models driven by mixed scores s = λ ∇log u₁ + (1-λ) ∇log u₂ (λ ≥ 0), where u₁ and u₂ are heat evolutions of compactly supported probability measures. This covers both mixture-of-experts (0 ≤ λ ≤ 1) and classifier-free guidance (λ > 1). Exploiting a Laplace-Varadhan principle under similarity-time rescaling, the authors show that the dynamics reduce to an autonomous Clarke subgradient flow on the explicit geometric potential Φ_λ = λ d₁² + (1-λ) d₂², which depends only on the supports of the initial measures. For finite Dirac mixtures the limiting potential is piecewise quadratic with Voronoi structure; this yields convergence of all limiting trajectories to critical points and a conditional convergence criterion for the original flow toward local minimizers, with rate O(√t) in the smooth stable case.

Significance. If the central reduction holds, the work supplies a rigorous geometric foundation for score mixing and guidance, two techniques central to practical diffusion models. The explicit, support-dependent potential Φ_λ together with the reduction from a singular non-autonomous SDE to an autonomous subgradient inclusion is a clear strength; the compact-support hypothesis is used effectively to obtain coercivity and compact sublevel sets without extra tail conditions. The piecewise-quadratic Voronoi analysis for Dirac mixtures and the explicit convergence rate in the stable case are concrete, falsifiable contributions that could guide both theoretical analysis and empirical design of guided samplers.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (Laplace-Varadhan reduction): the claim of a rigorous passage from the non-autonomous score-driven SDE to the limiting Clarke inclusion is load-bearing, yet the manuscript provides no explicit error estimates or quantitative rate for the rescaled large-deviation approximation in the singular regime (λ > 1). Without these bounds it is difficult to confirm the precise scope of the compact-support assumption.
  2. [§4] §4 (Dirac-mixture case): the stated O(√t) convergence rate to local minimizers is given only for the smooth stable case; the manuscript should clarify whether this rate survives when the limiting potential Φ_λ has non-differentiable Voronoi edges, as these are the generic points for finite mixtures.
minor comments (3)
  1. [Abstract] The distance functions d₁ and d₂ (distances to the supports) are used throughout but never defined explicitly in the abstract or early sections; a short sentence recalling d_i(x) = inf_{y ∈ supp(μ_i)} |x - y| would improve readability.
  2. [§2] The term “similarity-time rescaling” is introduced without the precise change-of-variable formula; adding the explicit time substitution (e.g., τ = -log t or equivalent) in the first paragraph of §2 would help readers track the large-deviation scaling.
  3. [§3] Notation for the Clarke subdifferential ∂^C Φ_λ is used in the limiting inclusion but never contrasted with the classical gradient; a one-sentence reminder that the inclusion reduces to the gradient flow wherever Φ_λ is differentiable would clarify the statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and recommendation for minor revision. We respond point-by-point to the major comments below.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (Laplace-Varadhan reduction): the claim of a rigorous passage from the non-autonomous score-driven SDE to the limiting Clarke inclusion is load-bearing, yet the manuscript provides no explicit error estimates or quantitative rate for the rescaled large-deviation approximation in the singular regime (λ > 1). Without these bounds it is difficult to confirm the precise scope of the compact-support assumption.

    Authors: The Laplace-Varadhan principle under similarity-time rescaling yields a rigorous asymptotic reduction of the rescaled process to the autonomous Clarke subgradient inclusion as the time parameter tends to zero. The compact-support hypothesis is used precisely to guarantee the large-deviation principle and the coercivity of Φ_λ, without requiring extra tail conditions. Explicit quantitative error bounds for the approximation in the singular regime λ > 1 are not needed to establish the limiting geometric structure or the convergence statements for Dirac mixtures. We will add a clarifying paragraph in Section 3 on the sense of convergence (viscosity solutions) and note that rate estimates constitute an open direction. revision: partial

  2. Referee: [§4] §4 (Dirac-mixture case): the stated O(√t) convergence rate to local minimizers is given only for the smooth stable case; the manuscript should clarify whether this rate survives when the limiting potential Φ_λ has non-differentiable Voronoi edges, as these are the generic points for finite mixtures.

    Authors: The O(√t) rate is derived only when the trajectory approaches a point at which Φ_λ is locally C² with positive-definite Hessian (the smooth stable case). At generic non-differentiable Voronoi edges the subgradient inclusion still forces convergence to critical points, but the rate typically slows (e.g., O(t^{1/3}) near codimension-1 edges). We will revise Section 4 to restrict the stated rate explicitly to the smooth stable case and add a short remark on the non-smooth regime. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives the governing potential Φ_λ via direct application of the standard Laplace-Varadhan large-deviation principle to a similarity-time rescaling of the heat kernels generated by compactly supported initial measures. This produces an explicit expression Φ_λ = λ d₁² + (1-λ) d₂² constructed solely from the distance functions to the supports, without parameter fitting, self-referential definitions, or load-bearing self-citations. The reduction from the non-autonomous score-driven SDE to the autonomous Clarke subgradient inclusion follows from the asymptotic analysis and the coercivity ensured by compact support; all steps remain independent of the target result and rely on externally verifiable principles rather than circular inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The claim rests on the applicability of the Laplace-Varadhan principle to the rescaled mixed-score process and on the compact support of the initial measures; no free parameters are fitted and no new entities are postulated beyond the derived potential.

axioms (2)
  • domain assumption u1 and u2 are heat evolutions of compactly supported probability measures
    Stated explicitly in the abstract as the setting for the mixed score.
  • standard math Laplace-Varadhan principle applies under the similarity-time rescaling
    Invoked to obtain the explicit geometric potential from the large-deviation rate function.
invented entities (1)
  • geometric potential Φ_λ = λ d1² + (1-λ) d2² no independent evidence
    purpose: Governs the limiting small-time dynamics
    Derived quantity that reduces the original non-autonomous flow to an autonomous subgradient inclusion; no independent empirical evidence supplied.

pith-pipeline@v0.9.0 · 5552 in / 1525 out tokens · 56242 ms · 2026-05-13T03:57:22.063736+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 1 internal anchor

  1. [1]

    B. D. O. Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982

  2. [2]

    On theconvergenceofthe proximal algorithm fornonsmooth functions involving analytic features.Mathematical Programming, 116(1–2):5–16, 2009

    H.Attouchand J.Bolte. On theconvergenceofthe proximal algorithm fornonsmooth functions involving analytic features.Mathematical Programming, 116(1–2):5–16, 2009

  3. [3]

    Aubin and H

    J.-P. Aubin and H. Frankowska.Set-Valued Analysis. Systems & Control: Foundations & Applications. Birkhäuser Boston, 1990

  4. [4]

    Systems & Control: Foundations & Applications

    Martino Bardi and Italo Capuzzo-Dolcetta.Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Systems & Control: Foundations & Applications. Birkhäuser Boston, Boston, MA, 1997

  5. [5]

    Benaïm and M

    M. Benaïm and M. W. Hirsch. Asymptotic pseudotrajectories and chain recurrent flows, with applications.Journal of Dynamics and Differential Equations, 8(1):141–176, 1996

  6. [6]

    Benaïm, J

    M. Benaïm, J. Hofbauer, and S. Sorin. Stochastic approximations and differential inclusions. SIAM Journal on Control and Optimization, 44(1):328–348, 2005

  7. [7]

    C. M. Bender and S. A. Orszag.Advanced Mathematical Methods for Scientists and Engineers I: Asymptotic Methods and Perturbation Theory. Springer, 2013

  8. [8]

    Benton, V

    J. Benton, V. De Bortoli, A. Doucet, and G. Deligiannidis. Nearlyd-linear convergence bounds for diffusion models via stochastic localization. InInternational Conference on Learning Rep- resentations, 2024

  9. [9]

    C. M. Bishop.Neural Networks for Pattern Recognition. Oxford University Press, 1995

  10. [10]

    S. Chen, S. Chewi, J. Li, Y. Li, A. Salim, and A. R. Zhang. Sampling is as easy as learning the score: Theory for diffusion models with minimal data assumptions. InInternational Conference on Learning Representations, 2023

  11. [11]

    Chidambaram, K

    M. Chidambaram, K. Gatmiry, S. Chen, H. Lee, and J. Lu. What does guidance do? a fine- grained analysis in a simple setting. InAdvances in Neural Information Processing Systems, volume 37, pages 84968–85005, 2024

  12. [12]

    F. H. Clarke.Optimization and Nonsmooth Analysis. Wiley, 1989. 60

  13. [13]

    Conforti, A

    G. Conforti, A. Durmus, and M. Gentiloni Silveri. KL convergence guarantees for score diffusion models under minimal data assumptions.SIAM Journal on Mathematics of Data Science, 7(1):86–109, 2025

  14. [14]

    J. Cortés. Discontinuous dynamical systems: A tutorial on solutions, nonsmooth analysis, and stability.IEEE Control Systems Magazine, 28(3):36–73, 2008

  15. [15]

    Crandall, Hitoshi Ishii, and Pierre-Louis Lions

    Michael G. Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscosity solutions of second order partial differential equations.Bulletin of the American Mathematical Society, 27(1):1–67, 1992

  16. [16]

    Dembo and O

    A. Dembo and O. Zeitouni.Large Deviations Techniques and Applications, volume 38 ofAp- plications of Mathematics. Springer, New York, 2 edition, 1998

  17. [17]

    Dhariwal and A

    P. Dhariwal and A. Q. Nichol. Diffusion models beat GANs on image synthesis. InAdvances in Neural Information Processing Systems, volume 34, pages 8780–8794, 2021

  18. [18]

    Fukunaga and L

    K. Fukunaga and L. D. Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition.IEEE Transactions on Information Theory, 21(1):32–40, 1975

  19. [19]

    V. A. Galaktionov and J. L. Vázquez. Asymptotic behaviour of nonlinear parabolic equa- tions with critical exponents: A dynamical systems approach.Journal of Functional Analysis, 100(2):435–462, 1991

  20. [20]

    V. A. Galaktionov and J. L. Vázquez.A Stability Technique for Evolution Partial Differential Equations: A Dynamical Systems Approach, volume 56 ofProgress in Nonlinear Differential Equations and Their Applications. Birkhäuser, 2004

  21. [21]

    I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems, volume 27, 2014

  22. [22]

    J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020

  23. [23]

    Ho and T

    J. Ho and T. Salimans. Classifier-free diffusion guidance. InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications. OpenReview.net, 2021

  24. [24]

    Estimationofnon-normalizedstatisticalmodelsbyscorematching

    A.HyvärinenandP.Dayan. Estimationofnon-normalizedstatisticalmodelsbyscorematching. Journal of Machine Learning Research, 6(4):695–709, 2005

  25. [25]

    Springer, 2 edition, 1991

    I.KaratzasandS.E.Shreve.Brownian Motion and Stochastic Calculus, volume113ofGraduate Texts in Mathematics. Springer, 2 edition, 1991

  26. [26]

    Karras, M

    T. Karras, M. Aittala, T. Kynkäänniemi, J. Lehtinen, T. Aila, and S. Laine. Guiding a diffusion model with a bad version of itself. InAdvances in Neural Information Processing Systems, volume 37, 2024

  27. [27]

    D. P. Kingma and Y. LeCun. Regularized estimation of image statistics by score matching. In Advances in Neural Information Processing Systems, volume 23, 2010

  28. [28]

    D. P. Kingma and M. Welling. Auto-encoding variational Bayes. InInternational Conference on Learning Representations, 2014. 61

  29. [29]

    Klartag and O

    B. Klartag and O. Ordentlich. The strong data processing inequality under the heat flow.IEEE Transactions on Information Theory, 2025

  30. [30]

    H. Lee, J. Lu, and Y. Tan. Convergence for score-based generative modeling with polynomial complexity. InAdvances in Neural Information Processing Systems, volume 35, pages 22870– 22882, 2022

  31. [31]

    Li and S.-T

    P. Li and S.-T. Yau. On the parabolic kernel of the Schrödinger operator.Acta Mathematica, 156:153–201, 1986

  32. [32]

    Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. M. Stuart, and A. Anand- kumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2021

  33. [33]

    Z. Li, K. Liu, L. Liverani, and E. Zuazua. Universal approximation of dynamical systems by semiautonomous neural ODEs and applications.SIAM Journal on Numerical Analysis, 64(1):193–223, 2026

  34. [34]

    Z. Li, K. Liu, Y. Song, H. Yue, and E. Zuazua. Deep neural ODE operator networks for PDEs. Mathematical Models and Methods in Applied Sciences, 2026

  35. [35]

    A PDE Perspective on Generative Diffusion Models

    K. Liu and E. Zuazua. A PDE perspective on generative diffusion models, 2025. Preprint at https://arxiv.org/abs/2511.05940

  36. [36]

    Łojasiewicz

    S. Łojasiewicz. Ensembles semi-analytiques, 1965. Lecture notes, Institut des Hautes Études Scientifiques

  37. [37]

    C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, and J. Zhu. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. InAdvances in Neural Information Processing Systems, volume 35, pages 5775–5787, 2022

  38. [38]

    L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intel- ligence, 3(3):218–229, 2021

  39. [39]

    L. Markus. Asymptotically autonomous differential systems. InContributions to the Theory of Nonlinear Oscillations, Vol. III, volume 36 ofAnnals of Mathematics Studies, pages 17–29. Princeton University Press, Princeton, NJ, 1956

  40. [40]

    H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas. Communication- efficient learning of deep networks from decentralized data. InProceedings of the 20th Interna- tional Conference on Artificial Intelligence and Statistics, 2017

  41. [41]

    Peebles and S

    W. Peebles and S. Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023

  42. [42]

    Rahimi and S

    P. Rahimi and S. Marcel. ScoreMix: Improving face recognition via score composition in diffusion generators, 2025. Preprint athttps://arxiv.org/abs/2506.10226

  43. [43]

    Raissi, P

    M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differ- ential equations.Journal of Computational Physics, 378:686–707, 2019. 62

  44. [44]

    Robbins and S

    H. Robbins and S. Monro. A stochastic approximation method.Annals of Mathematical Statistics, 22(3):400–407, 1951

  45. [45]

    R. T. Rockafellar and R. J.-B. Wets.Variational Analysis, volume 317 ofGrundlehren der Mathematischen Wissenschaften. Springer, 1998

  46. [46]

    Ronneberger, P

    O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional networks for biomedical image segmentation. InMedical Image Computing and Computer-Assisted Intervention – MICCAI 2015, volume 9351 ofLecture Notes in Computer Science, pages 234–241. Springer, 2015

  47. [47]

    Sadat, M

    S. Sadat, M. Kansy, O. Hilliges, and R. M. Weber. No training, no problem: Rethinking classifier-free guidance for diffusion models. InInternational Conference on Learning Repre- sentations. OpenReview.net, 2025

  48. [48]

    L. Simon. Asymptotics for a class of nonlinear evolution equations, with applications to geo- metric problems.Annals of Mathematics, 118(3):525–571, 1983

  49. [49]

    Song and S

    Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution. InAdvances in Neural Information Processing Systems, volume 32, 2019

  50. [50]

    Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021

  51. [51]

    Villani.Hypocoercivity

    C. Villani.Hypocoercivity. American Mathematical Society, 2009

  52. [52]

    P. Vincent. A connection between score matching and denoising autoencoders.Neural Com- putation, 23(7):1661–1674, 2011

  53. [53]

    X. Wang, N. Dufour, N. Andreou, M.-P. Cani, V. Fernández Abrevaya, D. Picard, and V. Kalo- geiton. Analysisofclassifier-freeguidanceweightschedulers.Transactions on Machine Learning Research, 2024

  54. [54]

    E. Zuazua. Asymptotic behavior of scalar convection–diffusion equations, 2020. Preprint at https://arxiv.org/abs/2003.11834. 63