arxiv: 2605.12231 · v1 · submitted 2026-05-12 · 🧮 math.OC

Recognition: 2 theorem links

· Lean Theorem

Geometric Asymptotics of Score Mixing and Guidance in Diffusion Models

Enrique Zuazua, Kang Liu

Pith reviewed 2026-05-13 03:57 UTC · model grok-4.3

classification 🧮 math.OC

keywords score mixingdiffusion modelsgeometric potentialLaplace-Varadhan principlesubgradient inclusionguidanceheat flowVoronoi partition

0 comments

The pith

Mixed-score guidance in diffusion models reduces asymptotically to dynamics on a geometric potential of squared distances to data supports.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that mixing two score fields with weight λ produces small-time generation trajectories whose effective driving force is the explicit potential Φ_λ equal to λ times the squared distance to the first support plus (1-λ) times the squared distance to the second support. This geometric object is independent of the detailed shape of the measures and governs the flow uniformly in both the mixture-of-experts range and the amplified-guidance range λ greater than one. By rescaling time in a similarity-invariant manner and applying the Laplace-Varadhan principle, the original singular non-autonomous equation is replaced by an autonomous Clarke subgradient inclusion driven solely by this potential. In the special case of empirical Dirac mixtures the potential becomes piecewise quadratic on a Voronoi partition, which immediately yields convergence of all limiting paths to critical points together with an explicit convergence rate for the original flow.

Core claim

Exploiting a Laplace-Varadhan principle under a similarity-time rescaling, the small-time generation dynamics driven by the mixed score s = λ ∇log u1 + (1-λ) ∇log u2 is governed by the explicit geometric potential Φ_λ = λ d1² + (1-λ) d2², which depends only on the supports of the initial measures and on the mixing parameter. This gives a rigorous reduction from a singular, non-autonomous score-driven dynamics to autonomous Clarke-type subgradient inclusions. In the empirical setting of finite Dirac mixtures, the limiting potential is piecewise quadratic with a Voronoi-type structure, yielding convergence of all autonomous limiting trajectories to critical points and a conditional convergence

What carries the argument

The geometric potential Φ_λ = λ d₁² + (1-λ) d₂², which encodes the limiting small-time behavior of the mixed-score flow and permits its reduction to an autonomous subgradient inclusion.

If this is right

The mixed-score dynamics reduces to an autonomous Clarke subgradient inclusion driven by Φ_λ.
In the finite-Dirac case all limiting trajectories converge to critical points of the piecewise-quadratic potential.
The original generation flow converges to local minimizers of Φ_λ at rate O(√t) whenever the minimizer is smooth and stable.
The same geometric description covers both the mixture-of-experts regime 0 ≤ λ ≤ 1 and the classifier-free guidance regime λ > 1.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Choosing λ to sculpt the landscape of Φ_λ offers a direct way to steer sampling toward or away from particular supports without retraining scores.
The Voronoi structure of the empirical case suggests that guidance with more than two measures would induce a multi-class Voronoi partition whose geometry controls the flow.
Because the reduction is local in time, early-stopping or hybrid integrators that switch to the geometric flow after a short burn-in period become feasible.

Load-bearing premise

The two measures are heat evolutions of compactly supported probability measures, so the Laplace-Varadhan principle applies directly to the rescaled process.

What would settle it

Numerical trajectories generated by the mixed score for small times that deviate systematically from the gradient flow of Φ_λ, or an analytic counter-example in which the supports are compact yet the limiting dynamics is not captured by that potential.

Figures

Figures reproduced from arXiv: 2605.12231 by Enrique Zuazua, Kang Liu.

**Figure 2.** Figure 2: Vector fields of the limiting dynamics and the associated potential landscape. [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Local minimizers of Φλ in the MoE regime. As λ increases from 0 to 1, the minimizers trace a geometric interpolation from A2 toward A1. CFG regime λ > 1. In the CFG regime, Φλ(x) = λd1(x) 2 − (λ − 1)d2(x) 2 = min y1∈A1 max y2∈A2 λ∥x − y1∥ 2 − (λ − 1)∥x − y2∥ 2 . The potential is therefore no longer a cooperative average. It has an extrapolative, competitive structure: the dynamics is attracted toward t… view at source ↗

**Figure 4.** Figure 4: illustrates the local minimizers of Φλ in the CFG regime for the same finite Diracmixture example. As λ increases, minimizers may approach the interface ND(A1, A2) and then evolve along it, reflecting the stronger role of nonsmooth geometry in the extrapolative regime [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Li–Yau, semiconcavity, and Hamilton–Jacobi structure for the rescaled logarithmic po [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: 1D case. Top row: stacked visualization of the log-product potential [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: 2D case. Backward generation trajectories in [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Backward deterministic generation trajectories driven by pure or mixed scores associated [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

**Figure 9.** Figure 9: Real-data experiment on CIFAR-10 in pixel space. Here [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

read the original abstract

Diffusion models are routinely guided in practice by combining multiple score fields, yet the mathematical structure of score mixing is still poorly understood. We study the small-time generation dynamics driven by mixed scores $$ s=\lambda\,\nabla\log u_1+(1-\lambda)\,\nabla\log u_2,\qquad \lambda\ge 0, $$ in the heat-flow framework, where $u_1,u_2$ are heat evolutions of two compactly supported probability measures. This single formulation covers both the mixture-of-experts regime $(0\leq \lambda\leq 1)$ and the classifier-free guidance regime $(\lambda>1)$. Exploiting a Laplace-Varadhan principle under a similarity-time rescaling, we show that the small-time generation dynamics is governed by the explicit geometric potential $$ \Phi_\lambda=\lambda d_1^2+(1-\lambda)d_2^2, $$ which depends only on the supports of the initial measures and on the mixing parameter. This gives a rigorous reduction from a singular, non-autonomous score-driven dynamics to autonomous Clarke-type subgradient inclusions. In the empirical setting of finite Dirac mixtures, the limiting potential is piecewise quadratic with a Voronoi-type structure; this rigidity yields convergence of all autonomous limiting trajectories to critical points and a conditional convergence criterion for the original generation flow toward local minimizers of the potential, with rate $\mathcal O(\sqrt t)$ in the smooth stable case.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reduces small-time mixed-score diffusion dynamics to an autonomous Clarke subgradient flow on the explicit potential Φ_λ = λ d1² + (1-λ) d2².

read the letter

The key takeaway is that this paper reduces the small-time generation dynamics of mixed scores in diffusion models to an autonomous Clarke subgradient flow driven by a simple geometric potential depending only on the supports. They start from the mixed score s = λ ∇log u1 + (1-λ) ∇log u2 with u1 and u2 as heat evolutions of compactly supported measures, apply a similarity-time rescaling, and invoke Laplace-Varadhan to extract the limiting potential Φ_λ. This single setup handles both mixture-of-experts (0 ≤ λ ≤ 1) and classifier-free guidance (λ > 1). For finite Dirac mixtures the potential turns piecewise quadratic with Voronoi structure, which forces limiting trajectories to critical points and yields conditional convergence of the original flow to local minimizers at rate O(√t) in the smooth stable case. The compact-support assumption gives uniform quadratic growth at infinity, so Φ_λ stays coercive with leading |x|² coefficient independent of λ; this ensures compact sublevel sets and justifies the passage to the inclusion without extra tail conditions. The stress-test note confirms the large-deviation rate function behaves well under these hypotheses. The main soft spot is that the reduction is strictly small-time asymptotic, with no quantitative error bounds or finite-time analysis provided in the abstract. The Clarke inclusion handles the nonsmooth distances, but the precise scope of the singular regime for λ > 1 would need checking in the full proofs. The work targets researchers doing rigorous analysis of diffusion models and score-based sampling, especially those connecting to large deviations or geometric flows. A reader already familiar with heat kernels and subgradient methods will extract the most value from the explicit potential and the Dirac-case convergence. It deserves a serious referee because the claims rest on standard tools applied cleanly to a current question in generative modeling, and the concrete results for empirical mixtures are verifiable. I recommend sending it to peer review.

Referee Report

2 major / 3 minor

Summary. The manuscript analyzes small-time generation dynamics in diffusion models driven by mixed scores s = λ ∇log u₁ + (1-λ) ∇log u₂ (λ ≥ 0), where u₁ and u₂ are heat evolutions of compactly supported probability measures. This covers both mixture-of-experts (0 ≤ λ ≤ 1) and classifier-free guidance (λ > 1). Exploiting a Laplace-Varadhan principle under similarity-time rescaling, the authors show that the dynamics reduce to an autonomous Clarke subgradient flow on the explicit geometric potential Φ_λ = λ d₁² + (1-λ) d₂², which depends only on the supports of the initial measures. For finite Dirac mixtures the limiting potential is piecewise quadratic with Voronoi structure; this yields convergence of all limiting trajectories to critical points and a conditional convergence criterion for the original flow toward local minimizers, with rate O(√t) in the smooth stable case.

Significance. If the central reduction holds, the work supplies a rigorous geometric foundation for score mixing and guidance, two techniques central to practical diffusion models. The explicit, support-dependent potential Φ_λ together with the reduction from a singular non-autonomous SDE to an autonomous subgradient inclusion is a clear strength; the compact-support hypothesis is used effectively to obtain coercivity and compact sublevel sets without extra tail conditions. The piecewise-quadratic Voronoi analysis for Dirac mixtures and the explicit convergence rate in the stable case are concrete, falsifiable contributions that could guide both theoretical analysis and empirical design of guided samplers.

major comments (2)

[Abstract and §3] Abstract and §3 (Laplace-Varadhan reduction): the claim of a rigorous passage from the non-autonomous score-driven SDE to the limiting Clarke inclusion is load-bearing, yet the manuscript provides no explicit error estimates or quantitative rate for the rescaled large-deviation approximation in the singular regime (λ > 1). Without these bounds it is difficult to confirm the precise scope of the compact-support assumption.
[§4] §4 (Dirac-mixture case): the stated O(√t) convergence rate to local minimizers is given only for the smooth stable case; the manuscript should clarify whether this rate survives when the limiting potential Φ_λ has non-differentiable Voronoi edges, as these are the generic points for finite mixtures.

minor comments (3)

[Abstract] The distance functions d₁ and d₂ (distances to the supports) are used throughout but never defined explicitly in the abstract or early sections; a short sentence recalling d_i(x) = inf_{y ∈ supp(μ_i)} |x - y| would improve readability.
[§2] The term “similarity-time rescaling” is introduced without the precise change-of-variable formula; adding the explicit time substitution (e.g., τ = -log t or equivalent) in the first paragraph of §2 would help readers track the large-deviation scaling.
[§3] Notation for the Clarke subdifferential ∂^C Φ_λ is used in the limiting inclusion but never contrasted with the classical gradient; a one-sentence reminder that the inclusion reduces to the gradient flow wherever Φ_λ is differentiable would clarify the statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and recommendation for minor revision. We respond point-by-point to the major comments below.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Laplace-Varadhan reduction): the claim of a rigorous passage from the non-autonomous score-driven SDE to the limiting Clarke inclusion is load-bearing, yet the manuscript provides no explicit error estimates or quantitative rate for the rescaled large-deviation approximation in the singular regime (λ > 1). Without these bounds it is difficult to confirm the precise scope of the compact-support assumption.

Authors: The Laplace-Varadhan principle under similarity-time rescaling yields a rigorous asymptotic reduction of the rescaled process to the autonomous Clarke subgradient inclusion as the time parameter tends to zero. The compact-support hypothesis is used precisely to guarantee the large-deviation principle and the coercivity of Φ_λ, without requiring extra tail conditions. Explicit quantitative error bounds for the approximation in the singular regime λ > 1 are not needed to establish the limiting geometric structure or the convergence statements for Dirac mixtures. We will add a clarifying paragraph in Section 3 on the sense of convergence (viscosity solutions) and note that rate estimates constitute an open direction. revision: partial
Referee: [§4] §4 (Dirac-mixture case): the stated O(√t) convergence rate to local minimizers is given only for the smooth stable case; the manuscript should clarify whether this rate survives when the limiting potential Φ_λ has non-differentiable Voronoi edges, as these are the generic points for finite mixtures.

Authors: The O(√t) rate is derived only when the trajectory approaches a point at which Φ_λ is locally C² with positive-definite Hessian (the smooth stable case). At generic non-differentiable Voronoi edges the subgradient inclusion still forces convergence to critical points, but the rate typically slows (e.g., O(t^{1/3}) near codimension-1 edges). We will revise Section 4 to restrict the stated rate explicitly to the smooth stable case and add a short remark on the non-smooth regime. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives the governing potential Φ_λ via direct application of the standard Laplace-Varadhan large-deviation principle to a similarity-time rescaling of the heat kernels generated by compactly supported initial measures. This produces an explicit expression Φ_λ = λ d₁² + (1-λ) d₂² constructed solely from the distance functions to the supports, without parameter fitting, self-referential definitions, or load-bearing self-citations. The reduction from the non-autonomous score-driven SDE to the autonomous Clarke subgradient inclusion follows from the asymptotic analysis and the coercivity ensured by compact support; all steps remain independent of the target result and rely on externally verifiable principles rather than circular inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The claim rests on the applicability of the Laplace-Varadhan principle to the rescaled mixed-score process and on the compact support of the initial measures; no free parameters are fitted and no new entities are postulated beyond the derived potential.

axioms (2)

domain assumption u1 and u2 are heat evolutions of compactly supported probability measures
Stated explicitly in the abstract as the setting for the mixed score.
standard math Laplace-Varadhan principle applies under the similarity-time rescaling
Invoked to obtain the explicit geometric potential from the large-deviation rate function.

invented entities (1)

geometric potential Φ_λ = λ d1² + (1-λ) d2² no independent evidence
purpose: Governs the limiting small-time dynamics
Derived quantity that reduces the original non-autonomous flow to an autonomous subgradient inclusion; no independent empirical evidence supplied.

pith-pipeline@v0.9.0 · 5552 in / 1525 out tokens · 56242 ms · 2026-05-13T03:57:22.063736+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
Exploiting a Laplace-Varadhan principle under a similarity-time rescaling, we show that the small-time generation dynamics is governed by the explicit geometric potential Φ_λ=λ d1²+(1-λ)d2²
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
the limiting potential is piecewise quadratic with a Voronoi-type structure

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 1 internal anchor

[1]

B. D. O. Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982

work page 1982
[2]

On theconvergenceofthe proximal algorithm fornonsmooth functions involving analytic features.Mathematical Programming, 116(1–2):5–16, 2009

H.Attouchand J.Bolte. On theconvergenceofthe proximal algorithm fornonsmooth functions involving analytic features.Mathematical Programming, 116(1–2):5–16, 2009

work page 2009
[3]

Aubin and H

J.-P. Aubin and H. Frankowska.Set-Valued Analysis. Systems & Control: Foundations & Applications. Birkhäuser Boston, 1990

work page 1990
[4]

Systems & Control: Foundations & Applications

Martino Bardi and Italo Capuzzo-Dolcetta.Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Systems & Control: Foundations & Applications. Birkhäuser Boston, Boston, MA, 1997

work page 1997
[5]

Benaïm and M

M. Benaïm and M. W. Hirsch. Asymptotic pseudotrajectories and chain recurrent flows, with applications.Journal of Dynamics and Differential Equations, 8(1):141–176, 1996

work page 1996
[6]

Benaïm, J

M. Benaïm, J. Hofbauer, and S. Sorin. Stochastic approximations and differential inclusions. SIAM Journal on Control and Optimization, 44(1):328–348, 2005

work page 2005
[7]

C. M. Bender and S. A. Orszag.Advanced Mathematical Methods for Scientists and Engineers I: Asymptotic Methods and Perturbation Theory. Springer, 2013

work page 2013
[8]

Benton, V

J. Benton, V. De Bortoli, A. Doucet, and G. Deligiannidis. Nearlyd-linear convergence bounds for diffusion models via stochastic localization. InInternational Conference on Learning Rep- resentations, 2024

work page 2024
[9]

C. M. Bishop.Neural Networks for Pattern Recognition. Oxford University Press, 1995

work page 1995
[10]

S. Chen, S. Chewi, J. Li, Y. Li, A. Salim, and A. R. Zhang. Sampling is as easy as learning the score: Theory for diffusion models with minimal data assumptions. InInternational Conference on Learning Representations, 2023

work page 2023
[11]

Chidambaram, K

M. Chidambaram, K. Gatmiry, S. Chen, H. Lee, and J. Lu. What does guidance do? a fine- grained analysis in a simple setting. InAdvances in Neural Information Processing Systems, volume 37, pages 84968–85005, 2024

work page 2024
[12]

F. H. Clarke.Optimization and Nonsmooth Analysis. Wiley, 1989. 60

work page 1989
[13]

Conforti, A

G. Conforti, A. Durmus, and M. Gentiloni Silveri. KL convergence guarantees for score diffusion models under minimal data assumptions.SIAM Journal on Mathematics of Data Science, 7(1):86–109, 2025

work page 2025
[14]

J. Cortés. Discontinuous dynamical systems: A tutorial on solutions, nonsmooth analysis, and stability.IEEE Control Systems Magazine, 28(3):36–73, 2008

work page 2008
[15]

Crandall, Hitoshi Ishii, and Pierre-Louis Lions

Michael G. Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscosity solutions of second order partial differential equations.Bulletin of the American Mathematical Society, 27(1):1–67, 1992

work page 1992
[16]

Dembo and O

A. Dembo and O. Zeitouni.Large Deviations Techniques and Applications, volume 38 ofAp- plications of Mathematics. Springer, New York, 2 edition, 1998

work page 1998
[17]

Dhariwal and A

P. Dhariwal and A. Q. Nichol. Diffusion models beat GANs on image synthesis. InAdvances in Neural Information Processing Systems, volume 34, pages 8780–8794, 2021

work page 2021
[18]

Fukunaga and L

K. Fukunaga and L. D. Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition.IEEE Transactions on Information Theory, 21(1):32–40, 1975

work page 1975
[19]

V. A. Galaktionov and J. L. Vázquez. Asymptotic behaviour of nonlinear parabolic equa- tions with critical exponents: A dynamical systems approach.Journal of Functional Analysis, 100(2):435–462, 1991

work page 1991
[20]

V. A. Galaktionov and J. L. Vázquez.A Stability Technique for Evolution Partial Differential Equations: A Dynamical Systems Approach, volume 56 ofProgress in Nonlinear Differential Equations and Their Applications. Birkhäuser, 2004

work page 2004
[21]

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems, volume 27, 2014

work page 2014
[22]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020

work page 2020
[23]

Ho and T

J. Ho and T. Salimans. Classifier-free diffusion guidance. InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications. OpenReview.net, 2021

work page 2021
[24]

Estimationofnon-normalizedstatisticalmodelsbyscorematching

A.HyvärinenandP.Dayan. Estimationofnon-normalizedstatisticalmodelsbyscorematching. Journal of Machine Learning Research, 6(4):695–709, 2005

work page 2005
[25]

Springer, 2 edition, 1991

I.KaratzasandS.E.Shreve.Brownian Motion and Stochastic Calculus, volume113ofGraduate Texts in Mathematics. Springer, 2 edition, 1991

work page 1991
[26]

Karras, M

T. Karras, M. Aittala, T. Kynkäänniemi, J. Lehtinen, T. Aila, and S. Laine. Guiding a diffusion model with a bad version of itself. InAdvances in Neural Information Processing Systems, volume 37, 2024

work page 2024
[27]

D. P. Kingma and Y. LeCun. Regularized estimation of image statistics by score matching. In Advances in Neural Information Processing Systems, volume 23, 2010

work page 2010
[28]

D. P. Kingma and M. Welling. Auto-encoding variational Bayes. InInternational Conference on Learning Representations, 2014. 61

work page 2014
[29]

Klartag and O

B. Klartag and O. Ordentlich. The strong data processing inequality under the heat flow.IEEE Transactions on Information Theory, 2025

work page 2025
[30]

H. Lee, J. Lu, and Y. Tan. Convergence for score-based generative modeling with polynomial complexity. InAdvances in Neural Information Processing Systems, volume 35, pages 22870– 22882, 2022

work page 2022
[31]

Li and S.-T

P. Li and S.-T. Yau. On the parabolic kernel of the Schrödinger operator.Acta Mathematica, 156:153–201, 1986

work page 1986
[32]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. M. Stuart, and A. Anand- kumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2021

work page 2021
[33]

Z. Li, K. Liu, L. Liverani, and E. Zuazua. Universal approximation of dynamical systems by semiautonomous neural ODEs and applications.SIAM Journal on Numerical Analysis, 64(1):193–223, 2026

work page 2026
[34]

Z. Li, K. Liu, Y. Song, H. Yue, and E. Zuazua. Deep neural ODE operator networks for PDEs. Mathematical Models and Methods in Applied Sciences, 2026

work page 2026
[35]

A PDE Perspective on Generative Diffusion Models

K. Liu and E. Zuazua. A PDE perspective on generative diffusion models, 2025. Preprint at https://arxiv.org/abs/2511.05940

work page internal anchor Pith review Pith/arXiv arXiv 2025
[36]

Łojasiewicz

S. Łojasiewicz. Ensembles semi-analytiques, 1965. Lecture notes, Institut des Hautes Études Scientifiques

work page 1965
[37]

C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, and J. Zhu. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. InAdvances in Neural Information Processing Systems, volume 35, pages 5775–5787, 2022

work page 2022
[38]

L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intel- ligence, 3(3):218–229, 2021

work page 2021
[39]

L. Markus. Asymptotically autonomous differential systems. InContributions to the Theory of Nonlinear Oscillations, Vol. III, volume 36 ofAnnals of Mathematics Studies, pages 17–29. Princeton University Press, Princeton, NJ, 1956

work page 1956
[40]

H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas. Communication- efficient learning of deep networks from decentralized data. InProceedings of the 20th Interna- tional Conference on Artificial Intelligence and Statistics, 2017

work page 2017
[41]

Peebles and S

W. Peebles and S. Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023

work page 2023
[42]

Rahimi and S

P. Rahimi and S. Marcel. ScoreMix: Improving face recognition via score composition in diffusion generators, 2025. Preprint athttps://arxiv.org/abs/2506.10226

work page arXiv 2025
[43]

Raissi, P

M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differ- ential equations.Journal of Computational Physics, 378:686–707, 2019. 62

work page 2019
[44]

Robbins and S

H. Robbins and S. Monro. A stochastic approximation method.Annals of Mathematical Statistics, 22(3):400–407, 1951

work page 1951
[45]

R. T. Rockafellar and R. J.-B. Wets.Variational Analysis, volume 317 ofGrundlehren der Mathematischen Wissenschaften. Springer, 1998

work page 1998
[46]

Ronneberger, P

O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional networks for biomedical image segmentation. InMedical Image Computing and Computer-Assisted Intervention – MICCAI 2015, volume 9351 ofLecture Notes in Computer Science, pages 234–241. Springer, 2015

work page 2015
[47]

Sadat, M

S. Sadat, M. Kansy, O. Hilliges, and R. M. Weber. No training, no problem: Rethinking classifier-free guidance for diffusion models. InInternational Conference on Learning Repre- sentations. OpenReview.net, 2025

work page 2025
[48]

L. Simon. Asymptotics for a class of nonlinear evolution equations, with applications to geo- metric problems.Annals of Mathematics, 118(3):525–571, 1983

work page 1983
[49]

Song and S

Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution. InAdvances in Neural Information Processing Systems, volume 32, 2019

work page 2019
[50]

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021

work page 2021
[51]

Villani.Hypocoercivity

C. Villani.Hypocoercivity. American Mathematical Society, 2009

work page 2009
[52]

P. Vincent. A connection between score matching and denoising autoencoders.Neural Com- putation, 23(7):1661–1674, 2011

work page 2011
[53]

X. Wang, N. Dufour, N. Andreou, M.-P. Cani, V. Fernández Abrevaya, D. Picard, and V. Kalo- geiton. Analysisofclassifier-freeguidanceweightschedulers.Transactions on Machine Learning Research, 2024

work page 2024
[54]

E. Zuazua. Asymptotic behavior of scalar convection–diffusion equations, 2020. Preprint at https://arxiv.org/abs/2003.11834. 63

work page arXiv 2020