Generative Drifting is Secretly Score Matching: a Spectral and Variational Perspective

Erkan Turan; Maks Ovsjanikov; Nicolas Dufour

arxiv: 2603.09936 · v2 · pith:272X56QOnew · submitted 2026-03-10 · 💻 cs.LG

Generative Drifting is Secretly Score Matching: a Spectral and Variational Perspective

Erkan Turan , Nicolas Dufour , Maks Ovsjanikov This is my paper

classification 💻 cs.LG

keywords driftdriftingoperatorkernelconvergencedistributionsdivergenceemph

0 comments

read the original abstract

Generative Modeling via Drifting~\citep{deng2026drifting} has recently achieved state-of-the-art one-step image generation through a kernel-based drift operator, yet its success is largely empirical and its theoretical foundations remain poorly understood. We observe that \emph{under a Gaussian kernel, the drift operator is exactly a score difference on smoothed distributions}. This answers three questions left open in the original work: (1) whether a vanishing drift guarantees equality of distributions ($V_{p,q}=0\Rightarrow p=q$), (2) how to choose between kernels, and (3) why the stop-gradient operator is indispensable for stable training. Our observations position drifting within the score-matching family. By linearizing the McKean-Vlasov dynamics and probing them in Fourier space, we reveal frequency-dependent convergence timescales comparable to \emph{Landau damping} in plasma kinetic theory: the Gaussian kernel suffers an exponential high-frequency bottleneck, potentially explaining the empirical preference for the Laplacian kernel. This suggests a fix: an exponential bandwidth annealing schedule $\sigma(t)=\sigma_0 e^{-rt}$ that reduces convergence time from $\exp(O(K_{\max}^2))$ to $O(\log K_{\max})$. Finally, by formalizing drifting as a Wasserstein gradient flow of the smoothed KL divergence, we prove that the stop-gradient operator is not a heuristic but is derived from the frozen-field discretization mandated by the Jordan-Kinderlehrer-Otto (JKO) scheme, and removing it severs training from any gradient-flow guarantee. This variational perspective further provides a general template for constructing novel drift operators, which we demonstrate with a Sinkhorn divergence drift. We validate our analysis on toy datasets and scale it up to ImageNet.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Uniform-in-time Propagation-of-Chaos for Stein Variational Gradient Descent
math.PR 2026-06 unverdicted novelty 7.0

Uniform-in-time propagation-of-chaos bounds for SVGD are obtained via cutoff for distributional metrics (logarithmic rates) and via finite-dimensional closure plus conjugacy for Gaussian targets (parametric N^{-1/2} rates).
Drifting Preference Optimization for One-Step Generative Models
cs.LG 2026-06 unverdicted novelty 7.0

DrPO enables online preference optimization for deterministic one-step generators via non-parametric dipole updates from ranked samples plus base-model drift, without reward backpropagation.
DriftXpress: Faster Drifting Models via Projected RKHS Fields
cs.LG 2026-05 unverdicted novelty 7.0

DriftXpress approximates drifting kernels via projected RKHS fields to lower training cost of one-step generative models while matching original FID scores.
One-Step Generative Modeling via Wasserstein Gradient Flows
cs.LG 2026-05 conditional novelty 7.0

W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...
Kernel-Gradient Drifting Models
cs.LG 2026-05 unverdicted novelty 7.0

Kernel-gradient drifting reformulates drifting models via kernel gradients to yield identifiable one-step generation with smoothed score matching and KL descent on Euclidean, Riemannian, and discrete spaces.
Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families
stat.ML 2026-04 unverdicted novelty 7.0

Companion-elliptic kernels (exactly the Gaussians and Matérn kernels with ν ≥ 1/2) ensure drifting-field identifiability for equal measures and restore stability via an asymptotic lower bound on the intrinsic overlap scalar.
Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families
stat.ML 2026-04 conditional novelty 7.0

For companion-elliptic kernels vanishing drifting fields identify target measures exactly, and field convergence yields weak convergence once mass escape to infinity is detected by a single C0 scalar.
Difference of Convex Programming in the Wasserstein Space with Applications to MMD Optimization
cs.LG 2026-06 unverdicted novelty 6.0

Lifts CCCP to Wasserstein space for DC functionals on measures, proves almost stationarity under smoothness/strong-convexity assumptions, and applies to MMD/ED with local convergence and faster empirical runs.
Smoothed-KL Reweighting: A Principled Account and Matching Rule for SNR-Based Diffusion Training
stat.ME 2026-06 unverdicted novelty 6.0

Derives Soft-Min-SNR weight from spread divergence on local Gaussian surrogates, yielding closed-form w = sigma^2/(sigma^2+lambda) that matches Min-SNR at leading order.
Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models
stat.ML 2026-05 unverdicted novelty 6.0

Establishes finite-particle convergence rates for a conservative KDE-gradient drifting method in one-step generative modeling on R^d along with analysis of a non-conservative Laplace kernel variant, yielding explicit ...
Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models
stat.ML 2026-05 unverdicted novelty 6.0

Derives continuous-time finite-particle convergence rates for a new conservative KDE-gradient drifting method and the non-conservative Laplace kernel method in one-step generative modeling.
One-Step Generative Modeling via Wasserstein Gradient Flows
cs.LG 2026-05 unverdicted novelty 6.0

W-Flow compresses a Wasserstein gradient flow defined via Sinkhorn divergence into a single-step neural generator, reporting 1.29 FID on ImageNet 256x256 with improved mode coverage.
Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow
cs.LG 2026-05 unverdicted novelty 6.0

DFP is a one-step generative policy using Wasserstein gradient flow on a drifting model backbone, with a top-K behavior cloning surrogate, that reaches SOTA on Robomimic and OGBench manipulation tasks.
On the Wasserstein Gradient Flow Interpretation of Drifting Models
cs.LG 2026-05 unverdicted novelty 6.0

The paper interprets GMD algorithms as limiting points of Wasserstein gradient flows on KL divergence with Parzen smoothing and on Sinkhorn divergence, while extending the approach to MMD, sliced Wasserstein, and GAN critics.
Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families
stat.ML 2026-04 unverdicted novelty 6.0

The paper proves identifiability of drifting fields for companion-elliptic kernels and shows that field convergence plus a C0 observable recovers weak convergence even without tightness.
Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
cs.CV 2026-05 unverdicted novelty 5.0

A simplified one-step diffusion distillation uses pretrained teacher features directly for drifting loss plus a mode coverage term, achieving FID 1.58 on ImageNet-64 and 18.4 on SDXL.
On the Wasserstein Gradient Flow Interpretation of Drifting Models
cs.LG 2026-05 unverdicted novelty 5.0

GMD algorithms correspond to limiting points of Wasserstein gradient flows on the KL divergence with Parzen smoothing and bear resemblance to Sinkhorn divergence fixed points, with extensions to MMD and other divergences.