Stable Velocity: A Variance Perspective on Flow Matching

Donglin Yang; Liang Hou; Pengfei Wan; Renjie Liao; Xiaojuan Qi; Xin Tao; Xin Yu; Yongxing Zhang

arxiv: 2602.05435 · v2 · pith:JW775INEnew · submitted 2026-02-05 · 💻 cs.CV

Stable Velocity: A Variance Perspective on Flow Matching

Donglin Yang , Yongxing Zhang , Xin Yu , Liang Hou , Xin Tao , Pengfei Wan , Xiaojuan Qi , Renjie Liao This is my paper

classification 💻 cs.CV

keywords regimelow-variancestabletrainingvelocitymatchingsamplingconditional

0 comments

read the original abstract

While flow matching is elegant, its reliance on single-sample conditional velocities leads to high-variance training targets that destabilize optimization and slow convergence. By explicitly characterizing this variance, we identify 1) a high-variance regime near the prior, where optimization is challenging, and 2) a low-variance regime near the data distribution, where conditional and marginal velocities nearly coincide. Leveraging this insight, we propose Stable Velocity, a unified framework that improves both training and sampling. For training, we introduce Stable Velocity Matching (StableVM), an unbiased variance-reduction objective, along with Variance-Aware Representation Alignment (VA-REPA), which adaptively strengthen auxiliary supervision in the low-variance regime. For inference, we show that dynamics in the low-variance regime admit closed-form simplifications, enabling Stable Velocity Sampling (StableVS), a finetuning-free acceleration. Extensive experiments on ImageNet $256\times256$ and large pretrained text-to-image and text-to-video models, including SD3.5, Flux, Qwen-Image, and Wan2.2, demonstrate consistent improvements in training efficiency and more than $2\times$ faster sampling within the low-variance regime without degrading sample quality. Our code is available at https://github.com/linYDTHU/StableVelocity.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Hyper-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control
cs.RO 2026-05 unverdicted novelty 7.0

HDP3 is a pocket-scale 3D diffusion policy with a Diffusion Mixer decoder that achieves state-of-the-art visuomotor control using two-step DDIM inference and under 1% of the parameters of prior 3D diffusion policies.
Hyper-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control
cs.RO 2026-05 conditional novelty 7.0

Frequency analysis of smooth robot actions bounds denoising error to low-frequency modes, enabling a sub-1% parameter 3D diffusion policy with two-step inference that reaches SOTA on manipulation benchmarks.
StreamEdit: Training-Free Video Editing via Few-Step Streaming Video Generation
cs.CV 2026-05 unverdicted novelty 6.0

StreamGVE enables high-quality training-free video editing by converting the task to noise-to-data streaming generation with dual-branch fast sampling, self-attention bridges, cross-attention grounding, source-oriente...
StreamEdit: Training-Free Video Editing via Few-Step Streaming Video Generation
cs.CV 2026-05 unverdicted novelty 6.0

StreamEdit enables high-quality training-free video editing by adapting streaming video generation models with dual-branch fast sampling, self-attention bridge, cross-attention grounding, source-oriented guidance, and...
Hyper-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control
cs.RO 2026-05 unverdicted novelty 6.0

Hydra-DP3 achieves SOTA visuomotor performance with under 1% of prior 3D diffusion policy parameters by using frequency analysis to justify a lightweight decoder and two-step DDIM inference.
Hyper-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control
cs.RO 2026-05 unverdicted novelty 6.0

Hydra-DP3 is a lightweight 3D diffusion policy that uses frequency analysis of smooth action trajectories to enable two-step DDIM inference and achieves state-of-the-art results with under 1% of prior parameters.
NeuroSonic: Conditional Flow Matching for EEG-to-Speech Reconstruction
cs.LG 2026-06 unverdicted novelty 5.0

NeuroSonic introduces a conditional flow-matching framework that learns a deterministic transport from noise to speech conditioned on EEG, reporting up to 26.3% gains in perceptual quality over GAN, diffusion, and mea...