AnyFlow enables any-step video diffusion by distilling flow-map transitions over arbitrary time intervals with on-policy backward simulation.
super hub
Flow Matching for Generative Modeling
267 Pith papers cite this work. Polarity classification is still indexing.
abstract
We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more robust and stable alternative for training diffusion models. Furthermore, Flow Matching opens the door to training CNFs with other, non-diffusion probability paths. An instance of particular interest is using Optimal Transport (OT) displacement interpolation to define the conditional probability paths. These paths are more efficient than diffusion paths, provide faster training and sampling, and result in better generalization. Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality, and allows fast and reliable sample generation using off-the-shelf numerical ODE solvers.
hub tools
claims ledger
- abstract We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more
authors
co-cited works
representative citing papers
Data geometry makes time identifiable from noisy interpolants at rate O(1/sqrt(d-k)), rendering the time-blindness gap asymptotically negligible relative to coupling variance.
Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices beyond traditional score matching.
A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain on hard binder-design tasks.
In flow matching, the uncertainty of the clean data given the current state is exactly the divergence of the velocity field (up to a known scalar).
ReConText3D is the first replay-memory framework for continual text-to-3D generation that prevents catastrophic forgetting on new textual categories while preserving quality on previously seen classes.
Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.
OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
Normalizing flows are constructed by learning the velocity of a stochastic interpolant via a quadratic loss derived from its probability current, yielding an efficient ODE-based alternative to diffusion models.
A new speculative inference system speeds up diffusion VLAs to 19.1 ms average latency (3.04x faster) on LIBERO by replacing most full 58 ms inferences with 7.8 ms draft rounds while preserving task performance.
Marginal-conditioned bridges enable training-free sampling from Flow Language Models by drawing clean one-hot endpoints from factorized posteriors and using Ornstein-Uhlenbeck bridges, preserving token marginals and reducing denoising error versus conditional-mean bridges.
OP4KSR enables efficient one-step 4K super-resolution without patches by adapting Flux with RoPE rescaling and periodicity loss to suppress artifacts.
OmniNFT introduces modality-wise advantage routing, layer-wise gradient surgery, and region-wise loss reweighting in an online diffusion RL framework to improve audio-video quality, alignment, and synchronization.
Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
A morphologically equivariant flow matching policy for bimanual robots enforces reflective symmetry to improve sample efficiency and enable zero-shot generalization to mirrored task configurations.
A generative transfer framework using iterative path-wise tilting integrated with conditional flow matching recovers target entropic optimal transport couplings from reference samples, achieving O(δ) convergence in Wasserstein-1 distance.
h-control introduces block-conditional pseudo-Gibbs refinement for training-free camera control in flow-matching video generators, achieving superior FVD scores on RealEstate10K and DAVIS benchmarks.
W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x faster sampling than comparable multi-step models.
HorizonDrive enables stable long-horizon autoregressive driving simulation via anti-drifting teacher training with scheduled rollout recovery and teacher rollout distillation.
Existence and uniqueness of cyclically monotone zero-couplings are established for arbitrary pairs of infinite measures in M_0(R^d) under a Hausdorff-dimension condition, with the tail limit of such couplings for regularly varying distributions coinciding with the unique proper zero-coupling of the
SABER provides 44.8K multi-representation action samples from unscripted retail environments that raise a VLA model's mean success rate on ten manipulation tasks from 13.4% to 29.3%.
PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.
citing papers explorer
-
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
AnyFlow enables any-step video diffusion by distilling flow-map transitions over arbitrary time intervals with on-policy backward simulation.
-
What Time Is It? How Data Geometry Makes Time Conditioning Optional for Flow Matching
Data geometry makes time identifiable from noisy interpolants at rate O(1/sqrt(d-k)), rendering the time-blindness gap asymptotically negligible relative to coupling variance.
-
Generative Modeling with Flux Matching
Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices beyond traditional score matching.
-
A-CODE: Fully Atomic Protein Co-Design with Unified Multimodal Diffusion
A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain on hard binder-design tasks.
-
Divergence is Uncertainty: A Closed-Form Posterior Covariance for Flow Matching
In flow matching, the uncertainty of the clean data given the current state is exactly the divergence of the velocity field (up to a known scalar).
-
ReConText3D: Replay-based Continual Text-to-3D Generation
ReConText3D is the first replay-memory framework for continual text-to-3D generation that prevents catastrophic forgetting on new textual categories while preserving quality on previously seen classes.
-
Query Lower Bounds for Diffusion Sampling
Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.
-
OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models
OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
-
Generative models on phase space
Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
-
Flow-GRPO: Training Flow Matching Models via Online RL
Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
-
Building Normalizing Flows with Stochastic Interpolants
Normalizing flows are constructed by learning the velocity of a stochastic interpolant via a quadratic loss derived from its probability current, yielding an efficient ODE-based alternative to diffusion models.
-
Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs
A new speculative inference system speeds up diffusion VLAs to 19.1 ms average latency (3.04x faster) on LIBERO by replacing most full 58 ms inferences with 7.8 ms draft rounds while preserving task performance.
-
Sampling from Flow Language Models via Marginal-Conditioned Bridges
Marginal-conditioned bridges enable training-free sampling from Flow Language Models by drawing clean one-hot endpoints from factorized posteriors and using Ornstein-Uhlenbeck bridges, preserving token marginals and reducing denoising error versus conditional-mean bridges.
-
OP4KSR: One-Step Patch-Free 4K Super-Resolution with Periodic Artifact Suppression
OP4KSR enables efficient one-step 4K super-resolution without patches by adapting Flux with RoPE rescaling and periodicity loss to suppress artifacts.
-
OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation
OmniNFT introduces modality-wise advantage routing, layer-wise gradient surgery, and region-wise loss reweighting in an online diffusion RL framework to improve audio-video quality, alignment, and synchronization.
-
Aligning Flow Map Policies with Optimal Q-Guidance
Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
-
Morphologically Equivariant Flow Matching for Bimanual Mobile Manipulation
A morphologically equivariant flow matching policy for bimanual robots enforces reflective symmetry to improve sample efficiency and enable zero-shot generalization to mirrored task configurations.
-
Generative Transfer for Entropic Optimal Transport with Unknown Costs
A generative transfer framework using iterative path-wise tilting integrated with conditional flow matching recovers target entropic optimal transport couplings from reference samples, achieving O(δ) convergence in Wasserstein-1 distance.
-
$h$-control: Training-Free Camera Control via Block-Conditional Gibbs Refinement
h-control introduces block-conditional pseudo-Gibbs refinement for training-free camera control in flow-matching video generators, achieving superior FVD scores on RealEstate10K and DAVIS benchmarks.
-
One-Step Generative Modeling via Wasserstein Gradient Flows
W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x faster sampling than comparable multi-step models.
-
HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation
HorizonDrive enables stable long-horizon autoregressive driving simulation via anti-drifting teacher training with scheduled rollout recovery and teacher rollout distillation.
-
Zero-couplings of infinite measures with cyclically monotone support and multivariate regular variation
Existence and uniqueness of cyclically monotone zero-couplings are established for arbitrary pairs of infinite measures in M_0(R^d) under a Hausdorff-dimension condition, with the tail limit of such couplings for regularly varying distributions coinciding with the unique proper zero-coupling of the
-
SABER: A Scalable Action-Based Embodied Dataset for Real-World VLA Adaptation
SABER provides 44.8K multi-representation action samples from unscripted retail environments that raise a VLA model's mean success rate on ten manipulation tasks from 13.4% to 29.3%.
-
Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs
PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.
-
Physics-Informed Neural PDE Solvers via Spatio-Temporal MeanFlow
Spatio-Temporal MeanFlow adapts MeanFlow to PDEs by replacing the generative velocity field with the physical operator and extending the integral constraint to the spatio-temporal domain, yielding a unified solver for time-dependent and stationary equations with improved accuracy and generalization.
-
Generative Actor-Critic with Soft Bridge Policies
SoftGAC defines a stochastic bridge from base to action latent that converts the MaxEnt objective into a tractable relative-entropy term reducible to control energy, achieving competitive returns with one-pass sampling.
-
From Articulated Kinematics to Routed Visual Control for Action-Conditioned Surgical Video Generation
A kinematic-to-visual lifting paradigm combined with hierarchically routed control generates action-conditioned surgical videos with better faithfulness, fidelity, and efficiency.
-
Geometry-Aware Discretization Error of Diffusion Models
First-order asymptotic expansions of weak and Fréchet discretization errors in diffusion sampling are derived, explicit under Gaussian data through covariance geometry and robust to other data geometries.
-
Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences
Recursive generative retraining with pluralistic preferences converges to a stable diverse distribution that satisfies a weighted Nash bargaining solution.
-
Path-Coupled Bellman Flows for Distributional Reinforcement Learning
Path-Coupled Bellman Flows use source-consistent Bellman-coupled paths and a lambda-parameterized control-variate to learn return distributions via flow matching, improving fidelity and stability over prior DRL approaches.
-
OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation
OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.
-
Arena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Models
ArenaPO infers Gaussian capability distributions from pairwise preferences and applies truncated-normal latent inference to derive fine-grained offline rewards for preference optimization of text-to-image diffusion models.
-
Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors
Diffusion model priors enable training-free Bayesian sampling for more accurate rain field reconstruction from path-integrated commercial microwave link measurements than Gaussian process baselines.
-
PerFlow: Physics-Embedded Rectified Flow for Efficient Reconstruction and Uncertainty Quantification of Spatiotemporal Dynamics
PerFlow embeds physics constraints into rectified flow sampling through guidance-free conditioning and constraint-preserving projections, achieving efficient sparse reconstruction and uncertainty quantification for spatiotemporal dynamics.
-
Generative Modeling with Orbit-Space Particle Flow Matching
OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
-
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning
FAN achieves state-of-the-art offline RL performance on robotic tasks by anchoring flow policies and using single-sample noise-conditioned Q-learning, with proven convergence and reduced runtimes.
-
Arbitrarily Conditioned Hierarchical Flows for Spatiotemporal Events
ARCH is a hierarchical flow-based generative model that enables tractable conditional intensity computation and arbitrary conditioning for spatiotemporal event distributions.
-
Being-H0.7: A Latent World-Action Model from Egocentric Videos
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
-
ABC: Any-Subset Autoregression via Non-Markovian Diffusion Bridges in Continuous Time and Space
ABC enables any-subset autoregressive generation of continuous stochastic processes via non-Markovian diffusion bridges that track physical time and allow path-dependent conditioning.
-
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance
FMRG is a training-free, single-trajectory guidance method for flow models derived from optimal control that achieves strong reward alignment with only 3 NFEs.
-
DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors
Discrete diffusion policies support native asynchronous execution via unmasking for real-time chunking, delivering higher success rates and 0.7x inference cost versus flow-matching RTC on dynamic robotics benchmarks and real pick tasks.
-
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
CF-VLA uses a coarse initialization over endpoint velocity followed by single-step refinement to achieve strong performance with low inference steps on CALVIN, LIBERO, and real-robot tasks.
-
Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling
Talker-T2AV achieves better lip-sync accuracy, video quality, and audio quality than dual-branch baselines by separating high-level shared autoregressive modeling from modality-specific low-level diffusion refinement in a joint audio-video generation framework.
-
Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization
Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
-
$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.
-
Beyond Expected Information Gain: Stable Bayesian Optimal Experimental Design with Integral Probability Metrics and Plug-and-Play Extensions
An IPM-based framework for Bayesian optimal experimental design is proposed that replaces KL-based expected information gain with Wasserstein, MMD, and energy distances, delivering stronger stability guarantees and plug-and-play extensions.
-
Annealed Langevin Monte Carlo for Flow ODE Sampling
ALMC-ODE uses annealed Langevin Monte Carlo with Jarzynski reweighting to produce a low-variance velocity estimator for flow ODE sampling, with an O(1/n) MSE bound and superior performance on multimodal benchmarks.
-
ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis
ReImagine decouples human appearance from temporal consistency via pretrained image backbones, SMPL-X motion guidance, and training-free video diffusion refinement to generate high-quality controllable videos.
-
Mask World Model: Predicting What Matters for Robust Robot Policy Learning
Mask World Model predicts semantic mask dynamics with video diffusion and integrates it with a diffusion policy head, outperforming RGB world models on LIBERO and RLBench while showing better real-world generalization and texture robustness.
-
Frequency-Forcing: From Scaling-as-Time to Soft Frequency Guidance
Frequency-Forcing guides pixel flow-matching with a data-derived low-frequency auxiliary stream to softly enforce scale-ordered generation, improving FID on ImageNet-256 over baselines.