super hub Mixed citations

Flow Matching for Generative Modeling

Heli Ben-Hamu, Maximilian Nickel, Ricky TQ Chen, Yaron Lipman · 2022 · cs.LG · arXiv 2210.02747

Mixed citation behavior. Most common role is method (47%).

750 Pith papers citing it

Method 47% of classified citations

open full Pith review browse 750 citing papers more from Heli Ben-Hamu arXiv PDF

abstract

We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more robust and stable alternative for training diffusion models. Furthermore, Flow Matching opens the door to training CNFs with other, non-diffusion probability paths. An instance of particular interest is using Optimal Transport (OT) displacement interpolation to define the conditional probability paths. These paths are more efficient than diffusion paths, provide faster training and sampling, and result in better generalization. Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality, and allows fast and reliable sample generation using off-the-shelf numerical ODE solvers.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 79 method 74 baseline 5

citation-polarity summary

use method 74 background 71 unclear 6 baseline 5 support 2

claims ledger

abstract We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more

authors

and Matt Le Heli Ben-Hamu Maximilian Nickel Ricky TQ Chen Yaron Lipman

co-cited works

representative citing papers

Test-time Adversarial Takeover: A Real-time Hijacking Interface against Robotic Diffusion Policies

cs.RO · 2026-06-09 · unverdicted · novelty 8.0

TAKO demonstrates real-time adversarial takeover of robotic diffusion policies via reusable universal patches on visual inputs, achieving 100% success in steering attacker-chosen trajectories across multiple tasks, encoders, and diffusion methods.

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

eess.AS · 2026-06-02 · unverdicted · novelty 8.0

WavTTS is the first raw-waveform diffusion TTS model using DiT flow matching and multi-scale mel supervision that approaches SOTA latent zero-shot performance while beating prior end-to-end models.

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

cs.CV · 2026-05-13 · unverdicted · novelty 8.0

AnyFlow enables any-step video diffusion by distilling flow-map transitions over arbitrary time intervals with on-policy backward simulation.

TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking

cs.CV · 2026-05-12 · unverdicted · novelty 8.0

TrackCraft3R is the first method to repurpose a video diffusion transformer as a feed-forward dense 3D tracker via dual-latent representations and temporal RoPE alignment, achieving SOTA performance with lower compute.

What Time Is It? How Data Geometry Makes Time Conditioning Optional for Flow Matching

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Data geometry makes time identifiable from noisy interpolants at rate O(1/sqrt(d-k)), rendering the time-blindness gap asymptotically negligible relative to coupling variance.

Generative Modeling with Flux Matching

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices beyond traditional score matching.

A-CODE: Fully Atomic Protein Co-Design with Unified Multimodal Diffusion

q-bio.QM · 2026-05-05 · unverdicted · novelty 8.0

A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain on hard binder-design tasks.

Divergence is Uncertainty: A Closed-Form Posterior Covariance for Flow Matching

cs.LG · 2026-05-01 · unverdicted · novelty 8.0 · 3 refs

Derives closed-form posterior covariance for flow matching from divergence of velocity field, enabling post-hoc uncertainty on pre-trained models including one-step generators.

How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance

cs.LG · 2026-04-29 · unverdicted · novelty 8.0 · 3 refs

FMRG reformulates guidance as deterministic optimal control, deriving a single-trajectory method using the flow map that matches or exceeds baselines on reward-guided generation and inverse problems with 3 NFEs at text-to-image scale.

ReConText3D: Replay-based Continual Text-to-3D Generation

cs.CV · 2026-04-15 · conditional · novelty 8.0

ReConText3D is the first replay-memory framework for continual text-to-3D generation that prevents catastrophic forgetting on new textual categories while preserving quality on previously seen classes.

Query Lower Bounds for Diffusion Sampling

cs.LG · 2026-04-12 · unverdicted · novelty 8.0

Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.

OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models

cs.CV · 2026-04-05 · unverdicted · novelty 8.0

OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.

Generative models on phase space

hep-ph · 2026-04-02 · unverdicted · novelty 8.0

Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.

FlowHijack: A Dynamics-Aware Backdoor Attack on Flow-Matching Vision-Language-Action Models

cs.CV · 2026-03-30 · unverdicted · novelty 8.0

FlowHijack is the first dynamics-aware backdoor attack on flow-matching VLAs that achieves high success rates with stealthy triggers while preserving benign performance and making malicious actions kinematically indistinguishable from normal ones.

Flow-GRPO: Training Flow Matching Models via Online RL

cs.CV · 2025-05-08 · unverdicted · novelty 8.0

Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.

Building Normalizing Flows with Stochastic Interpolants

cs.LG · 2022-09-30 · conditional · novelty 8.0 · 2 refs

Normalizing flows are constructed by learning the velocity of a stochastic interpolant via a quadratic loss derived from its probability current, yielding an efficient ODE-based alternative to diffusion models.

QWERTY: Training-Free Motion Control via Query-Warped Video Diffusion Transformers

cs.CV · 2026-07-02 · unverdicted · novelty 7.0

QWERTY enables training-free motion control in pretrained image-to-video DiTs by warping the frame-invariant semantic subspace of queries in 3D full attention and using the predicted noise as self-guidance for latent optimization.

Diffeomorphic Optimization

cs.LG · 2026-07-01 · unverdicted · novelty 7.0

Proposes diffeomorphic optimization for manifold-constrained problems in generative models via flow maps, with Lie-group extensions for protein design showing metric improvements.

Self-conditioned Flow Map Language Models via Fixed-point Flows

cs.CL · 2026-07-01 · unverdicted · novelty 7.0

Self-conditioned flow language models solve fixed-point iterations, enabling fixed-point flow maps that distill into FMLM* which outperforms SOTA in few-step generation on OpenWebText.

Flow-Map GRPO: Reinforcement Learning for Few-Step Flow-Map Generators via Anchored Stochastic Composition

cs.LG · 2026-07-01 · unverdicted · novelty 7.0

Flow-Map GRPO uses anchored stochastic flow map composition to enable GRPO-based RL alignment of deterministic few-step flow-map generators while preserving their marginal paths.

Cross-Space Distillation: Teaching One-Step Students with Modern Diffusion Teachers

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

Introduces a Bridge latent interface that maps mismatched student latents into teacher space, enabling distillation from modern diffusion teachers to compact one-step students and raising SD 1.5 HPSv3 from 5.4 to 9.4 while keeping one-step speed.

OopsieVerse: A Safety Benchmark with Damage-Aware Simulation for Robot Manipulation

cs.RO · 2026-06-30 · unverdicted · novelty 7.0

OOPSIEVERSE is a new damage-aware simulation benchmark for household robot manipulation that converts contact, thermal, and fluid signals into task-agnostic damage metrics and demonstrates uses in safer policy learning and benchmarking.

FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language Model

cs.SD · 2026-06-30 · unverdicted · novelty 7.0

FlexiSLM is the first spoken language model supporting dynamic and controllable frame rates on speech input and output, outperforming fixed-rate 7B models at high quality and enabling faster inference at lower rates like 6.25 Hz.

MUSE: Unlocking Timestep as Native Task Steering for One-Step Dense Prediction

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

MUSE shows that the native timestep embedding in diffusion models acts as a parameter-free steering signal for multi-task monocular depth and normal estimation via manifold decoupling in latent space.

citing papers explorer

Showing 10 of 10 citing papers after filters.

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling eess.AS · 2026-06-02 · unverdicted · none · ref 51 · internal anchor
WavTTS is the first raw-waveform diffusion TTS model using DiT flow matching and multi-scale mel supervision that approaches SOTA latent zero-shot performance while beating prior end-to-end models.
A Survey of Full-Duplex Spoken Dialogue Systems: Architectural Hierarchy, Interaction Ontology, and Decision State Machine eess.AS · 2026-06-17 · accept · none · ref 34 · internal anchor
A survey proposing an L0-L3 architectural hierarchy, T×I×R interaction ontology, and IDLE/LISTEN/SPEAK/WAIT/DUAL decision state machine for full-duplex spoken dialogue systems, documenting a realization gap between architectural potential and observed behavior due to training data limits.
HoliDubber: Holistic Video Dubbing for Complex Acoustic Scenes via Text-Guided Audio Synthesis eess.AS · 2026-06-08 · unverdicted · none · ref 41 · internal anchor
HoliDubber introduces a patch-based autoregressive diffusion transformer for joint text-guided synthesis of speech and ambient audio in video dubbing, with a new benchmark showing outperformance over prior speech-only methods.
BareWave: Waveform-Native Flow-Matching Text-to-Speech eess.AS · 2026-06-08 · unverdicted · none · ref 20 · internal anchor
BareWave develops a waveform-native flow-matching framework for direct text-to-waveform TTS using representation alignment, staged noise scheduling, and velocity-aware perceptual alignment to achieve strong zero-shot voice cloning results.
SenSE: Semantic-Aware High-Fidelity Universal Speech Enhancement eess.AS · 2025-09-29 · unverdicted · none · ref 17 · internal anchor
SenSE adds language-model semantic guidance to flow-matching generative speech enhancement via a dual-path masked conditioning strategy and reports SOTA results on distorted speech.
Local Diagnostics of Continuous Normalizing Flow for Out-of-Distribution Detection eess.AS · 2026-05-30 · unverdicted · none · ref 16 · 2 links · internal anchor
Lagrangian sub-flows and velocity-based geometric diagnostics in CNFs outperform likelihood for zero-shot phoneme mispronunciation detection in speech models.
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching eess.AS · 2024-10-09 · unverdicted · none · ref 118 · internal anchor
F5-TTS generates natural speech from text via flow matching on DiT with simple text padding, ConvNeXt refinement, and sway sampling, trained on 100K hours multilingual data.
MeloDISinger: Melody-Aware & Duration-Preserving Singing Voice Editing with Audio Infilling eess.AS · 2026-06-29 · unverdicted · none · ref 30 · internal anchor
Proposes MeloDISinger, a flow-matching SVE model with MeloDRP for melody-aware duration-preserving editing and audio infilling, claiming SOTA results.
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer eess.AS · 2026-05-29 · unverdicted · none · ref 27 · internal anchor
SwanSphere introduces a causal autoregressive diffusion transformer architecture with SVAC contrastive learning and ODPO optimization for streaming spatial audio generation from video and text.
A Survey of Advancing Audio Super-Resolution and Bandwidth Extension from Discriminative to Generative Models eess.AS · 2026-05-15 · unverdicted · none · ref 38 · internal anchor
A structured survey of audio bandwidth extension that organizes the transition from deterministic discriminative DNNs to generative approaches including GANs, diffusion models, and flow-based methods.

Flow Matching for Generative Modeling

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer