Flow Matching for Generative Modeling

Heli Ben-Hamu; Matt Le; Maximilian Nickel; Ricky T. Q. Chen; Yaron Lipman

arxiv: 2210.02747 · v2 · submitted 2022-10-06 · 💻 cs.LG · cs.AI· stat.ML

Flow Matching for Generative Modeling

Yaron Lipman , Ricky T. Q. Chen , Heli Ben-Hamu , Maximilian Nickel , Matt Le This is my paper

Pith reviewed 2026-05-24 10:59 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords flow matchingcontinuous normalizing flowsgenerative modelingoptimal transportdiffusion modelsvector field regressionImageNet

0 comments

The pith

Flow Matching trains Continuous Normalizing Flows by regressing vector fields of fixed conditional probability paths.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Flow Matching as a simulation-free objective for training CNFs at large scale. It regresses a neural network directly onto the vector fields of chosen conditional paths that connect noise to data, without needing to simulate trajectories during training. The framework includes diffusion paths as one option but highlights Optimal Transport displacement interpolation as a more efficient alternative that yields faster training, faster sampling, and stronger results on ImageNet. The learned marginal vector field can then be integrated with standard ODE solvers to generate samples.

Core claim

Flow Matching defines a training objective that regresses a neural network to match the conditional vector field of a fixed probability path; integrating the resulting marginal vector field transports samples from the base distribution to the data distribution.

What carries the argument

Flow Matching objective, which regresses on conditional vector fields of Gaussian probability paths (including OT interpolation) to recover the marginal flow.

If this is right

CNF training becomes simulation-free and scales to ImageNet-sized data.
OT-based paths converge faster and generate samples more quickly than diffusion paths.
Generation works reliably with off-the-shelf numerical ODE solvers.
Likelihood and sample quality improve over diffusion-based baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same regression approach could be applied to non-Gaussian paths that further reduce the number of integration steps needed.
Stability gains from conditional regression might allow CNFs to be combined with discrete architectures without custom stabilization tricks.
The separation between path choice and regression opens a route to optimize the probability path itself for a given data domain.

Load-bearing premise

Regressing a network on the conditional vector fields produces a vector field whose flow matches the desired marginal data distribution.

What would settle it

Integrating the trained vector field from noise samples fails to produce outputs whose distribution matches the data under standard metrics such as negative log-likelihood or FID.

read the original abstract

We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more robust and stable alternative for training diffusion models. Furthermore, Flow Matching opens the door to training CNFs with other, non-diffusion probability paths. An instance of particular interest is using Optimal Transport (OT) displacement interpolation to define the conditional probability paths. These paths are more efficient than diffusion paths, provide faster training and sampling, and result in better generalization. Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality, and allows fast and reliable sample generation using off-the-shelf numerical ODE solvers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Flow Matching gives a simulation-free way to train CNFs that holds by construction via conditional regression, and OT paths look practically better than diffusion ones.

read the letter

The main point is that this paper shows how to train continuous normalizing flows without any ODE simulation by regressing a network on the vector fields of fixed conditional probability paths. The key derivation establishes that the conditional flow matching loss has identical gradients to the marginal loss, so the trained field transports the right distribution. That equivalence is parameter-free and follows directly from the setup, which removes the usual simulation burden in CNF training. They also generalize the paths beyond diffusion to include optimal transport displacement interpolation, which produces straighter trajectories and yields faster sampling with off-the-shelf solvers. The OT version is presented as more efficient for both training and generation while still fitting inside the same framework. On the positive side, the math is clean and the justification for why conditional regression suffices is explicit and does not rely on extra assumptions. The paper is therefore a direct, usable alternative for anyone already working with CNFs or diffusion models. The main soft spot is the empirical section. The abstract claims consistent gains on ImageNet for both likelihood and sample quality, but without seeing the exact baselines, run counts, or variance numbers it is hard to gauge how large or robust those improvements are. If the full experiments include proper controls and multiple seeds, the practical case strengthens; otherwise the gains could be narrower than stated. This work is aimed at people building or improving continuous-time generative models. It is worth sending to peer review because the central theoretical step is solid and the OT extension is a concrete, testable addition that others can build on.

Referee Report

1 major / 0 minor

Summary. The paper introduces Flow Matching (FM), a simulation-free paradigm for training Continuous Normalizing Flows (CNFs) by regressing vector fields defined on fixed conditional probability paths. It shows compatibility with a family of Gaussian paths (including diffusion paths as special cases) and Optimal Transport displacement interpolation paths, derives that the conditional FM objective has identical gradients to the marginal objective, and claims that FM-trained CNFs achieve better likelihood and sample quality than diffusion-based methods on ImageNet while enabling fast, reliable sampling via off-the-shelf ODE solvers.

Significance. If the empirical claims hold, the work is significant for providing a theoretically clean, parameter-free unification of diffusion and flow-based generative modeling that enables scalable CNF training without simulation. A notable strength is the derivation that the conditional flow-matching loss has gradients identical to the marginal loss (differing only by a model-independent constant), which directly justifies the approach without additional assumptions.

major comments (1)

[Experimental evaluation] Experimental evaluation: The manuscript states that 'Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality' but provides no details on experimental setup, baselines, number of runs, statistical significance, or error bars, making the central performance claim unverifiable.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and constructive feedback. We address the single major comment below.

read point-by-point responses

Referee: The manuscript states that 'Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality' but provides no details on experimental setup, baselines, number of runs, statistical significance, or error bars, making the central performance claim unverifiable.

Authors: We agree that the experimental claims require fuller documentation to be verifiable. The revised manuscript will expand the experimental section (currently Section 4) with complete details on the ImageNet training setup, the exact diffusion baselines and their configurations, the number of independent runs, error bars, and any statistical tests performed. This addresses the concern directly without altering the reported results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The central derivation in Section 3 establishes that the conditional flow-matching loss has identical gradients w.r.t. network parameters to the marginal loss (difference is a model-independent constant). This equivalence follows directly from the definitions of the conditional probability paths and the regression objective; it is parameter-free and does not rely on fitted quantities, self-citations, or ansatzes imported from prior work. No step reduces the claimed result to its own inputs by construction. Empirical ImageNet results remain an external question.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that conditional vector-field regression yields the correct marginal flow; this is a domain assumption in continuous normalizing flows rather than a new invented entity or fitted parameter.

axioms (1)

domain assumption The marginal vector field of the unconditional probability path equals the expectation of the conditional vector fields under the data distribution.
This is the mathematical justification for the Flow Matching regression objective stated in the abstract.

pith-pipeline@v0.9.0 · 5731 in / 1207 out tokens · 31963 ms · 2026-05-24T10:59:08.401215+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 2. ... LCFM and LFM are equal. Hence ∇θ LFM(θ) = ∇θ LCFM(θ).
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Example II: Optimal Transport conditional VFs. ... ut(x|x1) = x1 − (1−σmin)x / (1−(1−σmin)t)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
cs.CV 2026-05 unverdicted novelty 8.0

AnyFlow enables any-step video diffusion by distilling flow-map transitions over arbitrary time intervals with on-policy backward simulation.
TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking
cs.CV 2026-05 unverdicted novelty 8.0

TrackCraft3R is the first method to repurpose a video diffusion transformer as a feed-forward dense 3D tracker via dual-latent representations and temporal RoPE alignment, achieving SOTA performance with lower compute.
What Time Is It? How Data Geometry Makes Time Conditioning Optional for Flow Matching
cs.LG 2026-05 unverdicted novelty 8.0

Data geometry makes time identifiable from noisy interpolants at rate O(1/sqrt(d-k)), rendering the time-blindness gap asymptotically negligible relative to coupling variance.
Generative Modeling with Flux Matching
cs.LG 2026-05 unverdicted novelty 8.0

Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices be...
A-CODE: Fully Atomic Protein Co-Design with Unified Multimodal Diffusion
q-bio.QM 2026-05 unverdicted novelty 8.0

A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain...
Divergence is Uncertainty: A Closed-Form Posterior Covariance for Flow Matching
cs.LG 2026-05 unverdicted novelty 8.0

In flow matching, the uncertainty of the clean data given the current state is exactly the divergence of the velocity field (up to a known scalar).
ReConText3D: Replay-based Continual Text-to-3D Generation
cs.CV 2026-04 conditional novelty 8.0

ReConText3D is the first replay-memory framework for continual text-to-3D generation that prevents catastrophic forgetting on new textual categories while preserving quality on previously seen classes.
Query Lower Bounds for Diffusion Sampling
cs.LG 2026-04 unverdicted novelty 8.0

Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.
OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models
cs.CV 2026-04 unverdicted novelty 8.0

OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
Generative models on phase space
hep-ph 2026-04 unverdicted novelty 8.0

Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
FlowHijack: A Dynamics-Aware Backdoor Attack on Flow-Matching Vision-Language-Action Models
cs.CV 2026-03 unverdicted novelty 8.0

FlowHijack is the first dynamics-aware backdoor attack on flow-matching VLAs that achieves high success rates with stealthy triggers while preserving benign performance and making malicious actions kinematically indis...
Flow-GRPO: Training Flow Matching Models via Online RL
cs.CV 2025-05 unverdicted novelty 8.0

Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
Building Normalizing Flows with Stochastic Interpolants
cs.LG 2022-09 conditional novelty 8.0

Normalizing flows are constructed by learning the velocity of a stochastic interpolant via a quadratic loss derived from its probability current, yielding an efficient ODE-based alternative to diffusion models.
Geo-Align: Video Generation Alignment via Metric Geometry Reward
cs.CV 2026-05 unverdicted novelty 7.0

Geo-Align applies RL with a perceptual reward derived from 3D camera trajectory estimation to improve controllability and fidelity in video generation without paired training data.
GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction
cs.CV 2026-05 unverdicted novelty 7.0

GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.
VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset
cs.CV 2026-05 unverdicted novelty 7.0

VINS-120K supplies the first large-scale set of instruction-image-edited-image triplets at ultra-high resolution together with an adaptation strategy that improves detail synthesis.
VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation
cs.CV 2026-05 unverdicted novelty 7.0

VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.
Increasing the Precision of Surrogate Models for Weak Lensing Mass Maps with Flow Matching
astro-ph.CO 2026-05 unverdicted novelty 7.0

A flow matching generative model produces weak lensing mass maps with fidelity improved to below 1% and 5% on basic and higher-order statistics relative to GAN benchmarks.
Let EEG Models Learn EEG
cs.CV 2026-05 unverdicted novelty 7.0

JET is a conditional flow matching framework that generates EEG as continuous raw sequences with added constraints for spectral and temporal properties, achieving over 40% lower TS-FID than prior discrete denoising me...
Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models
cs.CV 2026-05 unverdicted novelty 7.0

Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.
Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting
cs.LG 2026-05 unverdicted novelty 7.0

PG-DPO is a new variational framework that replaces Bellman recursion with a Pontryagin-guided adjoint-MC projection for RL under non-exponential discounting and shows gains on hyperbolic and survival benchmarks.
CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation
cs.LG 2026-05 unverdicted novelty 7.0

CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.
MetaEarth-MM: Unified Multimodal Remote Sensing Image Generation with Scene-centered Joint Modeling
cs.CV 2026-05 conditional novelty 7.0

MetaEarth-MM unifies multi-modal remote sensing image generation and any-to-any translation across five modalities via scene-centered joint modeling on the new EarthMM dataset.
Probability-Conserving Flow Guidance
cs.CV 2026-05 unverdicted novelty 7.0

AdaMaG is a guidance rule for generative models derived from decomposing continuity-equation effects into divergence and score-parallel terms, with a proof that divergence diverges near the manifold and a time-depende...
Inference-Time Scaling in Diffusion Models through Iterative Partial Refinement
cs.LG 2026-05 unverdicted novelty 7.0

IPR improves valid solution rates on MNIST Sudoku from 55.8% to 75.0% by iteratively refining partial regions in sequential diffusion models without external verifiers or reward models.
Mat\'ern Noise for Triangulation-Agnostic Flow Matching on Meshes
cs.GR 2026-05 unverdicted novelty 7.0

Proposes discretized Matérn process noise for triangulation-agnostic flow matching on meshes with PoissonNet denoiser, tested on elastic states and humanoid poses for meshes exceeding one million triangles.
Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR
cs.LG 2026-05 conditional novelty 7.0

Pion modifies Muon's Newton-Schulz iterations into a controllable high-pass filter that anchors dominant singular values at 1 while suppressing noisy tails, outperforming Muon and AdamW in VLA and RLVR regimes.
SURGE: Approximation and Training Free Particle Filter for Diffusion Surrogate
stat.ML 2026-05 unverdicted novelty 7.0

URGE performs unbiased path-wise importance reweighting via Girsanov estimation for derivative-free inference-time scaling in diffusion models, proving equivalence to particle-wise SMC and outperforming baselines empirically.
StableHand: Quality-Aware Flow Matching for World-Space Dual-Hand Motion Estimation from Egocentric Video
cs.CV 2026-05 unverdicted novelty 7.0

StableHand introduces a quality-aware flow matching framework conditioned on predicted four-channel per-frame hand observation quality to estimate dual-hand world-space motion from egocentric video, achieving SOTA res...
Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms
hep-ph 2026-05 unverdicted novelty 7.0

Nested-GPT is an autoregressive Transformer surrogate that generates variable-multiplicity parton showers while enforcing ordered Markovian branching and matches reference Monte Carlo results for leading-log non-globa...
Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms
hep-ph 2026-05 unverdicted novelty 7.0

Nested-GPT is an autoregressive Transformer that dynamically generates variable-multiplicity parton showers matching Monte Carlo references for non-global logarithm resummation in the large-Nc limit.
Learning Unbiased Permutations via Flow Matching
cs.LG 2026-05 unverdicted novelty 7.0

PermFlow applies conditional flow matching on the affine subspace of doubly stochastic matrices with a closed-form tangent projector and nearest-target coupling to capture multimodal permutation distributions.
LiWi: Layering in the Wild
cs.CV 2026-05 unverdicted novelty 7.0

Introduces LiWi-100k dataset via agent-orchestrated synthesis and a decomposition model with shadow-guided learning and boundary correction that claims state-of-the-art RGB L1 and Alpha IoU on natural images.
LiWi: Layering in the Wild
cs.CV 2026-05 unverdicted novelty 7.0

LiWi uses an agent-driven data synthesis pipeline to build the LiWi-100k dataset and a model with shadow-guided and degradation-restoration objectives that achieves SoTA performance on RGB L1 and Alpha IoU for natural...
HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention
cs.CV 2026-05 unverdicted novelty 7.0

HASTE delivers up to 1.93x speedup on Wan2.1 video DiTs via head-wise adaptive sparse attention using temporal mask reuse and error-guided per-head calibration while preserving video quality.
TeDiO: Temporal Diagonal Optimization for Training-Free Coherent Video Diffusion
cs.CV 2026-05 unverdicted novelty 7.0

TeDiO regularizes temporal diagonals in diffusion transformer attention maps to produce smoother video motion while keeping per-frame quality intact.
Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs
cs.RO 2026-05 unverdicted novelty 7.0

A new speculative inference system speeds up diffusion VLAs to 19.1 ms average latency (3.04x faster) on LIBERO by replacing most full 58 ms inferences with 7.8 ms draft rounds while preserving task performance.
Sampling from Flow Language Models via Marginal-Conditioned Bridges
cs.LG 2026-05 unverdicted novelty 7.0

Marginal-conditioned bridges enable training-free sampling from Flow Language Models by drawing clean one-hot endpoints from factorized posteriors and using Ornstein-Uhlenbeck bridges, preserving token marginals and r...
OP4KSR: One-Step Patch-Free 4K Super-Resolution with Periodic Artifact Suppression
cs.CV 2026-05 unverdicted novelty 7.0

OP4KSR enables efficient one-step 4K super-resolution without patches by adapting Flux with RoPE rescaling and periodicity loss to suppress artifacts.
Constraint-Aware Flow Matching: Decision Aligned End-to-End Training for Constrained Sampling
cs.LG 2026-05 unverdicted novelty 7.0

Constraint-Aware Flow Matching integrates constraint projections into the flow matching training objective to align model dynamics with constrained sampling and reduce distributional shift.
OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation
cs.CV 2026-05 unverdicted novelty 7.0

OmniNFT introduces modality-wise advantage routing, layer-wise gradient surgery, and region-wise loss reweighting in an online diffusion RL framework to improve audio-video quality, alignment, and synchronization.
Aligning Flow Map Policies with Optimal Q-Guidance
cs.LG 2026-05 unverdicted novelty 7.0

Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
Morphologically Equivariant Flow Matching for Bimanual Mobile Manipulation
cs.RO 2026-05 conditional novelty 7.0

A morphologically equivariant flow matching policy for bimanual robots enforces reflective symmetry to improve sample efficiency and enable zero-shot generalization to mirrored task configurations.
Generative Transfer for Entropic Optimal Transport with Unknown Costs
math.OC 2026-05 unverdicted novelty 7.0

A generative transfer framework using iterative path-wise tilting integrated with conditional flow matching recovers target entropic optimal transport couplings from reference samples, achieving O(δ) convergence in Wa...
$h$-control: Training-Free Camera Control via Block-Conditional Gibbs Refinement
cs.CV 2026-05 unverdicted novelty 7.0

h-control introduces block-conditional pseudo-Gibbs refinement for training-free camera control in flow-matching video generators, achieving superior FVD scores on RealEstate10K and DAVIS benchmarks.
One-Step Generative Modeling via Wasserstein Gradient Flows
cs.LG 2026-05 conditional novelty 7.0

W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...
HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation
cs.CV 2026-05 conditional novelty 7.0

HorizonDrive enables stable long-horizon autoregressive driving simulation via anti-drifting teacher training with scheduled rollout recovery and teacher rollout distillation.
Zero-couplings of infinite measures with cyclically monotone support and multivariate regular variation
math.PR 2026-05 unverdicted novelty 7.0

Existence and uniqueness of cyclically monotone zero-couplings are established for arbitrary pairs of infinite measures in M_0(R^d) under a Hausdorff-dimension condition, with the tail limit of such couplings for regu...
SABER: A Scalable Action-Based Embodied Dataset for Real-World VLA Adaptation
cs.RO 2026-05 unverdicted novelty 7.0

SABER provides 44.8K multi-representation action samples from unscripted retail environments that raise a VLA model's mean success rate on ten manipulation tasks from 13.4% to 29.3%.
Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs
cs.CV 2026-05 unverdicted novelty 7.0

PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.
Physics-Informed Neural PDE Solvers via Spatio-Temporal MeanFlow
cs.LG 2026-05 unverdicted novelty 7.0

Spatio-Temporal MeanFlow adapts MeanFlow to PDEs by replacing the generative velocity field with the physical operator and extending the integral constraint to the spatio-temporal domain, yielding a unified solver for...
Generative Actor-Critic with Soft Bridge Policies
cs.LG 2026-05 unverdicted novelty 7.0

SoftGAC defines a stochastic bridge from base to action latent that converts the MaxEnt objective into a tractable relative-entropy term reducible to control energy, achieving competitive returns with one-pass sampling.
From Articulated Kinematics to Routed Visual Control for Action-Conditioned Surgical Video Generation
cs.CV 2026-05 unverdicted novelty 7.0

A kinematic-to-visual lifting paradigm combined with hierarchically routed control generates action-conditioned surgical videos with better faithfulness, fidelity, and efficiency.
ACWM-Phys: Investigating Generalized Physical Interaction in Action-Conditioned Video World Models
cs.CV 2026-05 unverdicted novelty 7.0

ACWM-Phys is a controllable simulator benchmark with in- and out-of-distribution protocols for evaluating action-conditioned world models across rigid, kinematic, deformable, and particle dynamics.
A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots
cs.LG 2026-05 unverdicted novelty 7.0

Wasserstein Lagrangian Mechanics formalizes second-order dynamics in Wasserstein space and provides an algorithm to learn them from observed marginals without specifying the Lagrangian, outperforming gradient flows on...
A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots
cs.LG 2026-05 unverdicted novelty 7.0

Wasserstein Lagrangian Mechanics learns second-order population dynamics from observed marginal snapshots without specifying the Lagrangian and outperforms gradient flow methods on tasks like vortex dynamics and embry...
A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots
cs.LG 2026-05 unverdicted novelty 7.0

Wasserstein Lagrangian Mechanics learns second-order population dynamics from observed marginals without specifying the Lagrangian and outperforms gradient flow methods on periodic dynamics like vortex motion and flocking.
Geometry-Aware Discretization Error of Diffusion Models
cs.LG 2026-05 unverdicted novelty 7.0

First-order asymptotic expansions of weak and Fréchet discretization errors in diffusion sampling are derived, explicit under Gaussian data through covariance geometry and robust to other data geometries.
Flow-OPD: On-Policy Distillation for Flow Matching Models
cs.CV 2026-05 conditional novelty 7.0

Flow-OPD applies on-policy distillation to flow matching models via specialized teachers, cold-start initialization, and manifold anchor regularization, lifting GenEval from 63 to 92 and OCR from 59 to 94 on Stable Di...
One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy
cs.CV 2026-05 conditional novelty 7.0

Reducing visual input to one token per frame in VLA world models maintains or improves long-horizon performance on MetaWorld, LIBERO, and real-robot tasks.