arxiv: 2505.13447 · v1 · submitted 2025-05-19 · 💻 cs.LG · cs.CV

Recognition: 2 theorem links

· Lean Theorem

Mean Flows for One-step Generative Modeling

Zhengyang Geng , Mingyang Deng , Xingjian Bai , J. Zico Kolter , Kaiming He

Authors on Pith no claims yet

Pith reviewed 2026-05-11 14:24 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords MeanFlowone-step generationaverage velocityflow matchingImageNet generationdiffusion modelsgenerative modeling

0 comments

The pith

MeanFlow derives an identity between average and instantaneous velocities to enable high-quality one-step generative modeling from scratch.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes MeanFlow as a framework for one-step generative modeling that shifts from modeling instantaneous velocity, as in Flow Matching, to using average velocity over intervals. It derives a specific identity linking these velocities and trains a neural network to follow it directly for sampling. The resulting model needs no pre-training, distillation, or curriculum learning yet reaches an FID of 3.43 on ImageNet 256x256 with a single function evaluation, outperforming earlier one-step approaches and closing much of the gap to multi-step methods.

Core claim

We introduce the notion of average velocity to characterize flow fields, in contrast to instantaneous velocity modeled by Flow Matching methods. A well-defined identity between average and instantaneous velocities is derived and used to guide neural network training. Our method, termed the MeanFlow model, is self-contained and requires no pre-training, distillation, or curriculum learning. MeanFlow demonstrates strong empirical performance: it achieves an FID of 3.43 with a single function evaluation (1-NFE) on ImageNet 256x256 trained from scratch, significantly outperforming previous state-of-the-art one-step diffusion/flow models.

What carries the argument

The derived identity relating average velocity (integral of velocity over a time interval) to instantaneous velocity at a chosen point, which is used to supervise neural network training for direct one-step sampling.

If this is right

One-step models become competitive with multi-step diffusion and flow models on large image datasets without extra training stages.
Generative training simplifies because the network learns directly from the velocity identity rather than through distillation or curriculum.
The performance gap between single-evaluation and iterative sampling narrows substantially on tasks like ImageNet 256x256 generation.
The same identity-based training can be applied to other flow-based generative settings to improve efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The average-velocity idea could be adapted to conditional generation or other modalities by changing how the interval for averaging is chosen.
Variations in the exact form of the identity might yield further gains in sample quality or training stability.
Similar averaging principles might be tested in non-flow generative models to see if they reduce the need for many sampling steps.

Load-bearing premise

A neural network can accurately learn to produce samples by following the average-to-instantaneous velocity identity without any implicit reliance on multi-step pre-training or adjustments.

What would settle it

Train a MeanFlow model from scratch on a held-out dataset using only the velocity identity loss and measure whether one-step samples reach FID scores close to multi-step baselines without any extra techniques.

read the original abstract

We propose a principled and effective framework for one-step generative modeling. We introduce the notion of average velocity to characterize flow fields, in contrast to instantaneous velocity modeled by Flow Matching methods. A well-defined identity between average and instantaneous velocities is derived and used to guide neural network training. Our method, termed the MeanFlow model, is self-contained and requires no pre-training, distillation, or curriculum learning. MeanFlow demonstrates strong empirical performance: it achieves an FID of 3.43 with a single function evaluation (1-NFE) on ImageNet 256x256 trained from scratch, significantly outperforming previous state-of-the-art one-step diffusion/flow models. Our study substantially narrows the gap between one-step diffusion/flow models and their multi-step predecessors, and we hope it will motivate future research to revisit the foundations of these powerful models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MeanFlow swaps in average velocity for the usual instantaneous kind in flow matching, derives an identity to train a one-step model from scratch, and reports an FID of 3.43 on ImageNet 256x256.

read the letter

The main thing to know is that this paper replaces instantaneous velocity with average velocity in flow matching, derives an identity between the two, and uses that identity to train a network for single-step sampling. They call it MeanFlow and show it reaches FID 3.43 on ImageNet 256x256 with one function evaluation, trained entirely from scratch with no pre-training or distillation. That result is the clearest signal the work sends. It narrows the gap to multi-step models more than most prior one-step attempts have managed. The new element is the average-velocity framing and the guiding identity; standard flow-matching papers focus on instantaneous velocity, so this is a distinct shift even if the overall setup stays close to existing flow models. The self-contained training is also a practical plus. It avoids the extra stages that many one-step methods still need. If the baselines are matched fairly and the numbers hold, the empirical side gives people working on real-time generation something concrete to try. The soft spot is whether the neural network actually learns the identity cleanly enough in practice. The stress-test note flags a real possibility that discretization of the time integral or optimization dynamics could let multi-step information leak in, even if the abstract claims the method is purely self-contained. The provided abstract is light on the exact equations, training protocol, and error breakdowns, so it is hard to judge how tightly the single-step samples depend on the identity alone. If the full paper has ablations that rule out hidden reliance and show the identity is learned to reasonable accuracy, that concern stays minor. Otherwise it is the part referees would press on hardest. This is aimed at researchers already working in flow matching or efficient diffusion who want simpler one-step inference. Someone familiar with the velocity-field literature would pick up the idea quickly and could test it on their own setups. It has a clear conceptual move and empirical numbers strong enough to deserve peer review rather than a desk reject. I would send it out so the derivations and the exact training details get checked.

Referee Report

2 major / 2 minor

Summary. The paper proposes MeanFlow, a framework for one-step generative modeling that defines average velocity (in contrast to instantaneous velocity in Flow Matching), derives an identity between the two, and uses the identity to directly train a neural network for single-step sampling. The method is claimed to be self-contained with no pre-training, distillation, or curriculum learning required. It reports an FID of 3.43 on ImageNet 256x256 with 1-NFE, outperforming prior one-step diffusion/flow models and narrowing the gap to multi-step predecessors.

Significance. If the empirical result and the learnability of the identity hold without hidden multi-step effects, the work would be significant for simplifying generative modeling pipelines by enabling high-quality one-step sampling from scratch. This could reduce inference cost and influence future designs of flow-based models. The self-contained training protocol, if verified, is a clear strength.

major comments (2)

[Abstract] Abstract: The central empirical claim of FID=3.43 (1-NFE, ImageNet 256x256, trained from scratch) is presented with no experimental details, training protocol, baselines, error bars, or ablation studies. This is load-bearing for the outperformance assertion and prevents verification of whether the derived identity alone suffices.
[Method] Method section (identity derivation): The identity relating average velocity to instantaneous velocity is used to guide training, but the manuscript does not demonstrate that this identity remains exact (or sufficiently accurate) under neural-network approximation, finite discretization of the time integral, and optimization in high-dimensional discrete data. If the identity only holds in the continuous limit, the 1-NFE sampler may not recover high-fidelity samples as claimed.

minor comments (2)

[Abstract] The abstract would be clearer if it briefly stated the form of the training objective or the key identity equation.
[Method] Notation for average vs. instantaneous velocity should be introduced with explicit equations early in the method section to avoid ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We address the major comments below, providing clarifications and indicating where revisions will be made to strengthen the paper.

read point-by-point responses

Referee: [Abstract] Abstract: The central empirical claim of FID=3.43 (1-NFE, ImageNet 256x256, trained from scratch) is presented with no experimental details, training protocol, baselines, error bars, or ablation studies. This is load-bearing for the outperformance assertion and prevents verification of whether the derived identity alone suffices.

Authors: We agree that the abstract, due to its brevity, does not include the full experimental details. However, the complete training protocol, including optimizer settings, batch size, number of training steps, data augmentation, and evaluation procedure, along with comparisons to baselines and ablation studies, are thoroughly documented in Sections 4 (Experiments) and 5 (Ablations) of the manuscript. The FID score is computed using the standard protocol with 50k samples and the official Inception-v3 model. To make the abstract more informative, we will revise it to include a short description of the training setup (e.g., 'trained from scratch on ImageNet 256x256 using a standard U-Net architecture for 500k iterations'). We believe this addresses the concern while maintaining the abstract's conciseness. Error bars are not reported as single-run results are standard in the field, but we can note the stability across seeds if needed. revision: yes
Referee: [Method] Method section (identity derivation): The identity relating average velocity to instantaneous velocity is used to guide training, but the manuscript does not demonstrate that this identity remains exact (or sufficiently accurate) under neural-network approximation, finite discretization of the time integral, and optimization in high-dimensional discrete data. If the identity only holds in the continuous limit, the 1-NFE sampler may not recover high-fidelity samples as claimed.

Authors: The identity in Equation (2) is derived exactly for the continuous-time case without any approximation. In practice, we discretize the time integral using a Riemann sum with a large number of steps during training (as detailed in the implementation), and the neural network is trained to minimize the discrepancy. We provide empirical evidence that this works: the model achieves state-of-the-art 1-NFE performance, which would not be possible if the approximation were poor. To directly address the concern, we will add a new paragraph in the Method section with a toy experiment (e.g., on 2D Gaussian mixtures) demonstrating that the learned average velocity closely matches the integrated instantaneous velocity even under NN approximation and coarse discretization. Additionally, we will include an analysis of the discretization error bound. This shows that the identity holds sufficiently accurately for high-dimensional image data as evidenced by the results. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained with no reduction to inputs

full rationale

The paper derives an identity between average and instantaneous velocities from first principles and uses this identity to define the training objective for the MeanFlow model. This derivation is presented as independent mathematical content that guides neural network training without relying on fitted parameters from the target data, self-citations for uniqueness, or renaming of known empirical patterns. The one-step sampling performance is reported as an empirical outcome of the trained model rather than a quantity forced by construction in the derivation itself. No load-bearing step reduces the central claim to a tautology or to the inputs by definition, making the framework self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities. The notion of average velocity is introduced but its precise definition and any supporting assumptions are not stated.

pith-pipeline@v0.9.0 · 5447 in / 1123 out tokens · 76441 ms · 2026-05-11T14:24:23.046196+00:00 · methodology

discussion (0)

Forward citations

Cited by 55 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
cs.CV 2026-05 unverdicted novelty 8.0

AnyFlow enables any-step video diffusion by distilling flow-map transitions over arbitrary time intervals with on-policy backward simulation.
Generative Modeling with Flux Matching
cs.LG 2026-05 unverdicted novelty 8.0

Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices be...
Divergence is Uncertainty: A Closed-Form Posterior Covariance for Flow Matching
cs.LG 2026-05 unverdicted novelty 8.0

In flow matching, the uncertainty of the clean data given the current state is exactly the divergence of the velocity field (up to a known scalar).
Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs
cs.RO 2026-05 unverdicted novelty 7.0

A new speculative inference system speeds up diffusion VLAs to 19.1 ms average latency (3.04x faster) on LIBERO by replacing most full 58 ms inferences with 7.8 ms draft rounds while preserving task performance.
Aligning Flow Map Policies with Optimal Q-Guidance
cs.LG 2026-05 unverdicted novelty 7.0

Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
DriftXpress: Faster Drifting Models via Projected RKHS Fields
cs.LG 2026-05 unverdicted novelty 7.0

DriftXpress approximates drifting kernels via projected RKHS fields to lower training cost of one-step generative models while matching original FID scores.
One-Step Generative Modeling via Wasserstein Gradient Flows
cs.LG 2026-05 conditional novelty 7.0

W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...
HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation
cs.CV 2026-05 conditional novelty 7.0

HorizonDrive enables stable long-horizon autoregressive driving simulation via anti-drifting teacher training with scheduled rollout recovery and teacher rollout distillation.
Physics-Informed Neural PDE Solvers via Spatio-Temporal MeanFlow
cs.LG 2026-05 unverdicted novelty 7.0

Spatio-Temporal MeanFlow adapts MeanFlow to PDEs by replacing the generative velocity field with the physical operator and extending the integral constraint to the spatio-temporal domain, yielding a unified solver for...
Normalizing Trajectory Models
cs.CV 2026-05 unverdicted novelty 7.0

NTM uses per-step conditional normalizing flows plus a trajectory-wide predictor to achieve exact-likelihood 4-step sampling that matches or exceeds baselines on text-to-image tasks.
Normalizing Trajectory Models
cs.CV 2026-05 unverdicted novelty 7.0

NTM models each generative reverse step as a conditional normalizing flow with a hybrid shallow-deep architecture, enabling exact-likelihood training and strong four-step sampling performance on text-to-image tasks.
Hydra-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control
cs.RO 2026-05 conditional novelty 7.0

Frequency analysis of smooth robot actions bounds denoising error to low-frequency modes, enabling a sub-1% parameter 3D diffusion policy with two-step inference that reaches SOTA on manipulation benchmarks.
CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making
cs.AI 2026-05 unverdicted novelty 7.0

CoFlow achieves state-of-the-art coordination quality in offline MARL using only 1-3 denoising steps by natively coupling velocity fields across agents via coordinated attention and gating.
CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making
cs.AI 2026-05 unverdicted novelty 7.0

CoFlow achieves state-of-the-art coordination in offline MARL using single-pass joint velocity fields with Coordinated Velocity Attention and Adaptive Coordination Gating.
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance
cs.LG 2026-04 unverdicted novelty 7.0

FMRG is a training-free, single-trajectory guidance method for flow models derived from optimal control that achieves strong reward alignment with only 3 NFEs.
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
cs.AI 2026-04 unverdicted novelty 7.0

Proposes a levels x laws taxonomy for world models in AI agents, defining L1-L3 capabilities across physical, digital, social, and scientific regimes while reviewing over 400 works to outline a roadmap for advanced ag...
Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 7.0

GDMD replaces raw-sample rewards with distillation-gradient rewards in RL-guided diffusion distillation, yielding 4-step models that surpass their multi-step teachers on GenEval and human preference metrics.
Self-Improving Tabular Language Models via Iterative Group Alignment
cs.LG 2026-04 unverdicted novelty 7.0

TabGRAA enables self-improving tabular language models through iterative group-relative advantage alignment using modular automated quality signals like distinguishability classifiers.
Efficient Video Diffusion Models: Advancements and Challenges
cs.CV 2026-04 unverdicted novelty 7.0

A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories
cs.CV 2026-04 unverdicted novelty 7.0

LeapAlign fine-tunes flow matching models by constructing two consecutive leaps that skip multiple ODE steps with randomized timesteps and consistency weighting, enabling stable updates at any generation step.
LayerCache: Exploiting Layer-wise Velocity Heterogeneity for Efficient Flow Matching Inference
cs.CV 2026-04 unverdicted novelty 7.0

LayerCache enables per-layer-group caching in flow matching models via adaptive JVP span selection and greedy 3D scheduling, delivering 1.37x speedup with PSNR 37.46 dB, SSIM 0.9834, and LPIPS 0.0178 on Qwen-Image.
Isokinetic Flow Matching for Pathwise Straightening of Generative Flows
cs.LG 2026-04 unverdicted novelty 7.0

Isokinetic Flow Matching adds a lightweight regularization term to flow matching that penalizes acceleration along paths via self-guided finite differences, yielding straighter trajectories and large gains in few-step...
1.x-Distill: Breaking the Diversity, Quality, and Efficiency Barrier in Distribution Matching Distillation
cs.CV 2026-04 conditional novelty 7.0

1.x-Distill achieves better quality and diversity than prior few-step distillation methods at 1.67 and 1.74 effective NFEs on SD3 models with up to 33x speedup.
VOSR: A Vision-Only Generative Model for Image Super-Resolution
cs.CV 2026-04 conditional novelty 7.0

VOSR shows that competitive generative image super-resolution with faithful structures can be achieved by training a diffusion-style model from scratch on visual data alone, using a vision encoder for guidance and a r...
Setting-Matched and Semantics-Scaled Benchmarking of One-Step Generative Models Against Multistep Diffusion and Flow Models
cs.CV 2026-03 unverdicted novelty 7.0

Matched benchmarking reveals FID misleads in few-step regimes under CFG, prompting CLIP-scaled and PickScore-scaled FID and IS variants for better semantic evaluation of one-step image generators.
Training Agents Inside of Scalable World Models
cs.AI 2025-09 conditional novelty 7.0

Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
Gradient-Free Noise Optimization for Reward Alignment in Generative Models
cs.LG 2026-05 unverdicted novelty 6.0

ZeNO frames noise optimization as a path-integral control problem solvable from zeroth-order reward evaluations, connecting to implicit Langevin dynamics for reward-tilted distributions.
Gradient-Free Noise Optimization for Reward Alignment in Generative Models
cs.LG 2026-05 unverdicted novelty 6.0

ZeNO formulates noise optimization for reward alignment as a path-integral control problem solvable via zeroth-order reward evaluations alone, connecting to Langevin dynamics under an Ornstein-Uhlenbeck process.
Noise-Started One-Step Real-World Super-Resolution via LR-Conditioned SplitMeanFlow and GAN Refinement
cs.CV 2026-05 unverdicted novelty 6.0

SMFSR achieves state-of-the-art perceptual quality among one-step diffusion-based real-world super-resolution methods by preserving noise-started generation via LR-conditioned SplitMeanFlow and GAN refinement.
dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models
cs.LG 2026-05 unverdicted novelty 6.0

dFlowGRPO is a new rate-aware RL method for discrete flow models that outperforms prior GRPO approaches on image generation and matches continuous flow models while supporting broad probability paths.
Slowly Annealed Langevin Dynamics: Theory and Applications to Training-Free Guided Generation
cs.LG 2026-05 unverdicted novelty 6.0

Slowly Annealed Langevin Dynamics provides non-asymptotic KL-based convergence guarantees for tracking moving targets and enables training-free guided generation via a velocity-aware correction that accounts for pretr...
FlashMol: High-Quality Molecule Generation in as Few as Four Steps
cs.LG 2026-05 unverdicted novelty 6.0

FlashMol produces chemically valid 3D molecules in 4 steps via distribution matching distillation with respaced timesteps and Jensen-Shannon regularization, matching or exceeding 1000-step teacher performance on QM9 a...
Tyche: One Step Flow for Efficient Probabilistic Weather Forecasting
cs.LG 2026-05 unverdicted novelty 6.0

Tyche achieves competitive probabilistic weather forecasting skill and calibration using a single-step flow model with JVP-regularized training and rollout finetuning.
Velox: Learning Representations of 4D Geometry and Appearance
cs.CV 2026-05 unverdicted novelty 6.0

Velox compresses dynamic point clouds into latent tokens that support geometry via 4D surface modeling and appearance via 3D Gaussians, showing strong results on video-to-4D generation, tracking, and image-to-4D cloth...
A Few-Step Generative Model on Cumulative Flow Maps
cs.LG 2026-05 unverdicted novelty 6.0

Cumulative flow maps unify few-step generative modeling for diffusion and flow models via cumulative transport and parameterization with minimal changes to time embeddings and objectives.
Hydra-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control
cs.RO 2026-05 unverdicted novelty 6.0

Hydra-DP3 is a lightweight 3D diffusion policy that uses frequency analysis of smooth action trajectories to enable two-step DDIM inference and achieves state-of-the-art results with under 1% of prior parameters.
Hydra-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control
cs.RO 2026-05 unverdicted novelty 6.0

Hydra-DP3 achieves SOTA visuomotor performance with under 1% of prior 3D diffusion policy parameters by using frequency analysis to justify a lightweight decoder and two-step DDIM inference.
CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making
cs.AI 2026-05 unverdicted novelty 6.0

CoFlow preserves inter-agent coordination in few-step offline MARL by using a natively joint velocity field with Coordinated Velocity Attention and Adaptive Coordination Gating, matching or exceeding baselines in 1-3 ...
On the Role of Strain and Vorticity in Numerical Integration Error for Flow Matching
cs.LG 2026-04 unverdicted novelty 6.0

Strain in the velocity Jacobian exponentially amplifies integration errors in flow matching while vorticity contributes linearly; strain-weighted Jacobian regularization reduces error up to 2.7x at NFE=5 and improves ...
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
cs.CV 2026-04 unverdicted novelty 6.0

By requiring and using highly discriminative LLM text features, the work enables the first effective one-step text-conditioned image generation with MeanFlow.
Fisher Decorator: Refining Flow Policy via a Local Transport Map
cs.LG 2026-04 unverdicted novelty 6.0

Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.
Mean Flow Policy Optimization
cs.LG 2026-04 conditional novelty 6.0

Mean Flow Policy Optimization (MFPO) uses few-step flow-based models for RL policies and achieves performance on par with or better than diffusion-based methods while substantially lowering training and inference time...
Self-Adversarial One Step Generation via Condition Shifting
cs.CV 2026-04 unverdicted novelty 6.0

APEX derives self-adversarial gradients from condition-shifted velocity fields in flow models to achieve high-fidelity one-step generation, outperforming much larger models and multi-step teachers.
MENO: MeanFlow-Enhanced Neural Operators for Dynamical Systems
cs.LG 2026-04 unverdicted novelty 6.0

MENO enhances neural operators with MeanFlow to restore multi-scale accuracy in dynamical system predictions while keeping inference costs low, achieving up to 2x better power spectrum accuracy and 12x faster inferenc...
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation
cs.CV 2026-04 unverdicted novelty 6.0

Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.
PixelFlowCast: Latent-Free Precipitation Nowcasting via Pixel Mean Flows
cs.CV 2026-05 unverdicted novelty 5.0

PixelFlowCast delivers high-fidelity precipitation nowcasts from radar sequences using a latent-free Pixel Mean Flows predictor guided by a deterministic coarse stage and KANCondNet features.
Deterministic Decomposition of Stochastic Generative Dynamics
cs.LG 2026-05 unverdicted novelty 5.0

Stochastic generative dynamics admit a transport-osmotic decomposition of the deterministic field, supporting Bridge Matching for interpretable and tunable generation.
Consistency Regularised Gradient Flows for Inverse Problems
stat.ML 2026-05 unverdicted novelty 5.0

A consistency-regularized Euclidean-Wasserstein-2 gradient flow performs joint posterior sampling and prompt optimization in latent space for efficient low-NFE inverse problem solving with diffusion models.
Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
cs.CV 2026-05 unverdicted novelty 5.0

A simplified one-step diffusion distillation uses pretrained teacher features directly for drifting loss plus a mode coverage term, achieving FID 1.58 on ImageNet-64 and 18.4 on SDXL.
Fast Text-to-Audio Generation with One-Step Sampling via Energy-Scoring and Auxiliary Contextual Representation Distillation
cs.SD 2026-05 unverdicted novelty 5.0

A one-step text-to-audio model using energy-distance training and contextual distillation outperforms prior fast baselines on AudioCaps and achieves up to 8.5x faster inference than the multi-step IMPACT system with c...
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
cs.CV 2026-04 unverdicted novelty 5.0

Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemph...
RA-CMF: Region-Adaptive Conditional MeanFlow for CT Image Reconstruction
cs.CV 2026-04 unverdicted novelty 5.0

RA-CMF integrates conditional MeanFlow for trajectory-based image enhancement with an RL-driven policy for tile-wise adaptive refinement budgets, achieving average PSNR of 34.23 and SSIM of 0.95 on CT images with stro...
Adversarial Flow Matching for Imperceptible Attacks on End-to-End Autonomous Driving
cs.CV 2026-04 unverdicted novelty 5.0

AFM is a novel gray-box adversarial attack using flow matching to create visually imperceptible perturbations that degrade performance of Vision-Language-Action and modular end-to-end autonomous driving models while s...
Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 5.0

Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.
Qwen-Image-2.0 Technical Report
cs.CV 2026-05 unverdicted novelty 4.0

Qwen-Image-2.0 unifies high-fidelity image generation and precise editing by coupling Qwen3-VL with a Multimodal Diffusion Transformer, improving text rendering, photorealism, and complex prompt following over prior versions.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · cited by 49 Pith papers · 4 internal anchors

[1]

Building Normalizing Flows with Stochastic Interpolants

Michael S Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. arXiv preprint arXiv:2209.15571, 2022. 2

work page internal anchor Pith review arXiv 2022
[2]

Building normalizing flows with stochastic interpolants

Michael Samuel Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. In International Conference on Learning Representations (ICLR), 2023. 1, 2

work page 2023
[3]

Flow map matching.arXiv preprint arXiv:2406.07507,

Nicholas M Boffi, Michael S Albergo, and Eric Vanden-Eijnden. Flow map matching. arXiv preprint arXiv:2406.07507, 2024. 2

work page arXiv 2024
[4]

JAX: composable transformations of Python+NumPy programs, 2018

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax. 16

work page 2018
[5]

Large scale GAN training for high fidelity natural image synthesis

Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations (ICLR) ,

work page
[6]

Maskgit: Masked generative image transformer

Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, and William T Freeman. Maskgit: Masked generative image transformer. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 9

work page 2022
[7]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009. 2, 7

work page 2009
[8]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Neural Information Processing Systems (NeurIPS), 34, 2021. 9

work page 2021
[9]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021. 7, 14

work page 2021
[10]

Taming transformers for high-resolution image synthesis

Patrick Esser, Robin Rombach, and Bjorn Ommer. Taming transformers for high-resolution image synthesis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR),

work page
[11]

Scaling rectified flow trans- formers for high-resolution image synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow trans- formers for high-resolution image synthesis. In Forty-first international conference on machine learning, 2024. 1, 7, 8

work page 2024
[12]

An introduction to flow matching

Tor Fjelde, Emile Mathieu, and Vincent Dutordoir. An introduction to flow matching. https: //mlg.eng.cam.ac.uk/blog/2024/01/20/flow-matching.html, January 2024. Cambridge Machine Learning Group Blog. 3

work page 2024
[13]

One step diffusion via shortcut models

Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One step diffusion via shortcut models. In International Conference on Learning Representations (ICLR), 2025. 1, 2, 5, 6, 7, 8, 9 10

work page 2025
[14]

One-step diffusion distillation via deep equilibrium models

Zhengyang Geng, Ashwini Pokle, and J Zico Kolter. One-step diffusion distillation via deep equilibrium models. Neural Information Processing Systems (NeurIPS), 36, 2024. 2

work page 2024
[15]

Consistency models made easy

Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, and J Zico Kolter. Consistency models made easy. arXiv preprint arXiv:2406.14548, 2024. 1, 2, 5, 6, 7, 8, 9, 15

work page arXiv 2024
[16]

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017. 14

work page internal anchor Pith review arXiv 2017
[17]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Neural Information Processing Systems (NeurIPS), 2017. 7

work page 2017
[18]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022. 2, 6, 14, 15

work page internal anchor Pith review Pith/arXiv arXiv 2022
[19]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Neural Information Processing Systems (NeurIPS), 2020. 1, 2

work page 2020
[20]

simple diffusion: End-to-end diffusion for high resolution images

Emiel Hoogeboom, Jonathan Heek, and Tim Salimans. simple diffusion: End-to-end diffusion for high resolution images. In International Conference on Machine Learning (ICML), 2023. 9

work page 2023
[21]

Scaling up gans for text-to-image synthesis

Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, and Taesung Park. Scaling up gans for text-to-image synthesis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 9

work page 2023
[22]

Elucidating the design space of diffusion-based generative models

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. In Neural Information Processing Systems (NeurIPS), 2022. 2, 9, 14

work page 2022
[23]

Consistency trajectory models: Learning probability flow ODE trajectory of diffusion

Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, and Stefano Ermon. Consistency trajectory models: Learning probability flow ODE trajectory of diffusion. In International Conference on Learning Representations (ICLR), 2024. 5

work page 2024
[24]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Interna- tional Conference on Learning Representations (ICLR), 2015. 14

work page 2015
[25]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009. URL https: //www.cs.toronto.edu/~kriz/cifar.html. 9

work page 2009
[26]

Epistola LXXI ad johannem bernoullium, 5 aug 1697

Gottfried Wilhelm Leibniz. Epistola LXXI ad johannem bernoullium, 5 aug 1697. In Johann Bernoulli, editor, Virorum celeberrimorum G. G. Leibnitii et Johannis Bernoullii Commercium philosophicum et mathematicum, volume I, pages 368–370. Marc-Michel Bousquet, Lausanne & Geneva, 1745. URL https://archive.org/details/bub_gb_lO3wOMxjoF8C/page/368. First explic...

work page
[27]

Autoregressive image generation without vector quantization

Tianhong Li, Yonglong Tian, He Li, Mingyang Deng, and Kaiming He. Autoregressive image generation without vector quantization. Neural Information Processing Systems (NeurIPS),

work page
[28]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. In International Conference on Learning Representations (ICLR), 2023. 1, 2, 3, 5, 6

work page 2023
[29]

Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky T. Q. Chen, David Lopez-Paz, Heli Ben-Hamu, and Itai Gat. Flow matching guide and code,

work page
[30]

URL https://arxiv.org/abs/2412.06264. 14

work page internal anchor Pith review arXiv
[31]

Flow straight and fast: Learning to generate and transfer data with rectified flow

Xingchao Liu, Chengyue Gong, and qiang liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. In International Conference on Learning Representations (ICLR), 2023. 1, 2, 3 11

work page 2023
[32]

Simplifying, stabilizing and scaling continuous-time consistency models

Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models. In International Conference on Learning Representations (ICLR), 2025. 1, 2, 5, 6, 9

work page 2025
[33]

Diff- instruct: A universal approach for transferring knowledge from pre-trained diffusion models

Weijian Luo, Tianyang Hu, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, and Zhihua Zhang. Diff- instruct: A universal approach for transferring knowledge from pre-trained diffusion models. Neural Information Processing Systems (NeurIPS), 2024. 2

work page 2024
[34]

Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers

Nanye Ma, Mark Goldstein, Michael S Albergo, Nicholas M Boffi, Eric Vanden-Eijnden, and Saining Xie. Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers. In European Conference on Computer Vision (ECCV), 2024. 1, 7, 8, 9

work page 2024
[35]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 7, 8, 9, 14

work page 2023
[36]

Movie Gen: A cast of media foundation models, 2025

Adam Polyak, , et al. Movie Gen: A cast of media foundation models, 2025. 1

work page 2025
[37]

Variational inference with normalizing flows

Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In International Conference on Machine Learning (ICML), 2015. 2

work page 2015
[38]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 7, 9

work page 2021
[39]

U-net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted interven- tion (MICCAI), 2015. 9, 14

work page 2015
[40]

Progressive distillation for fast sampling of diffusion models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations (ICLR), 2022. 2

work page 2022
[41]

Stylegan-xl: Scaling stylegan to large diverse datasets

Axel Sauer, Katja Schwarz, and Andreas Geiger. Stylegan-xl: Scaling stylegan to large diverse datasets. In ACM Transactions on Graphics (SIGGRAPH), 2022. 9

work page 2022
[42]

Adversarial diffusion distillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. In European Conference on Computer Vision (ECCV), 2024. 2

work page 2024
[43]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning (ICML), 2015. 1, 2

work page 2015
[44]

Improved techniques for training consistency models

Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models. In International Conference on Learning Representations (ICLR), 2024. 1, 2, 5, 6, 7, 8, 9, 15

work page 2024
[45]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Neural Information Processing Systems (NeurIPS), 2019. 1, 2, 9, 14

work page 2019
[46]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (ICLR), 2021. 2

work page 2021
[47]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In International Conference on Machine Learning (ICML), 2023. 1, 2, 3, 5, 6, 7

work page 2023
[48]

Visual autoregressive modeling: Scalable image generation via next-scale prediction

Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, and Liwei Wang. Visual autoregressive modeling: Scalable image generation via next-scale prediction. Neural Information Processing Systems (NeurIPS), 2024. 9

work page 2024
[49]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Neural Information Processing Systems (NeurIPS), 2017. 7

work page 2017
[50]

arXiv preprint arXiv:2407.02398 , year=

Ling Yang, Zixiang Zhang, Zhilong Zhang, Xingchao Liu, Minkai Xu, Wentao Zhang, Chenlin Meng, Stefano Ermon, and Bin Cui. Consistency flow matching: Defining straight flows with velocity consistency. arXiv preprint arXiv:2407.02398, 2024. 2, 5 12

work page arXiv 2024
[51]

One-step diffusion with distribution matching distillation

Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Frédo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 2

work page 2024
[52]

Representation alignment for generation: Training diffusion transformers is easier than you think

Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, and Saining Xie. Representation alignment for generation: Training diffusion transformers is easier than you think. In International Conference on Learning Representations (ICLR), 2025. 9

work page 2025
[53]

Z Rd |x|2 dbqN 0 (x) 1/2 + Z Rd |y|2 dbpM(y) 1/2# .(32) Therefore, by Eq. (30), sup t∈[0,T] e(t)≤C T η

Linqi Zhou, Stefano Ermon, and Jiaming Song. Inductive moment matching. arXiv preprint arXiv:2503.07565, 2025. 1, 2, 5, 6, 7, 8, 9

work page arXiv 2025
[54]

Score identity distillation: Exponentially fast distillation of pretrained diffusion models for one-step generation

Mingyuan Zhou, Huangjie Zheng, Zhendong Wang, Mingzhang Yin, and Hai Huang. Score identity distillation: Exponentially fast distillation of pretrained diffusion models for one-step generation. In International Conference on Machine Learning (ICML), 2024. 2 13 Appendix A Implementation Table 4: Configurations on ImageNet 256×256. B/4 is our ablation model....

work page 2024
[55]

Eq. (4) ⇒ Eq. (6)

by a loss-adaptive weight λ ∝ ∥∆∥2(γ−1) 2 . In practice, we follow [15] and weight by: w = 1/(∥∆∥2 2 + c)p, (22) where p = 1 − γ and c > 0 is a small constant to avoid division by zero. If p = 0.5, this is similar to the Pseudo-Huber loss in [43]. The adaptively weighted loss is sg(w) · L, where sg denotes the stop-gradient operator. B.3 On the Sufficienc...

work page