Recognition: 3 theorem links
· Lean TheoremFlow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Pith reviewed 2026-05-10 13:12 UTC · model grok-4.3
The pith
Rectified flow learns ODEs that follow straight paths between data distributions by solving a simple least-squares problem.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By learning a velocity field that minimizes the squared difference from straight-line interpolations between paired samples from the two distributions, the rectification process produces a deterministic coupling with provably lower or equal convex transport costs. Recursively applying this yields a sequence of flows with progressively straighter trajectories that can be integrated accurately without fine discretization.
What carries the argument
The rectification procedure, which refines a coupling of π₀ and π₁ into one where the learned ODE follows straight paths, reducing transport costs and enabling coarse simulation.
If this is right
- Straight paths can be simulated exactly without discretization error.
- Recursive rectification increases path straightness for better efficiency.
- High-quality image generation and translation achievable with one Euler step.
- Unified approach for generative modeling and domain adaptation tasks.
Where Pith is reading between the lines
- This could simplify training of other continuous normalizing flows by encouraging straight trajectories.
- Applications might extend to sequential data or non-image domains where fast sampling is critical.
- Connections to optimal transport suggest potential for lower cost solutions in distribution matching.
Load-bearing premise
The nonlinear least-squares optimization reliably finds a velocity field that closely approximates the straight paths without suffering from scalability issues or optimization failures.
What would settle it
If after one or more rectifications the learned flow paths remain significantly curved or if single-step Euler integration yields poor sample quality on image tasks, the claim of increasingly straight and efficient flows would be falsified.
read the original abstract
We present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions \pi_0 and \pi_1, hence providing a unified solution to generative modeling and domain transfer, among various other tasks involving distribution transport. The idea of rectified flow is to learn the ODE to follow the straight paths connecting the points drawn from \pi_0 and \pi_1 as much as possible. This is achieved by solving a straightforward nonlinear least squares optimization problem, which can be easily scaled to large models without introducing extra parameters beyond standard supervised learning. The straight paths are special and preferred because they are the shortest paths between two points, and can be simulated exactly without time discretization and hence yield computationally efficient models. We show that the procedure of learning a rectified flow from data, called rectification, turns an arbitrary coupling of \pi_0 and \pi_1 to a new deterministic coupling with provably non-increasing convex transport costs. In addition, recursively applying rectification allows us to obtain a sequence of flows with increasingly straight paths, which can be simulated accurately with coarse time discretization in the inference phase. In empirical studies, we show that rectified flow performs superbly on image generation, image-to-image translation, and domain adaptation. In particular, on image generation and translation, our method yields nearly straight flows that give high quality results even with a single Euler discretization step.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes rectified flow, a simple method to learn neural ODEs transporting between empirical distributions π₀ and π₁ by solving a nonlinear least-squares problem that encourages the velocity field to follow straight-line paths between paired samples. Rectification converts an arbitrary coupling into a deterministic one with provably non-increasing convex transport costs; recursive rectification produces successively straighter flows that can be simulated accurately with coarse (even single-step) Euler discretization. The approach unifies generative modeling and domain transfer and is shown empirically to perform well on image generation, image-to-image translation, and domain adaptation.
Significance. If the learned neural flows approximately inherit the exact-case straightness and cost-reduction properties, the method supplies a parameter-efficient, unified framework for flow-based transport that reduces inference cost via coarse discretization while maintaining competitive sample quality. The empirical results on high-dimensional vision tasks indicate practical promise, and the absence of extra architectural parameters beyond standard supervised learning is a notable engineering strength.
major comments (2)
- [§3] §3 (Rectification theorem): The non-increasing convex transport-cost guarantee and the straight-path property are derived for the exact population minimizer of E[‖v((1−t)X₀ + t X₁, t) − (X₁ − X₀)‖²]. The manuscript instead optimizes a neural-network approximator on finite samples; no quantitative bounds on optimization or approximation error are supplied to ensure the induced endpoint map still satisfies the cost inequality or remains sufficiently close to linear interpolants for the single-step Euler claim to be reliable. This gap is load-bearing for the central practical advantage.
- [§4] §4 (Empirical validation): The reported single-step generation and translation results are strong, yet the manuscript provides no direct diagnostic (e.g., average deviation from straight lines, measured curvature, or empirical verification of the transport-cost inequality on held-out pairs) that would confirm the performance originates from the rectification properties rather than model capacity or training heuristics.
minor comments (2)
- [§2–3] The notation for couplings and the precise statement of the population versus empirical objective could be clarified in the main text to make the transition from theory to implementation more transparent.
- [§4] Figure captions and axis labels in the experimental section would benefit from explicit mention of the number of function evaluations used for each baseline to facilitate direct comparison of discretization efficiency.
Simulated Author's Rebuttal
We thank the referee for the constructive review and positive assessment of rectified flow's potential as a unified framework. We address each major comment point by point below, acknowledging where the manuscript can be strengthened through clarification and added analysis.
read point-by-point responses
-
Referee: [§3] §3 (Rectification theorem): The non-increasing convex transport-cost guarantee and the straight-path property are derived for the exact population minimizer of E[‖v((1−t)X₀ + t X₁, t) − (X₁ − X₀)‖²]. The manuscript instead optimizes a neural-network approximator on finite samples; no quantitative bounds on optimization or approximation error are supplied to ensure the induced endpoint map still satisfies the cost inequality or remains sufficiently close to linear interpolants for the single-step Euler claim to be reliable. This gap is load-bearing for the central practical advantage.
Authors: We agree that the rectification theorem, including the non-increasing convex transport cost and straight-path properties, is stated for the exact population minimizer of the least-squares objective. The manuscript optimizes a neural-network parametrization of the velocity field on finite samples and does not supply quantitative bounds on optimization or approximation error. This is a genuine gap between the exact-case analysis and the practical implementation. At the same time, the training objective is explicitly designed to minimize deviation from straight-line paths, and the empirical results on image tasks demonstrate that single-step Euler integration yields competitive sample quality. In revision we will add a dedicated paragraph in §3 discussing the approximation gap, its implications for the guarantees, and related literature on neural approximations to optimal transport maps. revision: yes
-
Referee: [§4] §4 (Empirical validation): The reported single-step generation and translation results are strong, yet the manuscript provides no direct diagnostic (e.g., average deviation from straight lines, measured curvature, or empirical verification of the transport-cost inequality on held-out pairs) that would confirm the performance originates from the rectification properties rather than model capacity or training heuristics.
Authors: The referee is correct that the current version lacks direct quantitative diagnostics linking performance to the rectification mechanism. While we report strong single-step results and note that paths become straighter under recursive rectification, we do not include metrics such as average path deviation, curvature, or held-out transport-cost comparisons. In the revised manuscript we will add these diagnostics: for example, plots of average ||v(x,t) − (x₁ − x₀)|| over t on held-out pairs, empirical verification of the transport-cost inequality before and after rectification, and curvature measures. These additions will help isolate the contribution of the learned straightness from model capacity. revision: yes
Circularity Check
Central claims derive from exact least-squares properties of rectification; neural approximation introduces no definitional circularity.
full rationale
The paper defines rectification as solving the nonlinear least-squares problem min E[||v((1-t)X0 + t X1, t) - (X1 - X0)||^2] over couplings of π0 and π1. Mathematical properties (non-increasing convex transport cost, straighter paths under recursion) are direct consequences of the exact minimizer being the straight-line flow; these follow from standard optimal transport identities rather than self-reference or renaming. The practical implementation uses a neural network trained on finite samples, so the 'provably' statements hold only for the population minimizer. This creates a validity gap between theory and implementation but does not constitute circularity: no step renames a fitted quantity as an independent prediction, no self-citation is load-bearing for the core inequality, and no ansatz is smuggled. The derivation chain remains self-contained once the exact-case math is granted.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math The learned velocity field admits unique solutions to the ODE initial-value problem for almost all initial conditions.
Lean theorems connected to this paper
-
IndisputableMonolith.Cost.FunctionalEquationwashburn_uniqueness_aczel echoesthe procedure of learning a rectified flow from data, called rectification, turns an arbitrary coupling of π0 and π1 to a new deterministic coupling with provably non-increasing convex transport costs
-
IndisputableMonolith.Foundation.DAlembert.Inevitabilitybilinear_family_forced echoesrecursively applying rectification allows us to obtain a sequence of flows with increasingly straight paths, which can be simulated accurately with coarse time discretization
Forward citations
Cited by 60 Pith papers
-
What Time Is It? How Data Geometry Makes Time Conditioning Optional for Flow Matching
Data geometry makes time identifiable from noisy interpolants at rate O(1/sqrt(d-k)), rendering the time-blindness gap asymptotically negligible relative to coupling variance.
-
Generative Modeling with Flux Matching
Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices be...
-
Divergence is Uncertainty: A Closed-Form Posterior Covariance for Flow Matching
In flow matching, the uncertainty of the clean data given the current state is exactly the divergence of the velocity field (up to a known scalar).
-
ReConText3D: Replay-based Continual Text-to-3D Generation
ReConText3D is the first replay-memory framework for continual text-to-3D generation that prevents catastrophic forgetting on new textual categories while preserving quality on previously seen classes.
-
OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models
OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
-
Flow-GRPO: Training Flow Matching Models via Online RL
Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
-
Consistency Models
Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.
-
Building Normalizing Flows with Stochastic Interpolants
Normalizing flows are constructed by learning the velocity of a stochastic interpolant via a quadratic loss derived from its probability current, yielding an efficient ODE-based alternative to diffusion models.
-
Aligning Flow Map Policies with Optimal Q-Guidance
Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
-
Expected Batch Optimal Transport Plans and Consequences for Flow Matching
The expected minibatch OT plan converges to the true OT plan with quantifiable bias and convergence rates, yielding a regular velocity field for unique flows from source to discrete target in flow matching.
-
$h$-control: Training-Free Camera Control via Block-Conditional Gibbs Refinement
h-control introduces block-conditional pseudo-Gibbs refinement for training-free camera control in flow-matching video generators, achieving superior FVD scores on RealEstate10K and DAVIS benchmarks.
-
One-Step Generative Modeling via Wasserstein Gradient Flows
W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...
-
SABER: A Scalable Action-Based Embodied Dataset for Real-World VLA Adaptation
SABER provides 44.8K multi-representation action samples from unscripted retail environments that raise a VLA model's mean success rate on ten manipulation tasks from 13.4% to 29.3%.
-
Physics-Informed Neural PDE Solvers via Spatio-Temporal MeanFlow
Spatio-Temporal MeanFlow adapts MeanFlow to PDEs by replacing the generative velocity field with the physical operator and extending the integral constraint to the spatio-temporal domain, yielding a unified solver for...
-
Quantile-Coupled Flow Matching for Distributional Reinforcement Learning
FlowIQN is a quantile-coupled CFM critic that yields the first explicit Wasserstein-aligned approximate projection for distributional RL, with improved return-distribution accuracy and competitive offline RL performance.
-
Geometry-Aware Discretization Error of Diffusion Models
First-order asymptotic expansions of weak and Fréchet discretization errors in diffusion sampling are derived, explicit under Gaussian data through covariance geometry and robust to other data geometries.
-
Arena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Models
ArenaPO infers Gaussian capability distributions from pairwise preferences and applies truncated-normal latent inference to derive fine-grained offline rewards for preference optimization of text-to-image diffusion models.
-
SDFlow: Similarity-Driven Flow Matching for Time Series Generation
SDFlow uses similarity-driven flow matching with low-rank manifold decomposition and a categorical posterior to generate high-fidelity long time series in VQ space without step-wise error accumulation.
-
PerFlow: Physics-Embedded Rectified Flow for Efficient Reconstruction and Uncertainty Quantification of Spatiotemporal Dynamics
PerFlow embeds physics constraints into rectified flow sampling through guidance-free conditioning and constraint-preserving projections, achieving efficient sparse reconstruction and uncertainty quantification for sp...
-
Mixture Prototype Flow Matching for Open-Set Supervised Anomaly Detection
MPFM uses flow matching with a Gaussian mixture prior on the velocity field and a mutual information maximizer to improve open-set anomaly detection over unimodal prototype methods.
-
DirectEdit: Step-Level Accurate Inversion for Flow-Based Image Editing
DirectEdit achieves step-level accurate inversion for flow-based image editing by directly aligning forward paths, using attention feature injection and mask-guided noise blending to balance fidelity and editability w...
-
Generative Modeling with Orbit-Space Particle Flow Matching
OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
-
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning
FAN achieves state-of-the-art offline RL performance on robotic tasks by anchoring flow policies and using single-sample noise-conditioned Q-learning, with proven convergence and reduced runtimes.
-
Hydra-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control
Frequency analysis of smooth robot actions bounds denoising error to low-frequency modes, enabling a sub-1% parameter 3D diffusion policy with two-step inference that reaches SOTA on manipulation benchmarks.
-
CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making
CoFlow achieves state-of-the-art coordination quality in offline MARL using only 1-3 denoising steps by natively coupling velocity fields across agents via coordinated attention and gating.
-
TimeTok: Granularity-Controllable Time-Series Generation via Hierarchical Tokenization
TimeTok is a unified framework using hierarchical tokenization for granularity-controllable time-series generation that achieves state-of-the-art performance in standard tasks and shows transferability across heteroge...
-
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance
FMRG is a training-free, single-trajectory guidance method for flow models derived from optimal control that achieves strong reward alignment with only 3 NFEs.
-
Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization
Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
-
$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a di...
-
ML-Guided Primal Heuristics for Mixed Binary Quadratic Programs
New neural architectures and combined contrastive plus weighted cross-entropy losses let ML models predict good solutions for MBQPs and beat existing primal heuristics and solvers on benchmarks and a wind-farm task.
-
HP-Edit: A Human-Preference Post-Training Framework for Image Editing
HP-Edit introduces a post-training framework and RealPref-50K dataset that uses a VLM-based HP-Scorer to align diffusion image editing models with human preferences, improving outputs on Qwen-Image-Edit-2509.
-
Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation
OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.
-
Generative Texture Filtering
A two-stage fine-tuning strategy on pre-trained generative models enables effective texture filtering that outperforms prior methods on challenging cases.
-
Self-Improving Tabular Language Models via Iterative Group Alignment
TabGRAA enables self-improving tabular language models through iterative group-relative advantage alignment using modular automated quality signals like distinguishability classifiers.
-
Long-Text-to-Image Generation via Compositional Prompt Decomposition
PRISM lets pre-trained text-to-image models handle long prompts by breaking them into compositional parts, predicting noise separately, and merging outputs via energy-based conjunction, matching fine-tuned models whil...
-
Grokking of Diffusion Models: Case Study on Modular Addition
Diffusion models show grokking on modular addition by composing periodic operand representations in simple data regimes or by separating arithmetic computation from visual denoising across timesteps in varied regimes.
-
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models
UniGeo unifies geometric guidance across three levels in video models to reduce geometric drift and improve consistency in camera-controllable image editing.
-
Efficient Video Diffusion Models: Advancements and Challenges
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
-
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories
LeapAlign fine-tunes flow matching models by constructing two consecutive leaps that skip multiple ODE steps with randomized timesteps and consistency weighting, enabling stable updates at any generation step.
-
Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes
Text-to-3D models lose prompt sensitivity for out-of-distribution shapes due to sink traps but retain geometric diversity via unconditional priors, enabling a decoupled inversion method for robust editing.
-
LayerCache: Exploiting Layer-wise Velocity Heterogeneity for Efficient Flow Matching Inference
LayerCache enables per-layer-group caching in flow matching models via adaptive JVP span selection and greedy 3D scheduling, delivering 1.37x speedup with PSNR 37.46 dB, SSIM 0.9834, and LPIPS 0.0178 on Qwen-Image.
-
Any 3D Scene is Worth 1K Tokens: 3D-Grounded Representation for Scene Generation at Scale
A 3D-grounded autoencoder and diffusion transformer allow direct generation of 3D scenes in an implicit latent space using a fixed 1K-token representation for arbitrary views and resolutions.
-
GeRM: A Generative Rendering Model From Physically Realistic to Photorealistic
GeRM learns a distribution transfer vector field to convert PBR images into photorealistic ones using a multi-condition ControlNet guided by G-buffers and text prompts.
-
Physically Grounded 3D Generative Reconstruction under Hand Occlusion using Proprioception and Multi-Contact Touch
A conditional diffusion model using proprioception and multi-contact touch produces metric-scale, physically consistent 3D object reconstructions under hand occlusion.
-
Large-Scale Universal Defect Generation: Foundation Models and Datasets
A 300K quadruplet dataset and UniDG foundation model enable reference- or text-driven defect generation across categories, outperforming few-shot baselines on anomaly detection tasks.
-
AniGen: Unified $S^3$ Fields for Animatable 3D Asset Generation
AniGen directly generates animatable 3D assets with consistent shape, skeleton, and skinning from single images using unified S^3 fields and a two-stage flow-matching pipeline.
-
Score Shocks: The Burgers Equation Structure of Diffusion Generative Models
The score in diffusion models obeys viscous Burgers dynamics, with binary mode boundaries producing a universal tanh interfacial profile whose sharpening marks speciation transitions.
-
Isokinetic Flow Matching for Pathwise Straightening of Generative Flows
Isokinetic Flow Matching adds a lightweight regularization term to flow matching that penalizes acceleration along paths via self-guided finite differences, yielding straighter trajectories and large gains in few-step...
-
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
DiffusionNFT performs online RL for diffusion models on the forward process via flow matching and positive-negative contrasts, delivering up to 25x efficiency gains and rapid benchmark improvements over prior reverse-...
-
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.
-
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.
-
Reinforcing VLAs in Task-Agnostic World Models
RAW-Dream lets VLAs learn new tasks in zero-shot imagination by using a world model pre-trained only on task-free behaviors and an unmodified VLM to supply rewards, with dual-noise verification to limit hallucinations.
-
EPIC: Efficient Predicate-Guided Inference-Time Control for Compositional Text-to-Image Generation
EPIC introduces predicate-guided inference-time search that lifts compositional T2I prompt accuracy from 34% to 71% on GenEval2 with 31-81% lower execution costs.
-
Operator Spectroscopy of Trained Lattice Samplers
Operator projections of trained sampler functions in 2D phi^4 lattice theory decompose residuals into zero-mode Binder and finite-k correlator components, distinguishing flow-matching, diffusion, and normalizing-flow models.
-
SF-Flow: Sound field magnitude estimation via flow matching guided by sparse measurements
SF-Flow applies flow matching with a permutation-invariant set encoder and 3D U-Net to reconstruct ATF magnitudes from sparse inputs, showing accurate results up to 1 kHz with faster training than autoencoder baselines.
-
Fashion130K: An E-commerce Fashion Dataset for Outfit Generation with Unified Multi-modal Condition
Fashion130K dataset and UMC framework align text and visual prompts with embedding refiner, Fusion Transformer, and redesigned attention to generate more consistent outfits than prior methods.
-
Learning Generative Dynamics with Soft Law Constraints: A McKean-Vlasov FBSDE Approach
A McKean-Vlasov FBSDE generative model learns stochastic path laws that match observed terminal and time-marginal distributions via soft energy constraints rather than hard interpolation.
-
Discrete Flow Matching: Convergence Guarantees Under Minimal Assumptions
Discrete flow matching on Z_m^d achieves non-asymptotic KL bounds for early-stopped targets and explicit TV convergence to the true target under an approximation error assumption, with improved scaling in dimension d ...
-
ACWM-Phys: Investigating Generalized Physical Interaction in Action-Conditioned Video World Models
ACWM-Phys benchmark shows action-conditioned world models generalize on simple geometric interactions but drop sharply on deformable contacts, high-dimensional control, and complex articulated motion, indicating relia...
-
Slowly Annealed Langevin Dynamics: Theory and Applications to Training-Free Guided Generation
Slowly Annealed Langevin Dynamics provides non-asymptotic KL-based convergence guarantees for tracking moving targets and enables training-free guided generation via a velocity-aware correction that accounts for pretr...
Reference graph
Works this paper leans on
-
[1]
Luigi Ambrosio and Gianluca Crippa. Existence, uniqueness, stability and differentiability properties of the flow associated to weakly differentiable vector fields. In Transport equations and multi-D hyperbolic conservation laws, pages 3–57. Springer, 2008
work page 2008
-
[2]
Luigi Ambrosio, Elia Bru ´e, and Daniele Semola. Lectures on optimal transport. Springer, 2021
work page 2021
-
[3]
Reverse-time diffusion equation models
Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Appli- cations, 12(3):313–326, 1982
work page 1982
-
[4]
Wasserstein generative adversarial networks
Martin Arjovsky, Soumith Chintala, and L ´eon Bottou. Wasserstein generative adversarial networks. In International conference on machine learning, pages 214–223. PMLR, 2017
work page 2017
-
[5]
Fan Bao, Chongxuan Li, Jun Zhu, and Bo Zhang. Analytic-DPM: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv preprint arXiv:2201.06503, 2022
-
[6]
Neural ordinary differ- ential equations
Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differ- ential equations. Advances in neural information processing systems, 31, 2018
work page 2018
-
[7]
arXiv preprint arXiv:2110.11291 , year=
Tianrong Chen, Guan-Horng Liu, and Evangelos A Theodorou. Likelihood training of Schr ¨odinger bridge using forward-backward sdes theory. arXiv preprint arXiv:2110.11291, 2021
-
[8]
Ilvr: Conditioning method for denoising diffusion probabilistic models
Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon. Ilvr: Condi- tioning method for denoising diffusion probabilistic models. arXiv preprint arXiv:2108.02938, 2021
-
[9]
StarGAN v2: Diverse image synthesis for multiple domains
Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. StarGAN v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8188–8197, 2020
work page 2020
-
[10]
Score-based generative neural networks for large-scale optimal transport
Max Daniels, Tyler Maunu, and Paul Hand. Score-based generative neural networks for large-scale optimal transport. Advances in neural information processing systems, 34:12955–12965, 2021
work page 2021
-
[11]
Diffusion Schr ¨odinger bridge with applications to score-based generative modeling
Valentin De Bortoli, James Thornton, Jeremy Heng, and Arnaud Doucet. Diffusion Schr ¨odinger bridge with applications to score-based generative modeling. Advances in Neural Information Pro- cessing Systems, 34, 2021
work page 2021
-
[12]
Diffusion models beat GANs on image synthesis
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems, 34, 2021
work page 2021
-
[13]
NICE: Non-linear Independent Components Estimation
Laurent Dinh, David Krueger, and Yoshua Bengio. Nice: Non-linear independent components esti- mation. arXiv preprint arXiv:1410.8516, 2014
work page internal anchor Pith review arXiv 2014
-
[14]
Density estimation using Real NVP
Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016
work page internal anchor Pith review arXiv 2016
-
[15]
An Invitation to Optimal Transport, Wasserstein Distances, and Gradient Flows
Alessio Figalli and Federico Glaudo. An Invitation to Optimal Transport, Wasserstein Distances, and Gradient Flows. 2021
work page 2021
-
[16]
Optimal transport for domain adaptation.IEEE Trans
R Flamary, N Courty, D Tuia, and A Rakotomamonjy. Optimal transport for domain adaptation.IEEE Trans. Pattern Anal. Mach. Intell, 1, 2016. 30
work page 2016
-
[17]
An entropy approach to the time reversal of diffusion processes
Hans F ¨ollmer. An entropy approach to the time reversal of diffusion processes. In Stochastic Differ- ential Systems Filtering and Control, pages 156–163. Springer, 1985
work page 1985
-
[18]
How much is enough? a study on diffusion times in score-based genera- tive models
Giulio Franzese, Simone Rossi, Lixuan Yang, Alessandro Finamore, Dario Rossi, Maurizio Filip- pone, and Pietro Michiardi. How much is enough? a study on diffusion times in score-based genera- tive models. arXiv preprint arXiv:2206.05173, 2022
-
[19]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014
work page 2014
-
[20]
In search of lost domain generalization.arXiv preprint arXiv:2007.01434,
Ishaan Gulrajani and David Lopez-Paz. In search of lost domain generalization. arXiv preprint arXiv:2007.01434, 2020
-
[21]
Flexible diffusion modeling of long videos
William Harvey, Saeid Naderiparizi, Vaden Masrani, Christian Weilbach, and Frank Wood. Flexible diffusion modeling of long videos. arXiv preprint arXiv:2205.11495, 2022
-
[22]
Ulrich G Haussmann and Etienne Pardoux. Time reversal of diffusions. The Annals of Probability, pages 1188–1205, 1986
work page 1986
-
[23]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020
work page 2020
-
[24]
Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J Fleet. Video diffusion models. arXiv preprint arXiv:2204.03458, 2022
work page internal anchor Pith review arXiv 2022
-
[25]
Image-to-image translation with conditional adversarial networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017
work page 2017
-
[26]
TransGAN: Two pure transformers can make one strong GAN, and that can scale up
Yifan Jiang, Shiyu Chang, and Zhangyang Wang. TransGAN: Two pure transformers can make one strong GAN, and that can scale up. Advances in Neural Information Processing Systems, 34, 2021
work page 2021
-
[27]
Progressive growing of GANs for improved quality, stability, and variation
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of GANs for improved quality, stability, and variation. In International Conference on Learning Representations, 2018
work page 2018
-
[28]
Train- ing generative adversarial networks with limited data
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Train- ing generative adversarial networks with limited data. Advances in Neural Information Processing Systems, 33:12104–12114, 2020
work page 2020
-
[29]
Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion- based generative models. arXiv preprint arXiv:2206.00364, 2022
work page internal anchor Pith review arXiv 2022
-
[30]
Understanding DDPM latent codes through optimal transport
Valentin Khrulkov and Ivan Oseledets. Understanding DDPM latent codes through optimal transport. arXiv preprint arXiv:2202.07477, 2022
-
[31]
Adam: A Method for Stochastic Optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. 31
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[32]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[33]
Diffwave: A versatile diffusion model for audio synthesis
Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. Diffwave: A versatile diffusion model for audio synthesis. In International Conference on Learning Representations, 2020
work page 2020
-
[34]
Do neural optimal transport solvers work? a continuous wasserstein-2 benchmark
Alexander Korotin, Lingxiao Li, Aude Genevay, Justin M Solomon, Alexander Filippov, and Evgeny Burnaev. Do neural optimal transport solvers work? a continuous wasserstein-2 benchmark. Ad- vances in Neural Information Processing Systems, 34:14593–14605, 2021
work page 2021
-
[35]
arXiv preprint arXiv:2201.12220 , year=
Alexander Korotin, Daniil Selikhanovych, and Evgeny Burnaev. Neural optimal transport. arXiv preprint arXiv:2201.12220, 2022
-
[36]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009
work page 2009
-
[37]
Equivalence of stochastic equations and martingale problems
Thomas G Kurtz. Equivalence of stochastic equations and martingale problems. InStochastic analysis 2010, pages 113–130. Springer, 2011
work page 2010
-
[38]
Improved pre- cision and recall metric for assessing generative models
Tuomas Kynk ¨a¨anniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved pre- cision and recall metric for assessing generative models. Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[39]
The flow map of the fokker–planck equation does not provide optimal transport
Hugo Lavenant and Filippo Santambrogio. The flow map of the fokker–planck equation does not provide optimal transport. Applied Mathematics Letters, page 108225, 2022
work page 2022
-
[40]
Nu-wave: A diffusion probabilistic model for neural audio upsam- pling
Junhyeok Lee and Seungu Han. Nu-wave: A diffusion probabilistic model for neural audio upsam- pling. arXiv preprint arXiv:2104.02321, 2021
-
[41]
Diffusion-lm improves controllable text generation
Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, and Tatsunori B Hashimoto. Diffusion- lm improves controllable text generation. arXiv preprint arXiv:2205.14217, 2022
-
[42]
On rectified flow and optimal coupling
Qiang Liu. On rectified flow and optimal coupling. preprint, 2022
work page 2022
-
[43]
Fusedream: Training-free text-to-image generation with improved clip+ gan space optimization
Xingchao Liu, Chengyue Gong, Lemeng Wu, Shujian Zhang, Hao Su, and Qiang Liu. Fusedream: Training-free text-to-image generation with improved clip+ gan space optimization. arXiv preprint arXiv:2112.01573, 2021
-
[44]
Let us build bridges: Understanding and extending diffusion generative models
Xingchao Liu, Lemeng Wu, Mao Ye, and Qiang Liu. Let us build bridges: Understanding and extending diffusion generative models. arXiv preprint arXiv:2208.14699, 2022
-
[45]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[46]
Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint arXiv:2206.00927, 2022
-
[47]
Eric Luhman and Troy Luhman. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021
-
[48]
Accelerating diffusion models via early stop of the diffusion process
Zhaoyang Lyu, Xudong Xu, Ceyuan Yang, Dahua Lin, and Bo Dai. Accelerating diffusion models via early stop of the diffusion process. arXiv preprint arXiv:2205.12524, 2022. 32
-
[49]
Optimal transport mapping via input convex neural networks
Ashok Makkuva, Amirhossein Taghvaei, Sewoong Oh, and Jason Lee. Optimal transport mapping via input convex neural networks. In International Conference on Machine Learning, pages 6672–6681. PMLR, 2020
work page 2020
-
[50]
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
Chenlin Meng, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021
work page internal anchor Pith review arXiv 2021
-
[51]
Engel, Curtis Hawthorne, and Ian Simon
Gautam Mittal, Jesse Engel, Curtis Hawthorne, and Ian Simon. Symbolic music generation with diffusion models. arXiv preprint arXiv:2103.16091, 2021
-
[52]
Spectral normalization for generative adversarial networks
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018
work page 2018
-
[53]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021
work page internal anchor Pith review arXiv 2021
-
[54]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021
work page 2021
-
[55]
Ot-flow: Fast and accurate contin- uous normalizing flows via optimal transport
Derek Onken, Samy Wu Fung, Xingjian Li, and Lars Ruthotto. Ot-flow: Fast and accurate contin- uous normalizing flows via optimal transport. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9223–9232, 2021
work page 2021
-
[56]
Normalizing flows for probabilistic modeling and inference
George Papamakarios, Eric T Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lak- shminarayanan. Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res., 22(57):1–64, 2021
work page 2021
-
[57]
Non-denoising forward-time diffusions
Stefano Peluchetti. Non-denoising forward-time diffusions. 2021
work page 2021
-
[58]
Moment matching for multi-source domain adaptation
Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1406–1415, 2019
work page 2019
-
[59]
Computational optimal transport: With applications to data sci- ence
Gabriel Peyr ´e, Marco Cuturi, et al. Computational optimal transport: With applications to data sci- ence. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019
work page 2019
-
[60]
Grad-tts: A diffusion probabilistic model for text-to-speech
Vadim Popov, Ivan V ovk, Vladimir Gogoryan, Tasnima Sadekova, and Mikhail Kudinov. Grad-tts: A diffusion probabilistic model for text-to-speech. In International Conference on Machine Learning, pages 8599–8608. PMLR, 2021
work page 2021
-
[61]
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text- conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022
work page internal anchor Pith review arXiv 2022
-
[62]
Variational inference with normalizing flows
Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In International conference on machine learning, pages 1530–1538. PMLR, 2015
work page 2015
-
[63]
arXiv preprint arXiv:2110.02999 , year =
Litu Rout, Alexander Korotin, and Evgeny Burnaev. Generative modeling with optimal transport maps. arXiv preprint arXiv:2110.02999, 2021. 33
-
[64]
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022
work page internal anchor Pith review arXiv 2022
-
[65]
Optimal transport for applied mathematicians
Filippo Santambrogio. Optimal transport for applied mathematicians. Birk¨auser, NY, 55(58-63):94, 2015
work page 2015
-
[66]
StyleGAN-XL: Scaling StyleGAN to large diverse datasets
Axel Sauer, Katja Schwarz, and Andreas Geiger. StyleGAN-XL: Scaling StyleGAN to large diverse datasets. In Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings, pages 1–10, 2022
work page 2022
-
[67]
B., Flamary, R., Courty, N., Rolet, A., & Blondel, M
Vivien Seguy, Bharath Bhushan Damodaran, R ´emi Flamary, Nicolas Courty, Antoine Rolet, and Mathieu Blondel. Large-scale optimal transport and mapping estimation. arXiv preprint arXiv:1711.02283, 2017
-
[68]
D2C: Diffusion-decoding models for few-shot conditional generation
Abhishek Sinha, Jiaming Song, Chenlin Meng, and Stefano Ermon. D2C: Diffusion-decoding models for few-shot conditional generation. Advances in Neural Information Processing Systems, 34:12533– 12548, 2021
work page 2021
-
[69]
Super-convergence: Very fast training of neural networks using large learning rates
Leslie N Smith and Nicholay Topin. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial intelligence and machine learning for multi-domain operations applications, volume 11006, pages 369–386. SPIE, 2019
work page 2019
-
[70]
Denoising diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In Interna- tional Conference on Learning Representations, 2020
work page 2020
-
[71]
Generative modeling by estimating gradients of the data distribution
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[72]
Improved techniques for training score-based generative models
Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020
work page 2020
-
[73]
Score-based generative modeling through stochastic differential equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2020
work page 2020
-
[74]
Maximum likelihood training of score- based diffusion models
Yang Song, Conor Durkan, Iain Murray, and Stefano Ermon. Maximum likelihood training of score- based diffusion models. Advances in Neural Information Processing Systems, 34, 2021
work page 2021
-
[75]
Dual diffusion implicit bridges for image-to-image translation.arXiv preprint arXiv:2203.08382,
Xuan Su, Jiaming Song, Chenlin Meng, and Stefano Ermon. Dual diffusion implicit bridges for image-to-image translation. arXiv preprint arXiv:2203.08382, 2022
-
[76]
Deep coral: Correlation alignment for deep domain adaptation
Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation. In European conference on computer vision, pages 443–450. Springer, 2016
work page 2016
-
[77]
Efficientnet: Rethinking model scaling for convolutional neural net- works
Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural net- works. In International conference on machine learning, pages 6105–6114. PMLR, 2019
work page 2019
-
[78]
Anastasiya Tanana. Comparison of transport map generated by heat flow interpolation and the optimal transport brenier map. Communications in Contemporary Mathematics, 23(06):2050025, 2021. 34
work page 2021
-
[79]
Giulio Trigila and Esteban G Tabak. Data-driven optimal transport. Communications on Pure and Applied Mathematics, 69(4):613–648, 2016
work page 2016
-
[80]
Theoretical guarantees for sampling and inference in generative models with latent diffusions
Belinda Tzen and Maxim Raginsky. Theoretical guarantees for sampling and inference in generative models with latent diffusions. In Conference on Learning Theory, pages 3084–3114. PMLR, 2019
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.