arxiv: 2605.11755 · v1 · submitted 2026-05-12 · 💻 cs.LG · cs.CV· stat.ML

Recognition: 2 theorem links

· Lean Theorem

One-Step Generative Modeling via Wasserstein Gradient Flows

Emmanuel J. Cand\`es, Jiaqi Han, Puheng Li, Qiushan Guo, Renyuan Xu, Stefano Ermon

Pith reviewed 2026-05-13 07:41 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML

keywords generative modelingWasserstein gradient flowone-step generationSinkhorn divergenceoptimal transportdiffusion modelsImageNet

0 comments

The pith

W-Flow achieves one-step ImageNet 256x256 generation at 1.29 FID by training a neural network to compress a Wasserstein gradient flow.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces W-Flow to create a generator that maps simple reference samples to target data in a single forward pass. It first evolves the reference distribution toward the target by following a Wasserstein gradient flow that minimizes an energy functional given by the Sinkhorn divergence. A static neural network is then trained to approximate the full continuous evolution at once. This produces better mode coverage and domain transfer than prior one-step methods while delivering sampling speeds roughly 100 times faster than multi-step diffusion models with comparable quality. A reader would care because the approach replaces expensive iterative sampling with a principled, transport-based shortcut that still reaches high fidelity.

Core claim

W-Flow defines an evolution from reference to target distribution through a Wasserstein gradient flow minimizing the Sinkhorn divergence energy functional, then trains a static neural generator to realize this entire evolution in one step. The finite-sample training dynamics converge to the continuous-time distributional dynamics under suitable assumptions. Empirically the resulting model reaches 1.29 FID on one-step ImageNet 256x256 generation, improves mode coverage and domain transfer, and yields approximately 100 times faster sampling than multi-step diffusion models with similar FID scores.

What carries the argument

The Wasserstein gradient flow of the Sinkhorn divergence energy functional, compressed into a single forward pass by a static neural generator.

Load-bearing premise

Finite-sample training dynamics converge to the continuous-time Wasserstein gradient flow dynamics under suitable assumptions.

What would settle it

A direct comparison showing that samples from the trained one-step generator deviate from the distribution reached by running the full multi-step Wasserstein flow on the same reference inputs.

Figures

Figures reproduced from arXiv: 2605.11755 by Emmanuel J. Cand\`es, Jiaqi Han, Puheng Li, Qiushan Guo, Renyuan Xu, Stefano Ermon.

**Figure 1.** Figure 1: (Left) 1-NFE samples from W-Flow-L/2 trained from scratch on ImageNet-256×256. (Right) Sample quality (measured by FID) vs. effective sampling compute [39] (billion parameters × number of function evaluations during sampling) evaluated on ImageNet 256×256. target distribution in one step. This would combine the efficiency of one-step generation with the flexibility of a distributional evolution during trai… view at source ↗

**Figure 2.** Figure 2: (a) The conceptual diagram of W-Flow. (b) Visualization of the training dynamics projected onto the Sinkhorn divergence landscape on 8 Gaussian mixtures, shown on a logarithmic scale. ing a few/one-step generator from scratch, typically by enforcing certain self-consistency conditions on the trajectory [18, 19, 4, 55] or the intermediate marginals [70]. These methods largely inherit their training signal f… view at source ↗

**Figure 3.** Figure 3: Comparison between onebatch and two-batch estimators on learning a 2D Gaussian. where Π(qbt, pb) is the set of matrices with prescribed marginals. Denote the optimal solution π ε,∗ qbt,pb . Two-batch estimate for self-transport. Naïvely estimating the self-entropic OT term OTε(qbt, qbt) from a single empirical batch introduces a self-matching artifact: since each particle can be matched to itself at zero … view at source ↗

**Figure 4.** Figure 4: Classifier-free guidance. Left: The FID and Inception Score curve when sweeping over CFG scales. Right: Image samples by W-Flow, L/2 with CFG increasing from 0.0 to 2.0. 1-NFE sampling, W-Flow outperforms most diffusion models requiring up to 250 steps, such as LightningDiT-XL/2. These strong empirical results support our central claim that principled WGF dynamics can translate into exceptional generation … view at source ↗

**Figure 5.** Figure 5: (a) Oval-to-circle domain transfer. Source and target are constructed by sampling angles uniformly from [0, 2π) with parametric curves corrupted by Gaussian noise. (b) & (c) One-step facial age translation on FFHQ, mapping older faces to younger ones. (b) Histogram of the latent ℓ2 distance between 2,000 source images and their generated targets. (c) Visual comparison. (a) Drifting (b) W-Flow Drifting W-Fl… view at source ↗

**Figure 6.** Figure 6: Evaluation of mode coverage under imbalanced target distributions. (a) Evaluation of mode coverage on a 2D Gaussian mixtures dataset featuring six dominant modes and two distant minority modes. (b) PCA scatter plot of generated latent codes for an artificially imbalanced FFHQ target distribution (95% senior faces, 5% child faces). See Appendix F for generated samples showing the comparison of mode coverage… view at source ↗

**Figure 7.** Figure 7: Evaluation of self-transport estimators on a 2D Gaussian mixtures dataset featuring six [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of velocity guidance and distribution guidance for conditional generation on a [PITH_FULL_IMAGE:figures/full_fig_p029_8.png] view at source ↗

**Figure 9.** Figure 9: Illustrations on the difference in the velocity field computation between Drifting Model [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 10.** Figure 10: Uncurated samples generated by W-Flow, L/2 with CFG [PITH_FULL_IMAGE:figures/full_fig_p033_10.png] view at source ↗

**Figure 11.** Figure 11: Uncurated samples generated by W-Flow, XL/2 [PITH_FULL_IMAGE:figures/full_fig_p034_11.png] view at source ↗

**Figure 12.** Figure 12: Uncurated samples generated by W-Flow, XL/2 with CFG [PITH_FULL_IMAGE:figures/full_fig_p036_12.png] view at source ↗

**Figure 13.** Figure 13: Uncurated samples generated by Drifting Model in the mode coverage experiment [PITH_FULL_IMAGE:figures/full_fig_p037_13.png] view at source ↗

**Figure 14.** Figure 14: Uncurated samples generated by W-Flow in the mode coverage experiment (Sec. [PITH_FULL_IMAGE:figures/full_fig_p038_14.png] view at source ↗

read the original abstract

Diffusion models and flow-based methods have shown impressive generative capability, especially for images, but their sampling is expensive because it requires many iterative updates. We introduce W-Flow, a framework for training a generator that transforms samples from a simple reference distribution into samples from a target data distribution in a single step. This is achieved in two steps: we first define an evolution from the reference distribution to the target distribution through a Wasserstein gradient flow that minimizes an energy functional; second, we train a static neural generator to compress this evolution into one-step generation. We instantiate the energy functional with the Sinkhorn divergence, which yields an efficient optimal-transport-based update rule that captures global distributional discrepancy and improves coverage of the target distribution. We further prove that the finite-sample training dynamics converge to the continuous-time distributional dynamics under suitable assumptions. Empirically, W-Flow sets a new state of the art for one-step ImageNet 256$\times$256 generation, achieving 1.29 FID, with improved mode coverage and domain transfer. Compared to multi-step diffusion models with similar FID scores, our method yields approximately 100$\times$ faster sampling. These results show that Wasserstein gradient flows provide a principled and effective foundation for fast and high-fidelity generative modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

W-Flow shows how to compress a Wasserstein gradient flow into a one-step generator and hits 1.29 FID on ImageNet 256x256, but the convergence claim rests on unspecified assumptions.

read the letter

Here's the quick take on this W-Flow paper: it gives a clean way to build a one-step generator by compressing a Wasserstein gradient flow trajectory, and the ImageNet results are competitive enough to notice. They define the flow using Sinkhorn divergence as the energy functional, which provides an efficient way to measure and minimize distributional differences. Then they train a neural network to perform the entire mapping from reference to target in a single forward pass. This is new compared to standard diffusion or CNF approaches, as it avoids multiple steps entirely. The paper does well on the practical side. The reported 1.29 FID score for 256x256 ImageNet generation, along with better mode coverage and domain transfer, shows that the method works in practice. The 100x speedup over comparable diffusion models is a real advantage for applications needing quick sampling. Where it gets softer is the convergence result. The claim that finite-sample training dynamics match the continuous Wasserstein flow relies on suitable assumptions that aren't spelled out in detail. For complex image distributions, those assumptions around boundedness or convergence rates might not hold without additional checks, so the theoretical guarantee feels a bit loose until the full proof is examined. This work is for folks in generative modeling who care about inference speed without sacrificing too much quality. It deserves a serious referee because the empirical claims are specific and the core idea is well-motivated, even if the theory needs tightening on the assumptions.

Referee Report

1 major / 2 minor

Summary. The paper introduces W-Flow, a two-stage framework that first evolves samples from a reference distribution to a target data distribution via a Wasserstein gradient flow minimizing a Sinkhorn-divergence energy functional, then trains a static neural generator to compress this continuous evolution into a single forward pass. It asserts a convergence result for finite-sample training dynamics to the continuous-time flow under suitable assumptions, and reports new state-of-the-art one-step performance on ImageNet 256×256 (1.29 FID) together with improved mode coverage, domain transfer, and roughly 100× faster sampling than multi-step diffusion models of comparable FID.

Significance. If the convergence result can be made rigorous and the empirical gains hold under controlled ablations, the work would supply a principled optimal-transport route to high-fidelity one-step generation that improves upon both diffusion and existing one-step baselines in coverage and speed, with potential impact on downstream tasks requiring fast sampling.

major comments (1)

[Abstract and convergence theorem] Abstract and theoretical development: the central claim that the trained one-step generator faithfully realizes the Wasserstein flow rests on a convergence statement for finite-sample dynamics that is conditioned on unspecified 'suitable assumptions.' Because the 1.29 FID result is presented as evidence that the discrete network compresses the continuous dynamics, the precise conditions (regularity of the energy functional, Lipschitz bounds on the velocity field, uniform convergence rates of empirical measures, or control of discretization error in 256×256 image space) must be stated explicitly and shown to be satisfied; without them the link between theory and the reported FID remains unverified.

minor comments (2)

[Method section] The precise definition of the Sinkhorn-regularized energy functional and the architecture/hyper-parameters of the one-step generator should be moved from supplementary material into the main text to support reproducibility of the 1.29 FID number.
[Experiments] Figure captions and experimental tables should explicitly report the number of function evaluations and wall-clock time per sample when claiming the 100× speedup relative to diffusion baselines.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review. The feedback on clarifying the convergence result is well-taken and will strengthen the manuscript. We respond point-by-point below.

read point-by-point responses

Referee: [Abstract and convergence theorem] Abstract and theoretical development: the central claim that the trained one-step generator faithfully realizes the Wasserstein flow rests on a convergence statement for finite-sample dynamics that is conditioned on unspecified 'suitable assumptions.' Because the 1.29 FID result is presented as evidence that the discrete network compresses the continuous dynamics, the precise conditions (regularity of the energy functional, Lipschitz bounds on the velocity field, uniform convergence rates of empirical measures, or control of discretization error in 256×256 image space) must be stated explicitly and shown to be satisfied; without them the link between theory and the reported FID remains unverified.

Authors: We agree that the assumptions require explicit statement to make the theoretical-empirical connection rigorous. In the revision we will expand the theorem (Section 3) to list them verbatim: (i) the Sinkhorn energy is λ-convex and C²-smooth w.r.t. the 2-Wasserstein metric for ε>0; (ii) the resulting velocity field is globally L-Lipschitz; (iii) the empirical measures satisfy a uniform Glivenko–Cantelli property with rate O(n^{-1/2} log n) under the covering numbers of the RKHS induced by the kernel; (iv) the Euler–Maruyama discretization error is O(Δt) uniformly on compact time intervals when the velocity is bounded. We will add a short verification paragraph showing that (i)–(iii) hold for the entropic Sinkhorn divergence on the image manifold (citing standard OT regularity results) and that (iv) is controlled by our chosen step-size schedule. The 1.29 FID remains an empirical illustration of practical performance; the revised theorem will now make the approximation guarantee precise rather than conditional on unspecified assumptions. revision: yes

Circularity Check

0 steps flagged

No circularity detected in the derivation chain

full rationale

The paper defines an evolution via Wasserstein gradient flow minimizing an energy functional instantiated with Sinkhorn divergence, then trains a neural generator to compress the flow into one step. This is a standard two-stage procedure using established optimal transport geometry and neural approximation; the claimed one-step generator is optimized against the flow rather than defined to equal it by construction. The convergence of finite-sample dynamics is asserted under suitable assumptions without any equation reducing the reported FID or sampling speed directly to a fitted internal parameter. No load-bearing self-citation, uniqueness theorem imported from prior author work, or ansatz smuggled via citation appears in the provided text. The ImageNet results are presented as empirical outcomes, not forced predictions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of a Wasserstein gradient flow for the chosen energy and on the ability of a neural network to approximate its finite-time evolution; both are standard in the literature but invoked without new justification here.

free parameters (1)

Sinkhorn regularization strength
Controls the approximation quality of the divergence and must be chosen or tuned for each dataset.

axioms (1)

domain assumption Finite-sample training dynamics converge to continuous-time distributional dynamics under suitable assumptions
Invoked to justify that the trained generator faithfully follows the flow; assumptions left unspecified in abstract.

pith-pipeline@v0.9.0 · 5541 in / 1304 out tokens · 44302 ms · 2026-05-13T07:41:20.795142+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel (J uniqueness) unclear
We model the evolution of {q(k)} via a WGF... V_t(x) = -∇ δF/δq (q_t)(x) ... instantiate ... Sinkhorn divergence ... prove finite-sample training dynamics converge ... under suitable assumptions (Assumption A.1)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
Theorem 3.1 (Informal) ... sup W2(bqN,M,η_t , q_t) → 0 as η→0, N,M→∞

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 18 internal anchors

[1]

Building Normalizing Flows with Stochastic Interpolants

Michael S Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants.arXiv preprint arXiv:2209.15571, 2022. 3

work page internal anchor Pith review Pith/arXiv arXiv 2022
[2]

LightSBB-M: Bridging Schr\"odinger and Bass for Generative Diffusion Modeling

Alexandre Alouadi, Pierre Henry-Labordère, Grégoire Loeper, Othmane Mazhar, Huyên Pham, and Nizar Touzi. Lightsbb-m: Bridging schrödinger and bass for generative diffusion modeling. 11 arXiv preprint arXiv:2601.19312, 2026. 3

work page internal anchor Pith review Pith/arXiv arXiv 2026
[3]

Wasserstein generative adversarial networks

Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. InInternational conference on machine learning, pages 214–223. Pmlr, 2017. 2, 3

work page 2017
[4]

How to build a consistency model: Learning flow maps via self-distillation.arXiv preprint arXiv:2505.18825, 2025

Nicholas M Boffi, Michael S Albergo, and Eric Vanden-Eijnden. How to build a consistency model: Learning flow maps via self-distillation.arXiv preprint arXiv:2505.18825, 2025. 3

work page arXiv 2025
[5]

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale GAN training for high fidelity natural image synthesis.arXiv preprint arXiv:1809.11096, 2018. 9

work page internal anchor Pith review arXiv 2018
[6]

Gradient flow drifting: Generative modeling via wasserstein gradient flows of kde-approximated divergences.arXiv preprint arXiv:2603.10592,

Jiarui Cao, Zixuan Wei, and Yuxin Liu. Gradient flow drifting: Generative modeling via wasserstein gradient flows of kde-approximated divergences.arXiv preprint arXiv:2603.10592,

work page arXiv
[7]

Maskgit: Masked generative image transformer

Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, and William T Freeman. Maskgit: Masked generative image transformer. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11315–11325, 2022. 9

work page 2022
[8]

Scalable Wasserstein gradient flow for generative modeling through unbalanced optimal transport

Jaemoo Choi, Jaewoong Choi, and Myungjoo Kang. Scalable Wasserstein gradient flow for generative modeling through unbalanced optimal transport. InProceedings of the 41st Inter- national Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 8629–8650. PMLR, 21–27 Jul 2024. 3, 29

work page 2024
[9]

Diffusion schrödinger bridge with applications to score-based generative modeling.arXiv preprint arXiv:2106.01357,

Valentin De Bortoli, James Thornton, Jeremy Heng, and Arnaud Doucet. Diffusion schrödinger bridge with applications to score-based generative modeling.arXiv preprint arXiv:2106.01357,

work page arXiv
[10]

ImageNet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. InCVPR, pages 248–255. Ieee, 2009. 8

work page 2009
[11]

Generative Modeling via Drifting

Mingyang Deng, He Li, Tianhong Li, Yilun Du, and Kaiming He. Generative modeling via drifting.arXiv preprint arXiv:2602.04770, 2026. 2, 3, 4, 6, 8, 9, 10, 26, 28, 29, 30, 32

work page internal anchor Pith review arXiv 2026
[12]

Diffusion models beat GANs on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat GANs on image synthesis. NeurIPS, 34:8780–8794, 2021. 8, 9

work page 2021
[13]

Variational wasser- stein gradient flow.arXiv preprint arXiv:2112.02424, 2021

Jiaojiao Fan, Qinsheng Zhang, Amirhossein Taghvaei, and Yongxin Chen. Variational wasser- stein gradient flow.arXiv preprint arXiv:2112.02424, 2021. 3

work page arXiv 2021
[14]

Interpolating between optimal transport and mmd using sinkhorn divergences

Jean Feydy, Thibault Séjourné, François-Xavier Vialard, Shun-ichi Amari, Alain Trouvé, and Gabriel Peyré. Interpolating between optimal transport and mmd using sinkhorn divergences. InThe 22nd international conference on artificial intelligence and statistics, pages 2681–2690. PMLR, 2019. 6

work page 2019
[15]

One step diffusion via shortcut models.arXiv preprint arXiv:2410.12557, 2024

Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One step diffusion via shortcut models.arXiv preprint arXiv:2410.12557, 2024. 9

work page arXiv 2024
[16]

Drift-Based Policy Optimization: Native One-Step Policy Learning for Online Robot Control

Yuxuan Gao, Yedong Shen, Shiqi Zhang, Wenhao Yu, Yifan Duan, Jiajia Wu, Jiajun Deng, Yanyong Zhang, et al. Drift-based policy optimization: Native one-step policy learning for online robot control.arXiv preprint arXiv:2604.03540, 2026. 3

work page internal anchor Pith review Pith/arXiv arXiv 2026
[17]

Learning generative models with sinkhorn divergences

Aude Genevay, Gabriel Peyré, and Marco Cuturi. Learning generative models with sinkhorn divergences. InInternational Conference on Artificial Intelligence and Statistics, pages 1608–

work page
[18]

3, 6, 30

PMLR, 2018. 3, 6, 30

work page 2018
[19]

Mean Flows for One-step Generative Modeling

Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling.arXiv preprint arXiv:2505.13447, 2025. 3, 9

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Improved Mean Flows: On the Challenges of Fastforward Generative Models

Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J Zico Kolter, and Kaiming He. Improved mean flows: On the challenges of fastforward generative models.arXiv preprint arXiv:2512.02012, 2025. 3, 9

work page internal anchor Pith review Pith/arXiv arXiv 2025
[21]

Generative adversarial nets.NeurIPS, 2014

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.NeurIPS, 2014. 2, 3 12

work page 2014
[22]

Improved training of wasserstein gans.Advances in neural information processing systems, 30,

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans.Advances in neural information processing systems, 30,

work page
[23]

The wasserstein gradient flow of the sinkhorn divergence between gaussian distributions.arXiv preprint arXiv:2602.10726, 2026

Mathis Hardion and Théo Lacombe. The wasserstein gradient flow of the sinkhorn divergence between gaussian distributions.arXiv preprint arXiv:2602.10726, 2026. 5

work page arXiv 2026
[24]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR, pages 770–778, 2016. 10

work page 2016
[25]

Sinkhorn-drifting generative models.arXiv preprint arXiv:2603.12366, 2026

Ping He, Om Khangaonkar, Hamed Pirsiavash, Yikun Bai, and Soheil Kolouri. Sinkhorn-drifting generative models.arXiv preprint arXiv:2603.12366, 2026. 3

work page arXiv 2026
[26]

GANs trained by a two time-scale update rule converge to a local nash equilibrium.NeurIPS,

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local nash equilibrium.NeurIPS,

work page
[27]

Denoising diffusion probabilistic models.NeurIPS, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.NeurIPS, 33:6840–6851, 2020. 1, 2

work page 2020
[28]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022. 8

work page internal anchor Pith review Pith/arXiv arXiv 2022
[29]

The variational formulation of the fokker– planck equation.SIAM journal on mathematical analysis, 29(1):1–17, 1998

Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formulation of the fokker– planck equation.SIAM journal on mathematical analysis, 29(1):1–17, 1998. 3

work page 1998
[30]

Scaling up GANs for text-to-image synthesis

Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, and Taesung Park. Scaling up GANs for text-to-image synthesis. InCVPR, pages 10124–10134,

work page
[31]

Marlowe: Stanford’s gpu-based computational instrument, 2025

Craig Kapfer, Kurt Stine, Balasubramanian Narasimhan, Christopher Mentzel, and Emmanuel Candes. Marlowe: Stanford’s gpu-based computational instrument, 2025. 11

work page 2025
[32]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019. 10

work page 2019
[33]

A unified view of drifting and score-based models.arXiv preprint arXiv:2603.07514, 2026

Chieh-Hsin Lai, Bac Nguyen, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon, and Molei Tao. A unified view of drifting and score-based models.arXiv preprint arXiv:2603.07514, 2026. 3

work page arXiv 2026
[34]

Autoregressive image generation without vector quantization.NeurIPS, 37:56424–56445, 2024

Tianhong Li, Yonglong Tian, He Li, Mingyang Deng, and Kaiming He. Autoregressive image generation without vector quantization.NeurIPS, 37:56424–56445, 2024. 9

work page 2024
[35]

Generative moment matching networks

Yujia Li, Kevin Swersky, and Rich Zemel. Generative moment matching networks. InICML, pages 1718–1727. PMLR, 2015. 5, 30

work page 2015
[36]

Generative Drifting for Conditional Medical Image Generation

Zirong Li, Siyuan Mei, Weiwen Wu, Andreas Maier, Lina Gölz, and Yan Xia. Generative drifting for conditional medical image generation.arXiv preprint arXiv:2604.19736, 2026. 3

work page internal anchor Pith review Pith/arXiv arXiv 2026
[37]

Adversarial Flow Models

Shanchuan Lin, Ceyuan Yang, Zhijie Lin, Hao Chen, and Haoqi Fan. Adversarial flow models. arXiv preprint arXiv:2511.22475, 2025. 3, 9

work page internal anchor Pith review Pith/arXiv arXiv 2025
[38]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022. 1, 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2022
[39]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022. 1, 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2022
[40]

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models.arXiv preprint arXiv:2410.11081, 2024. 2

work page internal anchor Pith review arXiv 2024
[41]

Schrödinger bridge for generative ai: Soft-constrained formulation and convergence analysis.arXiv preprint arXiv:2510.11829, 2025

Jin Ma, Ying Tan, and Renyuan Xu. Schrödinger bridge for generative ai: Soft-constrained formulation and convergence analysis.arXiv preprint arXiv:2510.11829, 2025. 3 13

work page arXiv 2025
[42]

SiT: Exploring flow and diffusion-based generative models with scalable interpolant transformers

Nanye Ma, Mark Goldstein, Michael S Albergo, Nicholas M Boffi, Eric Vanden-Eijnden, and Saining Xie. SiT: Exploring flow and diffusion-based generative models with scalable interpolant transformers. InECCV, pages 23–40. Springer, 2024. 9, 31, 32

work page 2024
[43]

Large-scale wasserstein gradient flows.Advances in Neural Information Processing Systems, 34:15243–15256, 2021

Petr Mokrov, Alexander Korotin, Lingxiao Li, Aude Genevay, Justin M Solomon, and Evgeny Burnaev. Large-scale wasserstein gradient flows.Advances in Neural Information Processing Systems, 34:15243–15256, 2021. 3

work page 2021
[44]

Entropic optimal transport: Convergence of potentials

Marcel Nutz and Johannes Wiesel. Entropic optimal transport: Convergence of potentials. Probability Theory and Related Fields, 184(1):401–424, 2022. 6

work page 2022
[45]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InCVPR, pages 4195–4205, 2023. 8, 9

work page 2023
[46]

Now Foundations and Trends, 2019

Gabriel Peyré and Marco Cuturi.Computational optimal transport: With applications to data science. Now Foundations and Trends, 2019. 6

work page 2019
[47]

Adversarial latent autoen- coders

Stanislav Pidhorskyi, Donald A Adjeroh, and Gianfranco Doretto. Adversarial latent autoen- coders. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14104–14113, 2020. 10

work page 2020
[48]

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks.arXiv preprint arXiv:1511.06434, 2015. 2, 3

work page internal anchor Pith review arXiv 2015
[49]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InCVPR, pages 10684–10695, 2022. 8

work page 2022
[50]

Progressive Distillation for Fast Sampling of Diffusion Models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022. 2

work page internal anchor Pith review Pith/arXiv arXiv 2022
[51]

Multistep distillation of diffusion models via moment matching.NeurIPS, 37:36046–36070, 2024

Tim Salimans, Thomas Mensink, Jonathan Heek, and Emiel Hoogeboom. Multistep distillation of diffusion models via moment matching.NeurIPS, 37:36046–36070, 2024. 2

work page 2024
[52]

StyleGAN-XL: Scaling StyleGAN to large diverse datasets

Axel Sauer, Katja Schwarz, and Andreas Geiger. StyleGAN-XL: Scaling StyleGAN to large diverse datasets. InSIGGRAPH, pages 1–10, 2022. 9

work page 2022
[53]

Concerning nonnegative matrices and doubly stochastic matrices.Pacific Journal of Mathematics, 21(2):343–348, 1967

Richard Sinkhorn and Paul Knopp. Concerning nonnegative matrices and doubly stochastic matrices.Pacific Journal of Mathematics, 21(2):343–348, 1967. 7

work page 1967
[54]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InICML, pages 2256–2265. pmlr, 2015. 1, 2

work page 2015
[55]

Improved Tech- niques for Training Consistency Models

Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models.arXiv preprint arXiv:2310.14189, 2023. 2, 9

work page arXiv 2023
[56]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. 2023. 2, 3

work page 2023
[57]

Generative modeling by estimating gradients of the data distribution.Advances in neural information processing systems, 32, 2019

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution.Advances in neural information processing systems, 32, 2019. 1

work page 2019
[58]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2011
[59]

Visual autoregressive modeling: Scalable image generation via next-scale prediction.NeurIPS, 37:84839–84865,

Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, and Liwei Wang. Visual autoregressive modeling: Scalable image generation via next-scale prediction.NeurIPS, 37:84839–84865,

work page
[60]

Wasserstein auto- encoders.arXiv preprint arXiv:1711.01558, 2017

Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. Wasserstein auto- encoders.arXiv preprint arXiv:1711.01558, 2017. 3 14

work page arXiv 2017
[61]

Generative drifting is secretly score matching: a spectral and variational perspective.arXiv preprint arXiv:2603.09936, 2026

Erkan Turan and Maks Ovsjanikov. Generative drifting is secretly score matching: a spectral and variational perspective.arXiv preprint arXiv:2603.09936, 2026. 3

work page arXiv 2026
[62]

Pro- lificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation

Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Pro- lificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in neural information processing systems, 36:8406–8441, 2023. 2

work page 2023
[63]

Transition models: Rethinking the generative learning objective.arXiv preprint arXiv:2509.04394, 2025

Zidong Wang, Yiyuan Zhang, Xiaoyu Yue, Xiangyu Yue, Yangguang Li, Wanli Ouyang, and Lei Bai. Transition models: Rethinking the generative learning objective.arXiv preprint arXiv:2509.04394, 2025. 9

work page arXiv 2025
[64]

Flow-based generative models as iterative algorithms in probability space.arXiv preprint arXiv:2502.13394, 2025

Yao Xie and Xiuyuan Cheng. Flow-based generative models as iterative algorithms in probability space.arXiv preprint arXiv:2502.13394, 2025. 3

work page arXiv 2025
[65]

Reconstruction vs

Jingfeng Yao, Bin Yang, and Xinggang Wang. Reconstruction vs. generation: Taming opti- mization dilemma in latent diffusion models. InCVPR, pages 15703–15712, 2025. 9, 31, 32

work page 2025
[66]

Improved distribution matching distillation for fast image synthesis

Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and William T Freeman. Improved distribution matching distillation for fast image synthesis. Advances in neural information processing systems, 37:47455–47487, 2024. 2

work page 2024
[67]

One-step diffusion with distribution matching distillation

Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. In CVPR, pages 6613–6623, 2024. 2

work page 2024
[68]

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, and Saining Xie. Representation alignment for generation: Training diffusion transformers is easier than you think.arXiv preprint arXiv:2410.06940, 2024. 9

work page internal anchor Pith review Pith/arXiv arXiv 2024
[69]

AlphaFlow: Understanding and improving MeanFlow models

Huijie Zhang, Aliaksandr Siarohin, Willi Menapace, Michael Vasilkovsky, Sergey Tulyakov, Qing Qu, and Ivan Skorokhodov. AlphaFlow: Understanding and improving MeanFlow models. arXiv preprint arXiv:2510.20771, 2025. 9

work page arXiv 2025
[70]

Diffusion Transformers with Representation Autoencoders

Boyang Zheng, Nanye Ma, Shengbang Tong, and Saining Xie. Diffusion transformers with representation autoencoders.arXiv preprint arXiv:2510.11690, 2025. 9

work page internal anchor Pith review Pith/arXiv arXiv 2025
[71]

Z Rd |x|2 dbqN 0 (x) 1/2 + Z Rd |y|2 dbpM(y) 1/2# .(32) Therefore, by Eq. (30), sup t∈[0,T] e(t)≤C T η

Linqi Zhou, Stefano Ermon, and Jiaming Song. Inductive moment matching.arXiv preprint arXiv:2503.07565, 2025. 2, 3, 30 15 Appendix Table of Contents A Proofs 16 A.1 Complete statement and the proof of Theorem 3.1 . . . . . . . . . . . . . . . . 16 B Additional discussions 22 B.1 Wasserstein gradient flows of energy functionals . . . . . . . . . . . . . . ...

work page arXiv 2025