arxiv: 2303.01469 · v2 · submitted 2023-03-02 · 💻 cs.LG · cs.CV· stat.ML

Recognition: 2 theorem links

· Lean Theorem

Consistency Models

Ilya Sutskever, Mark Chen, Prafulla Dhariwal, Yang Song

Pith reviewed 2026-05-13 15:41 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML

keywords consistency modelsdiffusion modelsgenerative modelsone-step generationimage synthesismodel distillationzero-shot editing

0 comments

The pith

Consistency models generate high-quality samples by directly mapping noise to data in one step.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces consistency models as a new family of generative models that produce high quality samples by directly mapping noise to data. This overcomes the slow iterative sampling process required by diffusion models. The models support fast one-step generation by design while still allowing multistep sampling to trade compute for better quality. They also enable zero-shot data editing tasks such as inpainting and super-resolution without explicit training on those tasks. The approach matters because it achieves new state-of-the-art one-step FID scores on CIFAR-10 and ImageNet 64x64 and can be trained either by distilling existing diffusion models or from scratch as standalone models.

Core claim

We propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either by distilling pre-trained diffusion models, or as standalone generative models altogether, outperforming existing distillation techniques and one-step non-adversarial generative models on standard benchmarks.

What carries the argument

The consistency function, which maps any noisy input at any noise level to the identical clean data output, enforcing consistency along the entire noise trajectory.

If this is right

One-step sampling from consistency models achieves FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64.
Multistep sampling can be applied to trade additional compute for higher sample quality.
Zero-shot editing capabilities such as inpainting, colorization, and super-resolution are available without dedicated training.
Standalone consistency models outperform prior one-step non-adversarial generative models on CIFAR-10, ImageNet 64x64, and LSUN 256x256.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These models could lower the barrier to real-time image synthesis in applications where multiple denoising steps are currently too slow.
The consistency principle might extend naturally to other iterative generative processes such as those used in audio or video synthesis.
Further gains could come from hybridizing consistency training with small amounts of adversarial fine-tuning.
Scaling experiments on higher-resolution datasets would test whether the direct noise-to-data mapping remains stable without additional regularization.

Load-bearing premise

A single learned consistency function can map noise at any level to the same clean output and generalize to zero-shot editing tasks without task-specific supervision.

What would settle it

If one-step samples produced by a trained consistency model show FID scores no better than existing one-step baselines on CIFAR-10 or ImageNet 64x64, or if zero-shot inpainting results contain visible inconsistencies not present in supervised editing methods.

read the original abstract

Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either by distilling pre-trained diffusion models, or as standalone generative models altogether. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step sampling, achieving the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step generation. When trained in isolation, consistency models become a new family of generative models that can outperform existing one-step, non-adversarial generative models on standard benchmarks such as CIFAR-10, ImageNet 64x64 and LSUN 256x256.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Consistency models give a direct noise-to-data map that delivers strong one-step FID scores, but the pairwise loss leaves open whether full trajectory invariance holds in practice.

read the letter

The main point is that this paper defines a consistency function trained to output the same clean image from any point on a diffusion trajectory, which lets you sample in one step while still supporting multi-step refinement. They report 3.55 FID on CIFAR-10 and 6.20 on ImageNet 64x64 for single-step generation, beating prior distillation baselines, and the same models handle zero-shot inpainting and colorization without extra supervision. That combination of speed and editing is the practical win. The standalone training option, without needing a pre-trained diffusion teacher, is also useful and widens the applicability. The empirical section looks solid on the benchmarks they chose. The soft spot is the training objective. The loss only penalizes inconsistency between randomly drawn pairs of timesteps, so it does not directly force the output to be identical for every point along a full continuous trajectory. Small gaps could remain for times not seen in training, which would weaken the guarantee that one-step sampling matches the quality of iterative refinement. The paper likely adds some regularization or empirical checks to mitigate this, but the concern is real enough that it should be addressed more explicitly. This work is aimed at researchers already working on diffusion or fast generative sampling who want lower inference cost without switching to adversarial methods. A reader who cares about practical speed-ups on image tasks will find the numbers and the editing results worth looking at. It deserves peer review because the results are competitive and the framework is distinct enough to be worth discussing, even with the open question on the loss.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces consistency models as a new family of generative models that directly map noise to data, enabling high-quality one-step generation while supporting multi-step refinement and zero-shot editing tasks such as inpainting, colorization, and super-resolution. Models can be trained either by distilling from pre-trained diffusion models or independently from scratch. Extensive experiments report new state-of-the-art FID scores of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step sampling, outperforming prior distillation techniques, with additional results on LSUN 256x256.

Significance. If the central claims hold, this work offers a meaningful advance toward computationally efficient sampling in generative modeling by largely eliminating iterative processes while preserving sample quality. The reported FID improvements on standard benchmarks are substantial, and the zero-shot editing results without task-specific supervision add practical value. The dual training options (distillation and standalone) broaden applicability. Strengths include the scale of empirical validation across datasets and the introduction of a distinct model family that competes with existing one-step non-adversarial generators.

major comments (1)

[§3.2] §3.2: The consistency loss minimizes ||f_θ(x_t, t) - f_θ(x_s, s)|| only over randomly sampled discrete pairs (t, s). This does not enforce exact invariance of f_θ(·, t) to the same x_0 along the full continuous trajectory, which is required for the one-step generation claim (t=1 to t=0) and for reliable zero-shot editing. Residual inconsistencies on unseen times could degrade performance; the manuscript should provide either theoretical bounds on trajectory consistency error or empirical measurements of invariance across dense time grids.

minor comments (2)

[§4.1] §4.1 and Table 1: The one-step FID numbers are presented without reported standard deviations or number of independent runs; adding these would allow readers to assess the statistical reliability of the claimed improvements over baselines.
[Figure 3] Figure 3: The caption and axis labels for the multi-step sampling curves could more explicitly indicate the compute-quality trade-off relative to the one-step baseline.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review of our manuscript. We address the major comment point by point below.

read point-by-point responses

Referee: [§3.2] §3.2: The consistency loss minimizes ||f_θ(x_t, t) - f_θ(x_s, s)|| only over randomly sampled discrete pairs (t, s). This does not enforce exact invariance of f_θ(·, t) to the same x_0 along the full continuous trajectory, which is required for the one-step generation claim (t=1 to t=0) and for reliable zero-shot editing. Residual inconsistencies on unseen times could degrade performance; the manuscript should provide either theoretical bounds on trajectory consistency error or empirical measurements of invariance across dense time grids.

Authors: We appreciate the referee's observation on the formulation of the consistency loss. The loss is indeed defined over discrete pairs (t, s) drawn from the continuous time distribution, rather than enforcing exact invariance at every point along the trajectory. While the repeated sampling of such pairs during training, together with the self-consistency objective, is intended to promote approximate invariance in practice, we acknowledge that this does not constitute a strict guarantee for all unseen times. To address the concern directly, we will add to the revised manuscript a new set of empirical measurements: consistency error evaluated on a dense grid of time points (e.g., 100 uniformly spaced values) not encountered during training, along with plots of ||f_θ(x_t, t) - x_0|| for fixed x_0 across the trajectory. These additions will provide quantitative support for the reliability of one-step generation and zero-shot editing results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results are empirical and self-contained

full rationale

The paper defines consistency models via a loss that enforces pairwise agreement on randomly sampled (t,s) pairs drawn from diffusion trajectories, then evaluates one-step and multi-step sampling performance via FID on held-out benchmarks (CIFAR-10, ImageNet 64x64). This training objective is an approximation to the desired trajectory invariance and does not presuppose the final sample quality or editing behavior. No load-bearing step reduces by construction to a fitted parameter renamed as prediction, a self-citation chain, or an ansatz smuggled from prior work by the same authors. The reported SOTA numbers rest on external benchmark comparison rather than internal redefinition of inputs. The derivation chain therefore remains independent of its measured outputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the learnability of a consistency function that produces identical outputs for all noise levels along a trajectory; this is treated as a domain assumption without independent proof in the abstract.

free parameters (1)

sampling step count
Variable number of steps is used to trade compute for quality, but no specific fitted values are stated in the abstract.

axioms (1)

domain assumption A consistency function exists that maps any point on a diffusion trajectory to the same clean data point
This is the defining property invoked to enable one-step generation.

pith-pipeline@v0.9.0 · 5511 in / 1197 out tokens · 108446 ms · 2026-05-13T15:41:20.707568+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

DAlembert.Inevitability bilinear_family_forced contradicts
Training minimizes ||f_θ(x_t, t) - f_θ(x_s, s)|| for randomly drawn pairs t,s (typically via the consistency loss in §3.2). This objective only penalizes inconsistency on the sampled pairs and does not constrain the function to be exactly constant along the entire continuous trajectory.

Forward citations

Cited by 23 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Query Lower Bounds for Diffusion Sampling
cs.LG 2026-04 unverdicted novelty 8.0

Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
cs.RO 2023-03 accept novelty 8.0

Diffusion Policy models robot actions as a conditional diffusion process, outperforming prior state-of-the-art methods by 46.9% on average across 12 manipulation tasks from four benchmarks.
ExpoCM: Exposure-Aware One-Step Generative Single-Image HDR Reconstruction
cs.CV 2026-05 unverdicted novelty 7.0

ExpoCM enables fast one-step single-image HDR reconstruction via exposure-dependent perturbations and region-conditioned consistency trajectories derived from a probability flow ODE.
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance
cs.LG 2026-04 unverdicted novelty 7.0

FMRG is a training-free, single-trajectory guidance method for flow models derived from optimal control that achieves strong reward alignment with only 3 NFEs.
Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment
cs.RO 2026-04 unverdicted novelty 7.0

VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with margin...
From Competition to Coopetition: Coopetitive Training-Free Image Editing Based on Text Guidance
cs.CV 2026-04 unverdicted novelty 7.0

CoEdit is a zero-shot coopetitive framework for text-guided image editing that uses dual-entropy attention manipulation and entropic latent refinement to improve editing harmony and structural preservation.
Isokinetic Flow Matching for Pathwise Straightening of Generative Flows
cs.LG 2026-04 unverdicted novelty 7.0

Isokinetic Flow Matching adds a lightweight regularization term to flow matching that penalizes acceleration along paths via self-guided finite differences, yielding straighter trajectories and large gains in few-step...
VOSR: A Vision-Only Generative Model for Image Super-Resolution
cs.CV 2026-04 conditional novelty 7.0

VOSR shows that competitive generative image super-resolution with faithful structures can be achieved by training a diffusion-style model from scratch on visual data alone, using a vision encoder for guidance and a r...
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
cs.CV 2023-10 unverdicted novelty 7.0

Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.
Gradient-Free Noise Optimization for Reward Alignment in Generative Models
cs.LG 2026-05 unverdicted novelty 6.0

ZeNO formulates noise optimization for reward alignment as a path-integral control problem solvable via zeroth-order reward evaluations alone, connecting to Langevin dynamics under an Ornstein-Uhlenbeck process.
dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models
cs.LG 2026-05 unverdicted novelty 6.0

dFlowGRPO is a new rate-aware RL method for discrete flow models that outperforms prior GRPO approaches on image generation and matches continuous flow models while supporting broad probability paths.
Tyche: One Step Flow for Efficient Probabilistic Weather Forecasting
cs.LG 2026-05 unverdicted novelty 6.0

Tyche achieves competitive probabilistic weather forecasting skill and calibration using a single-step flow model with JVP-regularized training and rollout finetuning.
GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model
cs.AI 2026-05 unverdicted novelty 6.0

GCCM prevents shortcut collapse in consistency models for graph prediction by using contrastive negative pairs and input feature perturbation, leading to better performance than deterministic baselines.
MetaSR: Content-Adaptive Metadata Orchestration for Generative Super-Resolution
cs.CV 2026-04 unverdicted novelty 6.0

MetaSR adaptively orchestrates metadata in a DiT-based generative SR model to deliver up to 1 dB PSNR gains and 50% bitrate savings across diverse content and degradations.
Pairing Regularization for Mitigating Many-to-One Collapse in GANs
cs.LG 2026-04 unverdicted novelty 6.0

Pairing regularization mitigates intra-mode collapse in GANs by penalizing redundant latent-to-sample mappings, improving recall under collapse-prone conditions or precision under stabilized training.
ELT: Elastic Looped Transformers for Visual Generation
cs.CV 2026-04 unverdicted novelty 6.0

Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.
Unified Video Action Model
cs.RO 2025-02 unverdicted novelty 6.0

UVA learns a joint video-action latent representation with decoupled diffusion decoding heads, enabling a single model to perform accurate fast policy learning, forward/inverse dynamics, and video generation without p...
Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
cs.CV 2026-05 unverdicted novelty 5.0

A simplified one-step diffusion distillation uses pretrained teacher features directly for drifting loss plus a mode coverage term, achieving FID 1.58 on ImageNet-64 and 18.4 on SDXL.
Lightning Unified Video Editing via In-Context Sparse Attention
cs.CV 2026-05 unverdicted novelty 5.0

ISA prunes low-saliency context tokens and routes queries by sharpness to either full or 0-th order Taylor sparse attention, enabling LIVEditor to cut attention latency ~60% while beating prior video editing methods o...
Fast Text-to-Audio Generation with One-Step Sampling via Energy-Scoring and Auxiliary Contextual Representation Distillation
cs.SD 2026-05 unverdicted novelty 5.0

A one-step text-to-audio model using energy-distance training and contextual distillation outperforms prior fast baselines on AudioCaps and achieves up to 8.5x faster inference than the multi-step IMPACT system with c...
A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models
cs.LG 2026-05 unverdicted novelty 4.0

Diffusion, score-based, and flow matching models are unified as instances of learning time-dependent vector fields inducing marginal distributions governed by continuity and Fokker-Planck equations.
OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL
cs.RO 2026-04 unverdicted novelty 4.0

OmniVLA-RL uses a mix-of-transformers architecture and flow-matching reformulated as SDE with group segmented policy optimization to surpass prior VLA models on LIBERO benchmarks.
Discrete Meanflow Training Curriculum
cs.LG 2026-04 unverdicted novelty 4.0

A DMF curriculum initialized from pretrained flow models achieves one-step FID 3.36 on CIFAR-10 after only 2000 epochs by exploiting a discretized consistency property in the Meanflow objective.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · cited by 23 Pith papers · 10 internal anchors

[1]

ediff-i: Text-to-image diffusion models with ensemble of expert denoisers

Balaji, Y., Nah, S., Huang, X., Vahdat, A., Song, J., Kreis, K., Aittala, M., Aila, T., Laine, S., Catanzaro, B., Karras, T., and Liu, M.-Y. ediff-i: Text-to-image diffusion models with ensemble of expert denoisers. arXiv preprint arXiv:2211.01324, 2022

work page arXiv 2022
[2]

S., Januschowski, T., and G \"u nnemann, S

Bilo s , M., Sommer, J., Rangapuram, S. S., Januschowski, T., and G \"u nnemann, S. Neural flows: Efficient alternative to neural odes. Advances in Neural Information Processing Systems, 34: 0 21325--21337, 2021

work page 2021
[3]

Large scale GAN training for high fidelity natural image synthesis

Brock, A., Donahue, J., and Simonyan, K. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=B1xsqj09Fm

work page 2019
[4]

J., Norouzi, M., and Chan, W

Chen, N., Zhang, Y., Zen, H., Weiss, R. J., Norouzi, M., and Chan, W. Wavegrad: Estimating gradients for waveform generation. In International Conference on Learning Representations (ICLR), 2021

work page 2021
[5]

T., Rubanova, Y., Bettencourt, J., and Duvenaud, D

Chen, R. T., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. Neural O rdinary D ifferential E quations. In Advances in neural information processing systems, pp.\ 6571--6583, 2018

work page 2018
[6]

T., Behrmann, J., Duvenaud, D

Chen, R. T., Behrmann, J., Duvenaud, D. K., and Jacobsen, J.-H. Residual flows for invertible generative modeling. In Advances in Neural Information Processing Systems, pp.\ 9916--9926, 2019

work page 2019
[7]

T., Klasky, M

Chung, H., Kim, J., Mccann, M. T., Klasky, M. L., and Ye, J. C. Diffusion posterior sampling for general noisy inverse problems. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=OnD9zGAGT0k

work page 2023
[8]

Imagenet: A large-scale hierarchical image database

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.\ 248--255. Ieee, 2009

work page 2009
[9]

and Nichol, A

Dhariwal, P. and Nichol, A. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021
[10]

NICE : Non-linear independent components estimation

Dinh, L., Krueger, D., and Bengio, Y. NICE : Non-linear independent components estimation. International Conference in Learning Representations Workshop Track, 2015

work page 2015
[11]

Density estimation using real NVP

Dinh, L., Sohl - Dickstein, J., and Bengio, S. Density estimation using real NVP . In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings . OpenReview.net, 2017. URL https://openreview.net/forum?id=HkpbnH9lx

work page 2017
[12]

Genie: Higher-order denoising diffusion solvers

Dockhorn, T., Vahdat, A., and Kreis, K. Genie: Higher-order denoising diffusion solvers. arXiv preprint arXiv:2210.05475, 2022

work page arXiv 2022
[13]

Autogan: Neural architecture search for generative adversarial networks

Gong, X., Chang, S., Jiang, Y., and Wang, Z. Autogan: Neural architecture search for generative adversarial networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 3224--3234, 2019

work page 2019
[14]

Generative adversarial nets

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in neural information processing systems, pp.\ 2672--2680, 2014

work page 2014
[15]

Densely connected normalizing flows

Grci \'c , M., Grubi s i \'c , I., and S egvi \'c , S. Densely connected normalizing flows. Advances in Neural Information Processing Systems, 34: 0 23968--23982, 2021

work page 2021
[16]

Bootstrap your own latent-a new approach to self-supervised learning

Grill, J.-B., Strub, F., Altch \'e , F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33: 0 21271--21284, 2020

work page 2020
[17]

Momentum contrast for unsupervised visual representation learning

He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 9729--9738, 2020

work page 2020
[18]

GANs trained by a two time-scale update rule converge to a local Nash equilibrium

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pp.\ 6626--6637, 2017

work page 2017
[19]

Denoising D iffusion P robabilistic M odels

Ho, J., Jain, A., and Abbeel, P. Denoising D iffusion P robabilistic M odels. Advances in Neural Information Processing Systems, 33, 2020

work page 2020
[20]

Imagen Video: High Definition Video Generation with Diffusion Models

Ho, J., Chan, W., Saharia, C., Whang, J., Gao, R., Gritsenko, A., Kingma, D. P., Poole, B., Norouzi, M., Fleet, D. J., et al. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022 a

work page internal anchor Pith review Pith/arXiv arXiv 2022
[21]

A., Chan, W., Norouzi, M., and Fleet, D

Ho, J., Salimans, T., Gritsenko, A. A., Chan, W., Norouzi, M., and Fleet, D. J. Video diffusion models. In ICLR Workshop on Deep Generative Models for Highly Structured Data, 2022 b . URL https://openreview.net/forum?id=BBelR2NdDZ5

work page 2022
[22]

and Dayan, P

Hyv \"a rinen, A. and Dayan, P. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research (JMLR), 6 0 (4), 2005

work page 2005
[23]

Transgan: Two pure transformers can make one strong gan, and that can scale up

Jiang, Y., Chang, S., and Wang, Z. Transgan: Two pure transformers can make one strong gan, and that can scale up. Advances in Neural Information Processing Systems, 34: 0 14745--14758, 2021

work page 2021
[24]

Progressive growing of GAN s for improved quality, stability, and variation

Karras, T., Aila, T., Laine, S., and Lehtinen, J. Progressive growing of GAN s for improved quality, stability, and variation. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Hk99zCeAb

work page 2018
[25]

Analyzing and improving the image quality of stylegan

Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., and Aila, T. Analyzing and improving the image quality of stylegan. 2020

work page 2020
[26]

Elucidating the design space of diffusion-based generative models

Karras, T., Aittala, M., Aila, T., and Laine, S. Elucidating the design space of diffusion-based generative models. In Proc. NeurIPS, 2022

work page 2022
[27]

Snips: Solving noisy inverse problems stochastically

Kawar, B., Vaksman, G., and Elad, M. Snips: Solving noisy inverse problems stochastically. arXiv preprint arXiv:2105.14951, 2021

work page arXiv 2021
[28]

Denoising diffusion restoration models

Kawar, B., Elad, M., Ermon, S., and Song, J. Denoising diffusion restoration models. In Advances in Neural Information Processing Systems, 2022

work page 2022
[29]

Kingma, D. P. and Dhariwal, P. Glow: Generative flow with invertible 1x1 convolutions. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31, pp.\ 10215--10224. 2018

work page 2018
[30]

Kingma, D. P. and Welling, M. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014

work page 2014
[31]

Diffwave: A versatile diffusion model for audio synthesis.arXiv preprint arXiv:2009.09761, 2020

Kong, Z., Ping, W., Huang, J., Zhao, K., and Catanzaro, B. Diff W ave: A V ersatile D iffusion M odel for A udio S ynthesis. arXiv preprint arXiv:2009.09761, 2020

work page arXiv 2009
[32]

Learning multiple layers of features from tiny images

Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images. 2009

work page 2009
[33]

Improved precision and recall metric for assessing generative models

Kynk \"a \"a nniemi, T., Karras, T., Laine, S., Lehtinen, J., and Aila, T. Improved precision and recall metric for assessing generative models. Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[34]

Vitgan: Training gans with vision transformers

Lee, K., Chang, H., Jiang, L., Zhang, H., Tu, Z., and Liu, C. Vitgan: Training gans with vision transformers. arXiv preprint arXiv:2107.04589, 2021

work page arXiv 2021
[35]

Continuous control with deep reinforcement learning

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[36]

On the variance of the adaptive learning rate and beyond

Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., and Han, J. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265, 2019

work page arXiv 1908
[37]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Liu, X., Gong, C., and Liu, Q. Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[38]

Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps

Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., and Zhu, J. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint arXiv:2206.00927, 2022

work page arXiv 2022
[39]

Luhman and T

Luhman, E. and Luhman, T. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021

work page arXiv 2021
[40]

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Meng, C., Song, Y., Song, J., Wu, J., Zhu, J.-Y., and Ermon, S. Sdedit: Image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[41]

P., Ermon, S., Ho, J., and Salimans, T

Meng, C., Gao, R., Kingma, D. P., Ermon, S., Ho, J., and Salimans, T. On distillation of guided diffusion models. arXiv preprint arXiv:2210.03142, 2022

work page arXiv 2022
[42]

Playing Atari with Deep Reinforcement Learning

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[43]

A., Veness, J., Bellemare, M

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. Human-level control through deep reinforcement learning. nature, 518 0 (7540): 0 529--533, 2015

work page 2015
[44]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., and Chen, M. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[45]

Dual contradistinctive generative autoencoder

Parmar, G., Li, D., Lee, K., and Tu, Z. Dual contradistinctive generative autoencoder. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 823--832, 2021

work page 2021
[46]

Grad- TTS : A diffusion probabilistic model for text-to-speech

Popov, V., Vovk, I., Gogoryan, V., Sadekova, T., and Kudinov, M. Grad- TTS : A diffusion probabilistic model for text-to-speech. arXiv preprint arXiv:2105.06337, 2021

work page arXiv 2021
[47]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., and Chen, M. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[48]

J., Mohamed, S., and Wierstra, D

Rezende, D. J., Mohamed, S., and Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, pp.\ 1278--1286, 2014

work page 2014
[49]

High-resolution image synthesis with latent diffusion models

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 10684--10695, 2022

work page 2022
[50]

Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., Ghasemipour, S. K. S., Ayan, B. K., Mahdavi, S. S., Lopes, R. G., et al. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[51]

and Ho, J

Salimans, T. and Ho, J. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=TIdIXIpzhoI

work page 2022
[52]

Improved techniques for training gans

Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., and Chen, X. Improved techniques for training gans. In Advances in neural information processing systems, pp.\ 2234--2242, 2016

work page 2016
[53]

Stylegan-xl: Scaling stylegan to large diverse datasets

Sauer, A., Schwarz, K., and Geiger, A. Stylegan-xl: Scaling stylegan to large diverse datasets. In ACM SIGGRAPH 2022 conference proceedings, pp.\ 1--10, 2022

work page 2022
[54]

Deep U nsupervised L earning U sing N onequilibrium T hermodynamics

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., and Ganguli, S. Deep U nsupervised L earning U sing N onequilibrium T hermodynamics. In International Conference on Machine Learning, pp.\ 2256--2265, 2015

work page 2015
[55]

Denoising Diffusion Implicit Models

Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[56]

Pseudoinverse-guided diffusion models for inverse problems

Song, J., Vahdat, A., Mardani, M., and Kautz, J. Pseudoinverse-guided diffusion models for inverse problems. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=9_gsMA8MRKQ

work page 2023
[57]

and Ermon, S

Song, Y. and Ermon, S. Generative M odeling by E stimating G radients of the D ata D istribution. In Advances in Neural Information Processing Systems, pp.\ 11918--11930, 2019

work page 2019
[58]

and Ermon, S

Song, Y. and Ermon, S. Improved T echniques for T raining S core- B ased G enerative M odels. Advances in Neural Information Processing Systems, 33, 2020

work page 2020
[59]

Sliced score matching: A scalable approach to density and score estimation

Song, Y., Garg, S., Shi, J., and Ermon, S. Sliced score matching: A scalable approach to density and score estimation. In Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2019, Tel Aviv, Israel, July 22-25, 2019 , pp.\ 204, 2019

work page 2019
[60]

P., Kumar, A., Ermon, S., and Poole, B

Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=PxTIG12RRHS

work page 2021
[61]

Solving inverse problems in medical imaging with score-based generative models

Song, Y., Shen, L., Xing, L., and Ermon, S. Solving inverse problems in medical imaging with score-based generative models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=vaRCHVj0uGI

work page 2022
[62]

and Mayers, D

S \"u li, E. and Mayers, D. F. An introduction to numerical analysis. Cambridge university press, 2003

work page 2003
[63]

Off-policy reinforcement learning for efficient and effective gan architecture search

Tian, Y., Wang, Q., Huang, Z., Li, W., Dai, D., Yang, M., Wang, J., and Fink, O. Off-policy reinforcement learning for efficient and effective gan architecture search. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part VII 16, pp.\ 175--192. Springer, 2020

work page 2020
[64]

Score-based generative modeling in latent space

Vahdat, A., Kreis, K., and Kautz, J. Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34: 0 11287--11302, 2021

work page 2021
[65]

A C onnection B etween S core M atching and D enoising A utoencoders

Vincent, P. A C onnection B etween S core M atching and D enoising A utoencoders. Neural Computation, 23 0 (7): 0 1661--1674, 2011

work page 2011
[66]

P., and Gool, L

Wu, J., Huang, Z., Acharya, D., Li, W., Thoma, J., Paudel, D. P., and Gool, L. V. Sliced wasserstein generative models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 3713--3722, 2019

work page 2019
[67]

Generative latent flow

Xiao, Z., Yan, Q., and Amit, Y. Generative latent flow. arXiv preprint arXiv:1905.10485, 2019

work page arXiv 1905
[68]

Tackling the generative learning trilemma with denoising diffusion GAN s

Xiao, Z., Kreis, K., and Vahdat, A. Tackling the generative learning trilemma with denoising diffusion GAN s. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=JprM0p-q0Co

work page 2022
[69]

Xu, Y., Liu, Z., Tegmark, M., and Jaakkola, T. S. Poisson flow generative models. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=voV_TRqcWh

work page 2022
[70]

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., and Xiao, J. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015

work page internal anchor Pith review arXiv 2015
[71]

arXiv preprint arXiv:2204.13902 , year=

Zhang, Q. and Chen, Y. Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022

work page arXiv 2022
[72]

A., Shechtman, E., and Wang, O

Zhang, R., Isola, P., Efros, A. A., Shechtman, E., and Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018

work page 2018
[73]

Fast sampling of diffusion models via operator learning

Zheng, H., Nie, W., Vahdat, A., Azizzadenesheli, K., and Anandkumar, A. Fast sampling of diffusion models via operator learning. arXiv preprint arXiv:2211.13449, 2022

work page arXiv 2022
[74]

Truncated diffusion probabilistic models and diffusion-based adversarial auto-encoders

Zheng, H., He, P., Chen, W., and Zhou, M. Truncated diffusion probabilistic models and diffusion-based adversarial auto-encoders. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=HDxgaKk956l

work page 2023