Recognition: 2 theorem links
· Lean TheoremConsistency Models
Pith reviewed 2026-05-13 15:41 UTC · model grok-4.3
The pith
Consistency models generate high-quality samples by directly mapping noise to data in one step.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either by distilling pre-trained diffusion models, or as standalone generative models altogether, outperforming existing distillation techniques and one-step non-adversarial generative models on standard benchmarks.
What carries the argument
The consistency function, which maps any noisy input at any noise level to the identical clean data output, enforcing consistency along the entire noise trajectory.
If this is right
- One-step sampling from consistency models achieves FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64.
- Multistep sampling can be applied to trade additional compute for higher sample quality.
- Zero-shot editing capabilities such as inpainting, colorization, and super-resolution are available without dedicated training.
- Standalone consistency models outperform prior one-step non-adversarial generative models on CIFAR-10, ImageNet 64x64, and LSUN 256x256.
Where Pith is reading between the lines
- These models could lower the barrier to real-time image synthesis in applications where multiple denoising steps are currently too slow.
- The consistency principle might extend naturally to other iterative generative processes such as those used in audio or video synthesis.
- Further gains could come from hybridizing consistency training with small amounts of adversarial fine-tuning.
- Scaling experiments on higher-resolution datasets would test whether the direct noise-to-data mapping remains stable without additional regularization.
Load-bearing premise
A single learned consistency function can map noise at any level to the same clean output and generalize to zero-shot editing tasks without task-specific supervision.
What would settle it
If one-step samples produced by a trained consistency model show FID scores no better than existing one-step baselines on CIFAR-10 or ImageNet 64x64, or if zero-shot inpainting results contain visible inconsistencies not present in supervised editing methods.
read the original abstract
Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either by distilling pre-trained diffusion models, or as standalone generative models altogether. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step sampling, achieving the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step generation. When trained in isolation, consistency models become a new family of generative models that can outperform existing one-step, non-adversarial generative models on standard benchmarks such as CIFAR-10, ImageNet 64x64 and LSUN 256x256.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces consistency models as a new family of generative models that directly map noise to data, enabling high-quality one-step generation while supporting multi-step refinement and zero-shot editing tasks such as inpainting, colorization, and super-resolution. Models can be trained either by distilling from pre-trained diffusion models or independently from scratch. Extensive experiments report new state-of-the-art FID scores of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step sampling, outperforming prior distillation techniques, with additional results on LSUN 256x256.
Significance. If the central claims hold, this work offers a meaningful advance toward computationally efficient sampling in generative modeling by largely eliminating iterative processes while preserving sample quality. The reported FID improvements on standard benchmarks are substantial, and the zero-shot editing results without task-specific supervision add practical value. The dual training options (distillation and standalone) broaden applicability. Strengths include the scale of empirical validation across datasets and the introduction of a distinct model family that competes with existing one-step non-adversarial generators.
major comments (1)
- [§3.2] §3.2: The consistency loss minimizes ||f_θ(x_t, t) - f_θ(x_s, s)|| only over randomly sampled discrete pairs (t, s). This does not enforce exact invariance of f_θ(·, t) to the same x_0 along the full continuous trajectory, which is required for the one-step generation claim (t=1 to t=0) and for reliable zero-shot editing. Residual inconsistencies on unseen times could degrade performance; the manuscript should provide either theoretical bounds on trajectory consistency error or empirical measurements of invariance across dense time grids.
minor comments (2)
- [§4.1] §4.1 and Table 1: The one-step FID numbers are presented without reported standard deviations or number of independent runs; adding these would allow readers to assess the statistical reliability of the claimed improvements over baselines.
- [Figure 3] Figure 3: The caption and axis labels for the multi-step sampling curves could more explicitly indicate the compute-quality trade-off relative to the one-step baseline.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive review of our manuscript. We address the major comment point by point below.
read point-by-point responses
-
Referee: [§3.2] §3.2: The consistency loss minimizes ||f_θ(x_t, t) - f_θ(x_s, s)|| only over randomly sampled discrete pairs (t, s). This does not enforce exact invariance of f_θ(·, t) to the same x_0 along the full continuous trajectory, which is required for the one-step generation claim (t=1 to t=0) and for reliable zero-shot editing. Residual inconsistencies on unseen times could degrade performance; the manuscript should provide either theoretical bounds on trajectory consistency error or empirical measurements of invariance across dense time grids.
Authors: We appreciate the referee's observation on the formulation of the consistency loss. The loss is indeed defined over discrete pairs (t, s) drawn from the continuous time distribution, rather than enforcing exact invariance at every point along the trajectory. While the repeated sampling of such pairs during training, together with the self-consistency objective, is intended to promote approximate invariance in practice, we acknowledge that this does not constitute a strict guarantee for all unseen times. To address the concern directly, we will add to the revised manuscript a new set of empirical measurements: consistency error evaluated on a dense grid of time points (e.g., 100 uniformly spaced values) not encountered during training, along with plots of ||f_θ(x_t, t) - x_0|| for fixed x_0 across the trajectory. These additions will provide quantitative support for the reliability of one-step generation and zero-shot editing results. revision: yes
Circularity Check
No significant circularity; results are empirical and self-contained
full rationale
The paper defines consistency models via a loss that enforces pairwise agreement on randomly sampled (t,s) pairs drawn from diffusion trajectories, then evaluates one-step and multi-step sampling performance via FID on held-out benchmarks (CIFAR-10, ImageNet 64x64). This training objective is an approximation to the desired trajectory invariance and does not presuppose the final sample quality or editing behavior. No load-bearing step reduces by construction to a fitted parameter renamed as prediction, a self-citation chain, or an ansatz smuggled from prior work by the same authors. The reported SOTA numbers rest on external benchmark comparison rather than internal redefinition of inputs. The derivation chain therefore remains independent of its measured outputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- sampling step count
axioms (1)
- domain assumption A consistency function exists that maps any point on a diffusion trajectory to the same clean data point
Lean theorems connected to this paper
-
DAlembert.Inevitabilitybilinear_family_forced contradictsTraining minimizes ||f_θ(x_t, t) - f_θ(x_s, s)|| for randomly drawn pairs t,s (typically via the consistency loss in §3.2). This objective only penalizes inconsistency on the sampled pairs and does not constrain the function to be exactly constant along the entire continuous trajectory.
Forward citations
Cited by 23 Pith papers
-
Query Lower Bounds for Diffusion Sampling
Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.
-
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Diffusion Policy models robot actions as a conditional diffusion process, outperforming prior state-of-the-art methods by 46.9% on average across 12 manipulation tasks from four benchmarks.
-
ExpoCM: Exposure-Aware One-Step Generative Single-Image HDR Reconstruction
ExpoCM enables fast one-step single-image HDR reconstruction via exposure-dependent perturbations and region-conditioned consistency trajectories derived from a probability flow ODE.
-
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance
FMRG is a training-free, single-trajectory guidance method for flow models derived from optimal control that achieves strong reward alignment with only 3 NFEs.
-
Characterizing Vision-Language-Action Models across XPUs: Constraints and Acceleration for On-Robot Deployment
VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with margin...
-
From Competition to Coopetition: Coopetitive Training-Free Image Editing Based on Text Guidance
CoEdit is a zero-shot coopetitive framework for text-guided image editing that uses dual-entropy attention manipulation and entropic latent refinement to improve editing harmony and structural preservation.
-
Isokinetic Flow Matching for Pathwise Straightening of Generative Flows
Isokinetic Flow Matching adds a lightweight regularization term to flow matching that penalizes acceleration along paths via self-guided finite differences, yielding straighter trajectories and large gains in few-step...
-
VOSR: A Vision-Only Generative Model for Image Super-Resolution
VOSR shows that competitive generative image super-resolution with faithful structures can be achieved by training a diffusion-style model from scratch on visual data alone, using a vision encoder for guidance and a r...
-
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.
-
Gradient-Free Noise Optimization for Reward Alignment in Generative Models
ZeNO formulates noise optimization for reward alignment as a path-integral control problem solvable via zeroth-order reward evaluations alone, connecting to Langevin dynamics under an Ornstein-Uhlenbeck process.
-
dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models
dFlowGRPO is a new rate-aware RL method for discrete flow models that outperforms prior GRPO approaches on image generation and matches continuous flow models while supporting broad probability paths.
-
Tyche: One Step Flow for Efficient Probabilistic Weather Forecasting
Tyche achieves competitive probabilistic weather forecasting skill and calibration using a single-step flow model with JVP-regularized training and rollout finetuning.
-
GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model
GCCM prevents shortcut collapse in consistency models for graph prediction by using contrastive negative pairs and input feature perturbation, leading to better performance than deterministic baselines.
-
MetaSR: Content-Adaptive Metadata Orchestration for Generative Super-Resolution
MetaSR adaptively orchestrates metadata in a DiT-based generative SR model to deliver up to 1 dB PSNR gains and 50% bitrate savings across diverse content and degradations.
-
Pairing Regularization for Mitigating Many-to-One Collapse in GANs
Pairing regularization mitigates intra-mode collapse in GANs by penalizing redundant latent-to-sample mappings, improving recall under collapse-prone conditions or precision under stabilized training.
-
ELT: Elastic Looped Transformers for Visual Generation
Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.
-
Unified Video Action Model
UVA learns a joint video-action latent representation with decoupled diffusion decoding heads, enabling a single model to perform accurate fast policy learning, forward/inverse dynamics, and video generation without p...
-
Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
A simplified one-step diffusion distillation uses pretrained teacher features directly for drifting loss plus a mode coverage term, achieving FID 1.58 on ImageNet-64 and 18.4 on SDXL.
-
Lightning Unified Video Editing via In-Context Sparse Attention
ISA prunes low-saliency context tokens and routes queries by sharpness to either full or 0-th order Taylor sparse attention, enabling LIVEditor to cut attention latency ~60% while beating prior video editing methods o...
-
Fast Text-to-Audio Generation with One-Step Sampling via Energy-Scoring and Auxiliary Contextual Representation Distillation
A one-step text-to-audio model using energy-distance training and contextual distillation outperforms prior fast baselines on AudioCaps and achieves up to 8.5x faster inference than the multi-step IMPACT system with c...
-
A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models
Diffusion, score-based, and flow matching models are unified as instances of learning time-dependent vector fields inducing marginal distributions governed by continuity and Fokker-Planck equations.
-
OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL
OmniVLA-RL uses a mix-of-transformers architecture and flow-matching reformulated as SDE with group segmented policy optimization to surpass prior VLA models on LIBERO benchmarks.
-
Discrete Meanflow Training Curriculum
A DMF curriculum initialized from pretrained flow models achieves one-step FID 3.36 on CIFAR-10 after only 2000 epochs by exploiting a discretized consistency property in the Meanflow objective.
Reference graph
Works this paper leans on
-
[1]
ediff-i: Text-to-image diffusion models with ensemble of expert denoisers
Balaji, Y., Nah, S., Huang, X., Vahdat, A., Song, J., Kreis, K., Aittala, M., Aila, T., Laine, S., Catanzaro, B., Karras, T., and Liu, M.-Y. ediff-i: Text-to-image diffusion models with ensemble of expert denoisers. arXiv preprint arXiv:2211.01324, 2022
-
[2]
S., Januschowski, T., and G \"u nnemann, S
Bilo s , M., Sommer, J., Rangapuram, S. S., Januschowski, T., and G \"u nnemann, S. Neural flows: Efficient alternative to neural odes. Advances in Neural Information Processing Systems, 34: 0 21325--21337, 2021
work page 2021
-
[3]
Large scale GAN training for high fidelity natural image synthesis
Brock, A., Donahue, J., and Simonyan, K. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=B1xsqj09Fm
work page 2019
-
[4]
Chen, N., Zhang, Y., Zen, H., Weiss, R. J., Norouzi, M., and Chan, W. Wavegrad: Estimating gradients for waveform generation. In International Conference on Learning Representations (ICLR), 2021
work page 2021
-
[5]
T., Rubanova, Y., Bettencourt, J., and Duvenaud, D
Chen, R. T., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. Neural O rdinary D ifferential E quations. In Advances in neural information processing systems, pp.\ 6571--6583, 2018
work page 2018
-
[6]
Chen, R. T., Behrmann, J., Duvenaud, D. K., and Jacobsen, J.-H. Residual flows for invertible generative modeling. In Advances in Neural Information Processing Systems, pp.\ 9916--9926, 2019
work page 2019
-
[7]
Chung, H., Kim, J., Mccann, M. T., Klasky, M. L., and Ye, J. C. Diffusion posterior sampling for general noisy inverse problems. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=OnD9zGAGT0k
work page 2023
-
[8]
Imagenet: A large-scale hierarchical image database
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.\ 248--255. Ieee, 2009
work page 2009
-
[9]
Dhariwal, P. and Nichol, A. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems (NeurIPS), 2021
work page 2021
-
[10]
NICE : Non-linear independent components estimation
Dinh, L., Krueger, D., and Bengio, Y. NICE : Non-linear independent components estimation. International Conference in Learning Representations Workshop Track, 2015
work page 2015
-
[11]
Density estimation using real NVP
Dinh, L., Sohl - Dickstein, J., and Bengio, S. Density estimation using real NVP . In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings . OpenReview.net, 2017. URL https://openreview.net/forum?id=HkpbnH9lx
work page 2017
-
[12]
Genie: Higher-order denoising diffusion solvers
Dockhorn, T., Vahdat, A., and Kreis, K. Genie: Higher-order denoising diffusion solvers. arXiv preprint arXiv:2210.05475, 2022
-
[13]
Autogan: Neural architecture search for generative adversarial networks
Gong, X., Chang, S., Jiang, Y., and Wang, Z. Autogan: Neural architecture search for generative adversarial networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 3224--3234, 2019
work page 2019
-
[14]
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in neural information processing systems, pp.\ 2672--2680, 2014
work page 2014
-
[15]
Densely connected normalizing flows
Grci \'c , M., Grubi s i \'c , I., and S egvi \'c , S. Densely connected normalizing flows. Advances in Neural Information Processing Systems, 34: 0 23968--23982, 2021
work page 2021
-
[16]
Bootstrap your own latent-a new approach to self-supervised learning
Grill, J.-B., Strub, F., Altch \'e , F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33: 0 21271--21284, 2020
work page 2020
-
[17]
Momentum contrast for unsupervised visual representation learning
He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 9729--9738, 2020
work page 2020
-
[18]
GANs trained by a two time-scale update rule converge to a local Nash equilibrium
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pp.\ 6626--6637, 2017
work page 2017
-
[19]
Denoising D iffusion P robabilistic M odels
Ho, J., Jain, A., and Abbeel, P. Denoising D iffusion P robabilistic M odels. Advances in Neural Information Processing Systems, 33, 2020
work page 2020
-
[20]
Imagen Video: High Definition Video Generation with Diffusion Models
Ho, J., Chan, W., Saharia, C., Whang, J., Gao, R., Gritsenko, A., Kingma, D. P., Poole, B., Norouzi, M., Fleet, D. J., et al. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022 a
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[21]
A., Chan, W., Norouzi, M., and Fleet, D
Ho, J., Salimans, T., Gritsenko, A. A., Chan, W., Norouzi, M., and Fleet, D. J. Video diffusion models. In ICLR Workshop on Deep Generative Models for Highly Structured Data, 2022 b . URL https://openreview.net/forum?id=BBelR2NdDZ5
work page 2022
-
[22]
Hyv \"a rinen, A. and Dayan, P. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research (JMLR), 6 0 (4), 2005
work page 2005
-
[23]
Transgan: Two pure transformers can make one strong gan, and that can scale up
Jiang, Y., Chang, S., and Wang, Z. Transgan: Two pure transformers can make one strong gan, and that can scale up. Advances in Neural Information Processing Systems, 34: 0 14745--14758, 2021
work page 2021
-
[24]
Progressive growing of GAN s for improved quality, stability, and variation
Karras, T., Aila, T., Laine, S., and Lehtinen, J. Progressive growing of GAN s for improved quality, stability, and variation. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Hk99zCeAb
work page 2018
-
[25]
Analyzing and improving the image quality of stylegan
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., and Aila, T. Analyzing and improving the image quality of stylegan. 2020
work page 2020
-
[26]
Elucidating the design space of diffusion-based generative models
Karras, T., Aittala, M., Aila, T., and Laine, S. Elucidating the design space of diffusion-based generative models. In Proc. NeurIPS, 2022
work page 2022
-
[27]
Snips: Solving noisy inverse problems stochastically
Kawar, B., Vaksman, G., and Elad, M. Snips: Solving noisy inverse problems stochastically. arXiv preprint arXiv:2105.14951, 2021
-
[28]
Denoising diffusion restoration models
Kawar, B., Elad, M., Ermon, S., and Song, J. Denoising diffusion restoration models. In Advances in Neural Information Processing Systems, 2022
work page 2022
-
[29]
Kingma, D. P. and Dhariwal, P. Glow: Generative flow with invertible 1x1 convolutions. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31, pp.\ 10215--10224. 2018
work page 2018
-
[30]
Kingma, D. P. and Welling, M. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014
work page 2014
-
[31]
Diffwave: A versatile diffusion model for audio synthesis.arXiv preprint arXiv:2009.09761, 2020
Kong, Z., Ping, W., Huang, J., Zhao, K., and Catanzaro, B. Diff W ave: A V ersatile D iffusion M odel for A udio S ynthesis. arXiv preprint arXiv:2009.09761, 2020
-
[32]
Learning multiple layers of features from tiny images
Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images. 2009
work page 2009
-
[33]
Improved precision and recall metric for assessing generative models
Kynk \"a \"a nniemi, T., Karras, T., Laine, S., Lehtinen, J., and Aila, T. Improved precision and recall metric for assessing generative models. Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[34]
Vitgan: Training gans with vision transformers
Lee, K., Chang, H., Jiang, L., Zhang, H., Tu, Z., and Liu, C. Vitgan: Training gans with vision transformers. arXiv preprint arXiv:2107.04589, 2021
-
[35]
Continuous control with deep reinforcement learning
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[36]
On the variance of the adaptive learning rate and beyond
Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., and Han, J. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265, 2019
-
[37]
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Liu, X., Gong, C., and Liu, Q. Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[38]
Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps
Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., and Zhu, J. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint arXiv:2206.00927, 2022
-
[39]
Luhman, E. and Luhman, T. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021
-
[40]
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
Meng, C., Song, Y., Song, J., Wu, J., Zhu, J.-Y., and Ermon, S. Sdedit: Image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[41]
P., Ermon, S., Ho, J., and Salimans, T
Meng, C., Gao, R., Kingma, D. P., Ermon, S., Ho, J., and Salimans, T. On distillation of guided diffusion models. arXiv preprint arXiv:2210.03142, 2022
-
[42]
Playing Atari with Deep Reinforcement Learning
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[43]
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. Human-level control through deep reinforcement learning. nature, 518 0 (7540): 0 529--533, 2015
work page 2015
-
[44]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., and Chen, M. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[45]
Dual contradistinctive generative autoencoder
Parmar, G., Li, D., Lee, K., and Tu, Z. Dual contradistinctive generative autoencoder. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 823--832, 2021
work page 2021
-
[46]
Grad- TTS : A diffusion probabilistic model for text-to-speech
Popov, V., Vovk, I., Gogoryan, V., Sadekova, T., and Kudinov, M. Grad- TTS : A diffusion probabilistic model for text-to-speech. arXiv preprint arXiv:2105.06337, 2021
-
[47]
Hierarchical Text-Conditional Image Generation with CLIP Latents
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., and Chen, M. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[48]
J., Mohamed, S., and Wierstra, D
Rezende, D. J., Mohamed, S., and Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, pp.\ 1278--1286, 2014
work page 2014
-
[49]
High-resolution image synthesis with latent diffusion models
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 10684--10695, 2022
work page 2022
-
[50]
Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., Ghasemipour, S. K. S., Ayan, B. K., Mahdavi, S. S., Lopes, R. G., et al. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
- [51]
-
[52]
Improved techniques for training gans
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., and Chen, X. Improved techniques for training gans. In Advances in neural information processing systems, pp.\ 2234--2242, 2016
work page 2016
-
[53]
Stylegan-xl: Scaling stylegan to large diverse datasets
Sauer, A., Schwarz, K., and Geiger, A. Stylegan-xl: Scaling stylegan to large diverse datasets. In ACM SIGGRAPH 2022 conference proceedings, pp.\ 1--10, 2022
work page 2022
-
[54]
Deep U nsupervised L earning U sing N onequilibrium T hermodynamics
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., and Ganguli, S. Deep U nsupervised L earning U sing N onequilibrium T hermodynamics. In International Conference on Machine Learning, pp.\ 2256--2265, 2015
work page 2015
-
[55]
Denoising Diffusion Implicit Models
Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[56]
Pseudoinverse-guided diffusion models for inverse problems
Song, J., Vahdat, A., Mardani, M., and Kautz, J. Pseudoinverse-guided diffusion models for inverse problems. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=9_gsMA8MRKQ
work page 2023
-
[57]
Song, Y. and Ermon, S. Generative M odeling by E stimating G radients of the D ata D istribution. In Advances in Neural Information Processing Systems, pp.\ 11918--11930, 2019
work page 2019
-
[58]
Song, Y. and Ermon, S. Improved T echniques for T raining S core- B ased G enerative M odels. Advances in Neural Information Processing Systems, 33, 2020
work page 2020
-
[59]
Sliced score matching: A scalable approach to density and score estimation
Song, Y., Garg, S., Shi, J., and Ermon, S. Sliced score matching: A scalable approach to density and score estimation. In Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2019, Tel Aviv, Israel, July 22-25, 2019 , pp.\ 204, 2019
work page 2019
-
[60]
P., Kumar, A., Ermon, S., and Poole, B
Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=PxTIG12RRHS
work page 2021
-
[61]
Solving inverse problems in medical imaging with score-based generative models
Song, Y., Shen, L., Xing, L., and Ermon, S. Solving inverse problems in medical imaging with score-based generative models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=vaRCHVj0uGI
work page 2022
-
[62]
S \"u li, E. and Mayers, D. F. An introduction to numerical analysis. Cambridge university press, 2003
work page 2003
-
[63]
Off-policy reinforcement learning for efficient and effective gan architecture search
Tian, Y., Wang, Q., Huang, Z., Li, W., Dai, D., Yang, M., Wang, J., and Fink, O. Off-policy reinforcement learning for efficient and effective gan architecture search. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part VII 16, pp.\ 175--192. Springer, 2020
work page 2020
-
[64]
Score-based generative modeling in latent space
Vahdat, A., Kreis, K., and Kautz, J. Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34: 0 11287--11302, 2021
work page 2021
-
[65]
A C onnection B etween S core M atching and D enoising A utoencoders
Vincent, P. A C onnection B etween S core M atching and D enoising A utoencoders. Neural Computation, 23 0 (7): 0 1661--1674, 2011
work page 2011
-
[66]
Wu, J., Huang, Z., Acharya, D., Li, W., Thoma, J., Paudel, D. P., and Gool, L. V. Sliced wasserstein generative models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 3713--3722, 2019
work page 2019
-
[67]
Xiao, Z., Yan, Q., and Amit, Y. Generative latent flow. arXiv preprint arXiv:1905.10485, 2019
-
[68]
Tackling the generative learning trilemma with denoising diffusion GAN s
Xiao, Z., Kreis, K., and Vahdat, A. Tackling the generative learning trilemma with denoising diffusion GAN s. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=JprM0p-q0Co
work page 2022
-
[69]
Xu, Y., Liu, Z., Tegmark, M., and Jaakkola, T. S. Poisson flow generative models. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=voV_TRqcWh
work page 2022
-
[70]
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., and Xiao, J. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015
work page internal anchor Pith review arXiv 2015
-
[71]
arXiv preprint arXiv:2204.13902 , year=
Zhang, Q. and Chen, Y. Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022
-
[72]
A., Shechtman, E., and Wang, O
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., and Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018
work page 2018
-
[73]
Fast sampling of diffusion models via operator learning
Zheng, H., Nie, W., Vahdat, A., Azizzadenesheli, K., and Anandkumar, A. Fast sampling of diffusion models via operator learning. arXiv preprint arXiv:2211.13449, 2022
-
[74]
Truncated diffusion probabilistic models and diffusion-based adversarial auto-encoders
Zheng, H., He, P., Chen, W., and Zhou, M. Truncated diffusion probabilistic models and diffusion-based adversarial auto-encoders. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=HDxgaKk956l
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.