Recognition: 3 theorem links
Denoising Diffusion Probabilistic Models
Pith reviewed 2026-05-11 03:19 UTC · model grok-4.3
The pith
Denoising diffusion probabilistic models generate high-quality images by reversing a fixed Gaussian noise-adding process.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Diffusion probabilistic models achieve high quality image synthesis by training on a weighted variational bound derived from a connection to denoising score matching with Langevin dynamics; on unconditional CIFAR-10 this produces an Inception score of 9.46 and a state-of-the-art FID score of 3.17, while 256x256 LSUN samples reach quality comparable to ProgressiveGAN and the models admit a progressive lossy decompression scheme.
What carries the argument
The reverse denoising process, learned to undo a fixed forward Markov chain of Gaussian transitions whose variance schedule is chosen independently of the data.
If this is right
- The models reach an Inception score of 9.46 and FID of 3.17 on unconditional CIFAR-10.
- Sample quality on 256x256 LSUN becomes comparable to ProgressiveGAN.
- A progressive lossy decompression scheme emerges that generalizes autoregressive decoding.
- Training succeeds via a weighted variational bound tied to score matching.
Where Pith is reading between the lines
- Because the variance schedule is fixed and data-independent, the same training recipe may transfer to new image resolutions or domains with minimal retuning.
- The explicit link to Langevin dynamics suggests the learned reverse process approximates the score function, which could be reused for tasks such as inpainting or super-resolution.
- The thermodynamic framing may encourage analysis of convergence rates or mode coverage that differs from the analysis typically applied to GANs.
Load-bearing premise
The forward diffusion process is a fixed Markov chain of Gaussian transitions whose variance schedule can be chosen independently of the data distribution.
What would settle it
Training the models on CIFAR-10 and measuring a FID score substantially above 3.17 together with visibly incoherent samples would falsify the performance claim.
read the original abstract
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at https://github.com/hojonathanho/diffusion
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces denoising diffusion probabilistic models (DDPMs) as latent variable models for high-quality unconditional image synthesis, inspired by nonequilibrium thermodynamics. It derives a weighted variational bound objective from a novel connection to denoising score matching with Langevin dynamics, interprets the reverse process as a progressive lossy decompression scheme generalizing autoregressive decoding, and reports an Inception score of 9.46 together with a state-of-the-art FID of 3.17 on CIFAR-10 plus sample quality on 256×256 LSUN comparable to ProgressiveGAN. The implementation is released publicly.
Significance. If the reported metrics hold under the stated evaluation protocol, the work is significant for establishing diffusion models as a competitive, likelihood-based alternative to GANs for image synthesis. The explicit fixed forward Markov chain, the score-matching-derived training objective, and the public code repository together provide a reproducible and extensible framework whose central numbers can be directly verified.
minor comments (3)
- [§3.2] The variance schedule β_t is presented as a fixed linear choice in the forward process without an exhaustive ablation; while not load-bearing for the central claim, a short sensitivity discussion would strengthen the experimental section.
- [Abstract and §4] The abstract states that the FID of 3.17 is state-of-the-art; the main text should explicitly list the exact number of samples and the precise FID implementation used for all compared methods to allow direct replication.
- [§3.4] Notation for the simplified loss L_simple and the weighting λ_t could be cross-referenced more clearly to the earlier variational bound derivation to aid readers following the score-matching connection.
Simulated Author's Rebuttal
We thank the referee for the positive review, the recognition of the significance of diffusion models as a likelihood-based alternative to GANs, and the recommendation to accept the manuscript. We appreciate the emphasis on reproducibility through the public code release.
Circularity Check
Derivation self-contained from variational bound and score-matching connection
full rationale
The paper starts from the standard variational lower bound on the negative log-likelihood for a latent variable model whose forward process is an explicit, fixed Markov chain of Gaussians with a hand-chosen variance schedule independent of the data. The training objective is obtained by algebraic simplification of the ELBO terms, followed by a derived equivalence to a weighted denoising score-matching loss; neither step redefines the target metric in terms of itself nor substitutes a fitted parameter for a prediction. Reported IS and FID numbers are measured empirical outcomes after training, not quantities forced by construction from the inputs. No load-bearing premise relies on a self-citation chain or an ansatz smuggled from prior work by the same authors.
Axiom & Free-Parameter Ledger
free parameters (1)
- variance schedule beta_t
axioms (2)
- domain assumption The forward process is a Markov chain of isotropic Gaussian transitions
- domain assumption The reverse process can be approximated by a Gaussian with learned mean
Lean theorems connected to this paper
-
LedgerCanonicalityno_free_knobs contradictsThe forward diffusion process is a fixed Markov chain of Gaussian transitions whose variance schedule can be chosen independently of the data distribution.
-
DAlembert.Inevitabilitybilinear_family_forced contradictsOur best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics
-
HierarchyForcinguniform_scaling_forced echoesOn the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17.
Forward citations
Cited by 60 Pith papers
-
Generative models on phase space
Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
-
TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models
TRACE creates valid conformal prediction sets for complex generative models by scoring outputs via averaged denoising or velocity errors along stochastic transport paths instead of likelihoods.
-
Deep Dreams Are Made of This: Visualizing Monosemantic Features in Diffusion Models
LVO applies optimization-based feature visualization to latent diffusion models after disentangling their representations with sparse autoencoders, yielding recognizable concept images on a fine-tuned Stable Diffusion...
-
Tempered Guided Diffusion
Tempered Guided Diffusion uses annealed SMC to produce consistent particle approximations to the posterior for training-free conditional diffusion sampling, outperforming independent guided trajectories in experiments.
-
Action Agent: Agentic Video Generation Meets Flow-Constrained Diffusion
Action Agent pairs LLM-driven video generation with a flow-constrained diffusion transformer to produce velocity commands, raising video success to 86% and delivering 64.7% real-world navigation on a Unitree G1 humanoid.
-
Generative diffusion models for spatiotemporal influenza forecasting
Influpaint uses generative diffusion models on image-encoded influenza data to produce realistic and diverse epidemic trajectories that match leading ensemble methods in accuracy.
-
Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization
Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
-
$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a di...
-
Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading
Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.
-
Conflated Inverse Modeling to Generate Diverse and Temperature-Change Inducing Urban Vegetation Patterns
A diffusion generative inverse model conditioned on temperature targets produces diverse, physically plausible urban vegetation patterns that achieve specified regional temperature shifts.
-
Causal Diffusion Models for Counterfactual Outcome Distributions in Longitudinal Data
Causal Diffusion Model is the first diffusion-based method to produce full probabilistic counterfactual outcome distributions for sequential interventions in longitudinal data, showing 15-30% better distributional acc...
-
ExpertEdit: Learning Skill-Aware Motion Editing from Expert Videos
ExpertEdit edits novice motions to expert skill levels by learning a motion prior from unpaired videos and infilling masked skill-critical spans.
-
VASR: Variance-Aware Systematic Resampling for Reward-Guided Diffusion
FVD applies Fleming-Viot population dynamics to diffusion model sampling at inference time to reduce diversity collapse while improving reward alignment and FID scores.
-
Anchored Cyclic Generation: A Novel Paradigm for Long-Sequence Symbolic Music Generation
Anchored Cyclic Generation uses anchor features from known music to mitigate error accumulation in autoregressive models, with the Hi-ACG framework delivering better long-sequence symbolic music and music completion p...
-
Unlocking Prompt Infilling Capability for Diffusion Language Models
Full-sequence masking in SFT unlocks prompt infilling for masked diffusion language models, producing templates that match or surpass hand-designed ones and transfer across models.
-
Hierarchical Text-Conditional Image Generation with CLIP Latents
A hierarchical prior-decoder model using CLIP latents generates more diverse text-conditional images than direct methods while preserving photorealism and caption fidelity.
-
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
A 3.5-billion-parameter diffusion model with classifier-free guidance generates images preferred over DALL-E by human raters and can be fine-tuned for text-guided inpainting.
-
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.
-
Diffusion Models Beat GANs on Image Synthesis
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
-
CUBic: Coordinated Unified Bimanual Perception and Control Framework
CUBic learns a shared tokenized representation for bimanual robot perception and control via unidirectional aggregation, bidirectional codebook coordination, and a unified diffusion policy, yielding higher coordinatio...
-
TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning
TMRL bridges behavioral cloning pretraining and RL finetuning via diffusion noise and timestep modulation to enable controlled exploration, improving sample efficiency and enabling real-world robot training in under one hour.
-
DiffSegLung: Diffusion Radiomic Distillation for Unsupervised Lung Pathology Segmentation
DiffSegLung distills pathology-discriminative structure from radiomic descriptors into a diffusion U-Net bottleneck for unsupervised CT lung pathology segmentation, outperforming baselines on heterogeneous cohorts.
-
Diffusion model for SU(N) gauge theories
Implicit score matching trains diffusion models that successfully sample SU(3) Wilson gauge configurations on lattices, with a Hamiltonian-dynamics corrector needed for strong coupling.
-
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities
LQL stabilizes Q-learning by penalizing violations of n-step action-sequence lower bounds with a hinge loss computed from standard network outputs.
-
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities
LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
-
GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model
GCCM prevents shortcut collapse in consistency models for graph prediction by using contrastive negative pairs and input feature perturbation, leading to better performance than deterministic baselines.
-
Delta Score Matters! Spatial Adaptive Multi Guidance in Diffusion Models
SAMG uses spatially adaptive guidance scales derived from a geometric analysis of classifier-free guidance to resolve the detail-artifact dilemma in diffusion-based image and video generation.
-
Beyond Fixed Formulas: Data-Driven Linear Predictor for Efficient Diffusion Models
L2P trains per-timestep linear weights on feature trajectories in about 20 seconds to enable aggressive caching in DiT models, delivering up to 4.55x FLOPs reduction with maintained visual quality.
-
MetaSR: Content-Adaptive Metadata Orchestration for Generative Super-Resolution
MetaSR adaptively orchestrates metadata in a DiT-based generative SR model to deliver up to 1 dB PSNR gains and 50% bitrate savings across diverse content and degradations.
-
Mapping License Plate Recoverability Under Extreme Viewing Angles for Oppor-tunistic Urban Sensing
Recoverability maps use synthetic sweeps of viewing angles and artifacts to quantify the recoverable fraction of parameter space for license plate restoration, with the best model succeeding on 93% and geometry settin...
-
COMPASS: A Unified Decision-Intelligence System for Navigating Performance Trade-off in HPC
COMPASS formalizes HPC configuration questions as ML tasks on traces, quantifies recommendation trustworthiness, and delivers 65.93% lower average job turnaround time plus 80.93% lower node usage versus prior methods ...
-
Breaking Watermarks in the Frequency Domain: A Modulated Diffusion Attack Framework
FMDiffWA uses frequency-domain modulation inside diffusion sampling to neutralize watermarks in images while preserving visual quality and generalizing across watermarking schemes.
-
Normalizing Flows with Iterative Denoising
iTARFlow augments normalizing flows with diffusion-style iterative denoising during sampling while preserving end-to-end likelihood training, reaching competitive results on ImageNet 64/128/256.
-
Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression
RDVQ enables joint rate-distortion optimization for vector-quantized generative image compression via differentiable codebook distribution relaxation and an autoregressive entropy model.
-
Make it Simple, Make it Dance: Dance Motion Simplification to Support Novices' Dance Learning
Rule-based and learning-based algorithms simplify dance motions to help novices learn more effectively while maintaining naturalness and style.
-
MuPPet: Multi-person 2D-to-3D Pose Lifting
MuPPet introduces person encoding, permutation augmentation, and dynamic multi-person attention to outperform prior single- and multi-person 2D-to-3D pose lifting methods on group interaction datasets while improving ...
-
VASR: Variance-Aware Systematic Resampling for Reward-Guided Diffusion
VASR separates continuation and residual variance in reward-guided diffusion SMC, using optimal mass allocation and systematic resampling to achieve up to 26% better FID scores and faster runtimes than prior SMC and M...
-
Train-Small Deploy-Large: Leveraging Diffusion-Based Multi-Robot Planning
A diffusion-based multi-robot planner trained on few agents generalizes to larger numbers during deployment using inter-agent attention and temporal convolution.
-
JD-BP: A Joint-Decision Generative Framework for Auto-Bidding and Pricing
JD-BP jointly generates bids and pricing corrections via generative models, memory-less return-to-go, trajectory augmentation, and energy-based DPO to improve auto-bidding performance despite prediction errors and latency.
-
Generative Path-Law Jump-Diffusion: Sequential MMD-Gradient Flows and Generalisation Bounds in Marcus-Signature RKHS
The paper proposes the ANJD flow and AVNSG operator to generate càdlàg trajectories via sequential MMD-gradient descent in Marcus-signature RKHS with generalisation bounds.
-
Anticipatory Reinforcement Learning: From Generative Path-Laws to Distributional Value Functions
ARL lifts states into signature-augmented manifolds and employs self-consistent proxies of future path-laws to enable deterministic expected-return evaluation while preserving contraction mappings in jump-diffusion en...
-
MRI-to-CT synthesis using drifting models
Drifting models outperform diffusion, CNN, VAE, and GAN baselines in MRI-to-CT synthesis on two pelvis datasets with higher SSIM/PSNR, lower RMSE, and millisecond one-step inference.
-
LTX-Video: Realtime Video Latent Diffusion
LTX-Video integrates Video-VAE and transformer for 1:192 latent compression and real-time video diffusion by moving patchifying to the VAE and letting the decoder finish denoising in pixel space.
-
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-t...
-
Make-A-Video: Text-to-Video Generation without Text-Video Data
Make-A-Video achieves state-of-the-art text-to-video generation by decomposing temporal U-Net and attention structures to add space-time modeling to text-to-image models, trained without any paired text-video data.
-
CaloArt: Large-Patch x-Prediction Diffusion Transformers for High-Granularity Calorimeter Shower Generation
CaloArt achieves top FPD, high-level, and classifier metrics on CaloChallenge datasets 2 and 3 while keeping single-GPU generation at 9-11 ms per shower by combining large-patch tokenization, x-prediction, and conditi...
-
MFVLR: Multi-domain Fine-grained Vision-Language Reconstruction for Generalizable Diffusion Face Forgery Detection and Localization
MFVLR uses multi-domain vision-language reconstruction with a fine-grained language transformer, multi-domain vision encoder, and vision injection module to achieve generalizable detection and localization of diffusio...
-
Uncertainty-Aware and Decoder-Aligned Learning for Video Summarization
VASTSum uses variational modeling of uncertainty in frame scores plus decoder-aligned regularization to achieve competitive Kendall and Spearman correlations on SumMe and TVSum while remaining a single-pass model.
-
On the Tradeoffs of On-Device Generative Models in Federated Predictive Maintenance Systems
Experiments on real industrial time series show that partial model sharing improves diffusion model performance in bandwidth-limited non-IID settings, while full sharing stabilizes GAN training but offers less robustn...
-
Repurposing Image Diffusion Models for Adversarial Synthetic Structured Data: A Case Study of Ground Truth Drift
Off-the-shelf image diffusion models can be repurposed to create synthetic structured data capable of inducing ground truth drift in machine pipelines.
-
Flow matching for Sentinel-2 super-resolution: implementation, application, and implications
Flow matching achieves single-step pixel accuracy and 20-step perceptual quality for Sentinel-2 super-resolution, outperforming diffusion and Real-ESRGAN while enabling large-scale 2.5 m land-cover products.
-
Seeing Is No Longer Believing: Frontier Image Generation Models, Synthetic Visual Evidence, and Real-World Risk
Frontier image models enable synthetic visual evidence that erodes trust in photos through combined realism, text, and identity features, calling for layered technical and policy controls.
-
Style-Based Neural Architectures for Real-Time Weather Classification
Three style-based neural architectures are proposed for real-time weather classification from images, with two truncated ResNet variants claimed to outperform prior methods and generalize across public datasets.
-
Score-Based Matching with Target Guidance for Cryo-EM Denoising
Score-based denoising with reference-density guidance improves particle-background separability and downstream 3D reconstruction consistency on cryo-EM datasets.
-
Diffusion-Based Optimization for Accelerated Convergence of Redundant Dual-Arm Minimum Time Problems
A novel diffusion variant accelerates minimum-time planning for redundant dual-arm robots by replacing gradient-based solving of the nonconvex high-level problem with probabilistic sampling, yielding 35x faster runtim...
-
Heuristic Style Transfer for Real-Time, Efficient Weather Attribute Detection
Lightweight multi-task models using Gram matrices and PatchGAN-style architectures detect 53 weather classes from RGB images with F1 scores above 96% internally and 78% zero-shot externally, supported by a new 503k-im...
-
Rethinking the Diffusion Model from a Langevin Perspective
Diffusion models are reorganized under a Langevin perspective that unifies ODE and SDE formulations and shows flow matching is equivalent to denoising under maximum likelihood.
-
Extending Tabular Denoising Diffusion Probabilistic Models for Time-Series Data Generation
A temporal extension of TabDDPM generates coherent synthetic time-series sequences on the WISDM dataset that match real distributions and support downstream classification with macro F1 of 0.64.
-
A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models
Diffusion, score-based, and flow matching models are unified as instances of learning time-dependent vector fields inducing marginal distributions governed by continuity and Fokker-Planck equations.
-
A Co-Evolutionary Theory of Human-AI Coexistence: Mutualism, Governance, and Dynamics in Complex Societies
Human-AI coexistence is best modeled as conditional mutualism under governance, formalized as a multiplex dynamical system whose simulations show stable high-coexistence equilibria only under balanced institutional oversight.
Reference graph
Works this paper leans on
-
[1]
GSNs: generative stochastic networks
Guillaume Alain, Yoshua Bengio, Li Yao, Jason Yosinski, Eric Thibodeau-Laufer, Saizheng Zhang, and Pascal Vincent. GSNs: generative stochastic networks. Information and Inference: A Journal of the IMA , 5(2):210–249, 2016
work page 2016
-
[2]
Learning to generate samples from noise through infusion training
Florian Bordes, Sina Honari, and Pascal Vincent. Learning to generate samples from noise through infusion training. In International Conference on Learning Representations , 2017
work page 2017
-
[3]
Large scale GAN training for high fidelity natural image synthesis
Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations , 2019
work page 2019
-
[4]
Your GAN is secretly an energy-based model and you should use discriminator driven latent sampling
Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, and Yoshua Bengio. Your GAN is secretly an energy-based model and you should use discriminator driven latent sampling. arXiv preprint arXiv:2003.06060, 2020
-
[5]
Neural ordinary differential equations
Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems , pages 6571–6583, 2018
work page 2018
-
[6]
PixelSNAIL: An improved autoregres- sive generative model
Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, and Pieter Abbeel. PixelSNAIL: An improved autoregres- sive generative model. In International Conference on Machine Learning , pages 863–871, 2018
work page 2018
-
[7]
Generating Long Sequences with Sparse Transformers
Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019. 9
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[8]
Residual Energy-Based Models for Text Generation, April 2020
Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, and Marc’Aurelio Ranzato. Residual energy-based models for text generation. arXiv preprint arXiv:2004.11714, 2020
-
[9]
NICE: Non-linear Independent Components Estimation
Laurent Dinh, David Krueger, and Yoshua Bengio. NICE: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014
work page internal anchor Pith review arXiv 2014
-
[10]
Density estimation using Real NVP
Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using Real NVP. arXiv preprint arXiv:1605.08803, 2016
work page internal anchor Pith review arXiv 2016
-
[11]
Implicit generation and modeling with energy based models
Yilun Du and Igor Mordatch. Implicit generation and modeling with energy based models. In Advances in Neural Information Processing Systems, pages 3603–3613, 2019
work page 2019
-
[12]
Learning generative ConvNets via multi-grid modeling and sampling
Ruiqi Gao, Yang Lu, Junpei Zhou, Song-Chun Zhu, and Ying Nian Wu. Learning generative ConvNets via multi-grid modeling and sampling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9155–9164, 2018
work page 2018
-
[13]
Flow contrastive estimation of energy-based models
Ruiqi Gao, Erik Nijkamp, Diederik P Kingma, Zhen Xu, Andrew M Dai, and Ying Nian Wu. Flow contrastive estimation of energy-based models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7518–7528, 2020
work page 2020
-
[14]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014
work page 2014
-
[15]
Variational walkback: Learning a transition operator as a stochastic recurrent net
Anirudh Goyal, Nan Rosemary Ke, Surya Ganguli, and Yoshua Bengio. Variational walkback: Learning a transition operator as a stochastic recurrent net. In Advances in Neural Information Processing Systems , pages 4392–4402, 2017
work page 2017
-
[16]
Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, and David Duvenaud. FFJORD: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations, 2019
work page 2019
-
[17]
Your classifier is secretly an energy based model and you should treat it like one
Will Grathwohl, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. Your classifier is secretly an energy based model and you should treat it like one. In International Conference on Learning Representations , 2020
work page 2020
-
[18]
Towards conceptual compression
Karol Gregor, Frederic Besse, Danilo Jimenez Rezende, Ivo Danihelka, and Daan Wierstra. Towards conceptual compression. In Advances In Neural Information Processing Systems , pages 3549–3557, 2016
work page 2016
-
[19]
The communication complexity of correlation
Prahladh Harsha, Rahul Jain, David McAllester, and Jaikumar Radhakrishnan. The communication complexity of correlation. In Twenty-Second Annual IEEE Conference on Computational Complexity (CCC’07), pages 10–23. IEEE, 2007
work page 2007
-
[20]
Minimal random code learning: Getting bits back from compressed model parameters
Marton Havasi, Robert Peharz, and José Miguel Hernández-Lobato. Minimal random code learning: Getting bits back from compressed model parameters. In International Conference on Learning Represen- tations, 2019
work page 2019
-
[21]
GANs trained by a two time-scale update rule converge to a local Nash equilibrium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pages 6626–6637, 2017
work page 2017
-
[22]
beta-V AE: Learning basic visual concepts with a constrained variational framework
Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mo- hamed, and Alexander Lerchner. beta-V AE: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations , 2017
work page 2017
-
[23]
Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, and Pieter Abbeel. Flow++: Improving flow-based generative models with variational dequantization and architecture design. In International Conference on Machine Learning, 2019
work page 2019
-
[24]
Evaluating lossy compression rates of deep generative models
Sicong Huang, Alireza Makhzani, Yanshuai Cao, and Roger Grosse. Evaluating lossy compression rates of deep generative models. In International Conference on Machine Learning , 2020
work page 2020
-
[25]
Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, and Koray Kavukcuoglu. Video pixel networks. In International Conference on Machine Learning , pages 1771–1779, 2017
work page 2017
-
[26]
Efficient neural audio synthesis
Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron van den Oord, Sander Dieleman, and Koray Kavukcuoglu. Efficient neural audio synthesis. In International Conference on Machine Learning , pages 2410–2419, 2018
work page 2018
-
[27]
Progressive growing of GANs for improved quality, stability, and variation
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of GANs for improved quality, stability, and variation. In International Conference on Learning Representations , 2018
work page 2018
-
[28]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 10 4401–4410, 2019
work page 2019
-
[29]
Training generative adversar- ial networks with limited data,
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Training generative adversarial networks with limited data. arXiv preprint arXiv:2006.06676v1, 2020
-
[30]
Analyzing and improving the image quality of StyleGAN
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8110–8119, 2020
work page 2020
-
[31]
Adam: A method for stochastic optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015
work page 2015
-
[32]
Glow: Generative flow with invertible 1x1 convolutions
Diederik P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems , pages 10215–10224, 2018
work page 2018
-
[33]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[34]
Improved variational inference with inverse autoregressive flow
Diederik P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Improved variational inference with inverse autoregressive flow. In Advances in Neural Information Processing Systems, pages 4743–4751, 2016
work page 2016
-
[35]
Energy-inspired models: Learning with sampler-induced distributions
John Lawson, George Tucker, Bo Dai, and Rajesh Ranganath. Energy-inspired models: Learning with sampler-induced distributions. In Advances in Neural Information Processing Systems , pages 8501–8513, 2019
work page 2019
-
[36]
Hoffman, and Jascha Sohl-Dickstein
Daniel Levy, Matt D. Hoffman, and Jascha Sohl-Dickstein. Generalizing Hamiltonian Monte Carlo with neural networks. In International Conference on Learning Representations , 2018
work page 2018
-
[37]
BIV A: A very deep hierarchy of latent variables for generative modeling
Lars Maaløe, Marco Fraccaro, Valentin Liévin, and Ole Winther. BIV A: A very deep hierarchy of latent variables for generative modeling. In Advances in Neural Information Processing Systems , pages 6548–6558, 2019
work page 2019
-
[38]
Generating high fidelity images with subscale pixel networks and multidimensional upscaling
Jacob Menick and Nal Kalchbrenner. Generating high fidelity images with subscale pixel networks and multidimensional upscaling. In International Conference on Learning Representations , 2019
work page 2019
-
[39]
Spectral normalization for generative adversarial networks
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations , 2018
work page 2018
-
[40]
VQ-DRAW: A sequential discrete V AE
Alex Nichol. VQ-DRAW: A sequential discrete V AE. arXiv preprint arXiv:2003.01599, 2020
-
[41]
On the anatomy of MCMC-based maximum likelihood learning of energy-based models
Erik Nijkamp, Mitch Hill, Tian Han, Song-Chun Zhu, and Ying Nian Wu. On the anatomy of MCMC-based maximum likelihood learning of energy-based models. arXiv preprint arXiv:1903.12370, 2019
-
[42]
Learning non-convergent non-persistent short-run MCMC toward energy-based model
Erik Nijkamp, Mitch Hill, Song-Chun Zhu, and Ying Nian Wu. Learning non-convergent non-persistent short-run MCMC toward energy-based model. In Advances in Neural Information Processing Systems , pages 5233–5243, 2019
work page 2019
-
[43]
Autoregressive quantile networks for generative modeling
Georg Ostrovski, Will Dabney, and Remi Munos. Autoregressive quantile networks for generative modeling. In International Conference on Machine Learning , pages 3936–3945, 2018
work page 2018
-
[44]
WaveGlow: A flow-based generative network for speech synthesis
Ryan Prenger, Rafael Valle, and Bryan Catanzaro. WaveGlow: A flow-based generative network for speech synthesis. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3617–3621. IEEE, 2019
work page 2019
-
[45]
Generating diverse high-fidelity images with VQ- V AE-2
Ali Razavi, Aaron van den Oord, and Oriol Vinyals. Generating diverse high-fidelity images with VQ- V AE-2. InAdvances in Neural Information Processing Systems , pages 14837–14847, 2019
work page 2019
-
[46]
Variational inference with normalizing flows
Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In International Conference on Machine Learning, pages 1530–1538, 2015
work page 2015
-
[47]
Stochastic backpropagation and approx- imate inference in deep generative models
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approx- imate inference in deep generative models. In International Conference on Machine Learning , pages 1278–1286, 2014
work page 2014
-
[48]
U-Net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234–241. Springer, 2015
work page 2015
-
[49]
Weight normalization: A simple reparameterization to accelerate training of deep neural networks
Tim Salimans and Durk P Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Advances in Neural Information Processing Systems , pages 901–909, 2016
work page 2016
-
[50]
Markov Chain Monte Carlo and variational inference: Bridging the gap
Tim Salimans, Diederik Kingma, and Max Welling. Markov Chain Monte Carlo and variational inference: Bridging the gap. In International Conference on Machine Learning , pages 1218–1226, 2015. 11
work page 2015
-
[51]
Improved techniques for training gans
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems , pages 2234–2242, 2016
work page 2016
-
[52]
Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P Kingma. PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications. In International Conference on Learning Representations, 2017
work page 2017
-
[53]
Deep unsupervised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning , pages 2256–2265, 2015
work page 2015
-
[54]
A-NICE-MC: Adversarial training for MCMC
Jiaming Song, Shengjia Zhao, and Stefano Ermon. A-NICE-MC: Adversarial training for MCMC. In Advances in Neural Information Processing Systems , pages 5140–5150, 2017
work page 2017
-
[55]
Generative modeling by estimating gradients of the data distribution
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems , pages 11895–11907, 2019
work page 2019
-
[56]
Improved techniques for training score-based generative models
Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. arXiv preprint arXiv:2006.09011, 2020
-
[57]
WaveNet: A Generative Model for Raw Audio
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. WaveNet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016
work page internal anchor Pith review arXiv 2016
-
[58]
Pixel recurrent neural networks
Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. International Conference on Machine Learning , 2016
work page 2016
-
[59]
Conditional image generation with PixelCNN decoders
Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. Conditional image generation with PixelCNN decoders. In Advances in Neural Information Processing Systems, pages 4790–4798, 2016
work page 2016
-
[60]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008, 2017
work page 2017
-
[61]
A connection between score matching and denoising autoencoders
Pascal Vincent. A connection between score matching and denoising autoencoders. Neural Computation, 23(7):1661–1674, 2011
work page 2011
-
[62]
Cnn-generated images are surprisingly easy to spot...for now
Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. Cnn-generated images are surprisingly easy to spot...for now. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020
work page 2020
-
[63]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 7794–7803, 2018
work page 2018
-
[64]
Predictive sampling with forecasting autoregressive models
Auke J Wiggers and Emiel Hoogeboom. Predictive sampling with forecasting autoregressive models. arXiv preprint arXiv:2002.09928, 2020
-
[65]
Hao Wu, Jonas Köhler, and Frank Noé. Stochastic normalizing flows. arXiv preprint arXiv:2002.06707, 2020
-
[66]
Yuxin Wu and Kaiming He. Group normalization. InProceedings of the European Conference on Computer Vision (ECCV), pages 3–19, 2018
work page 2018
-
[67]
A theory of generative convnet
Jianwen Xie, Yang Lu, Song-Chun Zhu, and Yingnian Wu. A theory of generative convnet. InInternational Conference on Machine Learning, pages 2635–2644, 2016
work page 2016
-
[68]
Synthesizing dynamic patterns by spatial-temporal generative convnet
Jianwen Xie, Song-Chun Zhu, and Ying Nian Wu. Synthesizing dynamic patterns by spatial-temporal generative convnet. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 7093–7101, 2017
work page 2017
-
[69]
Learning descriptor networks for 3d shape synthesis and analysis
Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, and Ying Nian Wu. Learning descriptor networks for 3d shape synthesis and analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8629–8638, 2018
work page 2018
-
[70]
Learning energy-based spatial-temporal generative convnets for dynamic patterns
Jianwen Xie, Song-Chun Zhu, and Ying Nian Wu. Learning energy-based spatial-temporal generative convnets for dynamic patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2019
work page 2019
-
[71]
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015
work page internal anchor Pith review arXiv 2015
-
[72]
Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146 , 2016. 12 Extra information LSUN FID scores for LSUN datasets are included in Table 3. Scores marked with∗ are reported by StyleGAN2 as baselines, and other scores are reported by their respective authors. Table 3: FID scores for LSUN 256× 256 datasets Model LSU...
work page internal anchor Pith review arXiv 2016
-
[73]
We use a U-Net with self-attention; NCSN uses a RefineNet with dilated convolutions. We condition all layers ont by adding in the Transformer sinusoidal position embedding, rather than only in normalization layers (NCSNv1) or only at the output (v2)
-
[74]
NCSN omits this scaling factor
Diffusion models scale down the data with each forward process step (by a√1−βt factor) so that variance does not grow when adding noise, thus providing consistently scaled inputs to the neural net reverse process. NCSN omits this scaling factor
-
[75]
Unlike NCSN, our forward process destroys signal (DKL(q(xT|x0)∥N (0, I))≈ 0), ensur- ing a close match between the prior and aggregate posterior of xT . Also unlike NCSN, our βt are very small, which ensures that the forward process is reversible by a Markov chain with conditional Gaussians. Both of these factors prevent distribution shift when sampling
-
[76]
Our Langevin-like sampler has coefficients (learning rate, noise scale, etc.) derived rig- orously from βt in the forward process. Thus, our training procedure directly trains our sampler to match the data distribution afterT steps: it trains the sampler as a latent variable model using variational inference. In contrast, NCSN’s sampler coefficients are set...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.