hub

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli · 2015 · cs.LG · arXiv 1503.03585

22 Pith papers cite this work. Polarity classification is still indexing.

22 Pith papers citing it

open full Pith review browse 22 citing papers arXiv PDF

abstract

A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

Joint KL yields horizon-free approximation but an information-theoretic lower bound of order Omega(H) for estimation error in autoregressive learning, with matching computationally efficient upper bounds.

Generative models on phase space

hep-ph · 2026-04-02 · unverdicted · novelty 8.0

Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

cs.LG · 2015-11-19 · accept · novelty 8.0

DCGANs with architectural constraints learn a hierarchy of representations from object parts to scenes in both generator and discriminator across image datasets.

Hierarchical Text-Conditional Image Generation with CLIP Latents

cs.CV · 2022-04-13 · accept · novelty 7.0

A hierarchical prior-decoder model using CLIP latents generates more diverse text-conditional images than direct methods while preserving photorealism and caption fidelity.

High-Resolution Image Synthesis with Latent Diffusion Models

cs.CV · 2021-12-20 · conditional · novelty 7.0

Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

cs.CV · 2021-12-20 · accept · novelty 7.0

A 3.5-billion-parameter diffusion model with classifier-free guidance generates images preferred over DALL-E by human raters and can be fine-tuned for text-guided inpainting.

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

cs.CV · 2021-08-02 · conditional · novelty 7.0

SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.

Diffusion Models Beat GANs on Image Synthesis

cs.LG · 2021-05-11 · accept · novelty 7.0

Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.

PG-3DGS: Optimizing 3D Gaussian Splatting to Satisfy Physics Objectives

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

PG-3DGS couples 3D Gaussian Splatting with differentiable physics so that optimized shapes satisfy both visual fidelity and physical objectives such as pouring and aerodynamic lift, with real-world 3D-printed validation.

Diffusion model for SU(N) gauge theories

hep-lat · 2026-05-07 · unverdicted · novelty 6.0

Implicit score matching trains diffusion models that successfully sample SU(3) Wilson gauge configurations on lattices, with a Hamiltonian-dynamics corrector needed for strong coupling.

GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

GCCM prevents shortcut collapse in consistency models for graph prediction by using contrastive negative pairs and input feature perturbation, leading to better performance than deterministic baselines.

Breaking Watermarks in the Frequency Domain: A Modulated Diffusion Attack Framework

cs.CV · 2026-04-24 · unverdicted · novelty 6.0

FMDiffWA uses frequency-domain modulation inside diffusion sampling to neutralize watermarks in images while preserving visual quality and generalizing across watermarking schemes.

Deepfake Detection Generalization with Diffusion Noise

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.

Discrete Flow Maps

stat.ML · 2026-04-10 · unverdicted · novelty 6.0

Discrete Flow Maps recast flow map training for discrete domains using simplex geometry to enable single-step text generation from noise and outperform prior discrete flow models.

MuPPet: Multi-person 2D-to-3D Pose Lifting

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

MuPPet introduces person encoding, permutation augmentation, and dynamic multi-person attention to outperform prior single- and multi-person 2D-to-3D pose lifting methods on group interaction datasets while improving occlusion robustness.

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

cs.CV · 2024-03-05 · conditional · novelty 6.0

Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

cs.CV · 2023-11-25 · conditional · novelty 6.0

Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

cs.CV · 2023-07-04 · conditional · novelty 6.0

SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.

VideoGPT: Video Generation using VQ-VAE and Transformers

cs.CV · 2021-04-20 · accept · novelty 6.0

VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.

Mesh Based Simulations with Spatial and Temporal awareness

cs.LG · 2026-05-02 · unverdicted · novelty 5.0

A unified training framework for mesh-based ML surrogates in CFD improves accuracy and long-horizon stability by enforcing spatial derivative consistency via multi-node prediction, using temporal cross-attention correction, and adding 3D rotary positional embeddings.

Generating Satellite Imagery Data for Wildfire Detection through Mask-Conditioned Generative AI

cs.CV · 2026-04-02 · conditional · novelty 4.0

A pre-trained Earth Observation diffusion model generates realistic post-wildfire Sentinel-2 imagery from burn masks via inpainting, achieving Burn IoU 0.456 and improved saliency over full generation.

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

cs.CV · 2024-02-27 · unverdicted · novelty 2.0

The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.

citing papers explorer

Showing 22 of 22 citing papers.

Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds cs.LG · 2026-05-12 · unverdicted · none · ref 24 · internal anchor
Joint KL yields horizon-free approximation but an information-theoretic lower bound of order Omega(H) for estimation error in autoregressive learning, with matching computationally efficient upper bounds.
Generative models on phase space hep-ph · 2026-04-02 · unverdicted · none · ref 13 · internal anchor
Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks cs.LG · 2015-11-19 · accept · none · ref 15 · internal anchor
DCGANs with architectural constraints learn a hierarchy of representations from object parts to scenes in both generator and discriminator across image datasets.
Hierarchical Text-Conditional Image Generation with CLIP Latents cs.CV · 2022-04-13 · accept · none · ref 47 · internal anchor
A hierarchical prior-decoder model using CLIP latents generates more diverse text-conditional images than direct methods while preserving photorealism and caption fidelity.
High-Resolution Image Synthesis with Latent Diffusion Models cs.CV · 2021-12-20 · conditional · none · ref 82 · internal anchor
Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models cs.CV · 2021-12-20 · accept · none · ref 25 · internal anchor
A 3.5-billion-parameter diffusion model with classifier-free guidance generates images preferred over DALL-E by human raters and can be fine-tuned for text-guided inpainting.
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations cs.CV · 2021-08-02 · conditional · none · ref 9 · internal anchor
SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.
Diffusion Models Beat GANs on Image Synthesis cs.LG · 2021-05-11 · accept · none · ref 56 · internal anchor
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
PG-3DGS: Optimizing 3D Gaussian Splatting to Satisfy Physics Objectives cs.CV · 2026-05-11 · unverdicted · none · ref 28 · internal anchor
PG-3DGS couples 3D Gaussian Splatting with differentiable physics so that optimized shapes satisfy both visual fidelity and physical objectives such as pouring and aerodynamic lift, with real-world 3D-printed validation.
Diffusion model for SU(N) gauge theories hep-lat · 2026-05-07 · unverdicted · none · ref 10 · internal anchor
Implicit score matching trains diffusion models that successfully sample SU(3) Wilson gauge configurations on lattices, with a Hamiltonian-dynamics corrector needed for strong coupling.
GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model cs.AI · 2026-05-07 · unverdicted · none · ref 17 · internal anchor
GCCM prevents shortcut collapse in consistency models for graph prediction by using contrastive negative pairs and input feature perturbation, leading to better performance than deterministic baselines.
Breaking Watermarks in the Frequency Domain: A Modulated Diffusion Attack Framework cs.CV · 2026-04-24 · unverdicted · none · ref 27 · internal anchor
FMDiffWA uses frequency-domain modulation inside diffusion sampling to neutralize watermarks in images while preserving visual quality and generalizing across watermarking schemes.
Deepfake Detection Generalization with Diffusion Noise cs.CV · 2026-04-16 · unverdicted · none · ref 62 · internal anchor
ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.
Discrete Flow Maps stat.ML · 2026-04-10 · unverdicted · none · ref 2 · internal anchor
Discrete Flow Maps recast flow map training for discrete domains using simplex geometry to enable single-step text generation from noise and outperform prior discrete flow models.
MuPPet: Multi-person 2D-to-3D Pose Lifting cs.CV · 2026-04-08 · unverdicted · none · ref 51 · internal anchor
MuPPet introduces person encoding, permutation augmentation, and dynamic multi-person attention to outperform prior single- and multi-person 2D-to-3D pose lifting methods on group interaction datasets while improving occlusion robustness.
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis cs.CV · 2024-03-05 · conditional · none · ref 181 · internal anchor
Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets cs.CV · 2023-11-25 · conditional · none · ref 84 · internal anchor
Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis cs.CV · 2023-07-04 · conditional · none · ref 45 · internal anchor
SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.
VideoGPT: Video Generation using VQ-VAE and Transformers cs.CV · 2021-04-20 · accept · none · ref 33 · internal anchor
VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.
Mesh Based Simulations with Spatial and Temporal awareness cs.LG · 2026-05-02 · unverdicted · none · ref 53 · internal anchor
A unified training framework for mesh-based ML surrogates in CFD improves accuracy and long-horizon stability by enforcing spatial derivative consistency via multi-node prediction, using temporal cross-attention correction, and adding 3D rotary positional embeddings.
Generating Satellite Imagery Data for Wildfire Detection through Mask-Conditioned Generative AI cs.CV · 2026-04-02 · conditional · none · ref 19 · internal anchor
A pre-trained Earth Observation diffusion model generates realistic post-wildfire Sentinel-2 imagery from burn masks via inpainting, achieving Burn IoU 0.456 and improved saliency over full generation.
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models cs.CV · 2024-02-27 · unverdicted · none · ref 51 · internal anchor
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer