pith. machine review for the scientific record. sign in

hub

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

22 Pith papers cite this work. Polarity classification is still indexing.

22 Pith papers citing it
abstract

A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm.

hub tools

citation-role summary

background 1

citation-polarity summary

roles

background 1

polarities

background 1

representative citing papers

Generative models on phase space

hep-ph · 2026-04-02 · unverdicted · novelty 8.0

Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.

High-Resolution Image Synthesis with Latent Diffusion Models

cs.CV · 2021-12-20 · conditional · novelty 7.0

Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and

Diffusion Models Beat GANs on Image Synthesis

cs.LG · 2021-05-11 · accept · novelty 7.0

Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.

Diffusion model for SU(N) gauge theories

hep-lat · 2026-05-07 · unverdicted · novelty 6.0

Implicit score matching trains diffusion models that successfully sample SU(3) Wilson gauge configurations on lattices, with a Hamiltonian-dynamics corrector needed for strong coupling.

Deepfake Detection Generalization with Diffusion Noise

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.

Discrete Flow Maps

stat.ML · 2026-04-10 · unverdicted · novelty 6.0

Discrete Flow Maps recast flow map training for discrete domains using simplex geometry to enable single-step text generation from noise and outperform prior discrete flow models.

MuPPet: Multi-person 2D-to-3D Pose Lifting

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

MuPPet introduces person encoding, permutation augmentation, and dynamic multi-person attention to outperform prior single- and multi-person 2D-to-3D pose lifting methods on group interaction datasets while improving occlusion robustness.

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

cs.CV · 2023-11-25 · conditional · novelty 6.0

Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.

Mesh Based Simulations with Spatial and Temporal awareness

cs.LG · 2026-05-02 · unverdicted · novelty 5.0

A unified training framework for mesh-based ML surrogates in CFD improves accuracy and long-horizon stability by enforcing spatial derivative consistency via multi-node prediction, using temporal cross-attention correction, and adding 3D rotary positional embeddings.

citing papers explorer

Showing 22 of 22 citing papers.

  • Autoregressive Learning in Joint KL: Sharp Oracle Bounds and Lower Bounds cs.LG · 2026-05-12 · unverdicted · none · ref 24 · internal anchor

    Joint KL yields horizon-free approximation but an information-theoretic lower bound of order Omega(H) for estimation error in autoregressive learning, with matching computationally efficient upper bounds.

  • Generative models on phase space hep-ph · 2026-04-02 · unverdicted · none · ref 13 · internal anchor

    Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.

  • Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks cs.LG · 2015-11-19 · accept · none · ref 15 · internal anchor

    DCGANs with architectural constraints learn a hierarchy of representations from object parts to scenes in both generator and discriminator across image datasets.

  • Hierarchical Text-Conditional Image Generation with CLIP Latents cs.CV · 2022-04-13 · accept · none · ref 47 · internal anchor

    A hierarchical prior-decoder model using CLIP latents generates more diverse text-conditional images than direct methods while preserving photorealism and caption fidelity.

  • High-Resolution Image Synthesis with Latent Diffusion Models cs.CV · 2021-12-20 · conditional · none · ref 82 · internal anchor

    Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and

  • GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models cs.CV · 2021-12-20 · accept · none · ref 25 · internal anchor

    A 3.5-billion-parameter diffusion model with classifier-free guidance generates images preferred over DALL-E by human raters and can be fine-tuned for text-guided inpainting.

  • SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations cs.CV · 2021-08-02 · conditional · none · ref 9 · internal anchor

    SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.

  • Diffusion Models Beat GANs on Image Synthesis cs.LG · 2021-05-11 · accept · none · ref 56 · internal anchor

    Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.

  • PG-3DGS: Optimizing 3D Gaussian Splatting to Satisfy Physics Objectives cs.CV · 2026-05-11 · unverdicted · none · ref 28 · internal anchor

    PG-3DGS couples 3D Gaussian Splatting with differentiable physics so that optimized shapes satisfy both visual fidelity and physical objectives such as pouring and aerodynamic lift, with real-world 3D-printed validation.

  • Diffusion model for SU(N) gauge theories hep-lat · 2026-05-07 · unverdicted · none · ref 10 · internal anchor

    Implicit score matching trains diffusion models that successfully sample SU(3) Wilson gauge configurations on lattices, with a Hamiltonian-dynamics corrector needed for strong coupling.

  • GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model cs.AI · 2026-05-07 · unverdicted · none · ref 17 · internal anchor

    GCCM prevents shortcut collapse in consistency models for graph prediction by using contrastive negative pairs and input feature perturbation, leading to better performance than deterministic baselines.

  • Breaking Watermarks in the Frequency Domain: A Modulated Diffusion Attack Framework cs.CV · 2026-04-24 · unverdicted · none · ref 27 · internal anchor

    FMDiffWA uses frequency-domain modulation inside diffusion sampling to neutralize watermarks in images while preserving visual quality and generalizing across watermarking schemes.

  • Deepfake Detection Generalization with Diffusion Noise cs.CV · 2026-04-16 · unverdicted · none · ref 62 · internal anchor

    ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.

  • Discrete Flow Maps stat.ML · 2026-04-10 · unverdicted · none · ref 2 · internal anchor

    Discrete Flow Maps recast flow map training for discrete domains using simplex geometry to enable single-step text generation from noise and outperform prior discrete flow models.

  • MuPPet: Multi-person 2D-to-3D Pose Lifting cs.CV · 2026-04-08 · unverdicted · none · ref 51 · internal anchor

    MuPPet introduces person encoding, permutation augmentation, and dynamic multi-person attention to outperform prior single- and multi-person 2D-to-3D pose lifting methods on group interaction datasets while improving occlusion robustness.

  • Scaling Rectified Flow Transformers for High-Resolution Image Synthesis cs.CV · 2024-03-05 · conditional · none · ref 181 · internal anchor

    Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.

  • Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets cs.CV · 2023-11-25 · conditional · none · ref 84 · internal anchor

    Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.

  • SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis cs.CV · 2023-07-04 · conditional · none · ref 45 · internal anchor

    SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.

  • VideoGPT: Video Generation using VQ-VAE and Transformers cs.CV · 2021-04-20 · accept · none · ref 33 · internal anchor

    VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.

  • Mesh Based Simulations with Spatial and Temporal awareness cs.LG · 2026-05-02 · unverdicted · none · ref 53 · internal anchor

    A unified training framework for mesh-based ML surrogates in CFD improves accuracy and long-horizon stability by enforcing spatial derivative consistency via multi-node prediction, using temporal cross-attention correction, and adding 3D rotary positional embeddings.

  • Generating Satellite Imagery Data for Wildfire Detection through Mask-Conditioned Generative AI cs.CV · 2026-04-02 · conditional · none · ref 19 · internal anchor

    A pre-trained Earth Observation diffusion model generates realistic post-wildfire Sentinel-2 imagery from burn masks via inpainting, achieving Burn IoU 0.456 and improved saliency over full generation.

  • Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models cs.CV · 2024-02-27 · unverdicted · none · ref 51 · internal anchor

    The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.