Joint KL yields horizon-free approximation but an information-theoretic lower bound of order Omega(H) for estimation error in autoregressive learning, with matching computationally efficient upper bounds.
hub Mixed citations
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Mixed citation behavior. Most common role is background (58%).
abstract
A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
DDIMs construct non-Markovian diffusion processes that share DDPM training objectives but allow much faster reverse sampling, demonstrated empirically at 10-50x wall-clock speedup.
DiffWave is a non-autoregressive diffusion model that generates high-fidelity audio waveforms from noise in constant steps, matching WaveNet vocoder quality while being orders of magnitude faster and outperforming prior models in unconditional generation.
DCGANs with architectural constraints learn a hierarchy of representations from object parts to scenes in both generator and discriminator across image datasets.
Hybrid simulation and non-Euclidean elasticity theory demonstrate that clathrin coats develop adaptive rigidity and memory during growth, producing flat, stalled, or closed outcomes through two energy-landscape gates and matching experiments without fitted parameters.
A gauge-equivariant diffusion model samples Schwinger model configurations, yielding unbiased observables matching MCMC and qualitatively less topological freezing than HMC.
Closed-Form Diffusion Policies enable training-free imitation learning by using closed-form scores derived from demonstration data, achieving competitive benchmark performance with millisecond inference and composable editing of pre-trained policies.
LGS pretrained on 2.5M trajectories across 16 systems matches deterministic baselines at one step and halves 20-step error while using far less compute and adapting to held-out higher-resolution flows.
A LoRA-adapted conditional diffusion surrogate for electromagnetic calorimeter showers matches key observables within 2% RMSE and reproduces directional trends in design-utility gradients.
Proposes an advection-diffusion PDE corruption process with stochastic velocity fields and Lattice Boltzmann solver for diffusion models, generalizing prior PDE methods.
Diffusion-QL uses conditional diffusion models as expressive policies in offline RL by coupling behavior cloning with Q-value maximization, achieving SOTA on most D4RL tasks.
A hierarchical prior-decoder model using CLIP latents generates more diverse text-conditional images than direct methods while preserving photorealism and caption fidelity.
Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and
A 3.5-billion-parameter diffusion model with classifier-free guidance generates images preferred over DALL-E by human raters and can be fine-tuned for text-guided inpainting.
SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
21cmEMUv3 emulates the cylindrical 21cm power spectrum via score-based diffusion and six other 21cmFAST observables via LSTM networks at sub-percent accuracy, then uses the emulator to infer a lower limit on soft-band X-ray luminosity from HERA data.
SITA performs scalable inference-time annealing of flow-based models on molecular systems by substituting energy-based surrogate likelihoods for divergence-based importance weights.
GenSBI delivers JAX-native implementations of generative SBI methods with transformer backbones and reports near-ideal calibration scores on standard benchmarks.
REPA-P aligns intermediate representations in diffusion models with physical states using first-principles PDE residuals to accelerate convergence and boost out-of-distribution robustness on PDE tasks.
A flow-matching generative model trained on CoLBT-hydro data conditionally generates marginal final-state hadron spectra from jet-induced hydro responses in 0-10% Pb+Pb collisions at 5.02 TeV, matching training data statistics with approximately six orders of magnitude computational speedup.
Diffusion LLMs can act as their own efficiency teachers by using revokable parallel decoding to identify reliable token orders and then distilling those orders into the model parameters for faster inference.
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
citing papers explorer
-
High-Resolution Image Synthesis with Latent Diffusion Models
Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and
-
Diffusion Models Beat GANs on Image Synthesis
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
-
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.