Metropolis-adjusted Langevin correctors using score-based acceptance probabilities, including an exact Bernoulli factory method and a Simpson's rule approximation, reduce sampling bias in diffusion models and improve FID scores.
Elucidating the design space of diffusion-based generative models
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
In the Gaussian setting the Wasserstein error of score-matching-plus-diffusion sampling equals a kernel norm of the data power spectrum whose kernel is determined by the four error sources and the algorithm parameters.
Diffusion-based refinement followed by consistency distillation improves music source separation quality and inference speed across U-Net and BS-RoFormer backbones on Slakh2100 and MUSDB18.
Aligning noisy hidden states in diffusion transformers to clean features from pretrained visual encoders speeds up training over 17x and reaches FID 1.42.
citing papers explorer
-
Metropolis-Adjusted Diffusion Models
Metropolis-adjusted Langevin correctors using score-based acceptance probabilities, including an exact Bernoulli factory method and a Simpson's rule approximation, reduce sampling bias in diffusion models and improve FID scores.
-
From Score Matching to Diffusion: A Fine-Grained Error Analysis in the Gaussian Setting
In the Gaussian setting the Wasserstein error of score-matching-plus-diffusion sampling equals a kernel norm of the data power spectrum whose kernel is determined by the four error sources and the algorithm parameters.
-
Improving Music Source Separation with Diffusion and Consistency Refinement
Diffusion-based refinement followed by consistency distillation improves music source separation quality and inference speed across U-Net and BS-RoFormer backbones on Slakh2100 and MUSDB18.
-
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Aligning noisy hidden states in diffusion transformers to clean features from pretrained visual encoders speeds up training over 17x and reaches FID 1.42.