Score-Based Generative Modeling through Stochastic Differential Equations
Pith reviewed 2026-05-24 13:44 UTC · model grok-4.3
The pith
Generative modeling reduces to solving a reverse-time SDE whose drift depends only on the score of the perturbed data distribution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that the reverse-time SDE for recovering data from noise depends only on the time-dependent score function, which can be estimated by neural networks. This allows the framework to encompass previous score-based and diffusion approaches while introducing a predictor-corrector sampler, a neural ODE for likelihoods, and applications to conditional generation and inpainting, achieving an Inception score of 9.89 and FID of 2.20 on CIFAR-10.
What carries the argument
The reverse-time stochastic differential equation whose drift is set by the score (gradient of the log probability) of the time-dependent perturbed data distribution.
If this is right
- A predictor-corrector sampler can be used to correct discretization errors during reverse-time evolution.
- An equivalent neural ODE yields exact likelihood computation and improved sampling efficiency.
- The same score model solves inverse problems including class-conditional generation, inpainting, and colorization.
- Unconditional image generation reaches an Inception score of 9.89 and FID of 2.20 on CIFAR-10.
Where Pith is reading between the lines
- The continuous-time formulation implies that many discrete diffusion steps are approximations to the underlying SDE limit.
- Score estimation accuracy becomes the shared performance bottleneck across previously separate generative techniques.
- The same machinery could be applied directly to sequential data without requiring discrete token adaptations.
Load-bearing premise
Neural networks can estimate the score function of the perturbed data distribution accurately enough for the numerical reverse-time SDE solver to produce valid samples.
What would settle it
A neural network trained to estimate the score is inserted into the reverse-time SDE solver and the resulting samples fail to match the training distribution under metrics such as FID or Inception score.
Figures
read the original abstract
Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a unified framework for generative modeling based on forward and reverse stochastic differential equations (SDEs). A forward SDE perturbs data distributions toward a simple prior by injecting noise; the corresponding reverse-time SDE recovers the data distribution and depends only on the time-dependent score (gradient of the log-density) of the perturbed distribution. Neural networks estimate this score via denoising score matching, enabling sampling through numerical SDE solvers. The framework recovers prior score-based and diffusion probabilistic models as special cases, introduces a predictor-corrector sampler to mitigate discretization error, derives an equivalent neural ODE permitting exact likelihood computation, and demonstrates applications to inverse problems. On CIFAR-10 it reports an Inception score of 9.89 and FID of 2.20.
Significance. If the numerical stability and score-estimation claims hold, the work is significant: it supplies a single continuous-time formalism that subsumes existing discrete diffusion and score-matching methods, supplies new sampling algorithms (predictor-corrector and probability-flow ODE), and adds exact-likelihood capability via the ODE formulation. The empirical results on unconditional and conditional image synthesis, together with the first 1024×1024 score-based generations, would constitute a clear advance in high-dimensional generative modeling.
major comments (3)
- [§3, Eq. (4)] §3, Eq. (4): The continuous-time equivalence between the reverse SDE and the data distribution is correctly derived when the exact score is supplied, yet the manuscript provides no error-propagation bounds showing that a neural approximation s_θ(x,t) trained by denoising score matching yields marginals whose total variation or Wasserstein distance to the target remains controlled under the finite-step predictor-corrector or Euler–Maruyama discretizations used for the reported CIFAR-10 results.
- [§4.2] §4.2 (predictor-corrector sampler): The claim that the corrector step “corrects errors in the evolution of the discretized reverse-time SDE” is central to the performance numbers, but the paper supplies neither a convergence analysis nor an ablation quantifying how many corrector steps are required as a function of score-estimation error or step-size Δt; without this, it is unclear whether the reported FID of 2.20 is robust or an artifact of a particular discretization schedule.
- [Experiments section] Experiments section (CIFAR-10 results): The record FID of 2.20 and the 1024×1024 generations are presented as evidence that the framework succeeds at scale, yet no diagnostic is given on the magnitude of the score-estimation residual ||s_θ − ∇log p_t|| across time steps or on its correlation with sample quality; this diagnostic is load-bearing for the assertion that neural score estimation plus numerical SDE solvers suffice.
minor comments (2)
- [§2] Notation for the diffusion coefficient and noise schedule is introduced without an explicit table mapping each choice to the corresponding special cases recovered from prior work.
- [Figures] Figure captions for the sampling trajectories do not state the number of function evaluations or the precise discretization scheme employed, making direct reproduction of the likelihood numbers difficult.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work's significance and for the detailed major comments. We respond point-by-point below, acknowledging limitations where the manuscript lacks requested analyses and indicating revisions where feasible.
read point-by-point responses
-
Referee: [§3, Eq. (4)] The continuous-time equivalence between the reverse SDE and the data distribution is correctly derived when the exact score is supplied, yet the manuscript provides no error-propagation bounds showing that a neural approximation s_θ(x,t) trained by denoising score matching yields marginals whose total variation or Wasserstein distance to the target remains controlled under the finite-step predictor-corrector or Euler–Maruyama discretizations used for the reported CIFAR-10 results.
Authors: We agree that error-propagation bounds for the neural score approximation would strengthen the theoretical claims. Deriving such general bounds for high-dimensional learned scores under discretization is a substantial open problem beyond the scope of this work, which prioritizes the unifying framework and empirical results. In revision we will add a short discussion paragraph in Section 3 noting this as a limitation and suggesting it as future work. revision: partial
-
Referee: [§4.2] The claim that the corrector step “corrects errors in the evolution of the discretized reverse-time SDE” is central to the performance numbers, but the paper supplies neither a convergence analysis nor an ablation quantifying how many corrector steps are required as a function of score-estimation error or step-size Δt; without this, it is unclear whether the reported FID of 2.20 is robust or an artifact of a particular discretization schedule.
Authors: The predictor-corrector sampler is motivated by standard numerical SDE techniques, and our experiments show consistent gains from the corrector steps. We did not include a full convergence proof or exhaustive ablation on step count versus error. We will add an ablation study varying the number of corrector steps and discretization schedules to the revised experiments section to better substantiate robustness of the FID result. revision: yes
-
Referee: [Experiments section] The record FID of 2.20 and the 1024×1024 generations are presented as evidence that the framework succeeds at scale, yet no diagnostic is given on the magnitude of the score-estimation residual ||s_θ − ∇log p_t|| across time steps or on its correlation with sample quality; this diagnostic is load-bearing for the assertion that neural score estimation plus numerical SDE solvers suffice.
Authors: Direct evaluation of the residual is impossible without the true score for image data. We rely on the denoising score matching training loss and downstream sample quality as proxies. We will add a brief discussion in the experiments section on training loss behavior across time and its relation to generation metrics as an indirect diagnostic. revision: partial
- Rigorous error-propagation bounds for neural score approximations under the reported discretizations
- Full convergence analysis of the predictor-corrector sampler
Circularity Check
No significant circularity; derivation and empirical results are self-contained
full rationale
The paper derives the forward SDE and reverse-time SDE from standard stochastic calculus (Anderson's theorem referenced externally), shows prior score-based and diffusion methods as special cases of the general SDE without redefining them circularly, introduces predictor-corrector and neural ODE samplers as new procedures, and reports CIFAR-10 metrics from trained neural networks estimating the score. No load-bearing step reduces claimed results (unification, sampling, or FID/IS) to fitted quantities or self-citations by construction. Self-citations to prior score-matching work exist but support independent components and are not required for the central claims.
Axiom & Free-Parameter Ledger
free parameters (2)
- noise schedule / diffusion coefficient
- neural network parameters for score estimation
axioms (2)
- domain assumption A reverse-time SDE exists whose drift depends only on the score of the perturbed distribution.
- domain assumption Numerical SDE solvers can accurately integrate the reverse process when the score is well-estimated.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
reverse-time SDE depends only on the time-dependent gradient field (score) of the perturbed data distribution
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
VE, VP and sub-VP SDEs ... 8-tick period not mentioned
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 60 Pith papers
-
Generative Modeling with Flux Matching
Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices be...
-
A-CODE: Fully Atomic Protein Co-Design with Unified Multimodal Diffusion
A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain...
-
Quotient-Space Diffusion Models
Quotient-space diffusion models generate correct symmetric distributions by removing redundancy on the quotient space, simplifying learning and improving results on small molecules and proteins under SE(3) symmetry.
-
The Feedback Hamiltonian is the Score Function: A Diffusion-Model Framework for Quantum Trajectory Reversal
The García-Pintos feedback Hamiltonian equals the score function of the quantum trajectory distribution, linking quantum feedback to diffusion-model reversal.
-
Query Lower Bounds for Diffusion Sampling
Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.
-
OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models
OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
-
Generative models on phase space
Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
-
A Priori Sampling of Transition States with Guided Diffusion
ASTRA reframes transition-state search as guided diffusion inference that samples the isodensity surface between metastable basins and converges to first-order saddles via score differences and physical forces.
-
Mean-Field Path-Integral Diffusion: From Samples to Interacting Agents
MF-PID turns independent diffusion samples into mean-field interacting agents, proving that quadratic interactions yield exact linear mean interpolation and delivering 19-24% energy savings in demand-response control.
-
Variational Optimality of F\"ollmer Processes in Generative Diffusions
Föllmer processes are variationally optimal among generative diffusions because they minimize the impact of drift estimation error on path-space KL divergence, rendering different interpolation schedules statistically...
-
Flow-GRPO: Training Flow Matching Models via Online RL
Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
-
Large Language Diffusion Models
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
-
Denoising Diffusion Implicit Models
DDIMs construct non-Markovian diffusion processes that share DDPM training objectives but allow much faster reverse sampling, demonstrated empirically at 10-50x wall-clock speedup.
-
Learned Relay Representations for Forward-Thinking Discrete Diffusion Models
Learned Relay Representations enable masked diffusion models to propagate useful latent information across denoising steps, scaling to Fast-dLLM v2 to outperform supervised finetuning on coding tasks while cutting inf...
-
Generative Modeling by Value-Driven Transport
A control-theoretic linear program yields value-driven transport policies for generative modeling with straight paths and simulation-free training.
-
Let EEG Models Learn EEG
JET is a conditional flow matching framework that generates EEG as continuous raw sequences with added constraints for spectral and temporal properties, achieving over 40% lower TS-FID than prior discrete denoising me...
-
Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models
Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.
-
CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation
CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.
-
Mat\'ern Noise for Triangulation-Agnostic Flow Matching on Meshes
Proposes discretized Matérn process noise for triangulation-agnostic flow matching on meshes with PoissonNet denoiser, tested on elastic states and humanoid poses for meshes exceeding one million triangles.
-
SURGE: Approximation and Training Free Particle Filter for Diffusion Surrogate
URGE performs unbiased path-wise importance reweighting via Girsanov estimation for derivative-free inference-time scaling in diffusion models, proving equivalence to particle-wise SMC and outperforming baselines empirically.
-
Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms
Nested-GPT is an autoregressive Transformer surrogate that generates variable-multiplicity parton showers while enforcing ordered Markovian branching and matches reference Monte Carlo results for leading-log non-globa...
-
Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms
Nested-GPT is an autoregressive Transformer that dynamically generates variable-multiplicity parton showers matching Monte Carlo references for non-global logarithm resummation in the large-Nc limit.
-
Functionalization via Structure Completion and Motion Rectification
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture wi...
-
Towards Generalized Image Manipulation Localization via Score-based Model
DiffIML applies score-based generative modeling to image manipulation localization, recovering coherent masks iteratively from noise to improve generalization on unseen manipulation types.
-
Training-Free Generative Sampling via Moment-Matched Score Smoothing
MM-SOLD is a training-free particle sampler whose large-particle limit converges to a moment-matched Gibbs distribution obtained by exponentially tilting a score-smoothed target.
-
Sampling from Flow Language Models via Marginal-Conditioned Bridges
Marginal-conditioned bridges enable training-free sampling from Flow Language Models by drawing clean one-hot endpoints from factorized posteriors and using Ornstein-Uhlenbeck bridges, preserving token marginals and r...
-
HIR-ALIGN: Enhancing Hyperspectral Image Restoration via Diffusion-Based Data Generation
HIR-ALIGN augments limited target data for hyperspectral restoration by creating proxy clean images, synthesizing aligned HSIs with blur-robust diffusion and warp-based transfer, then finetuning models to lower target...
-
Proximal-Based Generative Modeling for Bayesian Inverse Problems
PGM replaces the intractable likelihood score in diffusion models with a closed-form Moreau score computed via proximal operators, enabling non-asymptotic sampling for inverse problems trained only on prior data.
-
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
Edit-Compass and EditReward-Compass are new unified benchmarks for fine-grained image editing evaluation and realistic reward modeling in reinforcement learning optimization.
-
Bridging Domain Gaps with Target-Aligned Generation for Offline Reinforcement Learning
TCE bridges domain gaps in offline RL by selectively using source data or generating target-aligned transitions via a dual score-based model, outperforming baselines in experiments.
-
Amortized Guidance for Image Inpainting with Pretrained Diffusion Models
AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.
-
MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving
MindVLA-U1 introduces a unified streaming VLA with shared backbone, framewise memory, and language-guided action diffusion that surpasses human drivers on WOD-E2E planning metrics.
-
Aligning Flow Map Policies with Optimal Q-Guidance
Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
-
One-Step Generative Modeling via Wasserstein Gradient Flows
W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...
-
On the Approximation Complexity of Matrix Product Operator Born Machines
MPO-BMs have NP-hard KL approximation in continuous settings but admit efficient polynomial-bond-dimension approximations with provable KL guarantees for structured targets under locality and spectral-gap conditions.
-
Muninn: Your Trajectory Diffusion Model But Faster
Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
-
Discrete Langevin-Inspired Posterior Sampling
ΔLPS is a gradient-guided discrete posterior sampler for inverse problems that works with masked or uniform discrete diffusion priors and outperforms prior discrete methods on image restoration tasks.
-
Remix the Timbre: Diffusion-Based Style Transfer Across Polyphonic Stems
MixtureTT performs direct per-stem timbre transfer on polyphonic mixtures via a shared diffusion transformer, outperforming single-stem baselines on SATB choral data while eliminating cascaded separation errors.
-
A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots
Wasserstein Lagrangian Mechanics formalizes second-order dynamics in Wasserstein space and provides an algorithm to learn them from observed marginals without specifying the Lagrangian, outperforming gradient flows on...
-
A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots
Wasserstein Lagrangian Mechanics learns second-order population dynamics from observed marginal snapshots without specifying the Lagrangian and outperforms gradient flow methods on tasks like vortex dynamics and embry...
-
A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots
Wasserstein Lagrangian Mechanics learns second-order population dynamics from observed marginals without specifying the Lagrangian and outperforms gradient flow methods on periodic dynamics like vortex motion and flocking.
-
Adaptive Subspace Projection for Generative Personalization
A training-free adaptive subspace projection method mitigates semantic collapsing in generative personalization by isolating and adjusting drift in a low-dimensional subspace using the stable pre-trained embedding as anchor.
-
Kurtosis-Guided Denoising Score Matching for Tabular Anomaly Detection
K-DSM uses per-feature kurtosis to set noise scales in DSM, enabling effective single-scale anomaly detection on tabular benchmarks in both semi-supervised and unsupervised settings.
-
Path-Coupled Bellman Flows for Distributional Reinforcement Learning
Path-Coupled Bellman Flows use source-consistent Bellman-coupled paths and a lambda-parameterized control-variate to learn return distributions via flow matching, improving fidelity and stability over prior DRL approaches.
-
Arena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Models
ArenaPO infers Gaussian capability distributions from pairwise preferences and applies truncated-normal latent inference to derive fine-grained offline rewards for preference optimization of text-to-image diffusion models.
-
DBMSolver: A Training-free Diffusion Bridge Sampler for High-Quality Image-to-Image Translation
DBMSolver is a new training-free sampler using exponential integrators that reduces NFEs by up to 5x and improves quality in diffusion bridge model-based image-to-image translation tasks.
-
Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems
An ensemble-based information-theoretic active learning method with ensemble Kalman inversion selects valuable tasks to optimize communication structures in LLM multi-agent systems under constrained budgets.
-
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
D-OPSD formulates supervised fine-tuning of step-distilled diffusion models as on-policy self-distillation by minimizing distribution differences between a text-only student and a multimodal teacher on the student's o...
-
Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning
DOSER detects OOD actions via diffusion-model denoising error and applies selective regularization based on predicted transitions, proving gamma-contraction with performance bounds and outperforming priors on offline ...
-
FluxFlow: Conservative Flow-Matching for Astronomical Image Super-Resolution
FluxFlow is a conservative pixel-space flow-matching framework for astronomical super-resolution that incorporates real atmospheric uncertainty and a training-free Wiener correction, outperforming baselines on a new 1...
-
Tempered Guided Diffusion
Tempered Guided Diffusion uses annealed SMC to produce consistent particle approximations to the posterior for training-free conditional diffusion sampling, outperforming independent guided trajectories in experiments.
-
PerFlow: Physics-Embedded Rectified Flow for Efficient Reconstruction and Uncertainty Quantification of Spatiotemporal Dynamics
PerFlow embeds physics constraints into rectified flow sampling through guidance-free conditioning and constraint-preserving projections, achieving efficient sparse reconstruction and uncertainty quantification for sp...
-
PODiff: Latent Diffusion in Proper Orthogonal Decomposition Space for Scientific Super-Resolution
PODiff performs conditional diffusion in a fixed, variance-ordered POD latent space to enable efficient probabilistic super-resolution of high-dimensional scientific fields with lower memory and better-calibrated unce...
-
Inferring Active Neural Circuits Using Diffusion Scores
SBTG recovers the Jacobian of the nonlinear transition map between brain states by multiplying cross-block scores from denoising models, enabling inference of lag-specific directed interactions in neural population da...
-
ExpoCM: Exposure-Aware One-Step Generative Single-Image HDR Reconstruction
ExpoCM enables fast one-step single-image HDR reconstruction via exposure-dependent perturbations and region-conditioned consistency trajectories derived from a probability flow ODE.
-
Generative Modeling with Orbit-Space Particle Flow Matching
OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
-
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning
FAN achieves state-of-the-art offline RL performance on robotic tasks by anchoring flow policies and using single-sample noise-conditioned Q-learning, with proven convergence and reduced runtimes.
-
Cast3: Translating numerical weather prediction principles into data-driven forecasting
Cast3 translates NWP principles into a data-driven model using cubed-sphere grids, super-ensembles, and generative nudging to achieve state-of-the-art ensemble predictions that outperform baselines.
-
Decentralized Proximal Stochastic Gradient Langevin Dynamics
DE-PSGLD is the first decentralized MCMC sampler for constrained convex domains that converges to a regularized Gibbs distribution with explicit 2-Wasserstein bounds for agents and network averages.
-
GD4: Graph-based Discrete Denoising Diffusion for MIMO Detection
GD4 is a graph-based discrete denoising diffusion method for MIMO detection that yields higher-quality suboptimal solutions than prior diffusion detectors and classical baselines under similar compute budgets in both ...
Reference graph
Works this paper leans on
-
[1]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[2]
Numerical continuation methods: an introduction, volume 13
Eugene L Allgower and Kurt Georg. Numerical continuation methods: an introduction, volume 13. Springer Science & Business Media, 2012
work page 2012
-
[3]
Reverse-time diffusion equation models
Brian D O Anderson. Reverse-time diffusion equation models. Stochastic Process. Appl., 12 0 (3): 0 313--326, May 1982
work page 1982
-
[4]
Jens Behrmann, Will Grathwohl, Ricky TQ Chen, David Duvenaud, and J \"o rn-Henrik Jacobsen. Invertible residual networks. In International Conference on Machine Learning, pp.\ 573--582, 2019
work page 2019
-
[5]
Learning to Generate Samples from Noise through Infusion Training
Florian Bordes, Sina Honari, and Pascal Vincent. Learning to generate samples from noise through infusion training. arXiv preprint arXiv:1703.06975, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[6]
Large scale gan training for high fidelity natural image synthesis
Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2018
work page 2018
-
[7]
Learning gradient fields for shape generation
Ruojin Cai, Guandao Yang, Hadar Averbuch-Elor, Zekun Hao, Serge Belongie, Noah Snavely, and Bharath Hariharan. Learning gradient fields for shape generation. In Proceedings of the European Conference on Computer Vision (ECCV), 2020
work page 2020
-
[8]
WaveG- rad: Estimating gradients for waveform generation
Nanxin Chen, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, and William Chan. Wavegrad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713, 2020
-
[9]
Neural ordinary differential equations
Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. In Advances in neural information processing systems, pp.\ 6571--6583, 2018
work page 2018
-
[10]
Residual flows for invertible generative modeling
Ricky TQ Chen, Jens Behrmann, David K Duvenaud, and J \"o rn-Henrik Jacobsen. Residual flows for invertible generative modeling. In Advances in Neural Information Processing Systems, pp.\ 9916--9926, 2019
work page 2019
-
[11]
Density estimation using Real NVP
Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[12]
A family of embedded runge-kutta formulae
John R Dormand and Peter J Prince. A family of embedded runge-kutta formulae. Journal of computational and applied mathematics, 6 0 (1): 0 19--26, 1980
work page 1980
-
[13]
Implicit generation and modeling with energy based models
Yilun Du and Igor Mordatch. Implicit generation and modeling with energy based models. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alch\' e -Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32, pp.\ 3608--3618. Curran Associates, Inc., 2019
work page 2019
-
[14]
Tweedie’s formula and selection bias
Bradley Efron. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106 0 (496): 0 1602--1614, 2011
work page 2011
-
[15]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp.\ 2672--2680, 2014
work page 2014
-
[16]
Variational walkback: Learning a transition operator as a stochastic recurrent net
Anirudh Goyal Alias Parth Goyal, Nan Rosemary Ke, Surya Ganguli, and Yoshua Bengio. Variational walkback: Learning a transition operator as a stochastic recurrent net. In Advances in Neural Information Processing Systems, pp.\ 4392--4402, 2017
work page 2017
-
[17]
Ffjord: Free-form continuous dynamics for scalable reversible generative models
Will Grathwohl, Ricky TQ Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations, 2018
work page 2018
-
[18]
Representations of knowledge in complex systems
Ulf Grenander and Michael I Miller. Representations of knowledge in complex systems. Journal of the Royal Statistical Society: Series B (Methodological), 56 0 (4): 0 549--581, 1994
work page 1994
-
[19]
Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, and Pieter Abbeel. Flow++: Improving flow-based generative models with variational dequantization and architecture design. In International Conference on Machine Learning, pp.\ 2722--2730, 2019
work page 2019
-
[20]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 2020
work page 2020
-
[21]
A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines
Michael F Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 19 0 (2): 0 433--450, 1990
work page 1990
-
[22]
Estimation of non-normalized statistical models by score matching
Aapo Hyv \"a rinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6 0 (Apr): 0 695--709, 2005
work page 2005
-
[23]
Adversarial score matching and improved sampling for image generation
Alexia Jolicoeur-Martineau, R \'e mi Pich \'e -Taillefer, R \'e mi Tachet des Combes, and Ioannis Mitliagkas. Adversarial score matching and improved sampling for image generation. arXiv preprint arXiv:2009.05475, 2020
-
[24]
Progressive growing of gans for improved quality, stability, and variation
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations, 2018
work page 2018
-
[25]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 4401--4410, 2019
work page 2019
-
[26]
Training generative adversarial networks with limited data
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems, 33, 2020 a
work page 2020
-
[27]
Analyzing and improving the image quality of StyleGAN
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of StyleGAN . In Proc. CVPR, 2020 b
work page 2020
-
[28]
Glow: Generative flow with invertible 1x1 convolutions
Durk P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pp.\ 10215--10224, 2018
work page 2018
-
[29]
Numerical solution of stochastic differential equations, volume 23
Peter E Kloeden and Eckhard Platen. Numerical solution of stochastic differential equations, volume 23. Springer Science & Business Media, 2013
work page 2013
-
[30]
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[31]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009
work page 2009
-
[32]
Deep learning face attributes in the wild
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015
work page 2015
-
[33]
Interacting particle solutions of fokker-planck equations through gradient-log-density estimation
Dimitra Maoutsa, Sebastian Reich, and Manfred Opper. Interacting particle solutions of fokker-planck equations through gradient-log-density estimation. arXiv preprint arXiv:2006.00702, 2020
-
[34]
Mcmc using hamiltonian dynamics
Radford M Neal et al. Mcmc using hamiltonian dynamics. Handbook of markov chain monte carlo, 2 0 (11): 0 2, 2011
work page 2011
-
[35]
Permutation invariant graph generation via score-based generative modeling
Chenhao Niu, Yang Song, Jiaming Song, Shengjia Zhao, Aditya Grover, and Stefano Ermon. Permutation invariant graph generation via score-based generative modeling. volume 108 of Proceedings of Machine Learning Research, pp.\ 4474--4484, Online, 26--28 Aug 2020. PMLR
work page 2020
-
[36]
Stochastic differential equations
Bernt ksendal. Stochastic differential equations. In Stochastic differential equations, pp.\ 65--84. Springer, 2003
work page 2003
-
[37]
Efficient learning of generative models via finite-difference score matching
Tianyu Pang, Kun Xu, Chongxuan Li, Yang Song, Stefano Ermon, and Jun Zhu. Efficient learning of generative models via finite-difference score matching. arXiv preprint arXiv:2007.03317, 2020
-
[38]
Correlation functions and computer simulations
Giorgio Parisi. Correlation functions and computer simulations. Nuclear Physics B, 180 0 (3): 0 378--384, 1981
work page 1981
-
[39]
Generating diverse high-fidelity images with vq-vae-2
Ali Razavi, Aaron van den Oord, and Oriol Vinyals. Generating diverse high-fidelity images with vq-vae-2. In Advances in Neural Information Processing Systems, pp.\ 14837--14847, 2019
work page 2019
-
[40]
On linear identifiability of learned representations
Geoffrey Roeder, Luke Metz, and Diederik P Kingma. On linear identifiability of learned representations. arXiv preprint arXiv:2007.00810, 2020
-
[41]
Applied stochastic differential equations, volume 10
Simo S \"a rkk \"a and Arno Solin. Applied stochastic differential equations, volume 10. Cambridge University Press, 2019
work page 2019
-
[42]
The eigenvalues of mega-dimensional matrices
John Skilling. The eigenvalues of mega-dimensional matrices. In Maximum Entropy and Bayesian Methods, pp.\ 455--466. Springer, 1989
work page 1989
-
[43]
Deep unsupervised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp.\ 2256--2265, 2015
work page 2015
-
[44]
Generative modeling by estimating gradients of the data distribution
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, pp.\ 11895--11907, 2019
work page 2019
-
[45]
Improved techniques for training score-based generative models
Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. Advances in Neural Information Processing Systems, 33, 2020
work page 2020
-
[46]
Sliced score matching: A scalable approach to density and score estimation
Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. In Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2019, Tel Aviv, Israel, July 22-25, 2019 , pp.\ 204, 2019 a
work page 2019
-
[47]
Mintnet: Building invertible neural networks with masked convolutions
Yang Song, Chenlin Meng, and Stefano Ermon. Mintnet: Building invertible neural networks with masked convolutions. In Advances in Neural Information Processing Systems, pp.\ 11002--11012, 2019 b
work page 2019
-
[48]
Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T. Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimensional domains. NeurIPS, 2020
work page 2020
-
[49]
A connection between score matching and denoising autoencoders
Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23 0 (7): 0 1661--1674, 2011
work page 2011
-
[50]
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[51]
Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[52]
Making convolutional networks shift-invariant again
Richard Zhang. Making convolutional networks shift-invariant again. In ICML, 2019
work page 2019
-
[53]
\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...
-
[54]
\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...
-
[55]
NICE: Non-linear Independent Components Estimation
@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/s11263-015-0816-y 2019
-
[56]
@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.