Score-Based Generative Modeling through Stochastic Differential Equations

Abhishek Kumar; Ben Poole; Diederik P. Kingma; Jascha Sohl-Dickstein; Stefano Ermon; Yang Song

arxiv: 2011.13456 · v2 · submitted 2020-11-26 · 💻 cs.LG · stat.ML

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song , Jascha Sohl-Dickstein , Diederik P. Kingma , Abhishek Kumar , Stefano Ermon , Ben Poole This is my paper

Pith reviewed 2026-05-24 13:44 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords score-based generative modelsstochastic differential equationsdiffusion probabilistic modelsgenerative modelingimage generationneural ODEinverse problems

0 comments

The pith

Generative modeling reduces to solving a reverse-time SDE whose drift depends only on the score of the perturbed data distribution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that data can be transformed into a simple prior by a forward SDE that adds noise, and recovered by solving the corresponding reverse-time SDE. This reverse SDE is completely determined by the time-dependent score function of the noisy data distribution. Neural networks estimate the score, numerical solvers generate samples, and the same framework unifies earlier score-based and diffusion models while adding a predictor-corrector sampler and an equivalent neural ODE. The approach also handles inverse problems and reaches new performance levels on image generation.

Core claim

The authors establish that the reverse-time SDE for recovering data from noise depends only on the time-dependent score function, which can be estimated by neural networks. This allows the framework to encompass previous score-based and diffusion approaches while introducing a predictor-corrector sampler, a neural ODE for likelihoods, and applications to conditional generation and inpainting, achieving an Inception score of 9.89 and FID of 2.20 on CIFAR-10.

What carries the argument

The reverse-time stochastic differential equation whose drift is set by the score (gradient of the log probability) of the time-dependent perturbed data distribution.

If this is right

A predictor-corrector sampler can be used to correct discretization errors during reverse-time evolution.
An equivalent neural ODE yields exact likelihood computation and improved sampling efficiency.
The same score model solves inverse problems including class-conditional generation, inpainting, and colorization.
Unconditional image generation reaches an Inception score of 9.89 and FID of 2.20 on CIFAR-10.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The continuous-time formulation implies that many discrete diffusion steps are approximations to the underlying SDE limit.
Score estimation accuracy becomes the shared performance bottleneck across previously separate generative techniques.
The same machinery could be applied directly to sequential data without requiring discrete token adaptations.

Load-bearing premise

Neural networks can estimate the score function of the perturbed data distribution accurately enough for the numerical reverse-time SDE solver to produce valid samples.

What would settle it

A neural network trained to estimate the score is inserted into the reverse-time SDE solver and the resulting samples fail to match the training distribution under metrics such as FID or Inception score.

Figures

Figures reproduced from arXiv: 2011.13456 by Abhishek Kumar, Ben Poole, Diederik P. Kingma, Jascha Sohl-Dickstein, Stefano Ermon, Yang Song.

**Figure 2.** Figure 2: Overview of score-based generative modeling through SDEs. We can map data to a noise distribution (the prior) with an SDE (Section 3.1), and reverse this SDE for generative modeling (Section 3.2). We can also reverse the associated probability flow ODE (Section 4.3), which yields a deterministic process that samples from the same distribution as the SDE. Both the reverse-time SDE and probability flow ODE c… view at source ↗

**Figure 3.** Figure 3: Probability flow ODE enables fast sampling with adaptive stepsizes as the numerical precision is varied (left), and reduces the number of score function evaluations (NFE) without harming quality (middle). The invertible mapping from latents to images allows for interpolations (right). ODE, Eq. (13), provides the same trajectories given perfectly estimated scores. We provide additional empirical verificatio… view at source ↗

**Figure 4.** Figure 4: Left: Class-conditional samples on 32 ˆ 32 CIFAR-10. Top four rows are automobiles and bottom four rows are horses. Right: Inpainting (top two rows) and colorization (bottom two rows) results on 256 ˆ 256 LSUN. First column is the original image, second column is the masked/grayscale image, remaining columns are sampled image completions or colorizations. from ptpxptq | yq by starting from pT pxpTq | yq a… view at source ↗

**Figure 5.** Figure 5: Discrete-time perturbation kernels and our continuous generalizations match each other [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Samples from the probability flow ODE for VP SDE on [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Comparing the first 100 dimensions of the latent code obtained for a random CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Left: The dimension-wise difference between encodings obtained by Model A and B. As a baseline, we also report the difference between shuffled representations of these two models. Right: The dimension-wise correlation coefficients of encodings obtained by Model A and Model B. because xi “ 1 a 1 ´ βi`1 pxi`1 ` βi`1sθ˚ pxi`1, i ` 1qq ` a βi`1zi`1 “ ˆ 1 ` 1 2 βi`1 ` opβi`1q ˙ pxi`1 ` βi`1sθ˚ pxi`1, i ` 1qq ` … view at source ↗

**Figure 9.** Figure 9: PC sampling for LSUN bedroom and church. The vertical axis corresponds to the total [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: The effects of different architecture components for score-based models trained with VE [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗

**Figure 11.** Figure 11: Unconditional CIFAR-10 samples from NCSN++ cont. (deep, VE). [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

**Figure 12.** Figure 12: Samples on 1024 ˆ 1024 CelebA-HQ from a modified NCSN++ model trained with the VE SDE. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗

**Figure 13.** Figure 13: Class-conditional image generation by solving the conditional reverse-time SDE with PC. [PITH_FULL_IMAGE:figures/full_fig_p030_13.png] view at source ↗

**Figure 14.** Figure 14: Extended inpainting results for 256 ˆ 256 bedroom images. 33 [PITH_FULL_IMAGE:figures/full_fig_p033_14.png] view at source ↗

**Figure 15.** Figure 15: Extended inpainting results for 256 ˆ 256 church images. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗

**Figure 16.** Figure 16: Extended colorization results for 256 ˆ 256 bedroom images. 35 [PITH_FULL_IMAGE:figures/full_fig_p035_16.png] view at source ↗

**Figure 17.** Figure 17: Extended colorization results for 256 ˆ 256 church images. 36 [PITH_FULL_IMAGE:figures/full_fig_p036_17.png] view at source ↗

read the original abstract

Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The SDE unification and predictor-corrector sampler are the real additions here, with the CIFAR numbers holding up under the framework.

read the letter

The paper's core move is to cast both score-based generative models and diffusion probabilistic models as forward and reverse SDEs. This recovers the earlier approaches as special cases while adding a predictor-corrector sampler that corrects discretization drift and an equivalent neural ODE that supports exact likelihoods. Those two extensions are not just rephrasings of prior work; they give concrete new procedures for sampling and for solving inverse problems like inpainting. The CIFAR-10 results (IS 9.89, FID 2.20) and the 1024x1024 generations are reported with the new samplers, so the empirical claims are tied to the proposed methods rather than to black-box baselines. The derivations for the reverse SDE and the ODE equivalence look clean on the page, and the experiments use standard architectures plus the stated improvements. The main soft spot is exactly the one flagged in the stress test: once the score is replaced by a neural approximation, the numerical integration of the reverse SDE can accumulate bias, especially under aggressive noise schedules or at higher resolutions. The predictor-corrector step is their direct response to that issue, and it improves the numbers, but the paper does not supply a full error analysis or stability bounds for the discretized case. That leaves some room for later work on tighter guarantees. This is written for people already working on score matching or diffusion; the unification and the new samplers make it worth their time. It is grounded enough and the results are sharp enough that a serious editor should send it to referees rather than desk-reject.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces a unified framework for generative modeling based on forward and reverse stochastic differential equations (SDEs). A forward SDE perturbs data distributions toward a simple prior by injecting noise; the corresponding reverse-time SDE recovers the data distribution and depends only on the time-dependent score (gradient of the log-density) of the perturbed distribution. Neural networks estimate this score via denoising score matching, enabling sampling through numerical SDE solvers. The framework recovers prior score-based and diffusion probabilistic models as special cases, introduces a predictor-corrector sampler to mitigate discretization error, derives an equivalent neural ODE permitting exact likelihood computation, and demonstrates applications to inverse problems. On CIFAR-10 it reports an Inception score of 9.89 and FID of 2.20.

Significance. If the numerical stability and score-estimation claims hold, the work is significant: it supplies a single continuous-time formalism that subsumes existing discrete diffusion and score-matching methods, supplies new sampling algorithms (predictor-corrector and probability-flow ODE), and adds exact-likelihood capability via the ODE formulation. The empirical results on unconditional and conditional image synthesis, together with the first 1024×1024 score-based generations, would constitute a clear advance in high-dimensional generative modeling.

major comments (3)

[§3, Eq. (4)] §3, Eq. (4): The continuous-time equivalence between the reverse SDE and the data distribution is correctly derived when the exact score is supplied, yet the manuscript provides no error-propagation bounds showing that a neural approximation s_θ(x,t) trained by denoising score matching yields marginals whose total variation or Wasserstein distance to the target remains controlled under the finite-step predictor-corrector or Euler–Maruyama discretizations used for the reported CIFAR-10 results.
[§4.2] §4.2 (predictor-corrector sampler): The claim that the corrector step “corrects errors in the evolution of the discretized reverse-time SDE” is central to the performance numbers, but the paper supplies neither a convergence analysis nor an ablation quantifying how many corrector steps are required as a function of score-estimation error or step-size Δt; without this, it is unclear whether the reported FID of 2.20 is robust or an artifact of a particular discretization schedule.
[Experiments section] Experiments section (CIFAR-10 results): The record FID of 2.20 and the 1024×1024 generations are presented as evidence that the framework succeeds at scale, yet no diagnostic is given on the magnitude of the score-estimation residual ||s_θ − ∇log p_t|| across time steps or on its correlation with sample quality; this diagnostic is load-bearing for the assertion that neural score estimation plus numerical SDE solvers suffice.

minor comments (2)

[§2] Notation for the diffusion coefficient and noise schedule is introduced without an explicit table mapping each choice to the corresponding special cases recovered from prior work.
[Figures] Figure captions for the sampling trajectories do not state the number of function evaluations or the precise discretization scheme employed, making direct reproduction of the likelihood numbers difficult.

Simulated Author's Rebuttal

3 responses · 2 unresolved

We thank the referee for the positive assessment of our work's significance and for the detailed major comments. We respond point-by-point below, acknowledging limitations where the manuscript lacks requested analyses and indicating revisions where feasible.

read point-by-point responses

Referee: [§3, Eq. (4)] The continuous-time equivalence between the reverse SDE and the data distribution is correctly derived when the exact score is supplied, yet the manuscript provides no error-propagation bounds showing that a neural approximation s_θ(x,t) trained by denoising score matching yields marginals whose total variation or Wasserstein distance to the target remains controlled under the finite-step predictor-corrector or Euler–Maruyama discretizations used for the reported CIFAR-10 results.

Authors: We agree that error-propagation bounds for the neural score approximation would strengthen the theoretical claims. Deriving such general bounds for high-dimensional learned scores under discretization is a substantial open problem beyond the scope of this work, which prioritizes the unifying framework and empirical results. In revision we will add a short discussion paragraph in Section 3 noting this as a limitation and suggesting it as future work. revision: partial
Referee: [§4.2] The claim that the corrector step “corrects errors in the evolution of the discretized reverse-time SDE” is central to the performance numbers, but the paper supplies neither a convergence analysis nor an ablation quantifying how many corrector steps are required as a function of score-estimation error or step-size Δt; without this, it is unclear whether the reported FID of 2.20 is robust or an artifact of a particular discretization schedule.

Authors: The predictor-corrector sampler is motivated by standard numerical SDE techniques, and our experiments show consistent gains from the corrector steps. We did not include a full convergence proof or exhaustive ablation on step count versus error. We will add an ablation study varying the number of corrector steps and discretization schedules to the revised experiments section to better substantiate robustness of the FID result. revision: yes
Referee: [Experiments section] The record FID of 2.20 and the 1024×1024 generations are presented as evidence that the framework succeeds at scale, yet no diagnostic is given on the magnitude of the score-estimation residual ||s_θ − ∇log p_t|| across time steps or on its correlation with sample quality; this diagnostic is load-bearing for the assertion that neural score estimation plus numerical SDE solvers suffice.

Authors: Direct evaluation of the residual is impossible without the true score for image data. We rely on the denoising score matching training loss and downstream sample quality as proxies. We will add a brief discussion in the experiments section on training loss behavior across time and its relation to generation metrics as an indirect diagnostic. revision: partial

standing simulated objections not resolved

Rigorous error-propagation bounds for neural score approximations under the reported discretizations
Full convergence analysis of the predictor-corrector sampler

Circularity Check

0 steps flagged

No significant circularity; derivation and empirical results are self-contained

full rationale

The paper derives the forward SDE and reverse-time SDE from standard stochastic calculus (Anderson's theorem referenced externally), shows prior score-based and diffusion methods as special cases of the general SDE without redefining them circularly, introduces predictor-corrector and neural ODE samplers as new procedures, and reports CIFAR-10 metrics from trained neural networks estimating the score. No load-bearing step reduces claimed results (unification, sampling, or FID/IS) to fitted quantities or self-citations by construction. Self-citations to prior score-matching work exist but support independent components and are not required for the central claims.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the existence of a well-behaved forward SDE that reaches a known prior, the accuracy of neural score estimation, and the validity of numerical SDE solvers for the reverse process. No invented entities; free parameters are the noise schedule and neural network weights fitted during training.

free parameters (2)

noise schedule / diffusion coefficient
Time-dependent noise injection rate chosen to define the forward SDE; fitted or hand-selected to reach the prior.
neural network parameters for score estimation
Weights of the score network trained on perturbed data; central to all sampling claims.

axioms (2)

domain assumption A reverse-time SDE exists whose drift depends only on the score of the perturbed distribution.
Invoked in the abstract when stating that the reverse SDE transforms the prior back to data using only the score.
domain assumption Numerical SDE solvers can accurately integrate the reverse process when the score is well-estimated.
Required for the predictor-corrector and sampling claims to hold.

pith-pipeline@v0.9.0 · 5832 in / 1523 out tokens · 17529 ms · 2026-05-24T13:44:00.633046+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

reverse-time SDE depends only on the time-dependent gradient field (score) of the perturbed data distribution
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

VE, VP and sub-VP SDEs ... 8-tick period not mentioned

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Generative Modeling with Flux Matching
cs.LG 2026-05 unverdicted novelty 8.0

Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices be...
A-CODE: Fully Atomic Protein Co-Design with Unified Multimodal Diffusion
q-bio.QM 2026-05 unverdicted novelty 8.0

A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain...
Quotient-Space Diffusion Models
cs.LG 2026-04 unverdicted novelty 8.0

Quotient-space diffusion models generate correct symmetric distributions by removing redundancy on the quotient space, simplifying learning and improving results on small molecules and proteins under SE(3) symmetry.
The Feedback Hamiltonian is the Score Function: A Diffusion-Model Framework for Quantum Trajectory Reversal
quant-ph 2026-04 unverdicted novelty 8.0

The García-Pintos feedback Hamiltonian equals the score function of the quantum trajectory distribution, linking quantum feedback to diffusion-model reversal.
Query Lower Bounds for Diffusion Sampling
cs.LG 2026-04 unverdicted novelty 8.0

Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.
OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models
cs.CV 2026-04 unverdicted novelty 8.0

OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
Generative models on phase space
hep-ph 2026-04 unverdicted novelty 8.0

Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
A Priori Sampling of Transition States with Guided Diffusion
physics.chem-ph 2026-03 conditional novelty 8.0

ASTRA reframes transition-state search as guided diffusion inference that samples the isodensity surface between metastable basins and converges to first-order saddles via score differences and physical forces.
Mean-Field Path-Integral Diffusion: From Samples to Interacting Agents
math.OC 2026-02 unverdicted novelty 8.0

MF-PID turns independent diffusion samples into mean-field interacting agents, proving that quadratic interactions yield exact linear mean interpolation and delivering 19-24% energy savings in demand-response control.
Variational Optimality of F\"ollmer Processes in Generative Diffusions
math.ST 2026-02 unverdicted novelty 8.0

Föllmer processes are variationally optimal among generative diffusions because they minimize the impact of drift estimation error on path-space KL divergence, rendering different interpolation schedules statistically...
Flow-GRPO: Training Flow Matching Models via Online RL
cs.CV 2025-05 unverdicted novelty 8.0

Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
Large Language Diffusion Models
cs.CL 2025-02 unverdicted novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
Denoising Diffusion Implicit Models
cs.LG 2020-10 unverdicted novelty 8.0

DDIMs construct non-Markovian diffusion processes that share DDPM training objectives but allow much faster reverse sampling, demonstrated empirically at 10-50x wall-clock speedup.
Learned Relay Representations for Forward-Thinking Discrete Diffusion Models
cs.LG 2026-05 unverdicted novelty 7.0

Learned Relay Representations enable masked diffusion models to propagate useful latent information across denoising steps, scaling to Fast-dLLM v2 to outperform supervised finetuning on coding tasks while cutting inf...
Generative Modeling by Value-Driven Transport
cs.LG 2026-05 unverdicted novelty 7.0

A control-theoretic linear program yields value-driven transport policies for generative modeling with straight paths and simulation-free training.
Let EEG Models Learn EEG
cs.CV 2026-05 unverdicted novelty 7.0

JET is a conditional flow matching framework that generates EEG as continuous raw sequences with added constraints for spectral and temporal properties, achieving over 40% lower TS-FID than prior discrete denoising me...
Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models
cs.CV 2026-05 unverdicted novelty 7.0

Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.
CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation
cs.LG 2026-05 unverdicted novelty 7.0

CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.
Mat\'ern Noise for Triangulation-Agnostic Flow Matching on Meshes
cs.GR 2026-05 unverdicted novelty 7.0

Proposes discretized Matérn process noise for triangulation-agnostic flow matching on meshes with PoissonNet denoiser, tested on elastic states and humanoid poses for meshes exceeding one million triangles.
SURGE: Approximation and Training Free Particle Filter for Diffusion Surrogate
stat.ML 2026-05 unverdicted novelty 7.0

URGE performs unbiased path-wise importance reweighting via Girsanov estimation for derivative-free inference-time scaling in diffusion models, proving equivalence to particle-wise SMC and outperforming baselines empirically.
Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms
hep-ph 2026-05 unverdicted novelty 7.0

Nested-GPT is an autoregressive Transformer surrogate that generates variable-multiplicity parton showers while enforcing ordered Markovian branching and matches reference Monte Carlo results for leading-log non-globa...
Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms
hep-ph 2026-05 unverdicted novelty 7.0

Nested-GPT is an autoregressive Transformer that dynamically generates variable-multiplicity parton showers matching Monte Carlo references for non-global logarithm resummation in the large-Nc limit.
Functionalization via Structure Completion and Motion Rectification
cs.CV 2026-05 unverdicted novelty 7.0

Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture wi...
Towards Generalized Image Manipulation Localization via Score-based Model
cs.CV 2026-05 conditional novelty 7.0

DiffIML applies score-based generative modeling to image manipulation localization, recovering coherent masks iteratively from noise to improve generalization on unseen manipulation types.
Training-Free Generative Sampling via Moment-Matched Score Smoothing
stat.ML 2026-05 unverdicted novelty 7.0

MM-SOLD is a training-free particle sampler whose large-particle limit converges to a moment-matched Gibbs distribution obtained by exponentially tilting a score-smoothed target.
Sampling from Flow Language Models via Marginal-Conditioned Bridges
cs.LG 2026-05 unverdicted novelty 7.0

Marginal-conditioned bridges enable training-free sampling from Flow Language Models by drawing clean one-hot endpoints from factorized posteriors and using Ornstein-Uhlenbeck bridges, preserving token marginals and r...
HIR-ALIGN: Enhancing Hyperspectral Image Restoration via Diffusion-Based Data Generation
cs.CV 2026-05 unverdicted novelty 7.0

HIR-ALIGN augments limited target data for hyperspectral restoration by creating proxy clean images, synthesizing aligned HSIs with blur-robust diffusion and warp-based transfer, then finetuning models to lower target...
Proximal-Based Generative Modeling for Bayesian Inverse Problems
math.OC 2026-05 unverdicted novelty 7.0

PGM replaces the intractable likelihood score in diffusion models with a closed-form Moreau score computed via proximal operators, enabling non-asymptotic sampling for inverse problems trained only on prior data.
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
cs.CV 2026-05 unverdicted novelty 7.0

Edit-Compass and EditReward-Compass are new unified benchmarks for fine-grained image editing evaluation and realistic reward modeling in reinforcement learning optimization.
Bridging Domain Gaps with Target-Aligned Generation for Offline Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 7.0

TCE bridges domain gaps in offline RL by selectively using source data or generating target-aligned transitions via a dual score-based model, outperforming baselines in experiments.
Amortized Guidance for Image Inpainting with Pretrained Diffusion Models
cs.CV 2026-05 unverdicted novelty 7.0

AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.
MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving
cs.RO 2026-05 unverdicted novelty 7.0

MindVLA-U1 introduces a unified streaming VLA with shared backbone, framewise memory, and language-guided action diffusion that surpasses human drivers on WOD-E2E planning metrics.
Aligning Flow Map Policies with Optimal Q-Guidance
cs.LG 2026-05 unverdicted novelty 7.0

Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
One-Step Generative Modeling via Wasserstein Gradient Flows
cs.LG 2026-05 conditional novelty 7.0

W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...
On the Approximation Complexity of Matrix Product Operator Born Machines
cs.LG 2026-05 unverdicted novelty 7.0

MPO-BMs have NP-hard KL approximation in continuous settings but admit efficient polynomial-bond-dimension approximations with provable KL guarantees for structured targets under locality and spectral-gap conditions.
Muninn: Your Trajectory Diffusion Model But Faster
cs.RO 2026-05 unverdicted novelty 7.0

Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
Discrete Langevin-Inspired Posterior Sampling
cs.LG 2026-05 unverdicted novelty 7.0

ΔLPS is a gradient-guided discrete posterior sampler for inverse problems that works with masked or uniform discrete diffusion priors and outperforms prior discrete methods on image restoration tasks.
Remix the Timbre: Diffusion-Based Style Transfer Across Polyphonic Stems
cs.SD 2026-05 unverdicted novelty 7.0

MixtureTT performs direct per-stem timbre transfer on polyphonic mixtures via a shared diffusion transformer, outperforming single-stem baselines on SATB choral data while eliminating cascaded separation errors.
A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots
cs.LG 2026-05 unverdicted novelty 7.0

Wasserstein Lagrangian Mechanics formalizes second-order dynamics in Wasserstein space and provides an algorithm to learn them from observed marginals without specifying the Lagrangian, outperforming gradient flows on...
A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots
cs.LG 2026-05 unverdicted novelty 7.0

Wasserstein Lagrangian Mechanics learns second-order population dynamics from observed marginal snapshots without specifying the Lagrangian and outperforms gradient flow methods on tasks like vortex dynamics and embry...
A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots
cs.LG 2026-05 unverdicted novelty 7.0

Wasserstein Lagrangian Mechanics learns second-order population dynamics from observed marginals without specifying the Lagrangian and outperforms gradient flow methods on periodic dynamics like vortex motion and flocking.
Adaptive Subspace Projection for Generative Personalization
cs.CV 2026-05 unverdicted novelty 7.0

A training-free adaptive subspace projection method mitigates semantic collapsing in generative personalization by isolating and adjusting drift in a low-dimensional subspace using the stable pre-trained embedding as anchor.
Kurtosis-Guided Denoising Score Matching for Tabular Anomaly Detection
cs.LG 2026-05 unverdicted novelty 7.0

K-DSM uses per-feature kurtosis to set noise scales in DSM, enabling effective single-scale anomaly detection on tabular benchmarks in both semi-supervised and unsupervised settings.
Path-Coupled Bellman Flows for Distributional Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 7.0

Path-Coupled Bellman Flows use source-consistent Bellman-coupled paths and a lambda-parameterized control-variate to learn return distributions via flow matching, improving fidelity and stability over prior DRL approaches.
Arena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Models
cs.CV 2026-05 unverdicted novelty 7.0

ArenaPO infers Gaussian capability distributions from pairwise preferences and applies truncated-normal latent inference to derive fine-grained offline rewards for preference optimization of text-to-image diffusion models.
DBMSolver: A Training-free Diffusion Bridge Sampler for High-Quality Image-to-Image Translation
cs.CV 2026-05 unverdicted novelty 7.0

DBMSolver is a new training-free sampler using exponential integrators that reduces NFEs by up to 5x and improves quality in diffusion bridge model-based image-to-image translation tasks.
Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems
cs.MA 2026-05 unverdicted novelty 7.0

An ensemble-based information-theoretic active learning method with ensemble Kalman inversion selects valuable tasks to optimize communication structures in LLM multi-agent systems under constrained budgets.
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
cs.CV 2026-05 unverdicted novelty 7.0

D-OPSD formulates supervised fine-tuning of step-distilled diffusion models as on-policy self-distillation by minimizing distribution differences between a text-only student and a multimodal teacher on the student's o...
Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 7.0

DOSER detects OOD actions via diffusion-model denoising error and applies selective regularization based on predicted transitions, proving gamma-contraction with performance bounds and outperforming priors on offline ...
FluxFlow: Conservative Flow-Matching for Astronomical Image Super-Resolution
cs.CV 2026-05 unverdicted novelty 7.0

FluxFlow is a conservative pixel-space flow-matching framework for astronomical super-resolution that incorporates real atmospheric uncertainty and a training-free Wiener correction, outperforming baselines on a new 1...
Tempered Guided Diffusion
stat.ML 2026-05 unverdicted novelty 7.0

Tempered Guided Diffusion uses annealed SMC to produce consistent particle approximations to the posterior for training-free conditional diffusion sampling, outperforming independent guided trajectories in experiments.
PerFlow: Physics-Embedded Rectified Flow for Efficient Reconstruction and Uncertainty Quantification of Spatiotemporal Dynamics
cs.LG 2026-05 unverdicted novelty 7.0

PerFlow embeds physics constraints into rectified flow sampling through guidance-free conditioning and constraint-preserving projections, achieving efficient sparse reconstruction and uncertainty quantification for sp...
PODiff: Latent Diffusion in Proper Orthogonal Decomposition Space for Scientific Super-Resolution
cs.LG 2026-05 unverdicted novelty 7.0

PODiff performs conditional diffusion in a fixed, variance-ordered POD latent space to enable efficient probabilistic super-resolution of high-dimensional scientific fields with lower memory and better-calibrated unce...
Inferring Active Neural Circuits Using Diffusion Scores
q-bio.NC 2026-05 unverdicted novelty 7.0

SBTG recovers the Jacobian of the nonlinear transition map between brain states by multiplying cross-block scores from denoising models, enabling inference of lag-specific directed interactions in neural population da...
ExpoCM: Exposure-Aware One-Step Generative Single-Image HDR Reconstruction
cs.CV 2026-05 unverdicted novelty 7.0

ExpoCM enables fast one-step single-image HDR reconstruction via exposure-dependent perturbations and region-conditioned consistency trajectories derived from a probability flow ODE.
Generative Modeling with Orbit-Space Particle Flow Matching
cs.GR 2026-05 unverdicted novelty 7.0

OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning
cs.LG 2026-05 unverdicted novelty 7.0

FAN achieves state-of-the-art offline RL performance on robotic tasks by anchoring flow policies and using single-sample noise-conditioned Q-learning, with proven convergence and reduced runtimes.
Cast3: Translating numerical weather prediction principles into data-driven forecasting
physics.ao-ph 2026-05 unverdicted novelty 7.0

Cast3 translates NWP principles into a data-driven model using cubed-sphere grids, super-ensembles, and generative nudging to achieve state-of-the-art ensemble predictions that outperform baselines.
Decentralized Proximal Stochastic Gradient Langevin Dynamics
stat.ML 2026-05 unverdicted novelty 7.0

DE-PSGLD is the first decentralized MCMC sampler for constrained convex domains that converges to a regularized Gibbs distribution with explicit 2-Wasserstein bounds for agents and network averages.
GD4: Graph-based Discrete Denoising Diffusion for MIMO Detection
cs.LG 2026-05 unverdicted novelty 7.0

GD4 is a graph-based discrete denoising diffusion method for MIMO detection that yields higher-quality suboptimal solutions than prior diffusion detectors and classical baselines under similar compute budgets in both ...

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 363 Pith papers · 6 internal anchors

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

Numerical continuation methods: an introduction, volume 13

Eugene L Allgower and Kurt Georg. Numerical continuation methods: an introduction, volume 13. Springer Science & Business Media, 2012

work page 2012
[3]

Reverse-time diffusion equation models

Brian D O Anderson. Reverse-time diffusion equation models. Stochastic Process. Appl., 12 0 (3): 0 313--326, May 1982

work page 1982
[4]

Invertible residual networks

Jens Behrmann, Will Grathwohl, Ricky TQ Chen, David Duvenaud, and J \"o rn-Henrik Jacobsen. Invertible residual networks. In International Conference on Machine Learning, pp.\ 573--582, 2019

work page 2019
[5]

Learning to Generate Samples from Noise through Infusion Training

Florian Bordes, Sina Honari, and Pascal Vincent. Learning to generate samples from noise through infusion training. arXiv preprint arXiv:1703.06975, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[6]

Large scale gan training for high fidelity natural image synthesis

Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2018

work page 2018
[7]

Learning gradient fields for shape generation

Ruojin Cai, Guandao Yang, Hadar Averbuch-Elor, Zekun Hao, Serge Belongie, Noah Snavely, and Bharath Hariharan. Learning gradient fields for shape generation. In Proceedings of the European Conference on Computer Vision (ECCV), 2020

work page 2020
[8]

WaveG- rad: Estimating gradients for waveform generation

Nanxin Chen, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, and William Chan. Wavegrad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713, 2020

work page arXiv 2009
[9]

Neural ordinary differential equations

Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. In Advances in neural information processing systems, pp.\ 6571--6583, 2018

work page 2018
[10]

Residual flows for invertible generative modeling

Ricky TQ Chen, Jens Behrmann, David K Duvenaud, and J \"o rn-Henrik Jacobsen. Residual flows for invertible generative modeling. In Advances in Neural Information Processing Systems, pp.\ 9916--9926, 2019

work page 2019
[11]

Density estimation using Real NVP

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[12]

A family of embedded runge-kutta formulae

John R Dormand and Peter J Prince. A family of embedded runge-kutta formulae. Journal of computational and applied mathematics, 6 0 (1): 0 19--26, 1980

work page 1980
[13]

Implicit generation and modeling with energy based models

Yilun Du and Igor Mordatch. Implicit generation and modeling with energy based models. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alch\' e -Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32, pp.\ 3608--3618. Curran Associates, Inc., 2019

work page 2019
[14]

Tweedie’s formula and selection bias

Bradley Efron. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106 0 (496): 0 1602--1614, 2011

work page 2011
[15]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp.\ 2672--2680, 2014

work page 2014
[16]

Variational walkback: Learning a transition operator as a stochastic recurrent net

Anirudh Goyal Alias Parth Goyal, Nan Rosemary Ke, Surya Ganguli, and Yoshua Bengio. Variational walkback: Learning a transition operator as a stochastic recurrent net. In Advances in Neural Information Processing Systems, pp.\ 4392--4402, 2017

work page 2017
[17]

Ffjord: Free-form continuous dynamics for scalable reversible generative models

Will Grathwohl, Ricky TQ Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations, 2018

work page 2018
[18]

Representations of knowledge in complex systems

Ulf Grenander and Michael I Miller. Representations of knowledge in complex systems. Journal of the Royal Statistical Society: Series B (Methodological), 56 0 (4): 0 549--581, 1994

work page 1994
[19]

Flow++: Improving flow-based generative models with variational dequantization and architecture design

Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, and Pieter Abbeel. Flow++: Improving flow-based generative models with variational dequantization and architecture design. In International Conference on Machine Learning, pp.\ 2722--2730, 2019

work page 2019
[20]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 2020

work page 2020
[21]

A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines

Michael F Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 19 0 (2): 0 433--450, 1990

work page 1990
[22]

Estimation of non-normalized statistical models by score matching

Aapo Hyv \"a rinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6 0 (Apr): 0 695--709, 2005

work page 2005
[23]

Adversarial score matching and improved sampling for image generation

Alexia Jolicoeur-Martineau, R \'e mi Pich \'e -Taillefer, R \'e mi Tachet des Combes, and Ioannis Mitliagkas. Adversarial score matching and improved sampling for image generation. arXiv preprint arXiv:2009.05475, 2020

work page arXiv 2009
[24]

Progressive growing of gans for improved quality, stability, and variation

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations, 2018

work page 2018
[25]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 4401--4410, 2019

work page 2019
[26]

Training generative adversarial networks with limited data

Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems, 33, 2020 a

work page 2020
[27]

Analyzing and improving the image quality of StyleGAN

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of StyleGAN . In Proc. CVPR, 2020 b

work page 2020
[28]

Glow: Generative flow with invertible 1x1 convolutions

Durk P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pp.\ 10215--10224, 2018

work page 2018
[29]

Numerical solution of stochastic differential equations, volume 23

Peter E Kloeden and Eckhard Platen. Numerical solution of stochastic differential equations, volume 23. Springer Science & Business Media, 2013

work page 2013
[30]

DiffWave: A Versatile Diffusion Model for Audio Synthesis

Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009
[31]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009
[32]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015

work page 2015
[33]

Interacting particle solutions of fokker-planck equations through gradient-log-density estimation

Dimitra Maoutsa, Sebastian Reich, and Manfred Opper. Interacting particle solutions of fokker-planck equations through gradient-log-density estimation. arXiv preprint arXiv:2006.00702, 2020

work page arXiv 2006
[34]

Mcmc using hamiltonian dynamics

Radford M Neal et al. Mcmc using hamiltonian dynamics. Handbook of markov chain monte carlo, 2 0 (11): 0 2, 2011

work page 2011
[35]

Permutation invariant graph generation via score-based generative modeling

Chenhao Niu, Yang Song, Jiaming Song, Shengjia Zhao, Aditya Grover, and Stefano Ermon. Permutation invariant graph generation via score-based generative modeling. volume 108 of Proceedings of Machine Learning Research, pp.\ 4474--4484, Online, 26--28 Aug 2020. PMLR

work page 2020
[36]

Stochastic differential equations

Bernt ksendal. Stochastic differential equations. In Stochastic differential equations, pp.\ 65--84. Springer, 2003

work page 2003
[37]

Efficient learning of generative models via finite-difference score matching

Tianyu Pang, Kun Xu, Chongxuan Li, Yang Song, Stefano Ermon, and Jun Zhu. Efficient learning of generative models via finite-difference score matching. arXiv preprint arXiv:2007.03317, 2020

work page arXiv 2007
[38]

Correlation functions and computer simulations

Giorgio Parisi. Correlation functions and computer simulations. Nuclear Physics B, 180 0 (3): 0 378--384, 1981

work page 1981
[39]

Generating diverse high-fidelity images with vq-vae-2

Ali Razavi, Aaron van den Oord, and Oriol Vinyals. Generating diverse high-fidelity images with vq-vae-2. In Advances in Neural Information Processing Systems, pp.\ 14837--14847, 2019

work page 2019
[40]

On linear identifiability of learned representations

Geoffrey Roeder, Luke Metz, and Diederik P Kingma. On linear identifiability of learned representations. arXiv preprint arXiv:2007.00810, 2020

work page arXiv 2007
[41]

Applied stochastic differential equations, volume 10

Simo S \"a rkk \"a and Arno Solin. Applied stochastic differential equations, volume 10. Cambridge University Press, 2019

work page 2019
[42]

The eigenvalues of mega-dimensional matrices

John Skilling. The eigenvalues of mega-dimensional matrices. In Maximum Entropy and Bayesian Methods, pp.\ 455--466. Springer, 1989

work page 1989
[43]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp.\ 2256--2265, 2015

work page 2015
[44]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, pp.\ 11895--11907, 2019

work page 2019
[45]

Improved techniques for training score-based generative models

Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. Advances in Neural Information Processing Systems, 33, 2020

work page 2020
[46]

Sliced score matching: A scalable approach to density and score estimation

Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. In Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2019, Tel Aviv, Israel, July 22-25, 2019 , pp.\ 204, 2019 a

work page 2019
[47]

Mintnet: Building invertible neural networks with masked convolutions

Yang Song, Chenlin Meng, and Stefano Ermon. Mintnet: Building invertible neural networks with masked convolutions. In Advances in Neural Information Processing Systems, pp.\ 11002--11012, 2019 b

work page 2019
[48]

Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T

Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T. Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimensional domains. NeurIPS, 2020

work page 2020
[49]

A connection between score matching and denoising autoencoders

Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23 0 (7): 0 1661--1674, 2011

work page 2011
[50]

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[51]

Wide Residual Networks

Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[52]

Making convolutional networks shift-invariant again

Richard Zhang. Making convolutional networks shift-invariant again. In ICML, 2019

work page 2019
[53]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[54]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[55]

NICE: Non-linear Independent Components Estimation

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/s11263-015-0816-y 2019
[56]

Our framework allows general SDEs with matrix-valued diffusion coefficients that depend on the state, for which we provide a detailed discussion in app:general_sde

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page 2000

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

Numerical continuation methods: an introduction, volume 13

Eugene L Allgower and Kurt Georg. Numerical continuation methods: an introduction, volume 13. Springer Science & Business Media, 2012

work page 2012

[3] [3]

Reverse-time diffusion equation models

Brian D O Anderson. Reverse-time diffusion equation models. Stochastic Process. Appl., 12 0 (3): 0 313--326, May 1982

work page 1982

[4] [4]

Invertible residual networks

Jens Behrmann, Will Grathwohl, Ricky TQ Chen, David Duvenaud, and J \"o rn-Henrik Jacobsen. Invertible residual networks. In International Conference on Machine Learning, pp.\ 573--582, 2019

work page 2019

[5] [5]

Learning to Generate Samples from Noise through Infusion Training

Florian Bordes, Sina Honari, and Pascal Vincent. Learning to generate samples from noise through infusion training. arXiv preprint arXiv:1703.06975, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[6] [6]

Large scale gan training for high fidelity natural image synthesis

Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2018

work page 2018

[7] [7]

Learning gradient fields for shape generation

Ruojin Cai, Guandao Yang, Hadar Averbuch-Elor, Zekun Hao, Serge Belongie, Noah Snavely, and Bharath Hariharan. Learning gradient fields for shape generation. In Proceedings of the European Conference on Computer Vision (ECCV), 2020

work page 2020

[8] [8]

WaveG- rad: Estimating gradients for waveform generation

Nanxin Chen, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, and William Chan. Wavegrad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713, 2020

work page arXiv 2009

[9] [9]

Neural ordinary differential equations

Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. In Advances in neural information processing systems, pp.\ 6571--6583, 2018

work page 2018

[10] [10]

Residual flows for invertible generative modeling

Ricky TQ Chen, Jens Behrmann, David K Duvenaud, and J \"o rn-Henrik Jacobsen. Residual flows for invertible generative modeling. In Advances in Neural Information Processing Systems, pp.\ 9916--9926, 2019

work page 2019

[11] [11]

Density estimation using Real NVP

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[12] [12]

A family of embedded runge-kutta formulae

John R Dormand and Peter J Prince. A family of embedded runge-kutta formulae. Journal of computational and applied mathematics, 6 0 (1): 0 19--26, 1980

work page 1980

[13] [13]

Implicit generation and modeling with energy based models

Yilun Du and Igor Mordatch. Implicit generation and modeling with energy based models. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alch\' e -Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32, pp.\ 3608--3618. Curran Associates, Inc., 2019

work page 2019

[14] [14]

Tweedie’s formula and selection bias

Bradley Efron. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106 0 (496): 0 1602--1614, 2011

work page 2011

[15] [15]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp.\ 2672--2680, 2014

work page 2014

[16] [16]

Variational walkback: Learning a transition operator as a stochastic recurrent net

Anirudh Goyal Alias Parth Goyal, Nan Rosemary Ke, Surya Ganguli, and Yoshua Bengio. Variational walkback: Learning a transition operator as a stochastic recurrent net. In Advances in Neural Information Processing Systems, pp.\ 4392--4402, 2017

work page 2017

[17] [17]

Ffjord: Free-form continuous dynamics for scalable reversible generative models

Will Grathwohl, Ricky TQ Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations, 2018

work page 2018

[18] [18]

Representations of knowledge in complex systems

Ulf Grenander and Michael I Miller. Representations of knowledge in complex systems. Journal of the Royal Statistical Society: Series B (Methodological), 56 0 (4): 0 549--581, 1994

work page 1994

[19] [19]

Flow++: Improving flow-based generative models with variational dequantization and architecture design

Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, and Pieter Abbeel. Flow++: Improving flow-based generative models with variational dequantization and architecture design. In International Conference on Machine Learning, pp.\ 2722--2730, 2019

work page 2019

[20] [20]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 2020

work page 2020

[21] [21]

A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines

Michael F Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 19 0 (2): 0 433--450, 1990

work page 1990

[22] [22]

Estimation of non-normalized statistical models by score matching

Aapo Hyv \"a rinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6 0 (Apr): 0 695--709, 2005

work page 2005

[23] [23]

Adversarial score matching and improved sampling for image generation

Alexia Jolicoeur-Martineau, R \'e mi Pich \'e -Taillefer, R \'e mi Tachet des Combes, and Ioannis Mitliagkas. Adversarial score matching and improved sampling for image generation. arXiv preprint arXiv:2009.05475, 2020

work page arXiv 2009

[24] [24]

Progressive growing of gans for improved quality, stability, and variation

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations, 2018

work page 2018

[25] [25]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 4401--4410, 2019

work page 2019

[26] [26]

Training generative adversarial networks with limited data

Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems, 33, 2020 a

work page 2020

[27] [27]

Analyzing and improving the image quality of StyleGAN

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of StyleGAN . In Proc. CVPR, 2020 b

work page 2020

[28] [28]

Glow: Generative flow with invertible 1x1 convolutions

Durk P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pp.\ 10215--10224, 2018

work page 2018

[29] [29]

Numerical solution of stochastic differential equations, volume 23

Peter E Kloeden and Eckhard Platen. Numerical solution of stochastic differential equations, volume 23. Springer Science & Business Media, 2013

work page 2013

[30] [30]

DiffWave: A Versatile Diffusion Model for Audio Synthesis

Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009

[31] [31]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009

[32] [32]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015

work page 2015

[33] [33]

Interacting particle solutions of fokker-planck equations through gradient-log-density estimation

Dimitra Maoutsa, Sebastian Reich, and Manfred Opper. Interacting particle solutions of fokker-planck equations through gradient-log-density estimation. arXiv preprint arXiv:2006.00702, 2020

work page arXiv 2006

[34] [34]

Mcmc using hamiltonian dynamics

Radford M Neal et al. Mcmc using hamiltonian dynamics. Handbook of markov chain monte carlo, 2 0 (11): 0 2, 2011

work page 2011

[35] [35]

Permutation invariant graph generation via score-based generative modeling

Chenhao Niu, Yang Song, Jiaming Song, Shengjia Zhao, Aditya Grover, and Stefano Ermon. Permutation invariant graph generation via score-based generative modeling. volume 108 of Proceedings of Machine Learning Research, pp.\ 4474--4484, Online, 26--28 Aug 2020. PMLR

work page 2020

[36] [36]

Stochastic differential equations

Bernt ksendal. Stochastic differential equations. In Stochastic differential equations, pp.\ 65--84. Springer, 2003

work page 2003

[37] [37]

Efficient learning of generative models via finite-difference score matching

Tianyu Pang, Kun Xu, Chongxuan Li, Yang Song, Stefano Ermon, and Jun Zhu. Efficient learning of generative models via finite-difference score matching. arXiv preprint arXiv:2007.03317, 2020

work page arXiv 2007

[38] [38]

Correlation functions and computer simulations

Giorgio Parisi. Correlation functions and computer simulations. Nuclear Physics B, 180 0 (3): 0 378--384, 1981

work page 1981

[39] [39]

Generating diverse high-fidelity images with vq-vae-2

Ali Razavi, Aaron van den Oord, and Oriol Vinyals. Generating diverse high-fidelity images with vq-vae-2. In Advances in Neural Information Processing Systems, pp.\ 14837--14847, 2019

work page 2019

[40] [40]

On linear identifiability of learned representations

Geoffrey Roeder, Luke Metz, and Diederik P Kingma. On linear identifiability of learned representations. arXiv preprint arXiv:2007.00810, 2020

work page arXiv 2007

[41] [41]

Applied stochastic differential equations, volume 10

Simo S \"a rkk \"a and Arno Solin. Applied stochastic differential equations, volume 10. Cambridge University Press, 2019

work page 2019

[42] [42]

The eigenvalues of mega-dimensional matrices

John Skilling. The eigenvalues of mega-dimensional matrices. In Maximum Entropy and Bayesian Methods, pp.\ 455--466. Springer, 1989

work page 1989

[43] [43]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp.\ 2256--2265, 2015

work page 2015

[44] [44]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, pp.\ 11895--11907, 2019

work page 2019

[45] [45]

Improved techniques for training score-based generative models

Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. Advances in Neural Information Processing Systems, 33, 2020

work page 2020

[46] [46]

Sliced score matching: A scalable approach to density and score estimation

Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. In Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2019, Tel Aviv, Israel, July 22-25, 2019 , pp.\ 204, 2019 a

work page 2019

[47] [47]

Mintnet: Building invertible neural networks with masked convolutions

Yang Song, Chenlin Meng, and Stefano Ermon. Mintnet: Building invertible neural networks with masked convolutions. In Advances in Neural Information Processing Systems, pp.\ 11002--11012, 2019 b

work page 2019

[48] [48]

Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T

Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T. Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimensional domains. NeurIPS, 2020

work page 2020

[49] [49]

A connection between score matching and denoising autoencoders

Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23 0 (7): 0 1661--1674, 2011

work page 2011

[50] [50]

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[51] [51]

Wide Residual Networks

Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[52] [52]

Making convolutional networks shift-invariant again

Richard Zhang. Making convolutional networks shift-invariant again. In ICML, 2019

work page 2019

[53] [53]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[54] [54]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[55] [55]

NICE: Non-linear Independent Components Estimation

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/s11263-015-0816-y 2019

[56] [56]

Our framework allows general SDEs with matrix-valued diffusion coefficients that depend on the state, for which we provide a detailed discussion in app:general_sde

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page 2000