Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices beyond traditional score matching.
super hub
Score-Based Generative Modeling through Stochastic Differential Equations
191 Pith papers cite this work. Polarity classification is still indexing.
abstract
Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate
authors
co-cited works
roles
background 2polarities
background 2representative citing papers
A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain on hard binder-design tasks.
The García-Pintos feedback Hamiltonian equals the score function of the quantum trajectory distribution, linking quantum feedback to diffusion-model reversal.
Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.
OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
ASTRA reframes transition-state search as guided diffusion inference that samples the isodensity surface between metastable basins and converges to first-order saddles via score differences and physical forces.
Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
Marginal-conditioned bridges enable training-free sampling from Flow Language Models by drawing clean one-hot endpoints from factorized posteriors and using Ornstein-Uhlenbeck bridges, preserving token marginals and reducing denoising error versus conditional-mean bridges.
HIR-ALIGN augments limited target data for hyperspectral restoration by creating proxy clean images, synthesizing aligned HSIs with blur-robust diffusion and warp-based transfer, then finetuning models to lower target-domain risk.
PGM replaces the intractable likelihood score in diffusion models with a closed-form Moreau score computed via proximal operators, enabling non-asymptotic sampling for inverse problems trained only on prior data.
Edit-Compass and EditReward-Compass are new unified benchmarks for fine-grained image editing evaluation and realistic reward modeling in reinforcement learning optimization.
TCE bridges domain gaps in offline RL by selectively using source data or generating target-aligned transitions via a dual score-based model, outperforming baselines in experiments.
AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.
MindVLA-U1 introduces a unified streaming VLA with shared backbone, framewise memory, and language-guided action diffusion that surpasses human drivers on WOD-E2E planning metrics.
Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x faster sampling than comparable multi-step models.
MPO-BMs have NP-hard KL approximation in continuous settings but admit efficient polynomial-bond-dimension approximations with provable KL guarantees for structured targets under locality and spectral-gap conditions.
Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
ΔLPS is a gradient-guided discrete posterior sampler for inverse problems that works with masked or uniform discrete diffusion priors and outperforms prior discrete methods on image restoration tasks.
MixtureTT performs direct per-stem timbre transfer on polyphonic mixtures via a shared diffusion transformer, outperforming single-stem baselines on SATB choral data while eliminating cascaded separation errors.
Wasserstein Lagrangian Mechanics learns second-order population dynamics from observed marginals without specifying the Lagrangian and outperforms gradient flow methods on periodic dynamics like vortex motion and flocking.
A training-free adaptive subspace projection method mitigates semantic collapsing in generative personalization by isolating and adjusting drift in a low-dimensional subspace using the stable pre-trained embedding as anchor.
citing papers explorer
-
Structure-Adaptive Sparse Diffusion in Voxel Space for 3D Medical Image Enhancement
A sparse voxel-space diffusion method with structure-adaptive modulation achieves up to 10x training speedup and state-of-the-art results for 3D medical image denoising and super-resolution.
-
Grokking of Diffusion Models: Case Study on Modular Addition
Diffusion models show grokking on modular addition by composing periodic operand representations in simple data regimes or by separating arithmetic computation from visual denoising across timesteps in varied regimes.
-
Reinforcement Learning via Value Gradient Flow
VGF solves behavior-regularized RL by transporting particles from a reference distribution to the value-induced optimal policy via discrete value-guided gradient flow.
-
Beyond Reconstruction: Reconstruction-to-Vector Diffusion for Hyperspectral Anomaly Detection
R2VD redefines reconstruction as the origin for residual-guided vector diffusion across PPE, GMP, RSM, and VDI stages to achieve superior anomaly detectability and background suppression on eight datasets.
-
ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching
ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.
-
Novel View Synthesis as Video Completion
Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.
-
Diffusion Processes on Implicit Manifolds
Implicit Manifold-valued Diffusions (IMDs) are data-driven SDEs built from proximity graphs that converge in law to smooth manifold diffusions as sample count increases.
-
Lipschitz regularity in Flow Matching and Diffusion Models: sharp sampling rates and functional inequalities
Sharp Lipschitz regularity for flow-matching vector fields and diffusion scores, with optimal time/dimension dependence, gives √d/N Wasserstein discretization error for Euler samplers and globally Lipschitz Gaussian-to-target transport maps implying Poincaré and log-Sobolev inequalities.
-
HumANDiff: Articulated Noise Diffusion for Motion-Consistent Human Video Generation
HumANDiff improves motion consistency in human video generation by sampling diffusion noise on an articulated human body template and adding joint appearance-motion prediction plus a geometric consistency loss.
-
Rectified Schr\"odinger Bridge Matching for Few-Step Visual Navigation
RSBM exploits velocity field invariance across regularization levels to achieve over 94% cosine similarity and 92% success in visual navigation using only 3 integration steps.
-
Generative modeling of granular flow on inclined planes using conditional flow matching
A conditional flow matching model trained on DEM simulations reconstructs granular flow velocity fields from as little as 11-16% sparse boundary data, outperforming deterministic CNN baselines while providing uncertainty estimates via ensemble generation.
-
1.x-Distill: Breaking the Diversity, Quality, and Efficiency Barrier in Distribution Matching Distillation
1.x-Distill achieves better quality and diversity than prior few-step distillation methods at 1.67 and 1.74 effective NFEs on SD3 models with up to 33x speedup.
-
StrADiff: A Structured Source-Wise Adaptive Diffusion Framework for Linear and Nonlinear Blind Source Separation
StrADiff recovers latent source trajectories from linear and nonlinear mixtures via source-wise adaptive diffusion and a Gaussian process prior in a single unsupervised end-to-end objective.
-
Unlocking Prompt Infilling Capability for Diffusion Language Models
Full-sequence masking in SFT unlocks prompt infilling for masked diffusion language models, producing templates that match or surpass hand-designed ones and transfer across models.
-
VOSR: A Vision-Only Generative Model for Image Super-Resolution
VOSR shows that competitive generative image super-resolution with faithful structures can be achieved by training a diffusion-style model from scratch on visual data alone, using a vision encoder for guidance and a restoration-oriented sampling strategy.
-
Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting
Drift-AR achieves 3.8-5.5x speedup in AR-diffusion image models by using entropy to enable entropy-informed speculative decoding and single-step (1-NFE) anti-symmetric drifting decoding.
-
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.
-
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.
-
High-Resolution Image Synthesis with Latent Diffusion Models
Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and
-
Diffusion Models Beat GANs on Image Synthesis
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
-
ExpoCM: Exposure-Aware One-Step Generative Single-Image HDR Reconstruction
ExpoCM enables fast one-step single-image HDR reconstruction via exposure-dependent perturbations and region-conditioned consistency trajectories derived from a probability flow ODE.
-
Cast3: Translating numerical weather prediction principles into data-driven forecasting
Cast3 translates NWP principles into a data-driven model using cubed-sphere grids, super-ensembles, and generative nudging to achieve state-of-the-art ensemble predictions that outperform baselines.
-
Generative Modeling of Approximately Periodic Time Series by a Posterior-Weighted Gaussian Process
A two-stage posterior-weighted Gaussian Process generates approximately periodic time series by keeping an identical mean function across repetitions while permitting smooth inter-repetition variation.
-
GeoQuery: Geometry-Query Diffusion for Sparse-View Reconstruction
GeoQuery replaces corrupted rendering features with geometry-aligned proxy queries and restricts cross-view attention to local windows, enabling robust diffusion-based refinement under extreme view sparsity.
-
BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion
BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.
-
A Cold Diffusion Approach for Percussive Dereverberation
A cold diffusion model with direct and delta-normalized reverse processes, using UNet and transformer backbones, outperforms diffusion baselines for dereverberating acoustic and electronic drum stems on in-domain and out-of-domain tests.
-
Encoding and Decoding Temporal Signals with Spiking Bandpass Wavelets
Spike encoders are reformulated as time-causal bandpass wavelets that preserve sparsity and locality while providing reconstruction error bounds comparable to continuous wavelet transforms on ECG and audio signals.
-
DiffATS: Diffusion in Aligned Tensor Space
DiffATS trains diffusion models directly on aligned Tucker tensor primitives that are proven to be homeomorphisms, delivering efficient unconditional and conditional generation across images, videos, and PDE data with high compression.
-
Learning Generative Dynamics with Soft Law Constraints: A McKean-Vlasov FBSDE Approach
A McKean-Vlasov FBSDE generative model learns stochastic path laws that match observed terminal and time-marginal distributions via soft energy constraints rather than hard interpolation.
-
How to Train Your Latent Diffusion Language Model Jointly With the Latent Space
Joint training of the latent space with the diffusion process produces a competitive latent diffusion language model that is faster than existing discrete and continuous diffusion baselines.
-
From Synthetic to Real: Toward Identity-Consistent Makeup Transfer with Synthetic and Real Data
The work creates identity-consistent synthetic makeup data via ConsistentBeauty and adapts models to real images using reinforcement learning in RealBeauty, achieving better identity preservation and real-world performance than prior methods.
-
Towards Photorealistic and Efficient Bokeh Rendering via Diffusion Framework
MagicBokeh uses a single diffusion model with alternative training, focus-aware masked attention, and degradation-aware depth estimation to produce photorealistic bokeh on low-res zoomed images.
-
FlashMol: High-Quality Molecule Generation in as Few as Four Steps
FlashMol produces chemically valid 3D molecules in 4 steps via distribution matching distillation with respaced timesteps and Jensen-Shannon regularization, matching or exceeding 1000-step teacher performance on QM9 and GEOM-DRUG.
-
Continuous Latent Diffusion Language Model
Cola DLM proposes a hierarchical latent diffusion model that learns a text-to-latent mapping, fits a global semantic prior in continuous space with a block-causal DiT, and performs conditional decoding, establishing latent prior modeling as an alternative to token-level autoregressive language model
-
Diffusion model for SU(N) gauge theories
Implicit score matching trains diffusion models that successfully sample SU(3) Wilson gauge configurations on lattices, with a Hamiltonian-dynamics corrector needed for strong coupling.
-
Physical Fidelity Reconstruction via Improved Consistency-Distilled Flow Matching for Dynamical Systems
Distilled one-step consistency model from optimal-transport flow-matching teacher reconstructs high-fidelity dynamical system flows from low-fidelity data with 12x speedup, half the parameters, and 23.1% better SSIM than scratch-trained baselines.
-
Plug-and-Play Label Map Diffusion for Universal Goal-Oriented Navigation
PLMD applies a denoising diffusion model to predict labels for unknown map regions, allowing goal localization in unexplored environments by substituting completed labels into existing navigation pipelines.
-
InkDiffuser: High-Fidelity One-shot Chinese Calligraphy via Differentiable Morphological Optimization
InkDiffuser generates high-fidelity one-shot Chinese calligraphy using high-frequency enhancement and a differentiable ink structure loss for realistic stroke and ink rendering.
-
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities
LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.
-
Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems
An ensemble-based information-theoretic active learning method using ensemble Kalman inversion selects valuable tasks to optimize communication structures in LLM multi-agent systems more reliably than random sampling under limited training budgets.
-
Scaling Pretrained Representations Enables Label-Free Out-of-Distribution Detection Without Fine-Tuning
Scaling pretrained representations improves label-free OOD detection on frozen backbones, causing performance gaps between global and local detectors to vanish across vision and language tasks.
-
Conditional Diffusion Under Linear Constraints: Langevin Mixing and Information-Theoretic Guarantees
Error in approximating the tangent conditional score by the unconditional score in diffusion models is bounded by dimension-free conditional mutual information, with a projected-Langevin method outperforming baselines in inpainting and super-resolution.
-
Taming Outlier Tokens in Diffusion Transformers
Outlier tokens in DiTs are addressed with Dual-Stage Registers, which reduce artifacts and improve image generation on ImageNet and text-to-image tasks.
-
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
D-OPSD enables continuous supervised fine-tuning of few-step diffusion models via on-policy self-distillation where the model acts as both teacher (multimodal context) and student (text-only context) on its own roll-outs.
-
Stage-adaptive audio diffusion modeling
A semantic progress signal from SSL discrepancy slope enables three stage-aware mechanisms that improve training efficiency and performance in audio diffusion models over static baselines.
-
Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion
DiLAST optimizes 3D latents via guidance from a 2D diffusion model to enable generalizable style transfer for OOD styles in 3D asset generation.
-
Towards accurate extreme event likelihoods from diffusion model climate emulators
Diffusion model climate emulators provide probability density estimates that allow likelihood calculations and odds-ratio-based importance sampling for extreme events such as tropical cyclones.
-
A Few-Step Generative Model on Cumulative Flow Maps
Cumulative flow maps unify few-step generative modeling for diffusion and flow models via cumulative transport and parameterization with minimal changes to time embeddings and objectives.
-
NoiseRater: Meta-Learned Noise Valuation for Diffusion Model Training
NoiseRater meta-learns instance-level importance scores for noise in diffusion training via bilevel optimization, then uses a two-stage pipeline to improve efficiency and generation quality on FFHQ and ImageNet.
-
SixthSense: Task-Agnostic Proprioception-Only Whole-Body Wrench Estimation for Humanoids
SixthSense infers whole-body contact events and wrenches in humanoids from proprioception and IMU data alone by tokenizing histories and estimating a sparse contact-event flow with conditional flow matching.