super hub

Score-Based Generative Modeling through Stochastic Differential Equations

Abhishek Kumar, Diederik P Kingma, Jascha Sohl-Dickstein, Stefano Ermon, Yang Song · 2020 · cs.LG · arXiv 2011.13456

189 Pith papers cite this work. Polarity classification is still indexing.

189 Pith papers citing it

open full Pith review browse 189 citing papers more from Abhishek Kumar arXiv PDF

abstract

Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

claims ledger

abstract Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate

authors

Ab- hishek Kumar and Ben Poole Diederik P Kingma Jascha Sohl-Dickstein Stefano Ermon Yang Song

co-cited works

representative citing papers

Generative Modeling with Flux Matching

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices beyond traditional score matching.

A-CODE: Fully Atomic Protein Co-Design with Unified Multimodal Diffusion

q-bio.QM · 2026-05-05 · unverdicted · novelty 8.0

A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain on hard binder-design tasks.

The Feedback Hamiltonian is the Score Function: A Diffusion-Model Framework for Quantum Trajectory Reversal

quant-ph · 2026-04-23 · unverdicted · novelty 8.0

The García-Pintos feedback Hamiltonian equals the score function of the quantum trajectory distribution, linking quantum feedback to diffusion-model reversal.

Query Lower Bounds for Diffusion Sampling

cs.LG · 2026-04-12 · unverdicted · novelty 8.0

Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.

OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models

cs.CV · 2026-04-05 · unverdicted · novelty 8.0

OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.

Generative models on phase space

hep-ph · 2026-04-02 · unverdicted · novelty 8.0

Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.

Flow-GRPO: Training Flow Matching Models via Online RL

cs.CV · 2025-05-08 · unverdicted · novelty 8.0

Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

Sampling from Flow Language Models via Marginal-Conditioned Bridges

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Marginal-conditioned bridges enable training-free sampling from Flow Language Models by drawing clean one-hot endpoints from factorized posteriors and using Ornstein-Uhlenbeck bridges, preserving token marginals and reducing denoising error versus conditional-mean bridges.

HIR-ALIGN: Enhancing Hyperspectral Image Restoration via Diffusion-Based Data Generation

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

HIR-ALIGN augments limited target data for hyperspectral restoration by creating proxy clean images, synthesizing aligned HSIs with blur-robust diffusion and warp-based transfer, then finetuning models to lower target-domain risk.

Proximal-Based Generative Modeling for Bayesian Inverse Problems

math.OC · 2026-05-13 · unverdicted · novelty 7.0

PGM replaces the intractable likelihood score in diffusion models with a closed-form Moreau score computed via proximal operators, enabling non-asymptotic sampling for inverse problems trained only on prior data.

Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

Edit-Compass and EditReward-Compass are new unified benchmarks for fine-grained image editing evaluation and realistic reward modeling in reinforcement learning optimization.

Bridging Domain Gaps with Target-Aligned Generation for Offline Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

TCE bridges domain gaps in offline RL by selectively using source data or generating target-aligned transitions via a dual score-based model, outperforming baselines in experiments.

Amortized Guidance for Image Inpainting with Pretrained Diffusion Models

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.

MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving

cs.RO · 2026-05-12 · unverdicted · novelty 7.0

MindVLA-U1 introduces a unified streaming VLA with shared backbone, framewise memory, and language-guided action diffusion that surpasses human drivers on WOD-E2E planning metrics.

Aligning Flow Map Policies with Optimal Q-Guidance

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.

One-Step Generative Modeling via Wasserstein Gradient Flows

cs.LG · 2026-05-12 · conditional · novelty 7.0

W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x faster sampling than comparable multi-step models.

On the Approximation Complexity of Matrix Product Operator Born Machines

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

MPO-BMs have NP-hard KL approximation in continuous settings but admit efficient polynomial-bond-dimension approximations with provable KL guarantees for structured targets under locality and spectral-gap conditions.

Muninn: Your Trajectory Diffusion Model But Faster

cs.RO · 2026-05-11 · unverdicted · novelty 7.0

Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.

Discrete Langevin-Inspired Posterior Sampling

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

ΔLPS is a gradient-guided discrete posterior sampler for inverse problems that works with masked or uniform discrete diffusion priors and outperforms prior discrete methods on image restoration tasks.

Remix the Timbre: Diffusion-Based Style Transfer Across Polyphonic Stems

cs.SD · 2026-05-10 · unverdicted · novelty 7.0

MixtureTT performs direct per-stem timbre transfer on polyphonic mixtures via a shared diffusion transformer, outperforming single-stem baselines on SATB choral data while eliminating cascaded separation errors.

A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots

cs.LG · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

Wasserstein Lagrangian Mechanics learns second-order population dynamics from observed marginals without specifying the Lagrangian and outperforms gradient flow methods on periodic dynamics like vortex motion and flocking.

Adaptive Subspace Projection for Generative Personalization

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

A training-free adaptive subspace projection method mitigates semantic collapsing in generative personalization by isolating and adjusting drift in a low-dimensional subspace using the stable pre-trained embedding as anchor.

Kurtosis-Guided Denoising Score Matching for Tabular Anomaly Detection

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

K-DSM uses per-feature kurtosis to set noise scales in DSM, enabling effective single-scale anomaly detection on tabular benchmarks in both semi-supervised and unsupervised settings.

citing papers explorer

Showing 50 of 78 citing papers after filters.

OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models cs.CV · 2026-04-05 · unverdicted · none · ref 33 · internal anchor
OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
Flow-GRPO: Training Flow Matching Models via Online RL cs.CV · 2025-05-08 · unverdicted · none · ref 23 · internal anchor
Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
HIR-ALIGN: Enhancing Hyperspectral Image Restoration via Diffusion-Based Data Generation cs.CV · 2026-05-13 · unverdicted · none · ref 32 · internal anchor
HIR-ALIGN augments limited target data for hyperspectral restoration by creating proxy clean images, synthesizing aligned HSIs with blur-robust diffusion and warp-based transfer, then finetuning models to lower target-domain risk.
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling cs.CV · 2026-05-13 · unverdicted · none · ref 37 · internal anchor
Edit-Compass and EditReward-Compass are new unified benchmarks for fine-grained image editing evaluation and realistic reward modeling in reinforcement learning optimization.
Amortized Guidance for Image Inpainting with Pretrained Diffusion Models cs.CV · 2026-05-13 · unverdicted · none · ref 34 · internal anchor
AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.
Adaptive Subspace Projection for Generative Personalization cs.CV · 2026-05-08 · unverdicted · none · ref 34 · internal anchor
A training-free adaptive subspace projection method mitigates semantic collapsing in generative personalization by isolating and adjusting drift in a low-dimensional subspace using the stable pre-trained embedding as anchor.
Arena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Models cs.CV · 2026-05-07 · unverdicted · none · ref 32 · internal anchor
ArenaPO infers Gaussian capability distributions from pairwise preferences and applies truncated-normal latent inference to derive fine-grained offline rewards for preference optimization of text-to-image diffusion models.
DBMSolver: A Training-free Diffusion Bridge Sampler for High-Quality Image-to-Image Translation cs.CV · 2026-05-07 · unverdicted · none · ref 11 · internal anchor
DBMSolver is a new training-free sampler using exponential integrators that reduces NFEs by up to 5x and improves quality in diffusion bridge model-based image-to-image translation tasks.
Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization cs.CV · 2026-04-26 · unverdicted · none · ref 37 · internal anchor
Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models cs.CV · 2026-04-26 · unverdicted · none · ref 43 · internal anchor
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.
Dream-Cubed: Controllable Generative Modeling in Minecraft by Training on Billions of Cubes cs.CV · 2026-04-22 · unverdicted · none · ref 41 · internal anchor
Dream-Cubed releases a billion-scale voxel dataset and 3D diffusion models that generate controllable Minecraft worlds by operating directly on blocks.
HP-Edit: A Human-Preference Post-Training Framework for Image Editing cs.CV · 2026-04-21 · unverdicted · none · ref 41 · internal anchor
HP-Edit introduces a post-training framework and RealPref-50K dataset that uses a VLM-based HP-Scorer to align diffusion image editing models with human preferences, improving outputs on Qwen-Image-Edit-2509.
Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation cs.CV · 2026-04-21 · unverdicted · none · ref 40 · internal anchor
OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.
Generative Texture Filtering cs.CV · 2026-04-21 · unverdicted · none · ref 81 · internal anchor
A two-stage fine-tuning strategy on pre-trained generative models enables effective texture filtering that outperforms prior methods on challenging cases.
Long-Text-to-Image Generation via Compositional Prompt Decomposition cs.CV · 2026-04-20 · unverdicted · none · ref 22 · internal anchor
PRISM lets pre-trained text-to-image models handle long prompts by breaking them into compositional parts, predicting noise separately, and merging outputs via energy-based conjunction, matching fine-tuned models while generalizing better to prompts over 500 tokens.
Structure-Adaptive Sparse Diffusion in Voxel Space for 3D Medical Image Enhancement cs.CV · 2026-04-20 · unverdicted · none · ref 25 · internal anchor
A sparse voxel-space diffusion method with structure-adaptive modulation achieves up to 10x training speedup and state-of-the-art results for 3D medical image denoising and super-resolution.
Beyond Reconstruction: Reconstruction-to-Vector Diffusion for Hyperspectral Anomaly Detection cs.CV · 2026-04-13 · unverdicted · none · ref 37 · internal anchor
R2VD redefines reconstruction as the origin for residual-guided vector diffusion across PPE, GMP, RSM, and VDI stages to achieve superior anomaly detectability and background suppression on eight datasets.
Novel View Synthesis as Video Completion cs.CV · 2026-04-09 · unverdicted · none · ref 36 · internal anchor
Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.
HumANDiff: Articulated Noise Diffusion for Motion-Consistent Human Video Generation cs.CV · 2026-04-07 · unverdicted · none · ref 58 · internal anchor
HumANDiff improves motion consistency in human video generation by sampling diffusion noise on an articulated human body template and adding joint appearance-motion prediction plus a geometric consistency loss.
1.x-Distill: Breaking the Diversity, Quality, and Efficiency Barrier in Distribution Matching Distillation cs.CV · 2026-04-05 · conditional · none · ref 42 · internal anchor
1.x-Distill achieves better quality and diversity than prior few-step distillation methods at 1.67 and 1.74 effective NFEs on SD3 models with up to 33x speedup.
VOSR: A Vision-Only Generative Model for Image Super-Resolution cs.CV · 2026-04-03 · conditional · none · ref 31 · internal anchor
VOSR shows that competitive generative image super-resolution with faithful structures can be achieved by training a diffusion-style model from scratch on visual data alone, using a vision encoder for guidance and a restoration-oriented sampling strategy.
Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting cs.CV · 2026-03-30 · unverdicted · none · ref 29 · internal anchor
Drift-AR achieves 3.8-5.5x speedup in AR-diffusion image models by using entropy to enable entropy-informed speculative decoding and single-step (1-NFE) anti-symmetric drifting decoding.
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference cs.CV · 2023-10-06 · unverdicted · none · ref 85 · internal anchor
Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.
High-Resolution Image Synthesis with Latent Diffusion Models cs.CV · 2021-12-20 · conditional · none · ref 85 · internal anchor
Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and
ExpoCM: Exposure-Aware One-Step Generative Single-Image HDR Reconstruction cs.CV · 2026-05-04 · unverdicted · none · ref 46
ExpoCM enables fast one-step single-image HDR reconstruction via exposure-dependent perturbations and region-conditioned consistency trajectories derived from a probability flow ODE.
GeoQuery: Geometry-Query Diffusion for Sparse-View Reconstruction cs.CV · 2026-05-12 · unverdicted · none · ref 3 · internal anchor
GeoQuery replaces corrupted rendering features with geometry-aligned proxy queries and restricts cross-view attention to local windows, enabling robust diffusion-based refinement under extreme view sparsity.
From Synthetic to Real: Toward Identity-Consistent Makeup Transfer with Synthetic and Real Data cs.CV · 2026-05-08 · unverdicted · none · ref 23 · internal anchor
The work creates identity-consistent synthetic makeup data via ConsistentBeauty and adapts models to real images using reinforcement learning in RealBeauty, achieving better identity preservation and real-world performance than prior methods.
Towards Photorealistic and Efficient Bokeh Rendering via Diffusion Framework cs.CV · 2026-05-08 · unverdicted · none · ref 39 · 2 links · internal anchor
MagicBokeh uses a single diffusion model with alternative training, focus-aware masked attention, and degradation-aware depth estimation to produce photorealistic bokeh on low-res zoomed images.
InkDiffuser: High-Fidelity One-shot Chinese Calligraphy via Differentiable Morphological Optimization cs.CV · 2026-05-07 · unverdicted · none · ref 10 · internal anchor
InkDiffuser generates high-fidelity one-shot Chinese calligraphy using high-frequency enhancement and a differentiable ink structure loss for realistic stroke and ink rendering.
Taming Outlier Tokens in Diffusion Transformers cs.CV · 2026-05-06 · unverdicted · none · ref 29 · internal anchor
Outlier tokens in DiTs are addressed with Dual-Stage Registers, which reduce artifacts and improve image generation on ImageNet and text-to-image tasks.
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models cs.CV · 2026-05-06 · unverdicted · none · ref 90 · internal anchor
D-OPSD enables continuous supervised fine-tuning of few-step diffusion models via on-policy self-distillation where the model acts as both teacher (multimodal context) and student (text-only context) on its own roll-outs.
Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion cs.CV · 2026-05-06 · unverdicted · none · ref 23 · internal anchor
DiLAST optimizes 3D latents via guidance from a 2D diffusion model to enable generalizable style transfer for OOD styles in 3D asset generation.
Visual Implicit Autoregressive Modeling cs.CV · 2026-05-02 · unverdicted · none · ref 2 · internal anchor
VIAR embeds implicit equilibrium layers in visual autoregressive models to achieve ImageNet FID 2.16 with 38.4% of VAR parameters and controllable inference compute.
REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement cs.CV · 2026-04-30 · unverdicted · none · ref 42 · internal anchor
REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.
Delta Score Matters! Spatial Adaptive Multi Guidance in Diffusion Models cs.CV · 2026-04-29 · unverdicted · none · ref 32 · internal anchor
SAMG uses spatially adaptive guidance scales derived from a geometric analysis of classifier-free guidance to resolve the detail-artifact dilemma in diffusion-based image and video generation.
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization cs.CV · 2026-04-27 · unverdicted · none · ref 11 · internal anchor
Semi-DPO applies semi-supervised learning to noisy preference data in diffusion DPO by training first on consensus pairs then iteratively pseudo-labeling conflicts, yielding state-of-the-art alignment with complex human preferences.
Efficient Diffusion Distillation via Embedding Loss cs.CV · 2026-04-24 · unverdicted · none · ref 21 · internal anchor
Embedding Loss aligns feature distributions via MMD in random network embeddings to boost one-step diffusion distillation, reaching SOTA FID of 1.475 on CIFAR-10 unconditional generation.
Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation cs.CV · 2026-04-23 · unverdicted · none · ref 34 · internal anchor
Synthetic data complements real data in diffusion-based controllable human video generation, with effective sample selection improving motion realism, temporal consistency, and identity preservation.
Optimizing Diffusion Priors in Image Reconstruction from a Single Observation cs.CV · 2026-04-22 · unverdicted · none · ref 31 · internal anchor
Combining diffusion priors as a product-of-experts and optimizing exponents via Bayesian evidence maximization enables prior tuning from one observation in inverse imaging problems.
Normalizing Flows with Iterative Denoising cs.CV · 2026-04-21 · unverdicted · none · ref 16 · internal anchor
iTARFlow augments normalizing flows with diffusion-style iterative denoising during sampling while preserving end-to-end likelihood training, reaching competitive results on ImageNet 64/128/256.
Allo{SR}$^2$: Rectifying One-Step Super-Resolution to Stay Real via Allomorphic Generative Flows cs.CV · 2026-04-21 · unverdicted · none · ref 36 · internal anchor
Allo{SR}^2 rectifies one-step super-resolution trajectories with allomorphic generative flows via SNR initialization, velocity supervision, and self-adversarial matching to deliver state-of-the-art fidelity and realism.
Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation cs.CV · 2026-04-21 · unverdicted · none · ref 51 · internal anchor
Patch Forcing enables diffusion models to denoise image patches at varying rates based on predicted difficulty, advancing easier regions first to improve context and achieve better generation quality on ImageNet while scaling to text-to-image tasks.
Geometric Decoupling: Diagnosing the Structural Instability of Latent cs.CV · 2026-04-20 · unverdicted · none · ref 46 · internal anchor
Latent diffusion models exhibit geometric decoupling where curvature in out-of-distribution generation is misallocated to unstable semantic boundaries instead of image details, identifying geometric hotspots as the structural cause of editing instability.
CrossFlowDG: Bridging the Modality Gap with Cross-modal Flow Matching for Domain Generalization cs.CV · 2026-04-18 · unverdicted · none · ref 40 · internal anchor
CrossFlowDG bridges the modality gap in domain generalization by learning a continuous transformation that moves image embeddings to matching text embeddings using noise-free cross-modal flow matching.
Frequency-Aware Flow Matching for High-Quality Image Generation cs.CV · 2026-04-16 · unverdicted · none · ref 54 · internal anchor
FreqFlow introduces frequency-aware conditioning and a two-branch architecture to flow matching, reaching FID 1.38 on ImageNet-256 and outperforming DiT and SiT.
DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer cs.CV · 2026-04-15 · unverdicted · none · ref 29 · internal anchor
RTR-DiT distills a bidirectional DiT teacher into an autoregressive few-step model using Self Forcing and Distribution Matching Distillation, plus a reference-preserving KV cache, to enable stable real-time text- and reference-guided video stylization.
Structured State-Space Regularization for Compact and Generation-Friendly Image Tokenization cs.CV · 2026-04-13 · unverdicted · none · ref 57 · internal anchor
A new regularizer transfers frequency awareness from state-space models into image tokenizers, yielding more compact latents that improve diffusion-model generation quality with little reconstruction penalty.
Rein3D: Reinforced 3D Indoor Scene Generation with Panoramic Video Diffusion Models cs.CV · 2026-04-12 · unverdicted · none · ref 37 · internal anchor
Rein3D generates photorealistic, globally consistent 3D indoor scenes by using a restore-and-refine process where radial panoramic videos are restored via diffusion models and then used to update a 3D Gaussian field.
VGA-Bench: A Unified Benchmark and Multi-Model Framework for Video Aesthetics and Generation Quality Evaluation cs.CV · 2026-04-11 · unverdicted · none · ref 30 · internal anchor
VGA-Bench creates a three-tier taxonomy, 1,016-prompt dataset of 60k+ videos, and three multi-task neural models (VAQA-Net, VTag-Net, VGQA-Net) that align with human judgments for video aesthetics and generation quality.
Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation cs.CV · 2026-04-11 · conditional · none · ref 39 · internal anchor
Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.

Score-Based Generative Modeling through Stochastic Differential Equations

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer