Non-monotonic sampling schedules never improve upon monotonic baselines in diffusion models, with performance gaps ranging from substantial to negligible depending on the denoiser.
hub
Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in neural information processing systems, 35:5775–5787, 2022a
17 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
TMPO uses Softmax Trajectory Balance to match policy probabilities over multiple trajectories to a Boltzmann reward distribution, improving diversity by 9.1% in diffusion alignment tasks.
Grayscale diffusion model generates two-layer RF passives with sub-pixel resolution from partial S-parameters, achieving low error in surrogate predictions and validated on fabricated filters.
DiffusionNFT performs online RL for diffusion models on the forward process via flow matching and positive-negative contrasts, delivering up to 25x efficiency gains and rapid benchmark improvements over prior reverse-process methods.
MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.
Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.
FIS-DiT achieves 2.11-2.41x speedup on video DiT models in few-step regimes with negligible quality loss by exploiting frame-wise sparsity and consistency through a training-free interleaved execution strategy.
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.
The lookahead drifting model improves upon the drifting model by sequentially computing multiple drifting terms that incorporate higher-order gradient information, leading to better performance on toy examples and CIFAR10.
JFDL allows pre-trained Consistency Models to perform guided image generation post-hoc by aligning flow distributions, reducing FID scores on CIFAR-10 and ImageNet without needing a teacher model.
OpenVLA-OFT fine-tuning boosts LIBERO success rate from 76.5% to 97.1%, speeds action generation 26x, and outperforms baselines on real bimanual dexterous tasks.
IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.
Diffusion-based inverse problem solvers are made robust to outliers by combining explicit noise estimation with a Huber-loss IRLS objective solved via conjugate gradient.
ISA prunes low-saliency context tokens and routes queries by sharpness to either full or 0-th order Taylor sparse attention, enabling LIVEditor to cut attention latency ~60% while beating prior video editing methods on three benchmarks.
A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.
Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.
RK4 at 80 function evaluations matches Euler at 200 in sliced Wasserstein quality for flow matching sampling, with the adaptive solver concentrating steps near t=1 due to stiffening velocity fields.
citing papers explorer
-
Is Monotonic Sampling Necessary in Diffusion Models?
Non-monotonic sampling schedules never improve upon monotonic baselines in diffusion models, with performance gaps ranging from substantial to negligible depending on the denoiser.
-
TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment
TMPO uses Softmax Trajectory Balance to match policy probabilities over multiple trajectories to a Boltzmann reward distribution, improving diversity by 9.1% in diffusion alignment tasks.
-
Inverse Design of Multi-Layer Sub-Pixel-Resolution RF Passives Through Grayscale Diffusion with Flexible S-Parameter Conditioning
Grayscale diffusion model generates two-layer RF passives with sub-pixel resolution from partial S-parameters, achieving low error in surrogate predictions and validated on fabricated filters.
-
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
DiffusionNFT performs online RL for diffusion models on the forward process via flow matching and positive-negative contrasts, delivering up to 25x efficiency gains and rapid benchmark improvements over prior reverse-process methods.
-
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.
-
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.
-
FIS-DiT: Breaking the Few-Step Video Inference Barrier via Training-Free Frame Interleaved Sparsity
FIS-DiT achieves 2.11-2.41x speedup on video DiT models in few-step regimes with negligible quality loss by exploiting frame-wise sparsity and consistency through a training-free interleaved execution strategy.
-
The two clocks and the innovation window: When and how generative models learn rules
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.
-
Lookahead Drifting Model
The lookahead drifting model improves upon the drifting model by sequentially computing multiple drifting terms that incorporate higher-order gradient information, leading to better performance on toy examples and CIFAR10.
-
Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning
JFDL allows pre-trained Consistency Models to perform guided image generation post-hoc by aligning flow distributions, reducing FID scores on CIFAR-10 and ImageNet without needing a teacher model.
-
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.
-
Outlier-Robust Diffusion Solvers for Inverse Problems
Diffusion-based inverse problem solvers are made robust to outliers by combining explicit noise estimation with a Huber-loss IRLS objective solved via conjugate gradient.
-
Lightning Unified Video Editing via In-Context Sparse Attention
ISA prunes low-saliency context tokens and routes queries by sharpness to either full or 0-th order Taylor sparse attention, enabling LIVEditor to cut attention latency ~60% while beating prior video editing methods on three benchmarks.
-
Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges
A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.
-
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.