Adjoint-equation framework yields dimension-free convergence bounds in any IPM for discrete diffusion models under masked or uniform priors using one rate-matrix regularity assumption.
hub
International conference on machine learning , pages=
24 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
VMU-Diff improves precipitation nowcasting via coarse multi-source Vision Mamba fusion followed by residual conditional diffusion refinement.
DirectTryOn achieves state-of-the-art one-step virtual try-on performance by applying pure conditional transport, garment preservation loss, and self-consistency loss to straighten trajectories in pretrained generative models.
DOSER detects OOD actions via diffusion-model denoising error and applies selective regularization based on predicted transitions, proving gamma-contraction with performance bounds and outperforming priors on offline RL benchmarks.
OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.
NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.
PRISM lets pre-trained text-to-image models handle long prompts by breaking them into compositional parts, predicting noise separately, and merging outputs via energy-based conjunction, matching fine-tuned models while generalizing better to prompts over 500 tokens.
Absorbing discrete diffusion models the conditional distributions of clean data; reparameterizing yields a time-independent RADD that unifies with AO-ARMs and reaches SOTA perplexity among diffusion models on zero-shot language benchmarks.
URGE performs unbiased inference-time scaling for diffusion models by attaching multiplicative path weights from Girsanov estimation and resampling trajectories, with a proven equivalence to prior particle-wise SMC schemes.
Scaling pretrained representations improves label-free OOD detection on frozen backbones, causing performance gaps between global and local detectors to vanish across vision and language tasks.
A threshold-guided alignment method lets visual generative models be optimized directly from scalar human ratings instead of requiring paired preference data.
A training-free technique manipulates low-frequency noise in diffusion models to control image color and structure using low-frequency priors.
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.
DAG-STL decomposes long-horizon STL planning into decomposition, timed waypoint allocation, and diffusion-based trajectory generation to enable zero-shot planning under unknown dynamics.
VisionReward learns multi-dimensional human preferences for image and video generation via hierarchical assessment and linear weighting, outperforming VideoScore by 17.2% in prediction accuracy and yielding 31.6% higher win rates in text-to-video models.
LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.
Near-reversible Runge-Kutta ODE solvers combined with vector-field smoothing deliver more stable and higher-fidelity text-guided edits in diffusion models than exactly reversible schemes.
A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.
Stochastic image enhancement methods are shown to be variants of a shared SDE differing in drift, diffusion, terminal distributions and boundary conditions, with controlled experiments revealing no single dominant family and a new modular library released.
CDLM introduces MPDC training for discrete diffusion models, recovering prior methods as limits and claiming new SOTA text generation performance especially at low sampling budgets.
citing papers explorer
-
Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space
Adjoint-equation framework yields dimension-free convergence bounds in any IPM for discrete diffusion models under masked or uniform priors using one rate-matrix regularity assumption.
-
Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models
Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.
-
Functionalization via Structure Completion and Motion Rectification
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
-
VMU-Diff: A Coarse-to-fine Multi-source Data Fusion Framework for Precipitation Nowcasting
VMU-Diff improves precipitation nowcasting via coarse multi-source Vision Mamba fusion followed by residual conditional diffusion refinement.
-
DirectTryOn: One-Step Virtual Try-On via Straightened Conditional Transport
DirectTryOn achieves state-of-the-art one-step virtual try-on performance by applying pure conditional transport, garment preservation loss, and self-consistency loss to straighten trajectories in pretrained generative models.
-
Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning
DOSER detects OOD actions via diffusion-model denoising error and applies selective regularization based on predicted transitions, proving gamma-contraction with performance bounds and outperforming priors on offline RL benchmarks.
-
Generative Modeling with Orbit-Space Particle Flow Matching
OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
-
VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation
VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.
-
NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization
NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.
-
Long-Text-to-Image Generation via Compositional Prompt Decomposition
PRISM lets pre-trained text-to-image models handle long prompts by breaking them into compositional parts, predicting noise separately, and merging outputs via energy-based conjunction, matching fine-tuned models while generalizing better to prompts over 500 tokens.
-
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Absorbing discrete diffusion models the conditional distributions of clean data; reparameterizing yields a time-independent RADD that unifies with AO-ARMs and reaches SOTA perplexity among diffusion models on zero-shot language benchmarks.
-
Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures
URGE performs unbiased inference-time scaling for diffusion models by attaching multiplicative path weights from Girsanov estimation and resampling trajectories, with a proven equivalence to prior particle-wise SMC schemes.
-
Scaling Pretrained Representations Enables Label-Free Out-of-Distribution Detection Without Fine-Tuning
Scaling pretrained representations improves label-free OOD detection on frozen backbones, causing performance gaps between global and local detectors to vanish across vision and language tasks.
-
Threshold-Guided Optimization for Visual Generative Models
A threshold-guided alignment method lets visual generative models be optimized directly from scalar human ratings instead of requiring paired preference data.
-
Colorful-Noise: Training-Free Low-Frequency Noise Manipulation for Color-Based Conditional Image Generation
A training-free technique manipulates low-frequency noise in diffusion models to control image color and structure using low-frequency priors.
-
One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.
-
DAG-STL: A Hierarchical Framework for Zero-Shot Trajectory Planning under Signal Temporal Logic Specifications
DAG-STL decomposes long-horizon STL planning into decomposition, timed waypoint allocation, and diffusion-based trajectory generation to enable zero-shot planning under unknown dynamics.
-
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
VisionReward learns multi-dimensional human preferences for image and video generation via hierarchical assessment and linear weighting, outperforming VideoScore by 17.2% in prediction accuracy and yielding 31.6% higher win rates in text-to-video models.
-
Lossless Anti-Distillation Sampling
LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.
-
Stable and Near-Reversible Diffusion ODE Solvers for Image Editing
Near-reversible Runge-Kutta ODE solvers combined with vector-field smoothing deliver more stable and higher-fidelity text-guided edits in diffusion models than exactly reversible schemes.
-
Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges
A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.
-
Unifying Deep Stochastic Processes for Image Enhancement
Stochastic image enhancement methods are shown to be variants of a shared SDE differing in drift, diffusion, terminal distributions and boundary conditions, with controlled experiments revealing no single dominant family and a new modular library released.
-
Consistent Diffusion Language Models
CDLM introduces MPDC training for discrete diffusion models, recovering prior methods as limits and claiming new SOTA text generation performance especially at low sampling budgets.
- LASER: Learning Active Sensing for Continuum Field Reconstruction