Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.
hub
International conference on machine learning , pages=
24 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
VMU-Diff improves precipitation nowcasting via coarse multi-source Vision Mamba fusion followed by residual conditional diffusion refinement.
DirectTryOn achieves state-of-the-art one-step virtual try-on performance by applying pure conditional transport, garment preservation loss, and self-consistency loss to straighten trajectories in pretrained generative models.
DOSER detects OOD actions via diffusion-model denoising error and applies selective regularization based on predicted transitions, proving gamma-contraction with performance bounds and outperforming priors on offline RL benchmarks.
OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.
NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.
PRISM lets pre-trained text-to-image models handle long prompts by breaking them into compositional parts, predicting noise separately, and merging outputs via energy-based conjunction, matching fine-tuned models while generalizing better to prompts over 500 tokens.
Absorbing discrete diffusion models the conditional distributions of clean data; reparameterizing yields a time-independent RADD that unifies with AO-ARMs and reaches SOTA perplexity among diffusion models on zero-shot language benchmarks.
URGE performs unbiased inference-time scaling for diffusion models by attaching multiplicative path weights from Girsanov estimation and resampling trajectories, with a proven equivalence to prior particle-wise SMC schemes.
Scaling pretrained representations improves label-free OOD detection on frozen backbones, causing performance gaps between global and local detectors to vanish across vision and language tasks.
A threshold-guided alignment method lets visual generative models be optimized directly from scalar human ratings instead of requiring paired preference data.
A training-free technique manipulates low-frequency noise in diffusion models to control image color and structure using low-frequency priors.
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.
DAG-STL decomposes long-horizon STL planning into decomposition, timed waypoint allocation, and diffusion-based trajectory generation to enable zero-shot planning under unknown dynamics.
VisionReward learns multi-dimensional human preferences for image and video generation via hierarchical assessment and linear weighting, outperforming VideoScore by 17.2% in prediction accuracy and yielding 31.6% higher win rates in text-to-video models.
LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.
Near-reversible Runge-Kutta diffusion ODE solvers with vector-field smoothing improve stability and edit fidelity for large changes in text-guided image editing compared to exactly reversible alternatives.
A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.
Stochastic image enhancement methods are shown to be variants of a shared SDE differing in drift, diffusion, terminal distributions and boundary conditions, with controlled experiments revealing no single dominant family and a new modular library released.
citing papers explorer
-
Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures
URGE performs unbiased inference-time scaling for diffusion models by attaching multiplicative path weights from Girsanov estimation and resampling trajectories, with a proven equivalence to prior particle-wise SMC schemes.