The choice of closeness measure in diffusion reward alignment determines the computational primitives and tractable reward classes, with linear exponential tilts sufficing for KL with convex rewards and proximal oracles for Wasserstein with concave or low-dimensional Lipschitz rewards.
The Eleventh International Conference on Learning Representations , year=
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
dFlowGRPO is a new rate-aware RL method for discrete flow models that outperforms prior GRPO approaches on image generation and matches continuous flow models while supporting broad probability paths.
Establishes robustness of distribution support for guided diffusion processes under exact score access across DDIM, DDPM, and exponential integrator discretizations.
citing papers explorer
-
The tractability landscape of diffusion alignment: regularization, rewards, and computational primitives
The choice of closeness measure in diffusion reward alignment determines the computational primitives and tractable reward classes, with linear exponential tilts sufficing for KL with convex rewards and proximal oracles for Wasserstein with concave or low-dimensional Lipschitz rewards.
-
dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models
dFlowGRPO is a new rate-aware RL method for discrete flow models that outperforms prior GRPO approaches on image generation and matches continuous flow models while supporting broad probability paths.
-
On the Robustness of Distribution Support under Diffusion Guidance
Establishes robustness of distribution support for guided diffusion processes under exact score access across DDIM, DDPM, and exponential integrator discretizations.