Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
Mixed citations
Step-aware preference optimization: Aligning preference with denoising performance at each step
Mixed citation behavior. Most common role is background (40%).
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 8representative citing papers
PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.
AdaScope adaptively selects optimal RL intervention points during diffusion denoising by monitoring structural and semantic changes, delivering 66% higher performance at 59% lower cost than full-trajectory RL baselines.
DanceGRPO applies GRPO to visual generation tasks to achieve stable policy optimization across diffusion models, rectified flows, multiple tasks, and diverse reward models, outperforming prior RL methods.
A human preference dataset and VideoReward model enable Flow-DPO and Flow-NRG to produce smoother, better-aligned videos from text prompts in flow-based generators.
Diff.-NPO frames diffusion alignment as a self-play game reaching Nash equilibrium and reports better text-to-image results than prior DPO-style methods.
JS divergence in a unified f-divergence framework for GRPO-style T2I alignment yields competitive performance while preserving generation diversity.
BalancedDPO applies majority-vote consensus from multiple preference scorers and dynamic reference model updates within DPO to achieve multi-metric alignment for text-to-image diffusion models, reporting improved win rates on Pick-a-Pic, PartiPrompt, and HPD datasets across SD 1.5, 2.1, and SDXL.
citing papers explorer
-
Balancing Performance and Diversity in GRPO Autoregressive Text-to-Image Post-Training
JS divergence in a unified f-divergence framework for GRPO-style T2I alignment yields competitive performance while preserving generation diversity.