pith. sign in

Mixed citations

Step-aware preference optimization: Aligning preference with denoising performance at each step

Mixed citation behavior. Most common role is background (40%).

8 Pith papers citing it
Background 40% of classified citations

citation-role summary

background 4 baseline 1

citation-polarity summary

years

2026 4 2025 4

verdicts

UNVERDICTED 8

representative citing papers

Flow-GRPO: Training Flow Matching Models via Online RL

cs.CV · 2025-05-08 · unverdicted · novelty 8.0

Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.

DanceGRPO: Unleashing GRPO on Visual Generation

cs.CV · 2025-05-12 · unverdicted · novelty 6.0

DanceGRPO applies GRPO to visual generation tasks to achieve stable policy optimization across diffusion models, rectified flows, multiple tasks, and diverse reward models, outperforming prior RL methods.

Improving Video Generation with Human Feedback

cs.CV · 2025-01-23 · unverdicted · novelty 6.0

A human preference dataset and VideoReward model enable Flow-DPO and Flow-NRG to produce smoother, better-aligned videos from text prompts in flow-based generators.

BalancedDPO: Adaptive Multi-Metric Alignment

cs.CV · 2025-03-16 · unverdicted · novelty 4.0

BalancedDPO applies majority-vote consensus from multiple preference scorers and dynamic reference model updates within DPO to achieve multi-metric alignment for text-to-image diffusion models, reporting improved win rates on Pick-a-Pic, PartiPrompt, and HPD datasets across SD 1.5, 2.1, and SDXL.

citing papers explorer

Showing 8 of 8 citing papers.