Pave-GRPO reformulates GRPO via principled average velocity decomposition to enable denser temporal supervision in flow-based generative model alignment without increasing rollout cost.
Sudo: Enhancing text-to- image diffusion models with self-supervised direct preference optimization.arXiv preprint arXiv:2504.14534,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
AdaGRPO enhances GRPO for flow models via online curriculum filtering of prompts and cross-level advantage fusion, yielding performance gains and training stability.
citing papers explorer
-
Pave-GRPO: Beyond Instantaneous Guidance through Principled Average Velocity Decomposition
Pave-GRPO reformulates GRPO via principled average velocity decomposition to enable denser temporal supervision in flow-based generative model alignment without increasing rollout cost.
-
AdaGRPO: A Capability-Aware Adaptive Enhancement for Flow-based GRPO
AdaGRPO enhances GRPO for flow models via online curriculum filtering of prompts and cross-level advantage fusion, yielding performance gains and training stability.