OmniNFT introduces modality-wise advantage routing, layer-wise gradient surgery, and region-wise loss reweighting in an online diffusion RL framework to improve audio-video quality, alignment, and synchronization.
Maskfocus: Focusing policy optimization on critical steps for masked image generation
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 5years
2026 5roles
background 3representative citing papers
Super-Linear Advantage Shaping (SLAS) introduces a non-linear geometric policy update for RL post-training of text-to-image models that reshapes the local policy space via advantage-dependent Fisher-Rao weighting to reduce reward hacking and improve performance over GRPO baselines.
Flow-OPD is a two-stage on-policy distillation method for flow matching models that lifts GenEval from 63 to 92 and OCR from 59 to 94 on SD 3.5 Medium while preserving fidelity.
MAR-GRPO stabilizes GRPO for AR-diffusion hybrids via multi-trajectory expectation and uncertainty-based token selection, yielding better visual quality, stability, and spatial understanding than baselines.
E²PO uses embedding-level perturbations to maintain intra-group variance and discriminative signal in RL-based preference optimization for generative flow models.
citing papers explorer
-
Flow-OPD: On-Policy Distillation for Flow Matching Models
Flow-OPD is a two-stage on-policy distillation method for flow matching models that lifts GenEval from 63 to 92 and OCR from 59 to 94 on SD 3.5 Medium while preserving fidelity.