pith. sign in

hub Mixed citations

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Mixed citation behavior. Most common role is background (69%).

39 Pith papers citing it
Background 69% of classified citations
abstract

Although GRPO substantially enhances flow matching models in human preference alignment of image generation, methods such as FlowGRPO and DanceGRPO still exhibit inefficiency due to the necessity of sampling and optimizing over all denoising steps specified by the Markov Decision Process (MDP). In this paper, we propose $\textbf{MixGRPO}$, a novel framework that leverages the flexibility of mixed sampling strategies through the integration of stochastic differential equations (SDE) and ordinary differential equations (ODE). This streamlines the optimization process within the MDP to improve efficiency and boost performance. Specifically, MixGRPO introduces a sliding window mechanism, using SDE sampling and GRPO-guided optimization only within the window, while applying ODE sampling outside. This design confines sampling randomness to the time-steps within the window, thereby reducing the optimization overhead, and allowing for more focused gradient updates to accelerate convergence. Additionally, as time-steps beyond the sliding window are not involved in optimization, higher-order solvers are supported for faster sampling. So we present a faster variant, termed $\textbf{MixGRPO-Flash}$, which further improves training efficiency while achieving comparable performance. MixGRPO exhibits substantial gains across multiple dimensions of human preference alignment, outperforming DanceGRPO in both effectiveness and efficiency, with nearly 50% lower training time. Notably, MixGRPO-Flash further reduces training time by 71%.

hub tools

citation-role summary

background 12 baseline 2 dataset 1 method 1

citation-polarity summary

years

2026 33 2025 6

representative citing papers

OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models

cs.CV · 2026-04-05 · unverdicted · novelty 8.0

OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.

CreFlow: Corrective Reflow for Sparse-Reward Embodied Video Diffusion RL

cs.CV · 2026-05-14 · conditional · novelty 7.0

CreFlow combines LTL compositional rewards with credit-aware NFT and corrective reflow losses in online RL to improve embodied video diffusion models, raising downstream task success by 23.8 percentage points on eight bimanual manipulation tasks.

HunyuanVideo 1.5 Technical Report

cs.CV · 2025-11-24 · unverdicted · novelty 6.0

HunyuanVideo 1.5 delivers state-of-the-art open-source text-to-video and image-to-video generation with an 8.3B parameter DiT model featuring SSTA attention, glyph-aware encoding, and progressive training.

HunyuanImage 3.0 Technical Report

cs.CV · 2025-09-28 · accept · novelty 6.0

HunyuanImage 3.0 delivers an 80B-parameter MoE model unifying multimodal understanding and generation that matches prior state-of-the-art results while being fully open-sourced.

citing papers explorer

Showing 39 of 39 citing papers.