arXiv preprint arXiv:2511.16955 (2025)

He, D · 2025 · arXiv 2511.16955

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 4

citation-polarity summary

background 3 unclear 1

representative citing papers

OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models

cs.CV · 2026-04-05 · unverdicted · novelty 8.0

OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.

KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

KVPO aligns streaming autoregressive video generators with human preferences via ODE-native GRPO, using KV cache for semantic exploration and TVE for velocity-based policy modeling, yielding gains in quality and alignment.

Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation

cs.CV · 2026-04-21 · unverdicted · novelty 7.0

OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.

Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Super-Linear Advantage Shaping (SLAS) introduces a non-linear geometric policy update for RL post-training of text-to-image models that reshapes the local policy space via advantage-dependent Fisher-Rao weighting to reduce reward hacking and improve performance over GRPO baselines.

Region-Constrained Group Relative Policy Optimization for Flow-Based Image Editing

cs.CV · 2026-04-10 · unverdicted · novelty 6.0

RC-GRPO-Editing constrains GRPO exploration to editing regions via localized noise and attention rewards, improving instruction adherence and non-target preservation in flow-based image editing.

Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.

citing papers explorer

Showing 6 of 6 citing papers.

OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models cs.CV · 2026-04-05 · unverdicted · none · ref 8
OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration cs.CV · 2026-05-14 · unverdicted · none · ref 1
KVPO aligns streaming autoregressive video generators with human preferences via ODE-native GRPO, using KV cache for semantic exploration and TVE for velocity-based policy modeling, yielding gains in quality and alignment.
Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation cs.CV · 2026-04-21 · unverdicted · none · ref 13
OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.
Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping cs.CV · 2026-05-11 · unverdicted · none · ref 107
Super-Linear Advantage Shaping (SLAS) introduces a non-linear geometric policy update for RL post-training of text-to-image models that reshapes the local policy space via advantage-dependent Fisher-Rao weighting to reduce reward hacking and improve performance over GRPO baselines.
Region-Constrained Group Relative Policy Optimization for Flow-Based Image Editing cs.CV · 2026-04-10 · unverdicted · none · ref 12
RC-GRPO-Editing constrains GRPO exploration to editing regions via localized noise and attention rewards, improving instruction adherence and non-target preservation in flow-based image editing.
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation cs.CV · 2026-04-03 · unverdicted · none · ref 11
Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.

arXiv preprint arXiv:2511.16955 (2025)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer