Flow-Map GRPO uses anchored stochastic flow map composition to enable GRPO-based RL alignment of deterministic few-step flow-map generators while preserving their marginal paths.
arXiv preprint arXiv:2512.13006 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
K-Forcing introduces progressive self-forcing distillation to train a conditional push-forward model that jointly decodes k future tokens per forward pass, yielding 2.4-3.5x speedup at k=4 with modest quality loss on LM1B and OpenWebText.
citing papers explorer
-
Flow-Map GRPO: Reinforcement Learning for Few-Step Flow-Map Generators via Anchored Stochastic Composition
Flow-Map GRPO uses anchored stochastic flow map composition to enable GRPO-based RL alignment of deterministic few-step flow-map generators while preserving their marginal paths.
-
K-Forcing: Joint Next-K-Token Decoding via Push-Forward Language Modeling
K-Forcing introduces progressive self-forcing distillation to train a conditional push-forward model that jointly decodes k future tokens per forward pass, yielding 2.4-3.5x speedup at k=4 with modest quality loss on LM1B and OpenWebText.