ADWIN adaptively selects training horizons in on-policy distillation via prefix alignment checks, cutting end-to-end cost by up to 4.1x while matching or exceeding full-rollout accuracy on math and code benchmarks.
InThirty-seventh Conference on Neural Information Processing Sys- tems
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
BPPO selects shortest correct and incorrect completions for GRPO updates with prefix-focused optimization to deliver up to 6.08x speedup and 30-50% shorter responses on math reasoning tasks.
citing papers explorer
-
ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation
ADWIN adaptively selects training horizons in on-policy distillation via prefix alignment checks, cutting end-to-end cost by up to 4.1x while matching or exceeding full-rollout accuracy on math and code benchmarks.
-
BPPO: Binary Prefix Policy Optimization for Efficient GRPO-Style Reasoning RL with Concise Responses
BPPO selects shortest correct and incorrect completions for GRPO updates with prefix-focused optimization to deliver up to 6.08x speedup and 30-50% shorter responses on math reasoning tasks.