InThirty-seventh Conference on Neural Information Processing Sys- tems

Hejian Sang, Yuanda Xu, Zhengze Zhou, Ran He, Zhipeng Wang · 2026 · arXiv 2603.06619

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation

cs.LG · 2026-05-27 · unverdicted · novelty 6.0

ADWIN adaptively selects training horizons in on-policy distillation via prefix alignment checks, cutting end-to-end cost by up to 4.1x while matching or exceeding full-rollout accuracy on math and code benchmarks.

BPPO: Binary Prefix Policy Optimization for Efficient GRPO-Style Reasoning RL with Concise Responses

cs.LG · 2026-05-27 · unverdicted · novelty 6.0

BPPO selects shortest correct and incorrect completions for GRPO updates with prefix-focused optimization to deliver up to 6.08x speedup and 30-50% shorter responses on math reasoning tasks.

citing papers explorer

Showing 2 of 2 citing papers.

ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation cs.LG · 2026-05-27 · unverdicted · none · ref 29
ADWIN adaptively selects training horizons in on-policy distillation via prefix alignment checks, cutting end-to-end cost by up to 4.1x while matching or exceeding full-rollout accuracy on math and code benchmarks.
BPPO: Binary Prefix Policy Optimization for Efficient GRPO-Style Reasoning RL with Concise Responses cs.LG · 2026-05-27 · unverdicted · none · ref 6
BPPO selects shortest correct and incorrect completions for GRPO updates with prefix-focused optimization to deliver up to 6.08x speedup and 30-50% shorter responses on math reasoning tasks.

InThirty-seventh Conference on Neural Information Processing Sys- tems

fields

years

verdicts

representative citing papers

citing papers explorer