ADS improves average accuracy by 5.2% over GRPO across three LLMs and seven benchmarks by adaptively scheduling data at cluster and sample levels based on semantic patterns and policy boundaries.
Independent skill transfer for deep reinforcement learning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning at the Right Pace: Adaptive Data Scheduling Improves LLM Reinforcement Learning
ADS improves average accuracy by 5.2% over GRPO across three LLMs and seven benchmarks by adaptively scheduling data at cluster and sample levels based on semantic patterns and policy boundaries.