RL-PLUS is a hybrid RL approach for LLMs that combines internal exploitation with external data via importance sampling and exploration advantages to prevent capability boundary collapse and achieve gains on math and OOD reasoning benchmarks.
The second category consists of four straightforward baselines: 1)SFT, supervised fine-tuning using external reasoning trajectory data
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
RL-PLUS is a hybrid RL approach for LLMs that combines internal exploitation with external data via importance sampling and exploration advantages to prevent capability boundary collapse and achieve gains on math and OOD reasoning benchmarks.