Vapo: Efficient and reliable reinforcement learning for advanced reasoning tasks

Yu Yue, Yufeng Yuan, Qiying Yu, Xiaochen Zuo, Ruofei Zhu, Wenyuan Xu, Jiaze Chen, Chengyi Wang, TianTian Fan, Zhengyin Du, Xiangpeng Wei, Xiangyu Yu, Gaohong Liu, Juncai Liu, Lingjun Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Chi Zhang, Mofan Zha · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

cs.LG · 2025-03-31 · unverdicted · novelty 6.0

A simple PPO-based RL training pipeline on base models scales reasoning performance and response length, outperforming prior work on math and science benchmarks with one-tenth the training steps.

citing papers explorer

Showing 1 of 1 citing paper.

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model cs.LG · 2025-03-31 · unverdicted · none · ref 18
A simple PPO-based RL training pipeline on base models scales reasoning performance and response length, outperforming prior work on math and science benchmarks with one-tenth the training steps.

Vapo: Efficient and reliable reinforcement learning for advanced reasoning tasks

fields

years

verdicts

representative citing papers

citing papers explorer