Deepscaler: Surpassing o1-preview with a 1.5b model by scaling rl

Michael Luo, Sijun Tan, Justin Wong, Xiaoxiang Shi, William Tang, Manan Roongta, Colin Cai, Jeffrey Luo, Tianjun Zhang, Erran Li, Raluca Ada Popa, Ion Stoica · 2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Trust the Batch, On- or Off-Policy: Adaptive Policy Optimization for RL Post-Training

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

A new RL objective adapts trust-region and off-policy handling automatically via normalized effective sample size of batch policy ratios, matching tuned baselines without new hyperparameters.

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

cs.CV · 2025-08-25 · unverdicted · novelty 6.0

InternVL3.5 advances open-source multimodal models with Cascade RL for +16% reasoning gains and ViR for 4x inference speedup, with the 241B model reaching SOTA among open-source MLLMs on multimodal, reasoning, and agentic tasks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Trust the Batch, On- or Off-Policy: Adaptive Policy Optimization for RL Post-Training cs.LG · 2026-05-12 · unverdicted · none · ref 18
A new RL objective adapts trust-region and off-policy handling automatically via normalized effective sample size of batch policy ratios, matching tuned baselines without new hyperparameters.
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency cs.CV · 2025-08-25 · unverdicted · none · ref 83
InternVL3.5 advances open-source multimodal models with Cascade RL for +16% reasoning gains and ViR for 4x inference speedup, with the 241B model reaching SOTA among open-source MLLMs on multimodal, reasoning, and agentic tasks.

Deepscaler: Surpassing o1-preview with a 1.5b model by scaling rl

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer