A training recipe for tool-integrated reasoning models achieves state-of-the-art open-source results on math benchmarks such as 96.7% and 99.2% on AIME 2025 at 4B and 30B scales by balancing tool-use trajectories and optimizing for pass@k during SFT before stable RLVR.
P1: Mastering physics olympiads with reinforcement learning
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
baseline 1polarities
baseline 1representative citing papers
TEMPO scales test-time training for large reasoning models by interleaving policy refinement on unlabeled data with critic recalibration on labeled data via an EM formulation, yielding large gains on AIME tasks.
citing papers explorer
-
Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning
A training recipe for tool-integrated reasoning models achieves state-of-the-art open-source results on math benchmarks such as 96.7% and 99.2% on AIME 2025 at 4B and 30B scales by balancing tool-use trajectories and optimizing for pass@k during SFT before stable RLVR.
-
TEMPO: Scaling Test-time Training for Large Reasoning Models
TEMPO scales test-time training for large reasoning models by interleaving policy refinement on unlabeled data with critic recalibration on labeled data via an EM formulation, yielding large gains on AIME tasks.