{"paper":{"title":"rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking","license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","headline":"Small language models reach expert math reasoning by evolving their own search and evaluation processes over repeated rounds.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Fan Yang, Li Lyna Zhang, Mao Yang, Ning Shang, Xinyu Guan, Yifei Liu, Yi Zhu, Youran Sun","submitted_at":"2025-01-08T14:12:57Z","abstract_excerpt":"We present rStar-Math to demonstrate that small language models (SLMs) can rival or even surpass the math reasoning capability of OpenAI o1, without distillation from superior models. rStar-Math achieves this by exercising \"deep thinking\" through Monte Carlo Tree Search (MCTS), where a math policy SLM performs test-time search guided by an SLM-based process reward model. rStar-Math introduces three innovations to tackle the challenges in training the two SLMs: (1) a novel code-augmented CoT data sythesis method, which performs extensive MCTS rollouts to generate step-by-step verified reasoning"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Through 4 rounds of self-evolution with millions of synthesized solutions for 747k math problems, rStar-Math boosts SLMs' math reasoning to state-of-the-art levels. On the MATH benchmark, it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing o1-preview by +4.5% and +0.9%.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The process preference model trained on self-generated trajectories provides unbiased, accurate step-level guidance during MCTS search and does not overfit to patterns in the synthesized data or the specific benchmarks used for evaluation.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Small LLMs reach 90% on the MATH benchmark and solve 53% of AIME problems by self-evolving through MCTS with a process preference model, surpassing o1-preview without distillation from larger models.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Small language models reach expert math reasoning by evolving their own search and evaluation processes over repeated rounds.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"363770b40a8df31b40ecc59add77611f89c0582ce5628d6b8f95df8052aa63f1"},"source":{"id":"2501.04519","kind":"arxiv","version":1},"verdict":{"id":"c06e96ef-f23b-47cd-b08f-5ee9e54e914b","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T04:38:20.584079Z","strongest_claim":"Through 4 rounds of self-evolution with millions of synthesized solutions for 747k math problems, rStar-Math boosts SLMs' math reasoning to state-of-the-art levels. On the MATH benchmark, it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing o1-preview by +4.5% and +0.9%.","one_line_summary":"Small LLMs reach 90% on the MATH benchmark and solve 53% of AIME problems by self-evolving through MCTS with a process preference model, surpassing o1-preview without distillation from larger models.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The process preference model trained on self-generated trajectories provides unbiased, accurate step-level guidance during MCTS search and does not overfit to patterns in the synthesized data or the specific benchmarks used for evaluation.","pith_extraction_headline":"Small language models reach expert math reasoning by evolving their own search and evaluation processes over repeated rounds."},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"e63b6c453d16ae8b6f6b36b6eb270c5c9a58f4ea36427a2214c3e1996dcc3af6"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}