OlymMATH is a 350-problem Olympiad math benchmark combining bilingual natural-language evaluation with Lean 4 formal verification to test LLM reasoning.
An empirical study on eliciting and improving r1-like reasoning models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2025 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
R1-Searcher uses two-stage outcome-based RL to train LLMs to invoke external search systems for better reasoning without process rewards or distillation.
citing papers explorer
-
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models
OlymMATH is a 350-problem Olympiad math benchmark combining bilingual natural-language evaluation with Lean 4 formal verification to test LLM reasoning.
-
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
R1-Searcher uses two-stage outcome-based RL to train LLMs to invoke external search systems for better reasoning without process rewards or distillation.