Recognition: 2 theorem links
· Lean TheoremAristotle: IMO-level Automated Theorem Proving
Pith reviewed 2026-05-15 08:46 UTC · model grok-4.3
The pith
An AI system called Aristotle reaches gold-medal performance on the 2025 IMO problems by integrating Lean formal proof search with informal lemma generation and a dedicated geometry solver.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Aristotle integrates a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver. This combination enables complete proofs for the 2025 International Mathematical Olympiad problems at gold-medal-equivalent performance, with state-of-the-art results and favorable scaling properties in automated theorem proving.
What carries the argument
The hybrid system that pairs Lean formal verification with informal reasoning for lemma creation and formalization plus a geometry solver.
If this is right
- Automated theorem provers can now handle full competition math problems at human gold-medal standard.
- Performance scales favorably as the informal and formal components receive more resources.
- Hybrid formal-informal methods outperform pure formal or pure informal approaches on complex proofs.
- Geometry problems become tractable through the dedicated solver component.
Where Pith is reading between the lines
- The approach could extend to assisting with open research problems if scaled further.
- It suggests that future systems might bridge creative insight and rigorous checking in other technical domains.
- Public release of the generated proofs would allow independent checking of the claimed results.
Load-bearing premise
The integration of informal reasoning and formal verification components can reliably produce complete, correct proofs for the full set of 2025 IMO problems without undisclosed human guidance.
What would settle it
An independent run of the full system on the 2025 IMO problems with all output proofs verified as correct in Lean and no manual interventions.
read the original abstract
We introduce Aristotle, an AI system that combines formal verification with informal reasoning, achieving gold-medal-equivalent performance on the 2025 International Mathematical Olympiad problems. Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver. Our system demonstrates state-of-the-art performance with favorable scaling properties for automated theorem proving.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Aristotle, an AI system that combines formal verification with informal reasoning to achieve gold-medal-equivalent performance on the 2025 International Mathematical Olympiad problems. It integrates three components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver, while claiming state-of-the-art performance with favorable scaling properties for automated theorem proving.
Significance. If the performance claims hold with reproducible evidence, this would represent a significant step forward in automated theorem proving by showing that hybrid informal-formal pipelines can reach IMO-level difficulty. The architecture could inform future scaling strategies in the field.
major comments (1)
- Abstract: the central claim of gold-medal-equivalent performance on the 2025 IMO problems is presented without any quantitative support, such as the list of problems attempted, per-problem success rates, error analysis, or generated Lean proof artifacts. This absence is load-bearing because the manuscript describes the components only at a high level and supplies no empirical results to distinguish full automation from selective reporting or post-hoc adjustments.
minor comments (1)
- The high-level description of component integration would benefit from additional technical specifics on data flow and failure modes to improve clarity.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address the single major comment below and outline the revisions we will incorporate into the manuscript.
read point-by-point responses
-
Referee: Abstract: the central claim of gold-medal-equivalent performance on the 2025 IMO problems is presented without any quantitative support, such as the list of problems attempted, per-problem success rates, error analysis, or generated Lean proof artifacts. This absence is load-bearing because the manuscript describes the components only at a high level and supplies no empirical results to distinguish full automation from selective reporting or post-hoc adjustments.
Authors: We agree that the abstract, in its current form, is too high-level and does not supply the quantitative details needed to substantiate the central performance claim. In the revised manuscript we will expand the abstract to include concrete metrics, such as the number of 2025 IMO problems attempted and solved together with the resulting score. We will also add a new subsection to the experimental evaluation that enumerates every problem, reports per-problem success or failure, provides a concise error analysis, and supplies links or identifiers for the corresponding Lean proof artifacts. These additions will allow readers to verify that the reported performance reflects systematic, reproducible automation rather than selective reporting. revision: yes
Circularity Check
No circularity: empirical performance claim with no derivation chain
full rationale
The paper presents Aristotle as an integrated system achieving gold-medal-equivalent IMO performance via Lean search, informal lemma generation, and geometry solver. This is framed as an empirical outcome on external benchmarks (2025 IMO problems), not as a quantity derived from internal equations, fitted parameters, or self-citations. No load-bearing steps reduce by construction to inputs; the abstract and described components contain no self-definitional loops, renamed known results, or ansatzes smuggled via prior work. The claim stands or falls on external verification of the reported proofs, which is independent of any internal circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Correctness of the Lean formal system and its kernel
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our formal solutions to five of the six IMO 2025 problems are available on GitHub.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 23 Pith papers
-
Global Product Intersection Sets in Semigroups
Any subset of the natural numbers that contains 1 can be realized as a product intersection set for any family of at least two subsets of a semigroup, and the paper gives the full classification for both arbitrary and...
-
Bipartite Exact Matching in P
A deterministic O(n^6) algorithm for bipartite exact matching is obtained by proving the Affine-Slice Nonvanishing Theorem for all braces via induction on McCuaig's decomposition, with Lean 4 formalization.
-
Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics
Formal Conjectures is a Lean 4 benchmark containing 2615 formalized problems with 1029 open conjectures, designed to evaluate automated mathematical reasoning and proof discovery.
-
LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving
LeanSearch v2 recovers 46.1% of ground-truth premise groups on research-level Mathlib theorems and raises fixed-loop proof success from 4% to 20% via embedding-reranker plus iterative sketch-retrieve-reflect retrieval.
-
LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving
LeanSearch v2 recovers 46.1% of ground-truth premise groups for research-level Lean 4 theorems within 10 candidates and raises fixed-loop proof success to 20%.
-
AI co-mathematician: Accelerating mathematicians with agentic AI
An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.
-
Self-Improvement for Fast, High-Quality Plan Generation
Self-improvement of a decoder-only transformer yields plans averaging 30% shorter than a source symbolic planner, over 80% optimal where known, with sub-exponential latency scaling.
-
Gaps in Multiplicative Sidon Sets
g(n) ≪_ε n^{ρ+ε} with ρ = (13 - √69)/10 < 0.47 for multiplicative Sidon sets that intersect every interval of length L up to n.
-
Doubly Saturated Ramsey Graphs: A Case Study in Computer-Assisted Mathematical Discovery
A SAT-plus-LLM method discovers infinite families of doubly saturated Ramsey-good graphs, answering Grinstead and Roberts' 1982 question.
-
Yanasse: Finding New Proofs from Deep Vision's Analogies, Part 1
A domain-independent analogy engine transfers Lean tactic patterns from probability to representation theory, producing four new machine-verified proofs.
-
Certified Program Synthesis with a Multi-Modal Verifier
LeetProof achieves higher rates of fully certified program synthesis from natural language by using a multi-modal verifier in Lean to validate specifications via randomized testing and delegate proofs to AI tools, out...
-
Automatic Textbook Formalization
Multi-agent AI system formalizes entire 500-page graduate algebraic combinatorics textbook into Lean, creating 130K lines of code in one week at human-expert cost.
-
AI co-mathematician: Accelerating mathematicians with agentic AI
An interactive AI workbench called the AI co-mathematician supports open-ended mathematical research and achieves a new high score of 48% on FrontierMath Tier 4.
-
Teaching LLMs Program Semantics via Symbolic Execution Traces
Training Qwen3-8B on symbolic execution traces from Soteria improves violation detection in C programs by over 17 points, transfers across five property types, and shows superadditive gains with chain-of-thought.
-
The Network Structure of Mathlib
Network analysis of Mathlib reveals 50.9% coupling between human taxonomies and logical dependencies, median 1.6% import usage by developers, and centrality driven by infrastructure rather than mathematical content.
-
PROMISE: Proof Automation as Structural Imitation of Human Reasoning
PROMISE reframes automated proof generation as stateful search over structural embeddings of proof states, outperforming prior LLM-based systems by up to 26 points on the seL4 benchmark.
-
Automated Conjecture Resolution with Formal Verification
An AI framework combining informal reasoning and formal verification resolves an open commutative algebra problem and produces a Lean 4-checked proof with minimal human input.
-
A Minimal Agent for Automated Theorem Proving
A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.
-
Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs
MathArena is a maintained platform evaluating LLMs across olympiad problems, proofs, research questions, and formal proofs, with GPT-5.5 reaching 98% on 2026 USAMO and 74% on research-level tasks.
-
The cardinality of a set containing the pairwise sums of a fixed number of integers
Any subset A of {1,...,2n} with |A| at least n+1 or n+3 must contain the pairwise sums of three or four distinct integers respectively (both optimal), while |A| at least n plus 120 million guarantees it for five integers.
-
Deep Vision: A Formal Proof of Wolstenholmes Theorem in Lean 4
Wolstenholme's theorem is formally verified in Lean 4 via expansion of a shifted factorial product and vanishing power sums modulo p.
-
Astrolabe: A Content-Addressable Hypergraph for Semantic Knowledge Management
Astrolabe is a content-addressable hypergraph for semantic knowledge management using hash-identified entries, arbitrary-width ordered references, and plugin-extensible records.
-
Toward Evaluation Frameworks for Multi-Agent Scientific AI Systems
This paper discusses challenges in evaluating multi-agent scientific AI systems and proposes strategies like contamination-resistant tasks and multi-turn testing, demonstrated via a novel research ideas dataset and qu...
Reference graph
Works this paper leans on
-
[1]
The Surprising Effectiveness of Test‑Time Training for Few‑Shot Learning, 2024
Ekin Akyürek, Mehul Damani, Adam Zweiger, Linlu Qiu, Han Guo, Jyothish Pari, Yoon Kim, and Jacob Andreas. The Surprising Effectiveness of Test‑Time Training for Few‑Shot Learning, 2024. URL https://arxiv.org/ abs/2411.07279
-
[2]
Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. Hindsight Experience Replay. In Advances in Neural Information Processing Systems (NeurIPS) 30, pages 5055–5065, 2017. URL https://papers.nips.cc/ paper/7090-hindsight-experience-replay.pdf
work page 2017
-
[3]
Thinking Fast and Slow with Deep Learning and Tree Search
Thomas Anthony, Zheng Tian, and David Barber. Thinking Fast and Slow with Deep Learning and Tree Search. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017), pages 5360–5370, 2017
work page 2017
-
[4]
Ayers, Dragomir Radev, and Jeremy Avigad
Zhangir Azerbayev, Bartosz Piotrowski, Hailey Schoelkopf, Edward W. Ayers, Dragomir Radev, and Jeremy A vigad. ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics, 2023. URL https: //arxiv.org/abs/2302.12433. 11
-
[5]
Leonardo de Moura and Sebastian Ullrich
Luoxin Chen, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Kaijing Ma, Cheng Ren, Jiawei Shen, Wenlei Shi, Tong Sun, He Sun, Jiahui Wang, Siran Wang, Zhihong Wang, Chenrui Wei, Shufa Wei, Yonghui Wu, Yuchen Wu, Yihang Xia, Huajian Xin, Fan Yang, Huaiyuan Ying, Hongyi Yuan, Zheng Yuan, Tianyang Zha...
-
[6]
Yuri Chervonyi, Trieu H. Trinh, Miroslav Olšák, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V. Le, and Thang Luong. Gold‑medalist Performance in Solving Olympiad Geometry with AlphaGeometry2, 2025. URL https://arxiv.org/abs/2502.03544
-
[7]
Contin- uous upper confidence trees
Adrien Couëtoux, Jean-Baptiste Hoock, Nataliya Sokolovska, Olivier Teytaud, and Nicolas Bonnard. Contin- uous upper confidence trees. In Proceedings of the 5th International Conference on Learning and Intelligent Optimization (LION’05), pages 433–445, Berlin, Heidelberg, 2011. Springer-Verlag
work page 2011
-
[8]
Improving alphazero using monte-carlo graph search
Johannes Czech, Patrick Korus, and Kristian Kersting. Improving alphazero using monte-carlo graph search. In Proceedings of the 31st International Conference on Automated Planning and Scheduling (ICAPS 2021), pages 103–111, 2021
work page 2021
-
[9]
STP: Self-Play LLM Theorem Provers with Iterative Conjecturing and Proving,
Kefan Dong and Tengyu Ma. STP: Self-Play LLM Theorem Provers with Iterative Conjecturing and Proving,
- [10]
-
[11]
Rabe, Talia Ringer, and Yuriy Brun
Emily First, Markus N. Rabe, Talia Ringer, and Yuriy Brun. Baldur: Whole-Proof Generation and Repair with Large Language Models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023), pages 1229–1241. Association for Computing Machinery, 2023
work page 2023
-
[12]
ABEL: Sample Efficient Online Reinforcement Learning for Neural Theorem Proving
Fabian Gloeckle, Jannis Limperg, Gabriel Synnaeve, and Amaury Hayat. ABEL: Sample Efficient Online Reinforcement Learning for Neural Theorem Proving. In NeurIPS 2024 Workshop MATH-AI, 2024. URL https://openreview.net/forum?id=kk3mSjVCUO
work page 2024
-
[13]
AlphaProof: when reinforcement learning meets formal mathematics
Google DeepMind. AlphaProof: when reinforcement learning meets formal mathematics. YouTube video, March
-
[14]
URL https://www.youtube.com/watch?v=TFBzP78Jp6A. Presenter: Thomas Hubert
-
[15]
Jesse Michael Han, Jason Rute, Yuhuai Wu, Edward W. Ayers, and Stanislas Polu. Proof Artifact Co-Training for Theorem Proving with Language Models, 2021. URL https://arxiv.org/abs/2102.06203
-
[16]
IMO2025: Harmonic’s IMO 2025 Results (Problems & Proofs in Lean)
Harmonic. IMO2025: Harmonic’s IMO 2025 Results (Problems & Proofs in Lean). GitHub repository, 2025. URL https://github.com/harmonic-ai/IMO2025. Contains Lean statements and proofs for Problems 1–5 on IMO 2025
work page 2025
-
[17]
Aristotle Achieves Gold Medal Performance at the IMO
Harmonic. Aristotle Achieves Gold Medal Performance at the IMO. Blog post, July 28, 2025. URL https: //harmonic.fun/news#blog-post-imo
work page 2025
-
[18]
Harmonic. Running Lean at Scale. Blog post, September 11, 2025. URL https://harmonic.fun/news# blog-post-lean
work page 2025
-
[19]
LISA: Language Models of Isabelle Proofs
Albert Qiaochu Jiang, Wenda Li, Jesse Michael Han, and Yuhuai Wu. LISA: Language Models of Isabelle Proofs. In Proceedings of the 6th Conference on Artificial Intelligence and Theorem Proving (AITP 2021), volume 83 of EPiC Series in Computing, pages 378–392. EasyChair, 2021
work page 2021
-
[20]
THOR: Wielding Hammers to Integrate Language Models and Automated Theorem Provers
Albert Qiaochu Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, and Mateja Jamnik. THOR: Wielding Hammers to Integrate Language Models and Automated Theorem Provers. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022), pages 8360–8373, 2022
work page 2022
-
[21]
Draft, sketch, and prove: Guiding formal theorem provers with informal proofs
Albert Qiaochu Jiang, Sean Welleck, Jin Peng Zhou, Wenda Li, Jiacheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu, and Guillaume Lample. Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs. In International Conference on Learning Representations (ICLR), 2023. URL https://arxiv.org/ abs/2210.12283. 12
-
[22]
HyperTree Proof Search for Neural Theorem Proving
Guillaume Lample, Marie-Anne Lachaux, Thibaut Lavril, Xavier Martinet, Amaury Hayat, Gabriel Ebner, Au- rélien Rodriguez, and Timothée Lacroix. HyperTree Proof Search for Neural Theorem Proving. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022), pages 34651–34664, 2022
work page 2022
-
[23]
Yang Li, Dong Du, Linfeng Song, Chen Li, Weikang Wang, Tao Yang, and Haitao Mi. HunyuanProver: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving, 2024. URL https://arxiv.org/abs/2412.20735
-
[24]
Lean-STaR: Learning to Interleave Thinking and Proving, 2024
Haohan Lin, Zhiqing Sun, Yiming Yang, and Sean Welleck. Lean-STaR: Learning to Interleave Thinking and Proving, 2024. URL https://arxiv.org/abs/2407.10040
-
[25]
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving,
Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, and Chi Jin. Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving,
- [26]
-
[27]
ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data
Xiaoyang Liu, Kangjie Bao, Jiashuo Zhang, Yunqi Liu, Yu Chen, Yuntian Liu, Yang Jiao, and Tao Luo. ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data. 2025. URL https://arxiv. org/abs/2502.05567
-
[28]
Alexander Meiburg. Two lemmas for approxhom. GitHub pull request #251 in teorth/pfr, 2025. URL https: //github.com/teorth/pfr/pull/251
work page 2025
-
[29]
Roots of Matrix.charpoly are the eigenvalues
Alexander Meiburg. Roots of Matrix.charpoly are the eigenvalues. GitHub pull request #27118 in leanprover-community/mathlib4, 2025. URL https://github.com/leanprover-community/mathlib4/ pull/27118
work page 2025
-
[30]
limsup/liminf of f + g when either f or g tends to zero
Alexander Meiburg. limsup/liminf of f + g when either f or g tends to zero. GitHub pull request #27115 in leanprover-community/mathlib4, 2025. URL https://github.com/leanprover-community/mathlib4/ pull/27115
work page 2025
-
[31]
Alexander Meiburg. Niven’s theorem. GitHub pull request #26371 in leanprover-community/mathlib4, 2025. URL https://github.com/leanprover-community/mathlib4/pull/26371
work page 2025
-
[32]
Alexander Meiburg, Leonardo A Lessa, and Rodolfo R. Soldati. Quantum information in Lean, 2025. URL https://github.com/Timeroot/Lean-QuantumInfo
work page 2025
-
[33]
MagnusHammer: A Transformer-Based Approach to Premise Selection, 2023
Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak, Bartosz Piotrowski, Albert Qiaochu Jiang, Jin Peng Zhou, Christian Szegedy, Łukasz Kuciński, Piotr Miłoś, and Yuhuai Wu. MagnusHammer: A Transformer-Based Approach to Premise Selection, 2023. URL https://arxiv.org/abs/2303.04488
-
[34]
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
Azim Ospanov, Farzan Farnia, and Roozbeh Yousefzadeh. APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning. 2025. URL https://arxiv.org/abs/2505.05758
-
[35]
Generative lan- guage modeling for automated theorem proving.arXiv preprint arXiv:2009.03393, 2020
Stanislas Polu and Ilya Sutskever. Generative Language Modeling for Automated Theorem Proving, 2020. URL https://arxiv.org/abs/2009.03393
-
[36]
For- mal Mathematics Statement Curriculum Learning
Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, and Ilya Sutskever. For- mal Mathematics Statement Curriculum Learning. In The Eleventh International Conference on Learning Representations (ICLR 2023), 2023. URL https://openreview.net/forum?id=6CE2L64c0de
work page 2023
-
[37]
Z. Z. Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, Z. F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao, Daya Guo, and Chong Ruan. DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition. 2025. URL https://a...
work page internal anchor Pith review arXiv 2025
-
[38]
Christopher D. Rosin. Multi-armed bandits with episode context. Annals of Mathematics and Artificial Intelligence, 61:203–230, 2011. doi: 10.1007/s10472-011-9258-6. URL https://link.springer.com/ article/10.1007/s10472-011-9258-6 . 13
-
[39]
REAL‑Prover: Retrieval Aug- mented Lean Prover for Mathematical Reasoning, 2025
Ziju Shen, Naohao Huang, Fanyi Yang, Yutong Wang, Guoxiong Gao, Tianyi Xu, Jiedong Jiang, Wanyi He, Pu Yang, Mengzhou Sun, Haocheng Ju, Peihao Wu, Bryan Dai, and Bin Dong. REAL‑Prover: Retrieval Aug- mented Lean Prover for Mathematical Reasoning, 2025. URL https://arxiv.org/abs/2505.20613
-
[40]
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanc- tot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362 (6419):1140–1144, 2018. doi: 10....
-
[41]
Analysis I, volume 37 of Texts and Readings in Mathematics
Terence Tao. Analysis I, volume 37 of Texts and Readings in Mathematics. Hindustan Book Agency, New Delhi, India, 3 edition, 2014
work page 2014
-
[42]
URL https://github.com/teorth/analysis
Terence Tao, 2025. URL https://github.com/teorth/analysis
work page 2025
-
[43]
Terence Tao. Some corrections. GitHub pull request #113 in teorth/analysis, 2025. URL https://github. com/teorth/analysis/pull/113
work page 2025
-
[44]
Terence Tao. Machine‐assisted proof. Notices of the American Mathematical Society, 72(1):6–13, 2025
work page 2025
-
[45]
Repository for formalization of the Polynomial Freiman-Ruzsa conjecture, 2024
Terence Tao, Yaël Dillies, et al. Repository for formalization of the Polynomial Freiman-Ruzsa conjecture, 2024. URL https://teorth.github.io/pfr/
work page 2024
-
[46]
An In-Context Learning Agent for Formal Theorem-Proving
Amitayush Thakur, George Tsoukalas, Yeming Wen, Jimmy Xin, and Swarat Chaudhuri. An In-Context Learning Agent for Formal Theorem-Proving. In Proceedings of the 2024 Conference on Language Modeling (COLM 2024),
work page 2024
-
[47]
URL https://openreview.net/forum?id=V7HRrxXUhN
-
[48]
A read-eval-print-loop for Lean 4
The Lean Prover Community. A read-eval-print-loop for Lean 4. GitHub repository, 2023. URL https:// github.com/leanprover-community/repl
work page 2023
-
[49]
Trieu H. Trinh, Yuhuai Wu, Quoc V. Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations. Nature, 625(7961):476–482, 2024. doi: 10.1038/s41586-023-06747-5. URL https: //www.nature.com/articles/s41586-023-06747-5
-
[50]
PutnamBench: Evaluating Neural Theorem‑Provers on the Putnam Mathematical Com- petition, 2024
George Tsoukalas, Jasper Lee, John Jennings, Jimmy Xin, Michelle Ding, Michael Jennings, Amitayush Thakur, and Swarat Chaudhuri. PutnamBench: Evaluating Neural Theorem‑Provers on the Putnam Mathematical Com- petition, 2024. URL https://arxiv.org/abs/2407.11214
-
[51]
DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-Level Value Function
Haiming Wang, Ye Yuan, Zhengying Liu, Jianhao Shen, Yichun Yin, Jing Xiong, Enze Xie, Han Shi, Yujun Li, Lin Li, et al. DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-Level Value Function. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12632–12646, ...
-
[52]
LEGO-Prover: Neural Theorem Proving with Growing Libraries
Haiming Wang, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Heng Liao, and Xiaodan Liang. LEGO-Prover: Neural Theorem Proving with Growing Libraries. In Proceedings of the Twelfth International Conference on Learning Representations (ICLR 2024), 2024. URL https://procee...
work page 2024
-
[53]
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
Haiming Wang, Mert Unsal, Xiaohan Lin, Mantas Baksys, Junqi Liu, Marco Dos Santos, Flood Sung, Marina Vinyes, Zhenzhe Ying, Zekai Zhu, Jianqiao Lu, Hugues de Saxcé, Bolton Bailey, Chendong Song, Chenjun Xiao, Dehao Zhang, Ebony Zhang, Frederick Pu, Han Zhu, Jiawei Liu, Jonas Bayer, Julien Michel, Longhui Yu, Léo Dreyfus-Schmidt, Lewis Tunstall, Luigi Paga...
-
[54]
David J. Wu. Monte-Carlo Graph Search from First Principles. KataGo GitHub Documentation. URL https: //github.com/lightvector/KataGo/blob/master/docs/GraphSearch.md. 14
-
[55]
Autoformalization with Large Language Models
Yuhuai Wu, Albert Qiaochu Jiang, Wenda Li, Markus Rabe, Charles Staats, Mateja Jamnik, and Christian Szegedy. Autoformalization with Large Language Models. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022), pages 32353–32368, 2022
work page 2022
- [56]
-
[57]
Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, and Xiaodan Liang. DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data, 2024. URL https://arxiv.org/abs/2405.14333
-
[58]
Huajian Xin, Z. Ren, Z. Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, Wenjun Gao, Qihao Zhu, Dejian Yang, Zhibin Gou, F. Wu, Z. Fuli Luo, and Chong Ruan. DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte- Carlo Tree Search. 2025. URL https://openreview.net/foru...
work page 2025
-
[59]
BFS‑Prover: Scalable Best‑First Tree Search for LLM‑based Automatic Theorem Proving, July 2025
Ran Xin, Chenguang Xi, Jie Yang, Feng Chen, Hang Wu, Xia Xiao, Yifan Sun, Shen Zheng, and Ming Ding. BFS‑Prover: Scalable Best‑First Tree Search for LLM‑based Automatic Theorem Proving, July 2025
work page 2025
-
[60]
Kaiyu Yang, Aidan M. Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, and Anima Anandkumar. LeanDojo: Theorem Proving with Retrieval-Augmented Language Mod- els. In Proceedings of the 2023 Conference on Neural Information Processing Systems Datasets & Benchmarks Track, pages 4668–4685, 2023
work page 2023
-
[61]
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models, 2025
Zhouliang Yu, Ruotian Peng, Keyi Ding, Yizhe Li, Zhongyuan Peng, Minghao Liu, Yifan Zhang, Zheng Yuan, Huajian Xin, Wenhao Huang, Yandong Wen, Ge Zhang, and Weiyang Liu. FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models, 2025. URL https://arxiv.org/abs/2505.02735
-
[62]
LeanAbell-Prover: Posttraining Scaling in Formal Reasoning, 2025
Jingyuan Zhang, Qi Wang, Xingguang Ji, Yahui Liu, Yang Yue, Fuzheng Zhang, Di Zhang, Guorui Zhou, and Kun Gai. LeanAbell-Prover: Posttraining Scaling in Formal Reasoning, 2025. URL https://arxiv.org/abs/ 2504.06122
-
[63]
miniF2F: a cross-system benchmark for formal Olympiad-level mathematics
Kunhao Zheng, Jesse Michael Han, and Stanislas Polu. miniF2F: a cross-system benchmark for formal Olympiad-level mathematics. International Conference on Learning Representations (ICLR), 2022. URL https://openreview.net/forum?id=9ZPegFuFTFv
work page 2022
-
[64]
Solving Formal Math Problems by Decomposition and Iterative Reflection, 2025
Yichi Zhou, Jianqiu Zhao, Yongxin Zhang, Bohan Wang, Siran Wang, Luoxin Chen, Jiahui Wang, Haowei Chen, Allan Jie, Xinbo Zhang, Haocheng Wang, Luong Trung, Rong Ye, Phan Nhat Hoang, Huishuai Zhang, Peng Sun, and Hang Li. Solving Formal Math Problems by Decomposition and Iterative Reflection, 2025. URL https://arxiv.org/abs/2507.15225
-
[65]
Premise Selection for a Lean Hammer
Thomas Zhu, Joshua Clune, Jeremy A vigad, Albert Qiaochu Jiang, and Sean Welleck. Premise Selection for a Lean Hammer. 2025. URL https://arxiv.org/abs/2506.07477. A Contributors and Acknowledgements Contributors, ordered alphabetically by last name: Tudor Achim, Alex Best, Alberto Bietti, Kevin Der, Mathïs Fédérico, Sergei Gukov 1, Daniel Halpern-Leistner...
-
[66]
x := by -- Since $l_n$ is surjective, for any $x \in B_{n+1}$, there exists $y \in B_n$ such that $l_n(g_n(y)) = g_{n+1}(x)$. have h_surjective_step : 8 x : B (n + 1), ∃ y : B n, l n (g n y) = g (n + 1) x := by exact fun x => by rcases hiso_l_n n |>.2 ( g ( n + 1 ) x ) with h y, hy i ; exact h Classical.choose ( hsurj_g_n n y ), by rw [ hy, Classical.choo...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.