arxiv: 2510.01346 · v2 · submitted 2025-10-01 · 💻 cs.AI · cs.CL

Recognition: 2 theorem links

· Lean Theorem

Aristotle: IMO-level Automated Theorem Proving

Tudor Achim , Alex Best , Alberto Bietti , Kevin Der , Math\"is F\'ed\'erico , Sergei Gukov , Daniel Halpern-Leistner , Kirsten Henningsgard

show 15 more authors

Yury Kudryashov Alexander Meiburg Martin Michelsen Riley Patterson Eric Rodriguez Laura Scharff Vikram Shanker Vladmir Sicca Hari Sowrirajan Aidan Swope Matyas Tamas Vlad Tenev Jonathan Thomm Harold Williams Lawrence Wu

Authors on Pith no claims yet

Pith reviewed 2026-05-15 08:46 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords automated theorem provingInternational Mathematical OlympiadLean theorem proverformal verificationinformal reasoninggeometry solverAI for mathematics

0 comments

The pith

An AI system called Aristotle reaches gold-medal performance on the 2025 IMO problems by integrating Lean formal proof search with informal lemma generation and a dedicated geometry solver.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Aristotle as an AI that solves International Mathematical Olympiad problems at gold-medal level. It links a Lean-based formal proof searcher with an informal reasoning engine that creates lemmas and turns them into formal statements, plus a specialized geometry component. The authors report this hybrid setup delivers state-of-the-art results on the 2025 problems and improves with scaling. A reader would care because it shows one concrete way to combine loose idea generation with strict verification to reach competition-level math performance.

Core claim

Aristotle integrates a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver. This combination enables complete proofs for the 2025 International Mathematical Olympiad problems at gold-medal-equivalent performance, with state-of-the-art results and favorable scaling properties in automated theorem proving.

What carries the argument

The hybrid system that pairs Lean formal verification with informal reasoning for lemma creation and formalization plus a geometry solver.

If this is right

Automated theorem provers can now handle full competition math problems at human gold-medal standard.
Performance scales favorably as the informal and formal components receive more resources.
Hybrid formal-informal methods outperform pure formal or pure informal approaches on complex proofs.
Geometry problems become tractable through the dedicated solver component.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could extend to assisting with open research problems if scaled further.
It suggests that future systems might bridge creative insight and rigorous checking in other technical domains.
Public release of the generated proofs would allow independent checking of the claimed results.

Load-bearing premise

The integration of informal reasoning and formal verification components can reliably produce complete, correct proofs for the full set of 2025 IMO problems without undisclosed human guidance.

What would settle it

An independent run of the full system on the 2025 IMO problems with all output proofs verified as correct in Lean and no manual interventions.

read the original abstract

We introduce Aristotle, an AI system that combines formal verification with informal reasoning, achieving gold-medal-equivalent performance on the 2025 International Mathematical Olympiad problems. Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver. Our system demonstrates state-of-the-art performance with favorable scaling properties for automated theorem proving.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Aristotle claims gold-medal IMO performance but the abstract supplies no success rates, problem list, or sample proofs to check it.

read the letter

The paper introduces Aristotle, a system that combines Lean proof search, informal lemma generation, and a geometry solver, and it claims this setup reaches gold-medal equivalent results on the 2025 IMO problems. That three-part split is the main new element on offer, as it tries to link formal verification directly to informal reasoning steps in a way that prior Lean-based provers have not emphasized as explicitly. If the full manuscript shows how the components hand off lemmas and how the geometry solver integrates without heavy human intervention, that architecture could be worth looking at for people building hybrid provers. The scaling properties mentioned also sound like a practical angle worth testing against existing baselines. The central weakness is that none of the performance numbers appear in the abstract: no per-problem breakdown, no comparison table to AlphaProof or other systems, and no excerpt of the actual Lean scripts produced. The stress-test note flags exactly this gap, and without those artifacts it is impossible to tell whether the system ran end-to-end or whether selective reporting or post-processing filled in the proofs. That absence makes the main claim hard to evaluate on its own terms. The work would interest researchers already working on Lean-based ATP or informal-to-formal pipelines, but only if they are willing to wait for the missing evaluation details. A reader who wants reproducible evidence of IMO-level automation will not get much from the current version. I would not send it to referees until the authors add the concrete results, problem coverage, and proof artifacts; otherwise the paper stays at the level of an architecture sketch rather than a verified advance.

Referee Report

1 major / 1 minor

Summary. The paper introduces Aristotle, an AI system that combines formal verification with informal reasoning to achieve gold-medal-equivalent performance on the 2025 International Mathematical Olympiad problems. It integrates three components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver, while claiming state-of-the-art performance with favorable scaling properties for automated theorem proving.

Significance. If the performance claims hold with reproducible evidence, this would represent a significant step forward in automated theorem proving by showing that hybrid informal-formal pipelines can reach IMO-level difficulty. The architecture could inform future scaling strategies in the field.

major comments (1)

Abstract: the central claim of gold-medal-equivalent performance on the 2025 IMO problems is presented without any quantitative support, such as the list of problems attempted, per-problem success rates, error analysis, or generated Lean proof artifacts. This absence is load-bearing because the manuscript describes the components only at a high level and supplies no empirical results to distinguish full automation from selective reporting or post-hoc adjustments.

minor comments (1)

The high-level description of component integration would benefit from additional technical specifics on data flow and failure modes to improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address the single major comment below and outline the revisions we will incorporate into the manuscript.

read point-by-point responses

Referee: Abstract: the central claim of gold-medal-equivalent performance on the 2025 IMO problems is presented without any quantitative support, such as the list of problems attempted, per-problem success rates, error analysis, or generated Lean proof artifacts. This absence is load-bearing because the manuscript describes the components only at a high level and supplies no empirical results to distinguish full automation from selective reporting or post-hoc adjustments.

Authors: We agree that the abstract, in its current form, is too high-level and does not supply the quantitative details needed to substantiate the central performance claim. In the revised manuscript we will expand the abstract to include concrete metrics, such as the number of 2025 IMO problems attempted and solved together with the resulting score. We will also add a new subsection to the experimental evaluation that enumerates every problem, reports per-problem success or failure, provides a concise error analysis, and supplies links or identifiers for the corresponding Lean proof artifacts. These additions will allow readers to verify that the reported performance reflects systematic, reproducible automation rather than selective reporting. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claim with no derivation chain

full rationale

The paper presents Aristotle as an integrated system achieving gold-medal-equivalent IMO performance via Lean search, informal lemma generation, and geometry solver. This is framed as an empirical outcome on external benchmarks (2025 IMO problems), not as a quantity derived from internal equations, fitted parameters, or self-citations. No load-bearing steps reduce by construction to inputs; the abstract and described components contain no self-definitional loops, renamed known results, or ansatzes smuggled via prior work. The claim stands or falls on external verification of the reported proofs, which is independent of any internal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, ad-hoc axioms, or invented entities; the system is described as relying on the existing Lean theorem prover and standard mathematical reasoning.

axioms (1)

standard math Correctness of the Lean formal system and its kernel
The proof search component depends on Lean being sound, which is a background assumption from prior literature.

pith-pipeline@v0.9.0 · 5439 in / 1151 out tokens · 31875 ms · 2026-05-15T08:46:52.310675+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our formal solutions to five of the six IMO 2025 problems are available on GitHub.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 23 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Global Product Intersection Sets in Semigroups
math.CO 2026-04 accept novelty 8.0 full

Any subset of the natural numbers that contains 1 can be realized as a product intersection set for any family of at least two subsets of a semigroup, and the paper gives the full classification for both arbitrary and...
Bipartite Exact Matching in P
cs.DM 2026-04 accept novelty 8.0 partial

A deterministic O(n^6) algorithm for bipartite exact matching is obtained by proving the Affine-Slice Nonvanishing Theorem for all braces via induction on McCuaig's decomposition, with Lean 4 formalization.
Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics
cs.AI 2026-05 unverdicted novelty 7.0

Formal Conjectures is a Lean 4 benchmark containing 2615 formalized problems with 1029 open conjectures, designed to evaluate automated mathematical reasoning and proof discovery.
LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving
cs.IR 2026-05 conditional novelty 7.0

LeanSearch v2 recovers 46.1% of ground-truth premise groups on research-level Mathlib theorems and raises fixed-loop proof success from 4% to 20% via embedding-reranker plus iterative sketch-retrieve-reflect retrieval.
LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving
cs.IR 2026-05 conditional novelty 7.0

LeanSearch v2 recovers 46.1% of ground-truth premise groups for research-level Lean 4 theorems within 10 candidates and raises fixed-loop proof success to 20%.
AI co-mathematician: Accelerating mathematicians with agentic AI
cs.AI 2026-05 unverdicted novelty 7.0

An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.
Self-Improvement for Fast, High-Quality Plan Generation
cs.AI 2026-05 unverdicted novelty 7.0

Self-improvement of a decoder-only transformer yields plans averaging 30% shorter than a source symbolic planner, over 80% optimal where known, with sub-exponential latency scaling.
Gaps in Multiplicative Sidon Sets
math.NT 2026-05 unverdicted novelty 7.0 partial

g(n) ≪_ε n^{ρ+ε} with ρ = (13 - √69)/10 < 0.47 for multiplicative Sidon sets that intersect every interval of length L up to n.
Doubly Saturated Ramsey Graphs: A Case Study in Computer-Assisted Mathematical Discovery
math.CO 2026-04 unverdicted novelty 7.0 partial

A SAT-plus-LLM method discovers infinite families of doubly saturated Ramsey-good graphs, answering Grinstead and Roberts' 1982 question.
Yanasse: Finding New Proofs from Deep Vision's Analogies, Part 1
cs.AI 2026-04 unverdicted novelty 7.0

A domain-independent analogy engine transfers Lean tactic patterns from probability to representation theory, producing four new machine-verified proofs.
Certified Program Synthesis with a Multi-Modal Verifier
cs.SE 2026-04 unverdicted novelty 7.0

LeetProof achieves higher rates of fully certified program synthesis from natural language by using a multi-modal verifier in Lean to validate specifications via randomized testing and delegate proofs to AI tools, out...
Automatic Textbook Formalization
cs.AI 2026-04 accept novelty 7.0

Multi-agent AI system formalizes entire 500-page graduate algebraic combinatorics textbook into Lean, creating 130K lines of code in one week at human-expert cost.
AI co-mathematician: Accelerating mathematicians with agentic AI
cs.AI 2026-05 unverdicted novelty 6.0

An interactive AI workbench called the AI co-mathematician supports open-ended mathematical research and achieves a new high score of 48% on FrontierMath Tier 4.
Teaching LLMs Program Semantics via Symbolic Execution Traces
cs.SE 2026-05 unverdicted novelty 6.0

Training Qwen3-8B on symbolic execution traces from Soteria improves violation detection in C programs by over 17 points, transfers across five property types, and shows superadditive gains with chain-of-thought.
The Network Structure of Mathlib
cs.LO 2026-04 unverdicted novelty 6.0

Network analysis of Mathlib reveals 50.9% coupling between human taxonomies and logical dependencies, median 1.6% import usage by developers, and centrality driven by infrastructure rather than mathematical content.
PROMISE: Proof Automation as Structural Imitation of Human Reasoning
cs.LO 2026-04 unverdicted novelty 6.0

PROMISE reframes automated proof generation as stateful search over structural embeddings of proof states, outperforming prior LLM-based systems by up to 26 points on the seL4 benchmark.
Automated Conjecture Resolution with Formal Verification
cs.LG 2026-04 unverdicted novelty 6.0

An AI framework combining informal reasoning and formal verification resolves an open commutative algebra problem and produces a Lean 4-checked proof with minimal human input.
A Minimal Agent for Automated Theorem Proving
cs.AI 2026-02 unverdicted novelty 6.0

A minimal agentic system achieves competitive performance in automated theorem proving with a simpler design and lower cost than state-of-the-art methods.
Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs
cs.CL 2026-05 unverdicted novelty 5.0

MathArena is a maintained platform evaluating LLMs across olympiad problems, proofs, research questions, and formal proofs, with GPT-5.5 reaching 98% on 2026 USAMO and 74% on research-level tasks.
The cardinality of a set containing the pairwise sums of a fixed number of integers
math.NT 2026-04 unverdicted novelty 5.0

Any subset A of {1,...,2n} with |A| at least n+1 or n+3 must contain the pairwise sums of three or four distinct integers respectively (both optimal), while |A| at least n plus 120 million guarantees it for five integers.
Deep Vision: A Formal Proof of Wolstenholmes Theorem in Lean 4
cs.LO 2026-04 accept novelty 5.0

Wolstenholme's theorem is formally verified in Lean 4 via expansion of a shifted factorial product and vanishing power sums modulo p.
Astrolabe: A Content-Addressable Hypergraph for Semantic Knowledge Management
math.HO 2026-04 unverdicted novelty 5.0

Astrolabe is a content-addressable hypergraph for semantic knowledge management using hash-identified entries, arbitrary-width ordered references, and plugin-extensible records.
Toward Evaluation Frameworks for Multi-Agent Scientific AI Systems
cs.CY 2026-03 unverdicted novelty 4.0

This paper discusses challenges in evaluating multi-agent scientific AI systems and proposes strategies like contamination-resistant tasks and multi-turn testing, demonstrated via a novel research ideas dataset and qu...

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · cited by 21 Pith papers · 1 internal anchor

[1]

The Surprising Effectiveness of Test‑Time Training for Few‑Shot Learning, 2024

Ekin Akyürek, Mehul Damani, Adam Zweiger, Linlu Qiu, Han Guo, Jyothish Pari, Yoon Kim, and Jacob Andreas. The Surprising Effectiveness of Test‑Time Training for Few‑Shot Learning, 2024. URL https://arxiv.org/ abs/2411.07279

work page arXiv 2024
[2]

Hindsight Experience Replay

Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. Hindsight Experience Replay. In Advances in Neural Information Processing Systems (NeurIPS) 30, pages 5055–5065, 2017. URL https://papers.nips.cc/ paper/7090-hindsight-experience-replay.pdf

work page 2017
[3]

Thinking Fast and Slow with Deep Learning and Tree Search

Thomas Anthony, Zheng Tian, and David Barber. Thinking Fast and Slow with Deep Learning and Tree Search. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017), pages 5360–5370, 2017

work page 2017
[4]

Ayers, Dragomir Radev, and Jeremy Avigad

Zhangir Azerbayev, Bartosz Piotrowski, Hailey Schoelkopf, Edward W. Ayers, Dragomir Radev, and Jeremy A vigad. ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics, 2023. URL https: //arxiv.org/abs/2302.12433. 11

work page arXiv 2023
[5]

Leonardo de Moura and Sebastian Ullrich

Luoxin Chen, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Kaijing Ma, Cheng Ren, Jiawei Shen, Wenlei Shi, Tong Sun, He Sun, Jiahui Wang, Siran Wang, Zhihong Wang, Chenrui Wei, Shufa Wei, Yonghui Wu, Yuchen Wu, Yihang Xia, Huajian Xin, Fan Yang, Huaiyuan Ying, Hongyi Yuan, Zheng Yuan, Tianyang Zha...

work page arXiv 2025
[6]

Trinh, Miroslav Olšák, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V

Yuri Chervonyi, Trieu H. Trinh, Miroslav Olšák, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V. Le, and Thang Luong. Gold‑medalist Performance in Solving Olympiad Geometry with AlphaGeometry2, 2025. URL https://arxiv.org/abs/2502.03544

work page arXiv 2025
[7]

Contin- uous upper confidence trees

Adrien Couëtoux, Jean-Baptiste Hoock, Nataliya Sokolovska, Olivier Teytaud, and Nicolas Bonnard. Contin- uous upper confidence trees. In Proceedings of the 5th International Conference on Learning and Intelligent Optimization (LION’05), pages 433–445, Berlin, Heidelberg, 2011. Springer-Verlag

work page 2011
[8]

Improving alphazero using monte-carlo graph search

Johannes Czech, Patrick Korus, and Kristian Kersting. Improving alphazero using monte-carlo graph search. In Proceedings of the 31st International Conference on Automated Planning and Scheduling (ICAPS 2021), pages 103–111, 2021

work page 2021
[9]

STP: Self-Play LLM Theorem Provers with Iterative Conjecturing and Proving,

Kefan Dong and Tengyu Ma. STP: Self-Play LLM Theorem Provers with Iterative Conjecturing and Proving,

work page
[10]

URL https://arxiv.org/abs/2502.00212

work page arXiv
[11]

Rabe, Talia Ringer, and Yuriy Brun

Emily First, Markus N. Rabe, Talia Ringer, and Yuriy Brun. Baldur: Whole-Proof Generation and Repair with Large Language Models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023), pages 1229–1241. Association for Computing Machinery, 2023

work page 2023
[12]

ABEL: Sample Eﬀicient Online Reinforcement Learning for Neural Theorem Proving

Fabian Gloeckle, Jannis Limperg, Gabriel Synnaeve, and Amaury Hayat. ABEL: Sample Eﬀicient Online Reinforcement Learning for Neural Theorem Proving. In NeurIPS 2024 Workshop MATH-AI, 2024. URL https://openreview.net/forum?id=kk3mSjVCUO

work page 2024
[13]

AlphaProof: when reinforcement learning meets formal mathematics

Google DeepMind. AlphaProof: when reinforcement learning meets formal mathematics. YouTube video, March

work page
[14]

Presenter: Thomas Hubert

URL https://www.youtube.com/watch?v=TFBzP78Jp6A. Presenter: Thomas Hubert

work page
[15]

Ayers, and Stanislas Polu

Jesse Michael Han, Jason Rute, Yuhuai Wu, Edward W. Ayers, and Stanislas Polu. Proof Artifact Co-Training for Theorem Proving with Language Models, 2021. URL https://arxiv.org/abs/2102.06203

work page arXiv 2021
[16]

IMO2025: Harmonic’s IMO 2025 Results (Problems & Proofs in Lean)

Harmonic. IMO2025: Harmonic’s IMO 2025 Results (Problems & Proofs in Lean). GitHub repository, 2025. URL https://github.com/harmonic-ai/IMO2025. Contains Lean statements and proofs for Problems 1–5 on IMO 2025

work page 2025
[17]

Aristotle Achieves Gold Medal Performance at the IMO

Harmonic. Aristotle Achieves Gold Medal Performance at the IMO. Blog post, July 28, 2025. URL https: //harmonic.fun/news#blog-post-imo

work page 2025
[18]

Running Lean at Scale

Harmonic. Running Lean at Scale. Blog post, September 11, 2025. URL https://harmonic.fun/news# blog-post-lean

work page 2025
[19]

LISA: Language Models of Isabelle Proofs

Albert Qiaochu Jiang, Wenda Li, Jesse Michael Han, and Yuhuai Wu. LISA: Language Models of Isabelle Proofs. In Proceedings of the 6th Conference on Artificial Intelligence and Theorem Proving (AITP 2021), volume 83 of EPiC Series in Computing, pages 378–392. EasyChair, 2021

work page 2021
[20]

THOR: Wielding Hammers to Integrate Language Models and Automated Theorem Provers

Albert Qiaochu Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, and Mateja Jamnik. THOR: Wielding Hammers to Integrate Language Models and Automated Theorem Provers. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022), pages 8360–8373, 2022

work page 2022
[21]

Draft, sketch, and prove: Guiding formal theorem provers with informal proofs

Albert Qiaochu Jiang, Sean Welleck, Jin Peng Zhou, Wenda Li, Jiacheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu, and Guillaume Lample. Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs. In International Conference on Learning Representations (ICLR), 2023. URL https://arxiv.org/ abs/2210.12283. 12

work page arXiv 2023
[22]

HyperTree Proof Search for Neural Theorem Proving

Guillaume Lample, Marie-Anne Lachaux, Thibaut Lavril, Xavier Martinet, Amaury Hayat, Gabriel Ebner, Au- rélien Rodriguez, and Timothée Lacroix. HyperTree Proof Search for Neural Theorem Proving. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022), pages 34651–34664, 2022

work page 2022
[23]

HunyuanProver: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving, 2024

Yang Li, Dong Du, Linfeng Song, Chen Li, Weikang Wang, Tao Yang, and Haitao Mi. HunyuanProver: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving, 2024. URL https://arxiv.org/abs/2412.20735

work page arXiv 2024
[24]

Lean-STaR: Learning to Interleave Thinking and Proving, 2024

Haohan Lin, Zhiqing Sun, Yiming Yang, and Sean Welleck. Lean-STaR: Learning to Interleave Thinking and Proving, 2024. URL https://arxiv.org/abs/2407.10040

work page arXiv 2024
[25]

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving,

Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, and Chi Jin. Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving,

work page
[26]

URL https://arxiv.org/abs/2502.07640

work page arXiv
[27]

ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

Xiaoyang Liu, Kangjie Bao, Jiashuo Zhang, Yunqi Liu, Yu Chen, Yuntian Liu, Yang Jiao, and Tao Luo. ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data. 2025. URL https://arxiv. org/abs/2502.05567

work page arXiv 2025
[28]

Two lemmas for approxhom

Alexander Meiburg. Two lemmas for approxhom. GitHub pull request #251 in teorth/pfr, 2025. URL https: //github.com/teorth/pfr/pull/251

work page 2025
[29]

Roots of Matrix.charpoly are the eigenvalues

Alexander Meiburg. Roots of Matrix.charpoly are the eigenvalues. GitHub pull request #27118 in leanprover-community/mathlib4, 2025. URL https://github.com/leanprover-community/mathlib4/ pull/27118

work page 2025
[30]

limsup/liminf of f + g when either f or g tends to zero

Alexander Meiburg. limsup/liminf of f + g when either f or g tends to zero. GitHub pull request #27115 in leanprover-community/mathlib4, 2025. URL https://github.com/leanprover-community/mathlib4/ pull/27115

work page 2025
[31]

Niven’s theorem

Alexander Meiburg. Niven’s theorem. GitHub pull request #26371 in leanprover-community/mathlib4, 2025. URL https://github.com/leanprover-community/mathlib4/pull/26371

work page 2025
[32]

Alexander Meiburg, Leonardo A Lessa, and Rodolfo R. Soldati. Quantum information in Lean, 2025. URL https://github.com/Timeroot/Lean-QuantumInfo

work page 2025
[33]

MagnusHammer: A Transformer-Based Approach to Premise Selection, 2023

Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak, Bartosz Piotrowski, Albert Qiaochu Jiang, Jin Peng Zhou, Christian Szegedy, Łukasz Kuciński, Piotr Miłoś, and Yuhuai Wu. MagnusHammer: A Transformer-Based Approach to Premise Selection, 2023. URL https://arxiv.org/abs/2303.04488

work page arXiv 2023
[34]

APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

Azim Ospanov, Farzan Farnia, and Roozbeh Yousefzadeh. APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning. 2025. URL https://arxiv.org/abs/2505.05758

work page arXiv 2025
[35]

Generative lan- guage modeling for automated theorem proving.arXiv preprint arXiv:2009.03393, 2020

Stanislas Polu and Ilya Sutskever. Generative Language Modeling for Automated Theorem Proving, 2020. URL https://arxiv.org/abs/2009.03393

work page arXiv 2020
[36]

For- mal Mathematics Statement Curriculum Learning

Stanislas Polu, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, and Ilya Sutskever. For- mal Mathematics Statement Curriculum Learning. In The Eleventh International Conference on Learning Representations (ICLR 2023), 2023. URL https://openreview.net/forum?id=6CE2L64c0de

work page 2023
[37]

Z. Z. Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, Z. F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao, Daya Guo, and Chong Ruan. DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition. 2025. URL https://a...

work page internal anchor Pith review arXiv 2025
[38]

Christopher D. Rosin. Multi-armed bandits with episode context. Annals of Mathematics and Artificial Intelligence, 61:203–230, 2011. doi: 10.1007/s10472-011-9258-6. URL https://link.springer.com/ article/10.1007/s10472-011-9258-6 . 13

work page doi:10.1007/s10472-011-9258-6 2011
[39]

REAL‑Prover: Retrieval Aug- mented Lean Prover for Mathematical Reasoning, 2025

Ziju Shen, Naohao Huang, Fanyi Yang, Yutong Wang, Guoxiong Gao, Tianyi Xu, Jiedong Jiang, Wanyi He, Pu Yang, Mengzhou Sun, Haocheng Ju, Peihao Wu, Bryan Dai, and Bin Dong. REAL‑Prover: Retrieval Aug- mented Lean Prover for Mathematical Reasoning, 2025. URL https://arxiv.org/abs/2505.20613

work page arXiv 2025
[40]

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanc- tot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362 (6419):1140–1144, 2018. doi: 10....

work page doi:10.1126/science.aar6404 2018
[41]

Analysis I, volume 37 of Texts and Readings in Mathematics

Terence Tao. Analysis I, volume 37 of Texts and Readings in Mathematics. Hindustan Book Agency, New Delhi, India, 3 edition, 2014

work page 2014
[42]

URL https://github.com/teorth/analysis

Terence Tao, 2025. URL https://github.com/teorth/analysis

work page 2025
[43]

Some corrections

Terence Tao. Some corrections. GitHub pull request #113 in teorth/analysis, 2025. URL https://github. com/teorth/analysis/pull/113

work page 2025
[44]

Machine‐assisted proof

Terence Tao. Machine‐assisted proof. Notices of the American Mathematical Society, 72(1):6–13, 2025

work page 2025
[45]

Repository for formalization of the Polynomial Freiman-Ruzsa conjecture, 2024

Terence Tao, Yaël Dillies, et al. Repository for formalization of the Polynomial Freiman-Ruzsa conjecture, 2024. URL https://teorth.github.io/pfr/

work page 2024
[46]

An In-Context Learning Agent for Formal Theorem-Proving

Amitayush Thakur, George Tsoukalas, Yeming Wen, Jimmy Xin, and Swarat Chaudhuri. An In-Context Learning Agent for Formal Theorem-Proving. In Proceedings of the 2024 Conference on Language Modeling (COLM 2024),

work page 2024
[47]

URL https://openreview.net/forum?id=V7HRrxXUhN

work page
[48]

A read-eval-print-loop for Lean 4

The Lean Prover Community. A read-eval-print-loop for Lean 4. GitHub repository, 2023. URL https:// github.com/leanprover-community/repl

work page 2023
[49]

Trinh, Yuhuai Wu, Quoc V

Trieu H. Trinh, Yuhuai Wu, Quoc V. Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations. Nature, 625(7961):476–482, 2024. doi: 10.1038/s41586-023-06747-5. URL https: //www.nature.com/articles/s41586-023-06747-5

work page doi:10.1038/s41586-023-06747-5 2024
[50]

PutnamBench: Evaluating Neural Theorem‑Provers on the Putnam Mathematical Com- petition, 2024

George Tsoukalas, Jasper Lee, John Jennings, Jimmy Xin, Michelle Ding, Michael Jennings, Amitayush Thakur, and Swarat Chaudhuri. PutnamBench: Evaluating Neural Theorem‑Provers on the Putnam Mathematical Com- petition, 2024. URL https://arxiv.org/abs/2407.11214

work page arXiv 2024
[51]

DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-Level Value Function

Haiming Wang, Ye Yuan, Zhengying Liu, Jianhao Shen, Yichun Yin, Jing Xiong, Enze Xie, Han Shi, Yujun Li, Lin Li, et al. DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-Level Value Function. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12632–12646, ...

work page doi:10.18653/v1/2023.acl-long.703 2023
[52]

LEGO-Prover: Neural Theorem Proving with Growing Libraries

Haiming Wang, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Heng Liao, and Xiaodan Liang. LEGO-Prover: Neural Theorem Proving with Growing Libraries. In Proceedings of the Twelfth International Conference on Learning Representations (ICLR 2024), 2024. URL https://procee...

work page 2024
[53]

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

Haiming Wang, Mert Unsal, Xiaohan Lin, Mantas Baksys, Junqi Liu, Marco Dos Santos, Flood Sung, Marina Vinyes, Zhenzhe Ying, Zekai Zhu, Jianqiao Lu, Hugues de Saxcé, Bolton Bailey, Chendong Song, Chenjun Xiao, Dehao Zhang, Ebony Zhang, Frederick Pu, Han Zhu, Jiawei Liu, Jonas Bayer, Julien Michel, Longhui Yu, Léo Dreyfus-Schmidt, Lewis Tunstall, Luigi Paga...

work page arXiv 2025
[54]

David J. Wu. Monte-Carlo Graph Search from First Principles. KataGo GitHub Documentation. URL https: //github.com/lightvector/KataGo/blob/master/docs/GraphSearch.md. 14

work page
[55]

Autoformalization with Large Language Models

Yuhuai Wu, Albert Qiaochu Jiang, Wenda Li, Markus Rabe, Charles Staats, Mateja Jamnik, and Christian Szegedy. Autoformalization with Large Language Models. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022), pages 32353–32368, 2022

work page 2022
[56]

Internlm2

Zijian Wu, Suozhi Huang, Zhejian Zhou, Huaiyuan Ying, Jiayu Wang, Dahua Lin, and Kai Chen. InternLM2.5- StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems, 2024. URL https://arXiv.org/abs/2410.15700

work page arXiv 2024
[57]

Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data.arXiv preprint arXiv:2405.14333, 2024

Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, and Xiaodan Liang. DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data, 2024. URL https://arxiv.org/abs/2405.14333

work page arXiv 2024
[58]

Huajian Xin, Z. Ren, Z. Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, Wenjun Gao, Qihao Zhu, Dejian Yang, Zhibin Gou, F. Wu, Z. Fuli Luo, and Chong Ruan. DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte- Carlo Tree Search. 2025. URL https://openreview.net/foru...

work page 2025
[59]

BFS‑Prover: Scalable Best‑First Tree Search for LLM‑based Automatic Theorem Proving, July 2025

Ran Xin, Chenguang Xi, Jie Yang, Feng Chen, Hang Wu, Xia Xiao, Yifan Sun, Shen Zheng, and Ming Ding. BFS‑Prover: Scalable Best‑First Tree Search for LLM‑based Automatic Theorem Proving, July 2025

work page 2025
[60]

Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, and Anima Anandkumar

Kaiyu Yang, Aidan M. Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, and Anima Anandkumar. LeanDojo: Theorem Proving with Retrieval-Augmented Language Mod- els. In Proceedings of the 2023 Conference on Neural Information Processing Systems Datasets & Benchmarks Track, pages 4668–4685, 2023

work page 2023
[61]

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models, 2025

Zhouliang Yu, Ruotian Peng, Keyi Ding, Yizhe Li, Zhongyuan Peng, Minghao Liu, Yifan Zhang, Zheng Yuan, Huajian Xin, Wenhao Huang, Yandong Wen, Ge Zhang, and Weiyang Liu. FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models, 2025. URL https://arxiv.org/abs/2505.02735

work page arXiv 2025
[62]

LeanAbell-Prover: Posttraining Scaling in Formal Reasoning, 2025

Jingyuan Zhang, Qi Wang, Xingguang Ji, Yahui Liu, Yang Yue, Fuzheng Zhang, Di Zhang, Guorui Zhou, and Kun Gai. LeanAbell-Prover: Posttraining Scaling in Formal Reasoning, 2025. URL https://arxiv.org/abs/ 2504.06122

work page arXiv 2025
[63]

miniF2F: a cross-system benchmark for formal Olympiad-level mathematics

Kunhao Zheng, Jesse Michael Han, and Stanislas Polu. miniF2F: a cross-system benchmark for formal Olympiad-level mathematics. International Conference on Learning Representations (ICLR), 2022. URL https://openreview.net/forum?id=9ZPegFuFTFv

work page 2022
[64]

Solving Formal Math Problems by Decomposition and Iterative Reflection, 2025

Yichi Zhou, Jianqiu Zhao, Yongxin Zhang, Bohan Wang, Siran Wang, Luoxin Chen, Jiahui Wang, Haowei Chen, Allan Jie, Xinbo Zhang, Haocheng Wang, Luong Trung, Rong Ye, Phan Nhat Hoang, Huishuai Zhang, Peng Sun, and Hang Li. Solving Formal Math Problems by Decomposition and Iterative Reflection, 2025. URL https://arxiv.org/abs/2507.15225

work page arXiv 2025
[65]

Premise Selection for a Lean Hammer

Thomas Zhu, Joshua Clune, Jeremy A vigad, Albert Qiaochu Jiang, and Sean Welleck. Premise Selection for a Lean Hammer. 2025. URL https://arxiv.org/abs/2506.07477. A Contributors and Acknowledgements Contributors, ordered alphabetically by last name: Tudor Achim, Alex Best, Alberto Bietti, Kevin Der, Mathïs Fédérico, Sergei Gukov 1, Daniel Halpern-Leistner...

work page arXiv 2025
[66]

x := by -- Since $l_n$ is surjective, for any $x \in B_{n+1}$, there exists $y \in B_n$ such that $l_n(g_n(y)) = g_{n+1}(x)$. have h_surjective_step : 8 x : B (n + 1), ∃ y : B n, l n (g n y) = g (n + 1) x := by exact fun x => by rcases hiso_l_n n |>.2 ( g ( n + 1 ) x ) with h y, hy i ; exact h Classical.choose ( hsurj_g_n n y ), by rw [ hy, Classical.choo...

work page