arxiv: 2601.13209 · v5 · submitted 2026-01-19 · 🧮 math.HO

Recognition: no theorem link

AI for Mathematics: Progress, Challenges, and Prospects

Haocheng Ju , Bin Dong

Authors on Pith no claims yet

Pith reviewed 2026-05-16 13:22 UTC · model grok-4.3

classification 🧮 math.HO

keywords AI for Mathematicsmathematical reasoningproblem-specific modelinggeneral-purpose modelingfoundation modelsAI testbedmathematical discovery

0 comments

The pith

AI for mathematics research splits into specialized task architectures and general foundation models for broader reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews the field of AI for Mathematics by dividing existing work into two complementary directions. One direction designs specialized AI architectures for particular mathematical tasks. The other develops foundation models that support wider reasoning, retrieval, and exploratory processes. Mathematics is presented as a strong testbed because its rigorous structure can drive improvements in general AI reasoning. The authors stress that success should be measured by the ability to discover meaningful results and unified theories, not merely by verifying formal correctness.

Core claim

We categorize existing research into two complementary directions: problem-specific modeling, involving the design of specialized architectures for distinct mathematical tasks, and general-purpose modeling, focusing on foundation models capable of broader reasoning, retrieval, and exploratory workflows. The rigorous nature of mathematics serves as a premier testbed for advancing general reasoning capabilities. The true value of a proof lies in the insights and tools it offers to the broader mathematical landscape rather than formal correctness alone.

What carries the argument

The two complementary directions of problem-specific modeling with specialized architectures and general-purpose modeling with foundation models for reasoning and exploration.

If this is right

AI systems can assist mathematical research through both narrow specialized tools and open-ended exploratory workflows.
Advances in general reasoning follow from treating mathematics as a demanding test environment.
Future systems should prioritize discovery of new results and unified theories over formal proof verification.
Key remaining challenges include scalability limits and integration of data-driven and symbolic methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Success here could transfer to other domains that require rigorous step-by-step reasoning such as physics or computer science.
The focus on discovery suggests AI tools that propose conjectures rather than only checking existing proofs.
If the two modeling directions converge, hybrid systems might emerge that combine task-specific efficiency with broad exploratory power.

Load-bearing premise

The rigorous nature of mathematics serves as a premier testbed for advancing general reasoning capabilities in AI.

What would settle it

Training large AI models on extensive mathematical data and then measuring whether they show clear gains on non-mathematical reasoning benchmarks compared with models trained on other domains would test the testbed claim.

Figures

Figures reproduced from arXiv: 2601.13209 by Bin Dong, Haocheng Ju.

**Figure 2.** Figure 2: Performance of five LLMs across different Peking University (PKU) exams. Left: [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Example of a proof step generation method and the complete proof it produces. [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

read the original abstract

AI for Mathematics (AI4Math) has emerged as a distinct field that leverages machine learning to navigate mathematical landscapes historically intractable for early symbolic systems. While mid-20th-century symbolic approaches successfully automated formal logic, they faced severe scalability limitations due to the combinatorial explosion of the search space. The recent integration of data-driven approaches has revitalized this pursuit. In this review, we provide a systematic overview of AI4Math, highlighting its primary focus on developing AI models to support mathematical research. Crucially, we emphasize that this is not merely the application of AI to mathematical activities; it also encompasses the development of stronger AI systems where the rigorous nature of mathematics serves as a premier testbed for advancing general reasoning capabilities. We categorize existing research into two complementary directions: problem-specific modeling, involving the design of specialized architectures for distinct mathematical tasks, and general-purpose modeling, focusing on foundation models capable of broader reasoning, retrieval, and exploratory workflows. We conclude by discussing key challenges and prospects, advocating for AI systems that go beyond facilitating formal correctness to enabling the discovery of meaningful results and unified theories, recognizing that the true value of a proof lies in the insights and tools it offers to the broader mathematical landscape.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a basic survey that introduces a descriptive split between problem-specific and general-purpose AI4Math but adds little original analysis or depth.

read the letter

The key takeaway is that this is a review paper offering a new categorization of AI for mathematics into problem-specific modeling and general-purpose modeling. It pulls together the progress from early symbolic systems to current data-driven methods and makes a reasonable case for why mathematics provides a strong testbed for advancing AI reasoning capabilities. The split into specialized architectures for tasks like theorem proving versus broader foundation models for reasoning and exploration is a clean way to frame the literature. The closing section on challenges and prospects, emphasizing discovery over just correctness, is a fair point. Where it falls short is in depth. The taxonomy feels like a useful lens rather than something rigorously derived, and without detailed case studies or comparisons of the two approaches, it doesn't give readers much to evaluate which direction is more promising. The paper doesn't introduce new results or challenge existing assumptions beyond summarizing them. This kind of overview is most useful for researchers entering the field or looking to see how their work fits into the bigger picture. It could spark good discussion in a reading group focused on AI applications. Since it organizes an active area without obvious errors in its high-level claims, it should go to peer review. I'd recommend sending it for peer review. Reviewers can push for more concrete examples to make the categories more actionable.

Referee Report

1 major / 2 minor

Summary. The manuscript is a survey of AI for Mathematics (AI4Math). It traces the field's development from mid-20th-century symbolic systems limited by combinatorial explosion to recent data-driven methods. The central contribution is a two-way taxonomy that partitions existing work into problem-specific modeling (specialized architectures for distinct tasks) and general-purpose modeling (foundation models supporting broader reasoning, retrieval, and exploratory workflows). The review closes with challenges, prospects, and a call for systems that enable discovery of meaningful results and unified theories rather than formal correctness alone.

Significance. If the taxonomy is adopted, the paper supplies a practical organizing lens for a rapidly expanding literature at the intersection of machine learning and mathematics. By treating mathematics explicitly as a testbed for general reasoning, the survey underscores bidirectional value: AI tools for mathematicians and mathematical rigor as a driver of AI progress. The emphasis on discovery-oriented prospects, rather than verification alone, identifies an ambitious but coherent research direction that could shape funding and collaboration priorities.

major comments (1)

[§2] §2 (taxonomy section): the boundary between 'problem-specific' and 'general-purpose' modeling is presented as complementary yet is not equipped with explicit decision criteria or overlap metrics. Several cited works (e.g., LLM-based theorem provers applied to narrow domains) appear to straddle both categories; without a reproducible classification protocol the taxonomy risks becoming descriptive rather than predictive.

minor comments (2)

[Abstract] Abstract: the term 'exploratory workflows' is introduced without a one-sentence gloss; a parenthetical definition would aid readers who encounter the abstract first.
[Conclusion] Conclusion: the phrase 'unified theories' is invoked as an aspirational goal but is not linked to any concrete open problems (e.g., Langlands program or Millennium Prize problems) that the surveyed methods might address.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and recommendation for minor revision. The taxonomy is intended as a high-level organizing lens rather than a rigid classifier, but we agree that additional guidance on boundaries and overlaps will strengthen the presentation.

read point-by-point responses

Referee: [§2] §2 (taxonomy section): the boundary between 'problem-specific' and 'general-purpose' modeling is presented as complementary yet is not equipped with explicit decision criteria or overlap metrics. Several cited works (e.g., LLM-based theorem provers applied to narrow domains) appear to straddle both categories; without a reproducible classification protocol the taxonomy risks becoming descriptive rather than predictive.

Authors: We appreciate the referee's point that the taxonomy in §2 would benefit from more explicit guidance. The two directions are presented as complementary rather than mutually exclusive precisely to accommodate hybrids; the manuscript already notes that many recent systems combine elements of both. To address the concern directly, we will add a short subsection in the revised §2 that supplies classification criteria based on primary modeling objective and resource allocation: problem-specific modeling is characterized by architectures optimized for a narrow task distribution with limited transfer (e.g., custom GNNs or reinforcement-learning agents for specific combinatorial problems), while general-purpose modeling is characterized by large-scale pre-training followed by broad adaptation across tasks. Overlap will be illustrated with concrete examples, including LLM-based theorem provers restricted to narrow domains, which we will label as hybrid cases whose classification depends on whether the dominant contribution is task-specific engineering or the demonstration of general reasoning transfer. This addition will make the taxonomy more operational without altering its core message. revision: yes

Circularity Check

0 steps flagged

Survey paper exhibits no circularity; taxonomy is descriptive

full rationale

This is a literature review that organizes existing AI4Math research into two descriptive categories (problem-specific modeling and general-purpose modeling) drawn from cited prior works. The claim that mathematics serves as a premier testbed for general reasoning is presented as motivating context from the surveyed literature, not as a new derivation or prediction. No equations, fitted parameters, self-referential predictions, or load-bearing self-citations appear; the paper contains no derivations that reduce to its own inputs by construction. The content is self-contained against external benchmarks and draws support from independently cited sources.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The review rests on the domain assumption that mathematics is a premier testbed for AI reasoning and that data-driven approaches have revitalized symbolic methods; no free parameters or invented entities are introduced.

axioms (1)

domain assumption the rigorous nature of mathematics serves as a premier testbed for advancing general reasoning capabilities
Explicitly stated in the abstract as a core emphasis of the field.

pith-pipeline@v0.9.0 · 5503 in / 1033 out tokens · 34292 ms · 2026-05-16T13:22:11.323123+00:00 · methodology

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MathConstraint: Automated Generation of Verified Combinatorial Reasoning Instances for LLMs
cs.LG 2026-05 unverdicted novelty 8.0

MathConstraint generates scalable, automatically verifiable combinatorial problems where LLMs achieve 18.5-66.9% accuracy without tools but roughly double that with solver access.
Just Type It in Isabelle! AI Agents Drafting, Mechanizing, and Generalizing from Human Hints
cs.LO 2026-04 unverdicted novelty 6.0

A metatheoretical specification and Isabelle/HOL formalization of minimal type annotations for rank-one polymorphic terms is given, shown via human-AI collaborative workflows for drafting and mechanizing proofs.
Automated Conjecture Resolution with Formal Verification
cs.LG 2026-04 unverdicted novelty 6.0

An AI framework combining informal reasoning and formal verification resolves an open commutative algebra problem and produces a Lean 4-checked proof with minimal human input.

Reference graph

Works this paper leans on

188 extracted references · 188 canonical work pages · cited by 3 Pith papers · 13 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. GPT-4 Technical Report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Aristotle: IMO-level Automated Theorem Proving

Tudor Achim, Alex Best, Alberto Bietti, Kevin Der, Math¨ ıs F´ ed´ erico, Sergei Gukov, Daniel Halpern-Leistner, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, et al. Aristotle: IMO-Level Automated Theorem Proving.arXiv preprint arXiv:2510.01346, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Towards a Mathematics Formalisation Assistant Using Large Language Models

Ayush Agrawal, Siddhartha Gadgil, Navin Goyal, Ashvni Narayanan, and Anand Tadi- patri. Towards a Mathematics Formalisation Assistant Using Large Language Models. arXiv preprint arXiv:2211.07524, 2022

work page arXiv 2022
[4]

Global Lyapunov Functions: A Long-Standing Open Problem in Mathematics, with Symbolic Transformers.Advances in Neural Information Processing Systems, 37:93643–93670, 2024

Alberto Alfarano, Fran¸ cois Charton, and Amaury Hayat. Global Lyapunov Functions: A Long-Standing Open Problem in Mathematics, with Symbolic Transformers.Advances in Neural Information Processing Systems, 37:93643–93670, 2024

work page 2024
[5]

Machine Learning Class Numbers of Real Quadratic Fields.International Journal of Data Science in the Mathematical Sciences, 01(02):107–134, 2023

Malik Amir, Yang-Hui He, Kyu-Hwan Lee, Thomas Oliver, and Eldar Sultanow. Machine Learning Class Numbers of Real Quadratic Fields.International Journal of Data Science in the Mathematical Sciences, 01(02):107–134, 2023

work page 2023
[6]

Solving Inverse Problems Using Data-Driven Models.Acta Numerica, 28:1–174, 2019

Simon Arridge, Peter Maass, Ozan ¨Oktem, and Carola-Bibiane Sch¨ onlieb. Solving Inverse Problems Using Data-Driven Models.Acta Numerica, 28:1–174, 2019

work page 2019
[7]

LeanExplore: A Search Engine for Lean 4 Declarations.arXiv preprint arXiv:2506.11085, 2025

Justin Asher. LeanExplore: A Search Engine for Lean 4 Declarations.arXiv preprint arXiv:2506.11085, 2025

work page arXiv 2025
[8]

From Seeing Why to Checking Everything.https://axiommath.ai/ territory/from-seeing-why-to-checking-everything, 2025

Axiom Math. From Seeing Why to Checking Everything.https://axiommath.ai/ territory/from-seeing-why-to-checking-everything, 2025

work page 2025
[9]

Proofnet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics.arXiv preprint arXiv:2302.12433, 2023

Zhangir Azerbayev, Bartosz Piotrowski, Hailey Schoelkopf, Edward W Ayers, Dragomir Radev, and Jeremy Avigad. Proofnet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics.arXiv preprint arXiv:2302.12433, 2023

work page arXiv 2023
[10]

Jiang, Jia Deng, Stella Biderman, and Sean Welleck

Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen Marcus McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, and Sean Welleck. Llemma: An Open Language Model for Mathematics. InThe Twelfth International Conference on Learning Representations, 2024. 20

work page 2024
[11]

Holist: An Environment for Machine Learning of Higher Order Logic Theorem Proving

Kshitij Bansal, Sarah Loos, Markus Rabe, Christian Szegedy, and Stewart Wilcox. Holist: An Environment for Machine Learning of Higher Order Logic Theorem Proving. InIn- ternational Conference on Machine Learning, pages 454–463. PMLR, 2019

work page 2019
[12]

Effective Brascamp-Lieb inequalities

Timoth´ ee B´ enard and Weikun He. Effective Brascamp–Lieb Inequalities.arXiv preprint arXiv:2511.11091, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

An ML Approach to Resolution of Singularities

Gergely B´ erczi, Honglu Fan, and Mingcong Zeng. An ML Approach to Resolution of Singularities. InTopological, Algebraic and Geometric Learning Workshops 2023, pages 469–487. PMLR, 2023

work page 2023
[14]

Machine Learning Kreuzer–Skarke Calabi–Yau Threefolds.International Journal of Modern Physics A, 40(28):2550097, 2025

Per Berglund, Ben Campbell, and Vishnu Jejjala. Machine Learning Kreuzer–Skarke Calabi–Yau Threefolds.International Journal of Modern Physics A, 40(28):2550097, 2025

work page 2025
[15]

New Calabi–Yau Manifolds from Genetic Algorithms.Physics Letters B, 850:138504, 2024

Per Berglund, Yang-Hui He, Elli Heyes, Edward Hirst, Vishnu Jejjala, and Andre Lukas. New Calabi–Yau Manifolds from Genetic Algorithms.Physics Letters B, 850:138504, 2024

work page 2024
[16]

Berman, Yang-Hui He, and Edward Hirst

David S. Berman, Yang-Hui He, and Edward Hirst. Machine Learning Calabi-Yau Hy- persurfaces.Phys. Rev. D, 105:066002, Mar 2022

work page 2022
[17]

Towards Combinatorial Invariance for Kazhdan-Lusztig Polynomials.Representation The- ory of the American Mathematical Society, 26(37):1145–1191, 2022

Charles Blundell, Lars Buesing, Alex Davies, Petar Veliˇ ckovi´ c, and Geordie Williamson. Towards Combinatorial Invariance for Kazhdan-Lusztig Polynomials.Representation The- ory of the American Mathematical Society, 26(37):1145–1191, 2022

work page 2022
[18]

Murmurations of Modular Forms in the Weight Aspect.Algebra and Number Theory, 2025

Jonathan W Bober, Andrew R Booker, M Lee, and David Lowry-Duda. Murmurations of Modular Forms in the Weight Aspect.Algebra and Number Theory, 2025

work page 2025
[19]

Promising Directions of Machine Learning for Partial Differential Equations.Nature Computational Science, 4(7):483–494, 2024

Steven L Brunton and J Nathan Kutz. Promising Directions of Machine Learning for Partial Differential Equations.Nature Computational Science, 4(7):483–494, 2024

work page 2024
[20]

The Motivic Class of the Space of Genus 0 Maps to the Flag Variety.arXiv preprint arXiv:2601.07222, 2026

Jim Bryan, Bal´ azs Elek, Freddie Manners, George Salafatinos, and Ravi Vakil. The Motivic Class of the Space of Genus 0 Maps to the Flag Variety.arXiv preprint arXiv:2601.07222, 2026

work page arXiv 2026
[21]

Flow-Based Extremal Mathematical Structure Discovery, 2026

Gergely B´ erczi, Baran Hashemi, and Jonas Kl¨ uver. Flow-Based Extremal Mathematical Structure Discovery, 2026

work page 2026
[22]

Machine Learning in the String Landscape.Journal of High Energy Physics, 2017(9):1–36, 2017

Jonathan Carifio, James Halverson, Dmitri Krioukov, and Brent D Nelson. Machine Learning in the String Landscape.Journal of High Energy Physics, 2017(9):1–36, 2017

work page 2017
[23]

Patternboost: Constructions in Mathematics with a Little Help from AI.arXiv preprint arXiv:2411.00566, 2024

Fran¸ cois Charton, Jordan S Ellenberg, Adam Zsolt Wagner, and Geordie Williamson. Patternboost: Constructions in Mathematics with a Little Help from AI.arXiv preprint arXiv:2411.00566, 2024

work page arXiv 2024
[24]

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience.arXiv preprint arXiv:2512.17260, 2025

Jiangjie Chen, Wenxiang Chen, Jiacheng Du, Jinyi Hu, Zhicheng Jiang, Allan Jie, Xiao- ran Jin, Xing Jin, Chenggang Li, Wenlei Shi, Zhihong Wang, Mingxuan Wang, Chenrui Wei, Shufa Wei, Huajian Xin, Fan Yang, Weihao Gao, Zheng Yuan, Tianyang Zhan, Zeyu Zheng, Tianxi Zhou, and Thomas Hanwen Zhu. Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving ...

work page arXiv 2025
[25]

Learning to Optimize: A Primer and A Benchmark.Journal of Machine Learning Research, 23(189):1–59, 2022

Tianlong Chen, Xiaohan Chen, Wuyang Chen, Howard Heaton, Jialin Liu, Zhangyang Wang, and Wotao Yin. Learning to Optimize: A Primer and A Benchmark.Journal of Machine Learning Research, 23(189):1–59, 2022. 21

work page 2022
[26]

Gold-Medalist Performance in Solving Olympiad Geometry with AlphaGeometry2.arXiv preprint arXiv:2502.03544, 2025

Yuri Chervonyi, Trieu H Trinh, Miroslav Olˇ s´ ak, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V Le, and Thang Luong. Gold-Medalist Performance in Solving Olympiad Geometry with AlphaGeometry2.arXiv preprint arXiv:2502.03544, 2025

work page arXiv 2025
[27]

CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs.arXiv preprint arXiv:2502.18663, 2025

A Chervov, A Soibelman, S Lytkin, I Kiselev, S Fironov, A Lukyanenko, A Dolgorukova, A Ogurtsov, F Petrov, S Krymskii, et al. CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs.arXiv preprint arXiv:2502.18663, 2025

work page arXiv 2025
[28]

A Machine Learning Approach That Beats Large Rubik’s Cubes.arXiv preprint arXiv:2502.13266, 2025

Alexander Chervov, Kirill Khoruzhii, Nikita Bukhal, Jalal Naghiyev, Vladislav Zamkovoy, Ivan Koltsov, Lyudmila Cheldieva, Arsenii Sychev, Arsenii Lenin, Mark Obozov, et al. A Machine Learning Approach That Beats Large Rubik’s Cubes.arXiv preprint arXiv:2502.13266, 2025

work page arXiv 2025
[29]

Automated Reasoning in Geometry.Handbook of automated reasoning, 1:707–749, 2001

Shang-Ching Chou and Xiao-Shan Gao. Automated Reasoning in Geometry.Handbook of automated reasoning, 1:707–749, 2001

work page 2001
[30]

A Deductive Database Ap- proach to Automated Geometry Theorem Proving and Discovering.Journal of Automated Reasoning, 25(3):219–246, 2000

Shang-Ching Chou, Xiao-Shan Gao, and Jing-Zhong Zhang. A Deductive Database Ap- proach to Automated Geometry Theorem Proving and Discovering.Journal of Automated Reasoning, 25(3):219–246, 2000

work page 2000
[31]

D. A. Clarke. Martin Davis. A program for Presburger’s algorithm. Summaries of talks presented at the Summer Institute for Symbolic Logic, Cornell University, 1957, 2nd edn., Communications Research Division, Institute for Defense Analyses, Princeton, N.J., 1960, pp. 215–223.Journal of Symbolic Logic, 31(1):138–138, 1966

work page 1957
[32]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training Verifiers to Solve Math Word Problems.arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[33]

Evaluating Language Models for Mathematics Through Interactions.Proceedings of the National Academy of Sciences, 121(24):e2318124121, 2024

Katherine M Collins, Albert Q Jiang, Simon Frieder, Lionel Wong, Miri Zilka, Umang Bhatt, Thomas Lukasiewicz, Yuhuai Wu, Joshua B Tenenbaum, William Hart, et al. Evaluating Language Models for Mathematics Through Interactions.Proceedings of the National Academy of Sciences, 121(24):e2318124121, 2024

work page 2024
[34]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.arXiv preprint arXiv:2507.06261, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[35]

Princeton University Press, 2009

MARIANA COOK.Mathematicians: An Outer View of the Inner World. Princeton University Press, 2009

work page 2009
[36]

Hammer for Coq: Automation for Dependent Type Theory.Journal of Automated Reasoning, 61(1):423–453, 2018

Lukasz Czajka and Cezary Kaliszyk. Hammer for Coq: Automation for Dependent Type Theory.Journal of Automated Reasoning, 61(1):423–453, 2018

work page 2018
[37]

The Signature and Cusp Geometry of Hyperbolic Knots.Geometry & Topology, 28(5):2313–2343, 2024

Alex Davies, Andr´ as Juh´ asz, Marc Lackenby, and Nenad Tomaˇ sev. The Signature and Cusp Geometry of Hyperbolic Knots.Geometry & Topology, 28(5):2313–2343, 2024

work page 2024
[38]

Advancing Mathematics by Guiding Human Intuition with AI.Nature, 600(7887):70–74, 2021

Alex Davies, Petar Veliˇ ckovi´ c, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomaˇ sev, Richard Tanburn, Peter Battaglia, Charles Blundell, Andr´ as Juh´ asz, et al. Advancing Mathematics by Guiding Human Intuition with AI.Nature, 600(7887):70–74, 2021

work page 2021
[39]

The Prehistory and Early History of Automated Deduction.Automation of Reasoning 1: Classical Papers on Computational Logic 1957-1966, pages 1–28, 1983

Martin Davis. The Prehistory and Early History of Automated Deduction.Automation of Reasoning 1: Classical Papers on Computational Logic 1957-1966, pages 1–28, 1983. 22

work page 1957
[40]

The Lean 4 Theorem Prover and Programming Language

Leonardo de Moura and Sebastian Ullrich. The Lean 4 Theorem Prover and Programming Language. In Andr´ e Platzer and Geoff Sutcliffe, editors,Automated Deduction - CADE 28 - 28th International Conference on Automated Deduction, Virtual Event, July 12-15, 2021, Proceedings, volume 12699 ofLecture Notes in Computer Science, pages 625–635. Springer, 2021

work page 2021
[41]

The Lean Theorem Prover (System Description)

Leonardo Mendon¸ ca de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. The Lean Theorem Prover (System Description). In Amy P. Felty and Aart Middeldorp, editors,Automated Deduction - CADE-25 - 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings, volume 9195 ofLecture Notes in Com...

work page 2015
[42]

On Mathematical Modeling in Image Reconstruction and Beyond

Bin Dong. On Mathematical Modeling in Image Reconstruction and Beyond. InProc. Int. Cong. Math, volume 7, pages 5420–5449, 2022

work page 2022
[43]

Machine Learning Assisted Exploration for Affine Deligne–Lusztig Varieties.Peking Mathematical Journal, pages 1–50, 2024

Bin Dong, Xuhua He, Pengfei Jin, Felix Schremmer, and Qingchao Yu. Machine Learning Assisted Exploration for Affine Deligne–Lusztig Varieties.Peking Mathematical Journal, pages 1–50, 2024

work page 2024
[44]

STP: Self-Play LLM Theorem Provers with Iterative Con- jecturing and Proving

Kefan Dong and Tengyu Ma. STP: Self-Play LLM Theorem Provers with Iterative Con- jecturing and Proving. InForty-second International Conference on Machine Learning, 2025

work page 2025
[45]

Diffusion Models for Cayley Graphs.arXiv preprint arXiv:2503.05558, 2025

Michael R Douglas and Kit Fraser-Taliente. Diffusion Models for Cayley Graphs.arXiv preprint arXiv:2503.05558, 2025

work page arXiv 2025
[46]

Lean LSP MCP: Tools for Agentic Interaction with the Lean Theorem Prover.https://github.com/oOo0oOo/lean-lsp-mcp, 2025

Oliver Dressler. Lean LSP MCP: Tools for Agentic Interaction with the Lean Theorem Prover.https://github.com/oOo0oOo/lean-lsp-mcp, 2025

work page 2025
[47]

Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions.arXiv preprint arXiv:2512.00097, 2025

Boyan Duan, Xiao Liang, Shuai Lu, Yaoxiang Wang, Yelong Shen, Kai-Wei Chang, Ying Nian Wu, Mao Yang, Weizhu Chen, and Yeyun Gong. Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions.arXiv preprint arXiv:2512.00097, 2025

work page arXiv 2025
[48]

Machine Learning for Complete Intersection Calabi- Yau Manifolds: A Methodological Study.Phys

Harold Erbin and Riccardo Finotello. Machine Learning for Complete Intersection Calabi- Yau Manifolds: A Methodological Study.Phys. Rev. D, 103:126014, Jun 2021

work page 2021
[49]

Premise Selection in Natural Language Mathematical Texts

Deborah Ferreira and Andr´ e Freitas. Premise Selection in Natural Language Mathematical Texts. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 7365–7374. Association for Computational Linguistics, 2020

work page 2020
[50]

Baldur: Whole-Proof Gen- eration and Repair with Large Language Models

Emily First, Markus N Rabe, Talia Ringer, and Yuriy Brun. Baldur: Whole-Proof Gen- eration and Repair with Large Language Models. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Soft- ware Engineering, pages 1229–1241, 2023

work page 2023
[51]

Mathematical Capa- bilities of ChatGPT

Simon Frieder, Luca Pinchetti, Chevalier Chevalier, Ryan-Rhys Griffiths, Tommaso Sal- vatori, Thomas Lukasiewicz, Philipp Petersen, and Julius Berner. Mathematical Capa- bilities of ChatGPT. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 27699–27744. Curra...

work page 2023
[52]

Towards Automating Formalisation of Theorem Statements Using Large Language Models

Siddhartha Gadgil, Anand Rao Tadipatri, Ayush Agrawal, Ashvni Narayanan, and Navin Goyal. Towards Automating Formalisation of Theorem Statements Using Large Language Models. In36th Conference on Neural Information Processing Systems (NeurIPS 2022) Workshop on MATH-AI, 2022. 23

work page 2022
[53]

A Semantic Search Engine for Mathlib4

Guoxiong Gao, Haocheng Ju, Jiedong Jiang, Zihan Qin, and Bin Dong. A Semantic Search Engine for Mathlib4. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Findings of the Association for Computational Linguistics: EMNLP 2024, pages 8001– 8013, Miami, Florida, USA, November 2024. Association for Computational Linguistics

work page 2024
[54]

Herald: A Natural Language Annotated Lean 4 Dataset

Guoxiong Gao, Yutong Wang, Jiedong Jiang, Qi Gao, Zihan Qin, Tianyi Xu, and Bin Dong. Herald: A Natural Language Annotated Lean 4 Dataset. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[55]

The Math Retrieval System of ICST for NTCIR-12 MathIR Task

Liangcai Gao, Ke Yuan, Yuehan Wang, Zhuoren Jiang, and Zhi Tang. The Math Retrieval System of ICST for NTCIR-12 MathIR Task. In Noriko Kando, Tetsuya Sakai, and Mark Sanderson, editors,Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, June 7-10,

work page
[56]

National Institute of Informatics (NII), 2016

work page 2016
[57]

Mathe- matical Exploration and Discovery at Scale.arXiv preprint arXiv:2511.02864, 2025

Bogdan Georgiev, Javier G´ omez-Serrano, Terence Tao, and Adam Zsolt Wagner. Mathe- matical Exploration and Discovery at Scale.arXiv preprint arXiv:2511.02864, 2025

work page arXiv 2025
[58]

Frontiermath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI.arXiv preprint arXiv:2411.04872, 2024

Elliot Glazer, Ege Erdil, Tamay Besiroglu, Diego Chicharro, Evan Chen, Alex Gunning, Caroline Falkman Olsson, Jean-Stanislas Denain, Anson Ho, Emily de Oliveira Santos, et al. Frontiermath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI.arXiv preprint arXiv:2411.04872, 2024

work page arXiv 2024
[59]

Reinforcement Learning for Hierarchical Proof Generation in Lean 4

Fabian Gloeckle, Alex Gu, Gabriel Synnaeve, and Amaury Hayat. Reinforcement Learning for Hierarchical Proof Generation in Lean 4. InThe 5th Workshop on Mathematical Reasoning and AI at NeurIPS 2025, 2025

work page 2025
[60]

ABEL: Sam- ple Efficient Online Reinforcement Learning for Neural Theorem Proving

Fabian Gloeckle, Jannis Limperg, Gabriel Synnaeve, and Amaury Hayat. ABEL: Sam- ple Efficient Online Reinforcement Learning for Neural Theorem Proving. InThe 4th Workshop on Mathematical Reasoning and AI at NeurIPS’24, 2024

work page 2024
[61]

On Formally Undecidable Propositions of Principia Mathematica and Related Systems

Kurt G¨ odel. On Formally Undecidable Propositions of Principia Mathematica and Related Systems. 1931

work page 1931
[62]

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solv- ing

Zhibin Gou, Zhihong Shao, Yeyun Gong, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen, et al. ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solv- ing. InThe Twelfth International Conference on Learning Representations

work page
[63]

Searching for Ribbons with Machine Learning.Machine Learning: Science and Technology, 6(2):025065, jun 2025

Sergei Gukov, James Halverson, Ciprian Manolescu, and Fabian Ruehle. Searching for Ribbons with Machine Learning.Machine Learning: Science and Technology, 6(2):025065, jun 2025

work page 2025
[64]

Machine Learning BPS Spectra and the Gap Conjecture.Phys

Sergei Gukov and Rak-Kyeong Seong. Machine Learning BPS Spectra and the Gap Conjecture.Phys. Rev. D, 110:046016, Aug 2024

work page 2024
[65]

DeepSeek-R1 Incentivizes Reasoning in LLMs Through Reinforcement Learning.Nature, 645(8081):633–638, 2025

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. DeepSeek-R1 Incentivizes Reasoning in LLMs Through Reinforcement Learning.Nature, 645(8081):633–638, 2025

work page 2025
[66]

Can Transformers Do Enumerative Geometry? InThe Thirteenth International Conference on Learning Representations, 2025

Baran Hashemi, Roderic Guigo Corominas, and Alessandro Giacchetto. Can Transformers Do Enumerative Geometry? InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[67]

Murmurations of Elliptic Curves.Experimental Mathematics, 34(3):528–540, 2025

Yang-Hui He, Kyu-Hwan Lee, Thomas Oliver, and Alexey Pozdnyakov. Murmurations of Elliptic Curves.Experimental Mathematics, 34(3):528–540, 2025. 24

work page 2025
[68]

FGeo-TP: A Language Model-Enhanced Solver for Euclidean Geometry Problems.Symmetry, 16(4):421, 2024

Yiming He, Jia Zou, Xiaokai Zhang, Na Zhu, and Tuo Leng. FGeo-TP: A Language Model-Enhanced Solver for Euclidean Geometry Problems.Symmetry, 16(4):421, 2024

work page 2024
[69]

Measuring Mathematical Problem Solving With the MATH Dataset.NeurIPS, 2021

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring Mathematical Problem Solving With the MATH Dataset.NeurIPS, 2021

work page 2021
[70]

Sine Qua Non for Large Theory Reasoning

Kryˇ stof Hoder and Andrei Voronkov. Sine Qua Non for Large Theory Reasoning. In International Conference on Automated Deduction, pages 299–314. Springer, 2011

work page 2011
[71]

GamePad: A Learning Environment for Theorem Proving

Daniel Huang, Prafulla Dhariwal, Dawn Song, and Ilya Sutskever. Gamepad: A Learning Environment for Theorem Proving.arXiv preprint arXiv:1806.00608, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[72]

FormaRL: Enhancing Autoformalization with No Labeled Data

Yanxing Huang, Xinling Jin, Sijie Liang, Fuwen Luo, Peng Li, and Yang Liu. FormaRL: Enhancing Autoformalization with No Labeled Data. InSecond Conference on Language Modeling, 2025

work page 2025
[73]

Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline.arXiv preprint arXiv:2507.15855, 2025

Yichen Huang and Lin F Yang. Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline.arXiv preprint arXiv:2507.15855, 2025

work page arXiv 2025
[74]

Olympiad-Level Formal Mathematical Reasoning with Reinforcement Learning.Nature, pages 1–3, 2025

Thomas Hubert, Rishi Mehta, Laurent Sartran, Mikl´ os Z Horv´ ath, Goran ˇZuˇ zi´ c, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Masoom, et al. Olympiad-Level Formal Mathematical Reasoning with Reinforcement Learning.Nature, pages 1–3, 2025

work page 2025
[75]

Alemi, Niklas E´ en, Fran¸ cois Chollet, and Josef Urban

Geoffrey Irving, Christian Szegedy, Alexander A. Alemi, Niklas E´ en, Fran¸ cois Chollet, and Josef Urban. DeepMath: Deep Sequence Models for Premise Selection.Advances in Neural Information Processing Systems (NeurIPS), 29, 2016

work page 2016
[76]

Point Convergence of Nesterov’s Accelerated Gradient Method: An AI-Assisted Proof.arXiv preprint arXiv:2510.23513, 2025

Uijeong Jang and Ernest K Ryu. Point Convergence of Nesterov’s Accelerated Gradient Method: An AI-Assisted Proof.arXiv preprint arXiv:2510.23513, 2025

work page arXiv 2025
[77]

Multilingual Mathematical Autoformal- ization.arXiv preprint arXiv:2311.03755, 2023

Albert Q Jiang, Wenda Li, and Mateja Jamnik. Multilingual Mathematical Autoformal- ization.arXiv preprint arXiv:2311.03755, 2023

work page arXiv 2023
[78]

Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers.Advances in Neural Information Processing Systems, 35:8360–8373, 2022

Albert Qiaochu Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzyg´ o´ zd´ z, Piotr Mi lo´ s, Yuhuai Wu, and Mateja Jamnik. Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers.Advances in Neural Information Processing Systems, 35:8360–8373, 2022

work page 2022
[79]

Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs

Albert Qiaochu Jiang, Sean Welleck, Jin Peng Zhou, Timothee Lacroix, Jiacheng Liu, Wenda Li, Mateja Jamnik, Guillaume Lample, and Yuhuai Wu. Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[80]

FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels.arXiv preprint arXiv:2511.02872, 2025

Jiedong Jiang, Wanyi He, Yuefeng Wang, Guoxiong Gao, Yongle Hu, Jingting Wang, Nailing Guan, Peihao Wu, Chunbo Dai, Liang Xiao, et al. FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels.arXiv preprint arXiv:2511.02872, 2025

work page arXiv 2025

Showing first 80 references.