Recognition: no theorem link
AI for Mathematics: Progress, Challenges, and Prospects
Pith reviewed 2026-05-16 13:22 UTC · model grok-4.3
The pith
AI for mathematics research splits into specialized task architectures and general foundation models for broader reasoning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We categorize existing research into two complementary directions: problem-specific modeling, involving the design of specialized architectures for distinct mathematical tasks, and general-purpose modeling, focusing on foundation models capable of broader reasoning, retrieval, and exploratory workflows. The rigorous nature of mathematics serves as a premier testbed for advancing general reasoning capabilities. The true value of a proof lies in the insights and tools it offers to the broader mathematical landscape rather than formal correctness alone.
What carries the argument
The two complementary directions of problem-specific modeling with specialized architectures and general-purpose modeling with foundation models for reasoning and exploration.
If this is right
- AI systems can assist mathematical research through both narrow specialized tools and open-ended exploratory workflows.
- Advances in general reasoning follow from treating mathematics as a demanding test environment.
- Future systems should prioritize discovery of new results and unified theories over formal proof verification.
- Key remaining challenges include scalability limits and integration of data-driven and symbolic methods.
Where Pith is reading between the lines
- Success here could transfer to other domains that require rigorous step-by-step reasoning such as physics or computer science.
- The focus on discovery suggests AI tools that propose conjectures rather than only checking existing proofs.
- If the two modeling directions converge, hybrid systems might emerge that combine task-specific efficiency with broad exploratory power.
Load-bearing premise
The rigorous nature of mathematics serves as a premier testbed for advancing general reasoning capabilities in AI.
What would settle it
Training large AI models on extensive mathematical data and then measuring whether they show clear gains on non-mathematical reasoning benchmarks compared with models trained on other domains would test the testbed claim.
Figures
read the original abstract
AI for Mathematics (AI4Math) has emerged as a distinct field that leverages machine learning to navigate mathematical landscapes historically intractable for early symbolic systems. While mid-20th-century symbolic approaches successfully automated formal logic, they faced severe scalability limitations due to the combinatorial explosion of the search space. The recent integration of data-driven approaches has revitalized this pursuit. In this review, we provide a systematic overview of AI4Math, highlighting its primary focus on developing AI models to support mathematical research. Crucially, we emphasize that this is not merely the application of AI to mathematical activities; it also encompasses the development of stronger AI systems where the rigorous nature of mathematics serves as a premier testbed for advancing general reasoning capabilities. We categorize existing research into two complementary directions: problem-specific modeling, involving the design of specialized architectures for distinct mathematical tasks, and general-purpose modeling, focusing on foundation models capable of broader reasoning, retrieval, and exploratory workflows. We conclude by discussing key challenges and prospects, advocating for AI systems that go beyond facilitating formal correctness to enabling the discovery of meaningful results and unified theories, recognizing that the true value of a proof lies in the insights and tools it offers to the broader mathematical landscape.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a survey of AI for Mathematics (AI4Math). It traces the field's development from mid-20th-century symbolic systems limited by combinatorial explosion to recent data-driven methods. The central contribution is a two-way taxonomy that partitions existing work into problem-specific modeling (specialized architectures for distinct tasks) and general-purpose modeling (foundation models supporting broader reasoning, retrieval, and exploratory workflows). The review closes with challenges, prospects, and a call for systems that enable discovery of meaningful results and unified theories rather than formal correctness alone.
Significance. If the taxonomy is adopted, the paper supplies a practical organizing lens for a rapidly expanding literature at the intersection of machine learning and mathematics. By treating mathematics explicitly as a testbed for general reasoning, the survey underscores bidirectional value: AI tools for mathematicians and mathematical rigor as a driver of AI progress. The emphasis on discovery-oriented prospects, rather than verification alone, identifies an ambitious but coherent research direction that could shape funding and collaboration priorities.
major comments (1)
- [§2] §2 (taxonomy section): the boundary between 'problem-specific' and 'general-purpose' modeling is presented as complementary yet is not equipped with explicit decision criteria or overlap metrics. Several cited works (e.g., LLM-based theorem provers applied to narrow domains) appear to straddle both categories; without a reproducible classification protocol the taxonomy risks becoming descriptive rather than predictive.
minor comments (2)
- [Abstract] Abstract: the term 'exploratory workflows' is introduced without a one-sentence gloss; a parenthetical definition would aid readers who encounter the abstract first.
- [Conclusion] Conclusion: the phrase 'unified theories' is invoked as an aspirational goal but is not linked to any concrete open problems (e.g., Langlands program or Millennium Prize problems) that the surveyed methods might address.
Simulated Author's Rebuttal
We thank the referee for the constructive review and recommendation for minor revision. The taxonomy is intended as a high-level organizing lens rather than a rigid classifier, but we agree that additional guidance on boundaries and overlaps will strengthen the presentation.
read point-by-point responses
-
Referee: [§2] §2 (taxonomy section): the boundary between 'problem-specific' and 'general-purpose' modeling is presented as complementary yet is not equipped with explicit decision criteria or overlap metrics. Several cited works (e.g., LLM-based theorem provers applied to narrow domains) appear to straddle both categories; without a reproducible classification protocol the taxonomy risks becoming descriptive rather than predictive.
Authors: We appreciate the referee's point that the taxonomy in §2 would benefit from more explicit guidance. The two directions are presented as complementary rather than mutually exclusive precisely to accommodate hybrids; the manuscript already notes that many recent systems combine elements of both. To address the concern directly, we will add a short subsection in the revised §2 that supplies classification criteria based on primary modeling objective and resource allocation: problem-specific modeling is characterized by architectures optimized for a narrow task distribution with limited transfer (e.g., custom GNNs or reinforcement-learning agents for specific combinatorial problems), while general-purpose modeling is characterized by large-scale pre-training followed by broad adaptation across tasks. Overlap will be illustrated with concrete examples, including LLM-based theorem provers restricted to narrow domains, which we will label as hybrid cases whose classification depends on whether the dominant contribution is task-specific engineering or the demonstration of general reasoning transfer. This addition will make the taxonomy more operational without altering its core message. revision: yes
Circularity Check
Survey paper exhibits no circularity; taxonomy is descriptive
full rationale
This is a literature review that organizes existing AI4Math research into two descriptive categories (problem-specific modeling and general-purpose modeling) drawn from cited prior works. The claim that mathematics serves as a premier testbed for general reasoning is presented as motivating context from the surveyed literature, not as a new derivation or prediction. No equations, fitted parameters, self-referential predictions, or load-bearing self-citations appear; the paper contains no derivations that reduce to its own inputs by construction. The content is self-contained against external benchmarks and draws support from independently cited sources.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption the rigorous nature of mathematics serves as a premier testbed for advancing general reasoning capabilities
Forward citations
Cited by 3 Pith papers
-
MathConstraint: Automated Generation of Verified Combinatorial Reasoning Instances for LLMs
MathConstraint generates scalable, automatically verifiable combinatorial problems where LLMs achieve 18.5-66.9% accuracy without tools but roughly double that with solver access.
-
Just Type It in Isabelle! AI Agents Drafting, Mechanizing, and Generalizing from Human Hints
A metatheoretical specification and Isabelle/HOL formalization of minimal type annotations for rank-one polymorphic terms is given, shown via human-AI collaborative workflows for drafting and mechanizing proofs.
-
Automated Conjecture Resolution with Formal Verification
An AI framework combining informal reasoning and formal verification resolves an open commutative algebra problem and produces a Lean 4-checked proof with minimal human input.
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. GPT-4 Technical Report.arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Aristotle: IMO-level Automated Theorem Proving
Tudor Achim, Alex Best, Alberto Bietti, Kevin Der, Math¨ ıs F´ ed´ erico, Sergei Gukov, Daniel Halpern-Leistner, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, et al. Aristotle: IMO-Level Automated Theorem Proving.arXiv preprint arXiv:2510.01346, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Towards a Mathematics Formalisation Assistant Using Large Language Models
Ayush Agrawal, Siddhartha Gadgil, Navin Goyal, Ashvni Narayanan, and Anand Tadi- patri. Towards a Mathematics Formalisation Assistant Using Large Language Models. arXiv preprint arXiv:2211.07524, 2022
-
[4]
Alberto Alfarano, Fran¸ cois Charton, and Amaury Hayat. Global Lyapunov Functions: A Long-Standing Open Problem in Mathematics, with Symbolic Transformers.Advances in Neural Information Processing Systems, 37:93643–93670, 2024
work page 2024
-
[5]
Malik Amir, Yang-Hui He, Kyu-Hwan Lee, Thomas Oliver, and Eldar Sultanow. Machine Learning Class Numbers of Real Quadratic Fields.International Journal of Data Science in the Mathematical Sciences, 01(02):107–134, 2023
work page 2023
-
[6]
Solving Inverse Problems Using Data-Driven Models.Acta Numerica, 28:1–174, 2019
Simon Arridge, Peter Maass, Ozan ¨Oktem, and Carola-Bibiane Sch¨ onlieb. Solving Inverse Problems Using Data-Driven Models.Acta Numerica, 28:1–174, 2019
work page 2019
-
[7]
LeanExplore: A Search Engine for Lean 4 Declarations.arXiv preprint arXiv:2506.11085, 2025
Justin Asher. LeanExplore: A Search Engine for Lean 4 Declarations.arXiv preprint arXiv:2506.11085, 2025
-
[8]
Axiom Math. From Seeing Why to Checking Everything.https://axiommath.ai/ territory/from-seeing-why-to-checking-everything, 2025
work page 2025
-
[9]
Zhangir Azerbayev, Bartosz Piotrowski, Hailey Schoelkopf, Edward W Ayers, Dragomir Radev, and Jeremy Avigad. Proofnet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics.arXiv preprint arXiv:2302.12433, 2023
-
[10]
Jiang, Jia Deng, Stella Biderman, and Sean Welleck
Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen Marcus McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, and Sean Welleck. Llemma: An Open Language Model for Mathematics. InThe Twelfth International Conference on Learning Representations, 2024. 20
work page 2024
-
[11]
Holist: An Environment for Machine Learning of Higher Order Logic Theorem Proving
Kshitij Bansal, Sarah Loos, Markus Rabe, Christian Szegedy, and Stewart Wilcox. Holist: An Environment for Machine Learning of Higher Order Logic Theorem Proving. InIn- ternational Conference on Machine Learning, pages 454–463. PMLR, 2019
work page 2019
-
[12]
Effective Brascamp-Lieb inequalities
Timoth´ ee B´ enard and Weikun He. Effective Brascamp–Lieb Inequalities.arXiv preprint arXiv:2511.11091, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[13]
An ML Approach to Resolution of Singularities
Gergely B´ erczi, Honglu Fan, and Mingcong Zeng. An ML Approach to Resolution of Singularities. InTopological, Algebraic and Geometric Learning Workshops 2023, pages 469–487. PMLR, 2023
work page 2023
-
[14]
Per Berglund, Ben Campbell, and Vishnu Jejjala. Machine Learning Kreuzer–Skarke Calabi–Yau Threefolds.International Journal of Modern Physics A, 40(28):2550097, 2025
work page 2025
-
[15]
New Calabi–Yau Manifolds from Genetic Algorithms.Physics Letters B, 850:138504, 2024
Per Berglund, Yang-Hui He, Elli Heyes, Edward Hirst, Vishnu Jejjala, and Andre Lukas. New Calabi–Yau Manifolds from Genetic Algorithms.Physics Letters B, 850:138504, 2024
work page 2024
-
[16]
Berman, Yang-Hui He, and Edward Hirst
David S. Berman, Yang-Hui He, and Edward Hirst. Machine Learning Calabi-Yau Hy- persurfaces.Phys. Rev. D, 105:066002, Mar 2022
work page 2022
-
[17]
Charles Blundell, Lars Buesing, Alex Davies, Petar Veliˇ ckovi´ c, and Geordie Williamson. Towards Combinatorial Invariance for Kazhdan-Lusztig Polynomials.Representation The- ory of the American Mathematical Society, 26(37):1145–1191, 2022
work page 2022
-
[18]
Murmurations of Modular Forms in the Weight Aspect.Algebra and Number Theory, 2025
Jonathan W Bober, Andrew R Booker, M Lee, and David Lowry-Duda. Murmurations of Modular Forms in the Weight Aspect.Algebra and Number Theory, 2025
work page 2025
-
[19]
Steven L Brunton and J Nathan Kutz. Promising Directions of Machine Learning for Partial Differential Equations.Nature Computational Science, 4(7):483–494, 2024
work page 2024
-
[20]
Jim Bryan, Bal´ azs Elek, Freddie Manners, George Salafatinos, and Ravi Vakil. The Motivic Class of the Space of Genus 0 Maps to the Flag Variety.arXiv preprint arXiv:2601.07222, 2026
-
[21]
Flow-Based Extremal Mathematical Structure Discovery, 2026
Gergely B´ erczi, Baran Hashemi, and Jonas Kl¨ uver. Flow-Based Extremal Mathematical Structure Discovery, 2026
work page 2026
-
[22]
Machine Learning in the String Landscape.Journal of High Energy Physics, 2017(9):1–36, 2017
Jonathan Carifio, James Halverson, Dmitri Krioukov, and Brent D Nelson. Machine Learning in the String Landscape.Journal of High Energy Physics, 2017(9):1–36, 2017
work page 2017
-
[23]
Fran¸ cois Charton, Jordan S Ellenberg, Adam Zsolt Wagner, and Geordie Williamson. Patternboost: Constructions in Mathematics with a Little Help from AI.arXiv preprint arXiv:2411.00566, 2024
-
[24]
Jiangjie Chen, Wenxiang Chen, Jiacheng Du, Jinyi Hu, Zhicheng Jiang, Allan Jie, Xiao- ran Jin, Xing Jin, Chenggang Li, Wenlei Shi, Zhihong Wang, Mingxuan Wang, Chenrui Wei, Shufa Wei, Huajian Xin, Fan Yang, Weihao Gao, Zheng Yuan, Tianyang Zhan, Zeyu Zheng, Tianxi Zhou, and Thomas Hanwen Zhu. Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving ...
-
[25]
Tianlong Chen, Xiaohan Chen, Wuyang Chen, Howard Heaton, Jialin Liu, Zhangyang Wang, and Wotao Yin. Learning to Optimize: A Primer and A Benchmark.Journal of Machine Learning Research, 23(189):1–59, 2022. 21
work page 2022
-
[26]
Yuri Chervonyi, Trieu H Trinh, Miroslav Olˇ s´ ak, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V Le, and Thang Luong. Gold-Medalist Performance in Solving Olympiad Geometry with AlphaGeometry2.arXiv preprint arXiv:2502.03544, 2025
-
[27]
A Chervov, A Soibelman, S Lytkin, I Kiselev, S Fironov, A Lukyanenko, A Dolgorukova, A Ogurtsov, F Petrov, S Krymskii, et al. CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs.arXiv preprint arXiv:2502.18663, 2025
-
[28]
A Machine Learning Approach That Beats Large Rubik’s Cubes.arXiv preprint arXiv:2502.13266, 2025
Alexander Chervov, Kirill Khoruzhii, Nikita Bukhal, Jalal Naghiyev, Vladislav Zamkovoy, Ivan Koltsov, Lyudmila Cheldieva, Arsenii Sychev, Arsenii Lenin, Mark Obozov, et al. A Machine Learning Approach That Beats Large Rubik’s Cubes.arXiv preprint arXiv:2502.13266, 2025
-
[29]
Automated Reasoning in Geometry.Handbook of automated reasoning, 1:707–749, 2001
Shang-Ching Chou and Xiao-Shan Gao. Automated Reasoning in Geometry.Handbook of automated reasoning, 1:707–749, 2001
work page 2001
-
[30]
Shang-Ching Chou, Xiao-Shan Gao, and Jing-Zhong Zhang. A Deductive Database Ap- proach to Automated Geometry Theorem Proving and Discovering.Journal of Automated Reasoning, 25(3):219–246, 2000
work page 2000
-
[31]
D. A. Clarke. Martin Davis. A program for Presburger’s algorithm. Summaries of talks presented at the Summer Institute for Symbolic Logic, Cornell University, 1957, 2nd edn., Communications Research Division, Institute for Defense Analyses, Princeton, N.J., 1960, pp. 215–223.Journal of Symbolic Logic, 31(1):138–138, 1966
work page 1957
-
[32]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training Verifiers to Solve Math Word Problems.arXiv preprint arXiv:2110.14168, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[33]
Katherine M Collins, Albert Q Jiang, Simon Frieder, Lionel Wong, Miri Zilka, Umang Bhatt, Thomas Lukasiewicz, Yuhuai Wu, Joshua B Tenenbaum, William Hart, et al. Evaluating Language Models for Mathematics Through Interactions.Proceedings of the National Academy of Sciences, 121(24):e2318124121, 2024
work page 2024
-
[34]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.arXiv preprint arXiv:2507.06261, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[35]
Princeton University Press, 2009
MARIANA COOK.Mathematicians: An Outer View of the Inner World. Princeton University Press, 2009
work page 2009
-
[36]
Lukasz Czajka and Cezary Kaliszyk. Hammer for Coq: Automation for Dependent Type Theory.Journal of Automated Reasoning, 61(1):423–453, 2018
work page 2018
-
[37]
The Signature and Cusp Geometry of Hyperbolic Knots.Geometry & Topology, 28(5):2313–2343, 2024
Alex Davies, Andr´ as Juh´ asz, Marc Lackenby, and Nenad Tomaˇ sev. The Signature and Cusp Geometry of Hyperbolic Knots.Geometry & Topology, 28(5):2313–2343, 2024
work page 2024
-
[38]
Advancing Mathematics by Guiding Human Intuition with AI.Nature, 600(7887):70–74, 2021
Alex Davies, Petar Veliˇ ckovi´ c, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomaˇ sev, Richard Tanburn, Peter Battaglia, Charles Blundell, Andr´ as Juh´ asz, et al. Advancing Mathematics by Guiding Human Intuition with AI.Nature, 600(7887):70–74, 2021
work page 2021
-
[39]
Martin Davis. The Prehistory and Early History of Automated Deduction.Automation of Reasoning 1: Classical Papers on Computational Logic 1957-1966, pages 1–28, 1983. 22
work page 1957
-
[40]
The Lean 4 Theorem Prover and Programming Language
Leonardo de Moura and Sebastian Ullrich. The Lean 4 Theorem Prover and Programming Language. In Andr´ e Platzer and Geoff Sutcliffe, editors,Automated Deduction - CADE 28 - 28th International Conference on Automated Deduction, Virtual Event, July 12-15, 2021, Proceedings, volume 12699 ofLecture Notes in Computer Science, pages 625–635. Springer, 2021
work page 2021
-
[41]
The Lean Theorem Prover (System Description)
Leonardo Mendon¸ ca de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. The Lean Theorem Prover (System Description). In Amy P. Felty and Aart Middeldorp, editors,Automated Deduction - CADE-25 - 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings, volume 9195 ofLecture Notes in Com...
work page 2015
-
[42]
On Mathematical Modeling in Image Reconstruction and Beyond
Bin Dong. On Mathematical Modeling in Image Reconstruction and Beyond. InProc. Int. Cong. Math, volume 7, pages 5420–5449, 2022
work page 2022
-
[43]
Bin Dong, Xuhua He, Pengfei Jin, Felix Schremmer, and Qingchao Yu. Machine Learning Assisted Exploration for Affine Deligne–Lusztig Varieties.Peking Mathematical Journal, pages 1–50, 2024
work page 2024
-
[44]
STP: Self-Play LLM Theorem Provers with Iterative Con- jecturing and Proving
Kefan Dong and Tengyu Ma. STP: Self-Play LLM Theorem Provers with Iterative Con- jecturing and Proving. InForty-second International Conference on Machine Learning, 2025
work page 2025
-
[45]
Diffusion Models for Cayley Graphs.arXiv preprint arXiv:2503.05558, 2025
Michael R Douglas and Kit Fraser-Taliente. Diffusion Models for Cayley Graphs.arXiv preprint arXiv:2503.05558, 2025
-
[46]
Oliver Dressler. Lean LSP MCP: Tools for Agentic Interaction with the Lean Theorem Prover.https://github.com/oOo0oOo/lean-lsp-mcp, 2025
work page 2025
-
[47]
Boyan Duan, Xiao Liang, Shuai Lu, Yaoxiang Wang, Yelong Shen, Kai-Wei Chang, Ying Nian Wu, Mao Yang, Weizhu Chen, and Yeyun Gong. Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions.arXiv preprint arXiv:2512.00097, 2025
-
[48]
Machine Learning for Complete Intersection Calabi- Yau Manifolds: A Methodological Study.Phys
Harold Erbin and Riccardo Finotello. Machine Learning for Complete Intersection Calabi- Yau Manifolds: A Methodological Study.Phys. Rev. D, 103:126014, Jun 2021
work page 2021
-
[49]
Premise Selection in Natural Language Mathematical Texts
Deborah Ferreira and Andr´ e Freitas. Premise Selection in Natural Language Mathematical Texts. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 7365–7374. Association for Computational Linguistics, 2020
work page 2020
-
[50]
Baldur: Whole-Proof Gen- eration and Repair with Large Language Models
Emily First, Markus N Rabe, Talia Ringer, and Yuriy Brun. Baldur: Whole-Proof Gen- eration and Repair with Large Language Models. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Soft- ware Engineering, pages 1229–1241, 2023
work page 2023
-
[51]
Mathematical Capa- bilities of ChatGPT
Simon Frieder, Luca Pinchetti, Chevalier Chevalier, Ryan-Rhys Griffiths, Tommaso Sal- vatori, Thomas Lukasiewicz, Philipp Petersen, and Julius Berner. Mathematical Capa- bilities of ChatGPT. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 27699–27744. Curra...
work page 2023
-
[52]
Towards Automating Formalisation of Theorem Statements Using Large Language Models
Siddhartha Gadgil, Anand Rao Tadipatri, Ayush Agrawal, Ashvni Narayanan, and Navin Goyal. Towards Automating Formalisation of Theorem Statements Using Large Language Models. In36th Conference on Neural Information Processing Systems (NeurIPS 2022) Workshop on MATH-AI, 2022. 23
work page 2022
-
[53]
A Semantic Search Engine for Mathlib4
Guoxiong Gao, Haocheng Ju, Jiedong Jiang, Zihan Qin, and Bin Dong. A Semantic Search Engine for Mathlib4. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Findings of the Association for Computational Linguistics: EMNLP 2024, pages 8001– 8013, Miami, Florida, USA, November 2024. Association for Computational Linguistics
work page 2024
-
[54]
Herald: A Natural Language Annotated Lean 4 Dataset
Guoxiong Gao, Yutong Wang, Jiedong Jiang, Qi Gao, Zihan Qin, Tianyi Xu, and Bin Dong. Herald: A Natural Language Annotated Lean 4 Dataset. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[55]
The Math Retrieval System of ICST for NTCIR-12 MathIR Task
Liangcai Gao, Ke Yuan, Yuehan Wang, Zhuoren Jiang, and Zhi Tang. The Math Retrieval System of ICST for NTCIR-12 MathIR Task. In Noriko Kando, Tetsuya Sakai, and Mark Sanderson, editors,Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, June 7-10,
-
[56]
National Institute of Informatics (NII), 2016
work page 2016
-
[57]
Mathe- matical Exploration and Discovery at Scale.arXiv preprint arXiv:2511.02864, 2025
Bogdan Georgiev, Javier G´ omez-Serrano, Terence Tao, and Adam Zsolt Wagner. Mathe- matical Exploration and Discovery at Scale.arXiv preprint arXiv:2511.02864, 2025
-
[58]
Elliot Glazer, Ege Erdil, Tamay Besiroglu, Diego Chicharro, Evan Chen, Alex Gunning, Caroline Falkman Olsson, Jean-Stanislas Denain, Anson Ho, Emily de Oliveira Santos, et al. Frontiermath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI.arXiv preprint arXiv:2411.04872, 2024
-
[59]
Reinforcement Learning for Hierarchical Proof Generation in Lean 4
Fabian Gloeckle, Alex Gu, Gabriel Synnaeve, and Amaury Hayat. Reinforcement Learning for Hierarchical Proof Generation in Lean 4. InThe 5th Workshop on Mathematical Reasoning and AI at NeurIPS 2025, 2025
work page 2025
-
[60]
ABEL: Sam- ple Efficient Online Reinforcement Learning for Neural Theorem Proving
Fabian Gloeckle, Jannis Limperg, Gabriel Synnaeve, and Amaury Hayat. ABEL: Sam- ple Efficient Online Reinforcement Learning for Neural Theorem Proving. InThe 4th Workshop on Mathematical Reasoning and AI at NeurIPS’24, 2024
work page 2024
-
[61]
On Formally Undecidable Propositions of Principia Mathematica and Related Systems
Kurt G¨ odel. On Formally Undecidable Propositions of Principia Mathematica and Related Systems. 1931
work page 1931
-
[62]
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solv- ing
Zhibin Gou, Zhihong Shao, Yeyun Gong, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen, et al. ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solv- ing. InThe Twelfth International Conference on Learning Representations
-
[63]
Sergei Gukov, James Halverson, Ciprian Manolescu, and Fabian Ruehle. Searching for Ribbons with Machine Learning.Machine Learning: Science and Technology, 6(2):025065, jun 2025
work page 2025
-
[64]
Machine Learning BPS Spectra and the Gap Conjecture.Phys
Sergei Gukov and Rak-Kyeong Seong. Machine Learning BPS Spectra and the Gap Conjecture.Phys. Rev. D, 110:046016, Aug 2024
work page 2024
-
[65]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. DeepSeek-R1 Incentivizes Reasoning in LLMs Through Reinforcement Learning.Nature, 645(8081):633–638, 2025
work page 2025
-
[66]
Baran Hashemi, Roderic Guigo Corominas, and Alessandro Giacchetto. Can Transformers Do Enumerative Geometry? InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[67]
Murmurations of Elliptic Curves.Experimental Mathematics, 34(3):528–540, 2025
Yang-Hui He, Kyu-Hwan Lee, Thomas Oliver, and Alexey Pozdnyakov. Murmurations of Elliptic Curves.Experimental Mathematics, 34(3):528–540, 2025. 24
work page 2025
-
[68]
FGeo-TP: A Language Model-Enhanced Solver for Euclidean Geometry Problems.Symmetry, 16(4):421, 2024
Yiming He, Jia Zou, Xiaokai Zhang, Na Zhu, and Tuo Leng. FGeo-TP: A Language Model-Enhanced Solver for Euclidean Geometry Problems.Symmetry, 16(4):421, 2024
work page 2024
-
[69]
Measuring Mathematical Problem Solving With the MATH Dataset.NeurIPS, 2021
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring Mathematical Problem Solving With the MATH Dataset.NeurIPS, 2021
work page 2021
-
[70]
Sine Qua Non for Large Theory Reasoning
Kryˇ stof Hoder and Andrei Voronkov. Sine Qua Non for Large Theory Reasoning. In International Conference on Automated Deduction, pages 299–314. Springer, 2011
work page 2011
-
[71]
GamePad: A Learning Environment for Theorem Proving
Daniel Huang, Prafulla Dhariwal, Dawn Song, and Ilya Sutskever. Gamepad: A Learning Environment for Theorem Proving.arXiv preprint arXiv:1806.00608, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[72]
FormaRL: Enhancing Autoformalization with No Labeled Data
Yanxing Huang, Xinling Jin, Sijie Liang, Fuwen Luo, Peng Li, and Yang Liu. FormaRL: Enhancing Autoformalization with No Labeled Data. InSecond Conference on Language Modeling, 2025
work page 2025
-
[73]
Yichen Huang and Lin F Yang. Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline.arXiv preprint arXiv:2507.15855, 2025
-
[74]
Olympiad-Level Formal Mathematical Reasoning with Reinforcement Learning.Nature, pages 1–3, 2025
Thomas Hubert, Rishi Mehta, Laurent Sartran, Mikl´ os Z Horv´ ath, Goran ˇZuˇ zi´ c, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Masoom, et al. Olympiad-Level Formal Mathematical Reasoning with Reinforcement Learning.Nature, pages 1–3, 2025
work page 2025
-
[75]
Alemi, Niklas E´ en, Fran¸ cois Chollet, and Josef Urban
Geoffrey Irving, Christian Szegedy, Alexander A. Alemi, Niklas E´ en, Fran¸ cois Chollet, and Josef Urban. DeepMath: Deep Sequence Models for Premise Selection.Advances in Neural Information Processing Systems (NeurIPS), 29, 2016
work page 2016
-
[76]
Uijeong Jang and Ernest K Ryu. Point Convergence of Nesterov’s Accelerated Gradient Method: An AI-Assisted Proof.arXiv preprint arXiv:2510.23513, 2025
-
[77]
Multilingual Mathematical Autoformal- ization.arXiv preprint arXiv:2311.03755, 2023
Albert Q Jiang, Wenda Li, and Mateja Jamnik. Multilingual Mathematical Autoformal- ization.arXiv preprint arXiv:2311.03755, 2023
-
[78]
Albert Qiaochu Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzyg´ o´ zd´ z, Piotr Mi lo´ s, Yuhuai Wu, and Mateja Jamnik. Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers.Advances in Neural Information Processing Systems, 35:8360–8373, 2022
work page 2022
-
[79]
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
Albert Qiaochu Jiang, Sean Welleck, Jin Peng Zhou, Timothee Lacroix, Jiacheng Liu, Wenda Li, Mateja Jamnik, Guillaume Lample, and Yuhuai Wu. Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs. InThe Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[80]
Jiedong Jiang, Wanyi He, Yuefeng Wang, Guoxiong Gao, Yongle Hu, Jingting Wang, Nailing Guan, Peihao Wu, Chunbo Dai, Liang Xiao, et al. FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels.arXiv preprint arXiv:2511.02872, 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.