pith. machine review for the scientific record. sign in

arxiv: 2601.13209 · v5 · submitted 2026-01-19 · 🧮 math.HO

Recognition: no theorem link

AI for Mathematics: Progress, Challenges, and Prospects

Authors on Pith no claims yet

Pith reviewed 2026-05-16 13:22 UTC · model grok-4.3

classification 🧮 math.HO
keywords AI for Mathematicsmathematical reasoningproblem-specific modelinggeneral-purpose modelingfoundation modelsAI testbedmathematical discovery
0
0 comments X

The pith

AI for mathematics research splits into specialized task architectures and general foundation models for broader reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews the field of AI for Mathematics by dividing existing work into two complementary directions. One direction designs specialized AI architectures for particular mathematical tasks. The other develops foundation models that support wider reasoning, retrieval, and exploratory processes. Mathematics is presented as a strong testbed because its rigorous structure can drive improvements in general AI reasoning. The authors stress that success should be measured by the ability to discover meaningful results and unified theories, not merely by verifying formal correctness.

Core claim

We categorize existing research into two complementary directions: problem-specific modeling, involving the design of specialized architectures for distinct mathematical tasks, and general-purpose modeling, focusing on foundation models capable of broader reasoning, retrieval, and exploratory workflows. The rigorous nature of mathematics serves as a premier testbed for advancing general reasoning capabilities. The true value of a proof lies in the insights and tools it offers to the broader mathematical landscape rather than formal correctness alone.

What carries the argument

The two complementary directions of problem-specific modeling with specialized architectures and general-purpose modeling with foundation models for reasoning and exploration.

If this is right

  • AI systems can assist mathematical research through both narrow specialized tools and open-ended exploratory workflows.
  • Advances in general reasoning follow from treating mathematics as a demanding test environment.
  • Future systems should prioritize discovery of new results and unified theories over formal proof verification.
  • Key remaining challenges include scalability limits and integration of data-driven and symbolic methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Success here could transfer to other domains that require rigorous step-by-step reasoning such as physics or computer science.
  • The focus on discovery suggests AI tools that propose conjectures rather than only checking existing proofs.
  • If the two modeling directions converge, hybrid systems might emerge that combine task-specific efficiency with broad exploratory power.

Load-bearing premise

The rigorous nature of mathematics serves as a premier testbed for advancing general reasoning capabilities in AI.

What would settle it

Training large AI models on extensive mathematical data and then measuring whether they show clear gains on non-mathematical reasoning benchmarks compared with models trained on other domains would test the testbed claim.

Figures

Figures reproduced from arXiv: 2601.13209 by Bin Dong, Haocheng Ju.

Figure 1
Figure 1. Figure 1: Landscape of AI4Math research directions. AI4Math research can be broadly divided [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Performance of five LLMs across different Peking University (PKU) exams. Left: [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example of a proof step generation method and the complete proof it produces. [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
read the original abstract

AI for Mathematics (AI4Math) has emerged as a distinct field that leverages machine learning to navigate mathematical landscapes historically intractable for early symbolic systems. While mid-20th-century symbolic approaches successfully automated formal logic, they faced severe scalability limitations due to the combinatorial explosion of the search space. The recent integration of data-driven approaches has revitalized this pursuit. In this review, we provide a systematic overview of AI4Math, highlighting its primary focus on developing AI models to support mathematical research. Crucially, we emphasize that this is not merely the application of AI to mathematical activities; it also encompasses the development of stronger AI systems where the rigorous nature of mathematics serves as a premier testbed for advancing general reasoning capabilities. We categorize existing research into two complementary directions: problem-specific modeling, involving the design of specialized architectures for distinct mathematical tasks, and general-purpose modeling, focusing on foundation models capable of broader reasoning, retrieval, and exploratory workflows. We conclude by discussing key challenges and prospects, advocating for AI systems that go beyond facilitating formal correctness to enabling the discovery of meaningful results and unified theories, recognizing that the true value of a proof lies in the insights and tools it offers to the broader mathematical landscape.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript is a survey of AI for Mathematics (AI4Math). It traces the field's development from mid-20th-century symbolic systems limited by combinatorial explosion to recent data-driven methods. The central contribution is a two-way taxonomy that partitions existing work into problem-specific modeling (specialized architectures for distinct tasks) and general-purpose modeling (foundation models supporting broader reasoning, retrieval, and exploratory workflows). The review closes with challenges, prospects, and a call for systems that enable discovery of meaningful results and unified theories rather than formal correctness alone.

Significance. If the taxonomy is adopted, the paper supplies a practical organizing lens for a rapidly expanding literature at the intersection of machine learning and mathematics. By treating mathematics explicitly as a testbed for general reasoning, the survey underscores bidirectional value: AI tools for mathematicians and mathematical rigor as a driver of AI progress. The emphasis on discovery-oriented prospects, rather than verification alone, identifies an ambitious but coherent research direction that could shape funding and collaboration priorities.

major comments (1)
  1. [§2] §2 (taxonomy section): the boundary between 'problem-specific' and 'general-purpose' modeling is presented as complementary yet is not equipped with explicit decision criteria or overlap metrics. Several cited works (e.g., LLM-based theorem provers applied to narrow domains) appear to straddle both categories; without a reproducible classification protocol the taxonomy risks becoming descriptive rather than predictive.
minor comments (2)
  1. [Abstract] Abstract: the term 'exploratory workflows' is introduced without a one-sentence gloss; a parenthetical definition would aid readers who encounter the abstract first.
  2. [Conclusion] Conclusion: the phrase 'unified theories' is invoked as an aspirational goal but is not linked to any concrete open problems (e.g., Langlands program or Millennium Prize problems) that the surveyed methods might address.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and recommendation for minor revision. The taxonomy is intended as a high-level organizing lens rather than a rigid classifier, but we agree that additional guidance on boundaries and overlaps will strengthen the presentation.

read point-by-point responses
  1. Referee: [§2] §2 (taxonomy section): the boundary between 'problem-specific' and 'general-purpose' modeling is presented as complementary yet is not equipped with explicit decision criteria or overlap metrics. Several cited works (e.g., LLM-based theorem provers applied to narrow domains) appear to straddle both categories; without a reproducible classification protocol the taxonomy risks becoming descriptive rather than predictive.

    Authors: We appreciate the referee's point that the taxonomy in §2 would benefit from more explicit guidance. The two directions are presented as complementary rather than mutually exclusive precisely to accommodate hybrids; the manuscript already notes that many recent systems combine elements of both. To address the concern directly, we will add a short subsection in the revised §2 that supplies classification criteria based on primary modeling objective and resource allocation: problem-specific modeling is characterized by architectures optimized for a narrow task distribution with limited transfer (e.g., custom GNNs or reinforcement-learning agents for specific combinatorial problems), while general-purpose modeling is characterized by large-scale pre-training followed by broad adaptation across tasks. Overlap will be illustrated with concrete examples, including LLM-based theorem provers restricted to narrow domains, which we will label as hybrid cases whose classification depends on whether the dominant contribution is task-specific engineering or the demonstration of general reasoning transfer. This addition will make the taxonomy more operational without altering its core message. revision: yes

Circularity Check

0 steps flagged

Survey paper exhibits no circularity; taxonomy is descriptive

full rationale

This is a literature review that organizes existing AI4Math research into two descriptive categories (problem-specific modeling and general-purpose modeling) drawn from cited prior works. The claim that mathematics serves as a premier testbed for general reasoning is presented as motivating context from the surveyed literature, not as a new derivation or prediction. No equations, fitted parameters, self-referential predictions, or load-bearing self-citations appear; the paper contains no derivations that reduce to its own inputs by construction. The content is self-contained against external benchmarks and draws support from independently cited sources.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The review rests on the domain assumption that mathematics is a premier testbed for AI reasoning and that data-driven approaches have revitalized symbolic methods; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption the rigorous nature of mathematics serves as a premier testbed for advancing general reasoning capabilities
    Explicitly stated in the abstract as a core emphasis of the field.

pith-pipeline@v0.9.0 · 5503 in / 1033 out tokens · 34292 ms · 2026-05-16T13:22:11.323123+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MathConstraint: Automated Generation of Verified Combinatorial Reasoning Instances for LLMs

    cs.LG 2026-05 unverdicted novelty 8.0

    MathConstraint generates scalable, automatically verifiable combinatorial problems where LLMs achieve 18.5-66.9% accuracy without tools but roughly double that with solver access.

  2. Just Type It in Isabelle! AI Agents Drafting, Mechanizing, and Generalizing from Human Hints

    cs.LO 2026-04 unverdicted novelty 6.0

    A metatheoretical specification and Isabelle/HOL formalization of minimal type annotations for rank-one polymorphic terms is given, shown via human-AI collaborative workflows for drafting and mechanizing proofs.

  3. Automated Conjecture Resolution with Formal Verification

    cs.LG 2026-04 unverdicted novelty 6.0

    An AI framework combining informal reasoning and formal verification resolves an open commutative algebra problem and produces a Lean 4-checked proof with minimal human input.

Reference graph

Works this paper leans on

188 extracted references · 188 canonical work pages · cited by 3 Pith papers · 13 internal anchors

  1. [1]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. GPT-4 Technical Report.arXiv preprint arXiv:2303.08774, 2023

  2. [2]

    Aristotle: IMO-level Automated Theorem Proving

    Tudor Achim, Alex Best, Alberto Bietti, Kevin Der, Math¨ ıs F´ ed´ erico, Sergei Gukov, Daniel Halpern-Leistner, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, et al. Aristotle: IMO-Level Automated Theorem Proving.arXiv preprint arXiv:2510.01346, 2025

  3. [3]

    Towards a Mathematics Formalisation Assistant Using Large Language Models

    Ayush Agrawal, Siddhartha Gadgil, Navin Goyal, Ashvni Narayanan, and Anand Tadi- patri. Towards a Mathematics Formalisation Assistant Using Large Language Models. arXiv preprint arXiv:2211.07524, 2022

  4. [4]

    Global Lyapunov Functions: A Long-Standing Open Problem in Mathematics, with Symbolic Transformers.Advances in Neural Information Processing Systems, 37:93643–93670, 2024

    Alberto Alfarano, Fran¸ cois Charton, and Amaury Hayat. Global Lyapunov Functions: A Long-Standing Open Problem in Mathematics, with Symbolic Transformers.Advances in Neural Information Processing Systems, 37:93643–93670, 2024

  5. [5]

    Machine Learning Class Numbers of Real Quadratic Fields.International Journal of Data Science in the Mathematical Sciences, 01(02):107–134, 2023

    Malik Amir, Yang-Hui He, Kyu-Hwan Lee, Thomas Oliver, and Eldar Sultanow. Machine Learning Class Numbers of Real Quadratic Fields.International Journal of Data Science in the Mathematical Sciences, 01(02):107–134, 2023

  6. [6]

    Solving Inverse Problems Using Data-Driven Models.Acta Numerica, 28:1–174, 2019

    Simon Arridge, Peter Maass, Ozan ¨Oktem, and Carola-Bibiane Sch¨ onlieb. Solving Inverse Problems Using Data-Driven Models.Acta Numerica, 28:1–174, 2019

  7. [7]

    LeanExplore: A Search Engine for Lean 4 Declarations.arXiv preprint arXiv:2506.11085, 2025

    Justin Asher. LeanExplore: A Search Engine for Lean 4 Declarations.arXiv preprint arXiv:2506.11085, 2025

  8. [8]

    From Seeing Why to Checking Everything.https://axiommath.ai/ territory/from-seeing-why-to-checking-everything, 2025

    Axiom Math. From Seeing Why to Checking Everything.https://axiommath.ai/ territory/from-seeing-why-to-checking-everything, 2025

  9. [9]

    Proofnet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics.arXiv preprint arXiv:2302.12433, 2023

    Zhangir Azerbayev, Bartosz Piotrowski, Hailey Schoelkopf, Edward W Ayers, Dragomir Radev, and Jeremy Avigad. Proofnet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics.arXiv preprint arXiv:2302.12433, 2023

  10. [10]

    Jiang, Jia Deng, Stella Biderman, and Sean Welleck

    Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen Marcus McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, and Sean Welleck. Llemma: An Open Language Model for Mathematics. InThe Twelfth International Conference on Learning Representations, 2024. 20

  11. [11]

    Holist: An Environment for Machine Learning of Higher Order Logic Theorem Proving

    Kshitij Bansal, Sarah Loos, Markus Rabe, Christian Szegedy, and Stewart Wilcox. Holist: An Environment for Machine Learning of Higher Order Logic Theorem Proving. InIn- ternational Conference on Machine Learning, pages 454–463. PMLR, 2019

  12. [12]

    Effective Brascamp-Lieb inequalities

    Timoth´ ee B´ enard and Weikun He. Effective Brascamp–Lieb Inequalities.arXiv preprint arXiv:2511.11091, 2025

  13. [13]

    An ML Approach to Resolution of Singularities

    Gergely B´ erczi, Honglu Fan, and Mingcong Zeng. An ML Approach to Resolution of Singularities. InTopological, Algebraic and Geometric Learning Workshops 2023, pages 469–487. PMLR, 2023

  14. [14]

    Machine Learning Kreuzer–Skarke Calabi–Yau Threefolds.International Journal of Modern Physics A, 40(28):2550097, 2025

    Per Berglund, Ben Campbell, and Vishnu Jejjala. Machine Learning Kreuzer–Skarke Calabi–Yau Threefolds.International Journal of Modern Physics A, 40(28):2550097, 2025

  15. [15]

    New Calabi–Yau Manifolds from Genetic Algorithms.Physics Letters B, 850:138504, 2024

    Per Berglund, Yang-Hui He, Elli Heyes, Edward Hirst, Vishnu Jejjala, and Andre Lukas. New Calabi–Yau Manifolds from Genetic Algorithms.Physics Letters B, 850:138504, 2024

  16. [16]

    Berman, Yang-Hui He, and Edward Hirst

    David S. Berman, Yang-Hui He, and Edward Hirst. Machine Learning Calabi-Yau Hy- persurfaces.Phys. Rev. D, 105:066002, Mar 2022

  17. [17]

    Towards Combinatorial Invariance for Kazhdan-Lusztig Polynomials.Representation The- ory of the American Mathematical Society, 26(37):1145–1191, 2022

    Charles Blundell, Lars Buesing, Alex Davies, Petar Veliˇ ckovi´ c, and Geordie Williamson. Towards Combinatorial Invariance for Kazhdan-Lusztig Polynomials.Representation The- ory of the American Mathematical Society, 26(37):1145–1191, 2022

  18. [18]

    Murmurations of Modular Forms in the Weight Aspect.Algebra and Number Theory, 2025

    Jonathan W Bober, Andrew R Booker, M Lee, and David Lowry-Duda. Murmurations of Modular Forms in the Weight Aspect.Algebra and Number Theory, 2025

  19. [19]

    Promising Directions of Machine Learning for Partial Differential Equations.Nature Computational Science, 4(7):483–494, 2024

    Steven L Brunton and J Nathan Kutz. Promising Directions of Machine Learning for Partial Differential Equations.Nature Computational Science, 4(7):483–494, 2024

  20. [20]

    The Motivic Class of the Space of Genus 0 Maps to the Flag Variety.arXiv preprint arXiv:2601.07222, 2026

    Jim Bryan, Bal´ azs Elek, Freddie Manners, George Salafatinos, and Ravi Vakil. The Motivic Class of the Space of Genus 0 Maps to the Flag Variety.arXiv preprint arXiv:2601.07222, 2026

  21. [21]

    Flow-Based Extremal Mathematical Structure Discovery, 2026

    Gergely B´ erczi, Baran Hashemi, and Jonas Kl¨ uver. Flow-Based Extremal Mathematical Structure Discovery, 2026

  22. [22]

    Machine Learning in the String Landscape.Journal of High Energy Physics, 2017(9):1–36, 2017

    Jonathan Carifio, James Halverson, Dmitri Krioukov, and Brent D Nelson. Machine Learning in the String Landscape.Journal of High Energy Physics, 2017(9):1–36, 2017

  23. [23]

    Patternboost: Constructions in Mathematics with a Little Help from AI.arXiv preprint arXiv:2411.00566, 2024

    Fran¸ cois Charton, Jordan S Ellenberg, Adam Zsolt Wagner, and Geordie Williamson. Patternboost: Constructions in Mathematics with a Little Help from AI.arXiv preprint arXiv:2411.00566, 2024

  24. [24]

    Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience.arXiv preprint arXiv:2512.17260, 2025

    Jiangjie Chen, Wenxiang Chen, Jiacheng Du, Jinyi Hu, Zhicheng Jiang, Allan Jie, Xiao- ran Jin, Xing Jin, Chenggang Li, Wenlei Shi, Zhihong Wang, Mingxuan Wang, Chenrui Wei, Shufa Wei, Huajian Xin, Fan Yang, Weihao Gao, Zheng Yuan, Tianyang Zhan, Zeyu Zheng, Tianxi Zhou, and Thomas Hanwen Zhu. Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving ...

  25. [25]

    Learning to Optimize: A Primer and A Benchmark.Journal of Machine Learning Research, 23(189):1–59, 2022

    Tianlong Chen, Xiaohan Chen, Wuyang Chen, Howard Heaton, Jialin Liu, Zhangyang Wang, and Wotao Yin. Learning to Optimize: A Primer and A Benchmark.Journal of Machine Learning Research, 23(189):1–59, 2022. 21

  26. [26]

    Gold-Medalist Performance in Solving Olympiad Geometry with AlphaGeometry2.arXiv preprint arXiv:2502.03544, 2025

    Yuri Chervonyi, Trieu H Trinh, Miroslav Olˇ s´ ak, Xiaomeng Yang, Hoang Nguyen, Marcelo Menegali, Junehyuk Jung, Vikas Verma, Quoc V Le, and Thang Luong. Gold-Medalist Performance in Solving Olympiad Geometry with AlphaGeometry2.arXiv preprint arXiv:2502.03544, 2025

  27. [27]

    CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs.arXiv preprint arXiv:2502.18663, 2025

    A Chervov, A Soibelman, S Lytkin, I Kiselev, S Fironov, A Lukyanenko, A Dolgorukova, A Ogurtsov, F Petrov, S Krymskii, et al. CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs.arXiv preprint arXiv:2502.18663, 2025

  28. [28]

    A Machine Learning Approach That Beats Large Rubik’s Cubes.arXiv preprint arXiv:2502.13266, 2025

    Alexander Chervov, Kirill Khoruzhii, Nikita Bukhal, Jalal Naghiyev, Vladislav Zamkovoy, Ivan Koltsov, Lyudmila Cheldieva, Arsenii Sychev, Arsenii Lenin, Mark Obozov, et al. A Machine Learning Approach That Beats Large Rubik’s Cubes.arXiv preprint arXiv:2502.13266, 2025

  29. [29]

    Automated Reasoning in Geometry.Handbook of automated reasoning, 1:707–749, 2001

    Shang-Ching Chou and Xiao-Shan Gao. Automated Reasoning in Geometry.Handbook of automated reasoning, 1:707–749, 2001

  30. [30]

    A Deductive Database Ap- proach to Automated Geometry Theorem Proving and Discovering.Journal of Automated Reasoning, 25(3):219–246, 2000

    Shang-Ching Chou, Xiao-Shan Gao, and Jing-Zhong Zhang. A Deductive Database Ap- proach to Automated Geometry Theorem Proving and Discovering.Journal of Automated Reasoning, 25(3):219–246, 2000

  31. [31]

    D. A. Clarke. Martin Davis. A program for Presburger’s algorithm. Summaries of talks presented at the Summer Institute for Symbolic Logic, Cornell University, 1957, 2nd edn., Communications Research Division, Institute for Defense Analyses, Princeton, N.J., 1960, pp. 215–223.Journal of Symbolic Logic, 31(1):138–138, 1966

  32. [32]

    Training Verifiers to Solve Math Word Problems

    Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training Verifiers to Solve Math Word Problems.arXiv preprint arXiv:2110.14168, 2021

  33. [33]

    Evaluating Language Models for Mathematics Through Interactions.Proceedings of the National Academy of Sciences, 121(24):e2318124121, 2024

    Katherine M Collins, Albert Q Jiang, Simon Frieder, Lionel Wong, Miri Zilka, Umang Bhatt, Thomas Lukasiewicz, Yuhuai Wu, Joshua B Tenenbaum, William Hart, et al. Evaluating Language Models for Mathematics Through Interactions.Proceedings of the National Academy of Sciences, 121(24):e2318124121, 2024

  34. [34]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.arXiv preprint arXiv:2507.06261, 2025

  35. [35]

    Princeton University Press, 2009

    MARIANA COOK.Mathematicians: An Outer View of the Inner World. Princeton University Press, 2009

  36. [36]

    Hammer for Coq: Automation for Dependent Type Theory.Journal of Automated Reasoning, 61(1):423–453, 2018

    Lukasz Czajka and Cezary Kaliszyk. Hammer for Coq: Automation for Dependent Type Theory.Journal of Automated Reasoning, 61(1):423–453, 2018

  37. [37]

    The Signature and Cusp Geometry of Hyperbolic Knots.Geometry & Topology, 28(5):2313–2343, 2024

    Alex Davies, Andr´ as Juh´ asz, Marc Lackenby, and Nenad Tomaˇ sev. The Signature and Cusp Geometry of Hyperbolic Knots.Geometry & Topology, 28(5):2313–2343, 2024

  38. [38]

    Advancing Mathematics by Guiding Human Intuition with AI.Nature, 600(7887):70–74, 2021

    Alex Davies, Petar Veliˇ ckovi´ c, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomaˇ sev, Richard Tanburn, Peter Battaglia, Charles Blundell, Andr´ as Juh´ asz, et al. Advancing Mathematics by Guiding Human Intuition with AI.Nature, 600(7887):70–74, 2021

  39. [39]

    The Prehistory and Early History of Automated Deduction.Automation of Reasoning 1: Classical Papers on Computational Logic 1957-1966, pages 1–28, 1983

    Martin Davis. The Prehistory and Early History of Automated Deduction.Automation of Reasoning 1: Classical Papers on Computational Logic 1957-1966, pages 1–28, 1983. 22

  40. [40]

    The Lean 4 Theorem Prover and Programming Language

    Leonardo de Moura and Sebastian Ullrich. The Lean 4 Theorem Prover and Programming Language. In Andr´ e Platzer and Geoff Sutcliffe, editors,Automated Deduction - CADE 28 - 28th International Conference on Automated Deduction, Virtual Event, July 12-15, 2021, Proceedings, volume 12699 ofLecture Notes in Computer Science, pages 625–635. Springer, 2021

  41. [41]

    The Lean Theorem Prover (System Description)

    Leonardo Mendon¸ ca de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. The Lean Theorem Prover (System Description). In Amy P. Felty and Aart Middeldorp, editors,Automated Deduction - CADE-25 - 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings, volume 9195 ofLecture Notes in Com...

  42. [42]

    On Mathematical Modeling in Image Reconstruction and Beyond

    Bin Dong. On Mathematical Modeling in Image Reconstruction and Beyond. InProc. Int. Cong. Math, volume 7, pages 5420–5449, 2022

  43. [43]

    Machine Learning Assisted Exploration for Affine Deligne–Lusztig Varieties.Peking Mathematical Journal, pages 1–50, 2024

    Bin Dong, Xuhua He, Pengfei Jin, Felix Schremmer, and Qingchao Yu. Machine Learning Assisted Exploration for Affine Deligne–Lusztig Varieties.Peking Mathematical Journal, pages 1–50, 2024

  44. [44]

    STP: Self-Play LLM Theorem Provers with Iterative Con- jecturing and Proving

    Kefan Dong and Tengyu Ma. STP: Self-Play LLM Theorem Provers with Iterative Con- jecturing and Proving. InForty-second International Conference on Machine Learning, 2025

  45. [45]

    Diffusion Models for Cayley Graphs.arXiv preprint arXiv:2503.05558, 2025

    Michael R Douglas and Kit Fraser-Taliente. Diffusion Models for Cayley Graphs.arXiv preprint arXiv:2503.05558, 2025

  46. [46]

    Lean LSP MCP: Tools for Agentic Interaction with the Lean Theorem Prover.https://github.com/oOo0oOo/lean-lsp-mcp, 2025

    Oliver Dressler. Lean LSP MCP: Tools for Agentic Interaction with the Lean Theorem Prover.https://github.com/oOo0oOo/lean-lsp-mcp, 2025

  47. [47]

    Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions.arXiv preprint arXiv:2512.00097, 2025

    Boyan Duan, Xiao Liang, Shuai Lu, Yaoxiang Wang, Yelong Shen, Kai-Wei Chang, Ying Nian Wu, Mao Yang, Weizhu Chen, and Yeyun Gong. Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions.arXiv preprint arXiv:2512.00097, 2025

  48. [48]

    Machine Learning for Complete Intersection Calabi- Yau Manifolds: A Methodological Study.Phys

    Harold Erbin and Riccardo Finotello. Machine Learning for Complete Intersection Calabi- Yau Manifolds: A Methodological Study.Phys. Rev. D, 103:126014, Jun 2021

  49. [49]

    Premise Selection in Natural Language Mathematical Texts

    Deborah Ferreira and Andr´ e Freitas. Premise Selection in Natural Language Mathematical Texts. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 7365–7374. Association for Computational Linguistics, 2020

  50. [50]

    Baldur: Whole-Proof Gen- eration and Repair with Large Language Models

    Emily First, Markus N Rabe, Talia Ringer, and Yuriy Brun. Baldur: Whole-Proof Gen- eration and Repair with Large Language Models. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Soft- ware Engineering, pages 1229–1241, 2023

  51. [51]

    Mathematical Capa- bilities of ChatGPT

    Simon Frieder, Luca Pinchetti, Chevalier Chevalier, Ryan-Rhys Griffiths, Tommaso Sal- vatori, Thomas Lukasiewicz, Philipp Petersen, and Julius Berner. Mathematical Capa- bilities of ChatGPT. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 27699–27744. Curra...

  52. [52]

    Towards Automating Formalisation of Theorem Statements Using Large Language Models

    Siddhartha Gadgil, Anand Rao Tadipatri, Ayush Agrawal, Ashvni Narayanan, and Navin Goyal. Towards Automating Formalisation of Theorem Statements Using Large Language Models. In36th Conference on Neural Information Processing Systems (NeurIPS 2022) Workshop on MATH-AI, 2022. 23

  53. [53]

    A Semantic Search Engine for Mathlib4

    Guoxiong Gao, Haocheng Ju, Jiedong Jiang, Zihan Qin, and Bin Dong. A Semantic Search Engine for Mathlib4. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Findings of the Association for Computational Linguistics: EMNLP 2024, pages 8001– 8013, Miami, Florida, USA, November 2024. Association for Computational Linguistics

  54. [54]

    Herald: A Natural Language Annotated Lean 4 Dataset

    Guoxiong Gao, Yutong Wang, Jiedong Jiang, Qi Gao, Zihan Qin, Tianyi Xu, and Bin Dong. Herald: A Natural Language Annotated Lean 4 Dataset. InThe Thirteenth International Conference on Learning Representations, 2025

  55. [55]

    The Math Retrieval System of ICST for NTCIR-12 MathIR Task

    Liangcai Gao, Ke Yuan, Yuehan Wang, Zhuoren Jiang, and Zhi Tang. The Math Retrieval System of ICST for NTCIR-12 MathIR Task. In Noriko Kando, Tetsuya Sakai, and Mark Sanderson, editors,Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, June 7-10,

  56. [56]

    National Institute of Informatics (NII), 2016

  57. [57]

    Mathe- matical Exploration and Discovery at Scale.arXiv preprint arXiv:2511.02864, 2025

    Bogdan Georgiev, Javier G´ omez-Serrano, Terence Tao, and Adam Zsolt Wagner. Mathe- matical Exploration and Discovery at Scale.arXiv preprint arXiv:2511.02864, 2025

  58. [58]

    Frontiermath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI.arXiv preprint arXiv:2411.04872, 2024

    Elliot Glazer, Ege Erdil, Tamay Besiroglu, Diego Chicharro, Evan Chen, Alex Gunning, Caroline Falkman Olsson, Jean-Stanislas Denain, Anson Ho, Emily de Oliveira Santos, et al. Frontiermath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI.arXiv preprint arXiv:2411.04872, 2024

  59. [59]

    Reinforcement Learning for Hierarchical Proof Generation in Lean 4

    Fabian Gloeckle, Alex Gu, Gabriel Synnaeve, and Amaury Hayat. Reinforcement Learning for Hierarchical Proof Generation in Lean 4. InThe 5th Workshop on Mathematical Reasoning and AI at NeurIPS 2025, 2025

  60. [60]

    ABEL: Sam- ple Efficient Online Reinforcement Learning for Neural Theorem Proving

    Fabian Gloeckle, Jannis Limperg, Gabriel Synnaeve, and Amaury Hayat. ABEL: Sam- ple Efficient Online Reinforcement Learning for Neural Theorem Proving. InThe 4th Workshop on Mathematical Reasoning and AI at NeurIPS’24, 2024

  61. [61]

    On Formally Undecidable Propositions of Principia Mathematica and Related Systems

    Kurt G¨ odel. On Formally Undecidable Propositions of Principia Mathematica and Related Systems. 1931

  62. [62]

    ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solv- ing

    Zhibin Gou, Zhihong Shao, Yeyun Gong, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen, et al. ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solv- ing. InThe Twelfth International Conference on Learning Representations

  63. [63]

    Searching for Ribbons with Machine Learning.Machine Learning: Science and Technology, 6(2):025065, jun 2025

    Sergei Gukov, James Halverson, Ciprian Manolescu, and Fabian Ruehle. Searching for Ribbons with Machine Learning.Machine Learning: Science and Technology, 6(2):025065, jun 2025

  64. [64]

    Machine Learning BPS Spectra and the Gap Conjecture.Phys

    Sergei Gukov and Rak-Kyeong Seong. Machine Learning BPS Spectra and the Gap Conjecture.Phys. Rev. D, 110:046016, Aug 2024

  65. [65]

    DeepSeek-R1 Incentivizes Reasoning in LLMs Through Reinforcement Learning.Nature, 645(8081):633–638, 2025

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. DeepSeek-R1 Incentivizes Reasoning in LLMs Through Reinforcement Learning.Nature, 645(8081):633–638, 2025

  66. [66]

    Can Transformers Do Enumerative Geometry? InThe Thirteenth International Conference on Learning Representations, 2025

    Baran Hashemi, Roderic Guigo Corominas, and Alessandro Giacchetto. Can Transformers Do Enumerative Geometry? InThe Thirteenth International Conference on Learning Representations, 2025

  67. [67]

    Murmurations of Elliptic Curves.Experimental Mathematics, 34(3):528–540, 2025

    Yang-Hui He, Kyu-Hwan Lee, Thomas Oliver, and Alexey Pozdnyakov. Murmurations of Elliptic Curves.Experimental Mathematics, 34(3):528–540, 2025. 24

  68. [68]

    FGeo-TP: A Language Model-Enhanced Solver for Euclidean Geometry Problems.Symmetry, 16(4):421, 2024

    Yiming He, Jia Zou, Xiaokai Zhang, Na Zhu, and Tuo Leng. FGeo-TP: A Language Model-Enhanced Solver for Euclidean Geometry Problems.Symmetry, 16(4):421, 2024

  69. [69]

    Measuring Mathematical Problem Solving With the MATH Dataset.NeurIPS, 2021

    Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring Mathematical Problem Solving With the MATH Dataset.NeurIPS, 2021

  70. [70]

    Sine Qua Non for Large Theory Reasoning

    Kryˇ stof Hoder and Andrei Voronkov. Sine Qua Non for Large Theory Reasoning. In International Conference on Automated Deduction, pages 299–314. Springer, 2011

  71. [71]

    GamePad: A Learning Environment for Theorem Proving

    Daniel Huang, Prafulla Dhariwal, Dawn Song, and Ilya Sutskever. Gamepad: A Learning Environment for Theorem Proving.arXiv preprint arXiv:1806.00608, 2018

  72. [72]

    FormaRL: Enhancing Autoformalization with No Labeled Data

    Yanxing Huang, Xinling Jin, Sijie Liang, Fuwen Luo, Peng Li, and Yang Liu. FormaRL: Enhancing Autoformalization with No Labeled Data. InSecond Conference on Language Modeling, 2025

  73. [73]

    Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline.arXiv preprint arXiv:2507.15855, 2025

    Yichen Huang and Lin F Yang. Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline.arXiv preprint arXiv:2507.15855, 2025

  74. [74]

    Olympiad-Level Formal Mathematical Reasoning with Reinforcement Learning.Nature, pages 1–3, 2025

    Thomas Hubert, Rishi Mehta, Laurent Sartran, Mikl´ os Z Horv´ ath, Goran ˇZuˇ zi´ c, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Masoom, et al. Olympiad-Level Formal Mathematical Reasoning with Reinforcement Learning.Nature, pages 1–3, 2025

  75. [75]

    Alemi, Niklas E´ en, Fran¸ cois Chollet, and Josef Urban

    Geoffrey Irving, Christian Szegedy, Alexander A. Alemi, Niklas E´ en, Fran¸ cois Chollet, and Josef Urban. DeepMath: Deep Sequence Models for Premise Selection.Advances in Neural Information Processing Systems (NeurIPS), 29, 2016

  76. [76]

    Point Convergence of Nesterov’s Accelerated Gradient Method: An AI-Assisted Proof.arXiv preprint arXiv:2510.23513, 2025

    Uijeong Jang and Ernest K Ryu. Point Convergence of Nesterov’s Accelerated Gradient Method: An AI-Assisted Proof.arXiv preprint arXiv:2510.23513, 2025

  77. [77]

    Multilingual Mathematical Autoformal- ization.arXiv preprint arXiv:2311.03755, 2023

    Albert Q Jiang, Wenda Li, and Mateja Jamnik. Multilingual Mathematical Autoformal- ization.arXiv preprint arXiv:2311.03755, 2023

  78. [78]

    Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers.Advances in Neural Information Processing Systems, 35:8360–8373, 2022

    Albert Qiaochu Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzyg´ o´ zd´ z, Piotr Mi lo´ s, Yuhuai Wu, and Mateja Jamnik. Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers.Advances in Neural Information Processing Systems, 35:8360–8373, 2022

  79. [79]

    Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs

    Albert Qiaochu Jiang, Sean Welleck, Jin Peng Zhou, Timothee Lacroix, Jiacheng Liu, Wenda Li, Mateja Jamnik, Guillaume Lample, and Yuhuai Wu. Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs. InThe Eleventh International Conference on Learning Representations, 2023

  80. [80]

    FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels.arXiv preprint arXiv:2511.02872, 2025

    Jiedong Jiang, Wanyi He, Yuefeng Wang, Guoxiong Gao, Yongle Hu, Jingting Wang, Nailing Guan, Peihao Wu, Chunbo Dai, Liang Xiao, et al. FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels.arXiv preprint arXiv:2511.02872, 2025

Showing first 80 references.