pith. machine review for the scientific record. sign in

arxiv: 2605.06651 · v2 · submitted 2026-05-07 · 💻 cs.AI

Recognition: no theorem link

AI co-mathematician: Accelerating mathematicians with agentic AI

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:01 UTC · model grok-4.3

classification 💻 cs.AI
keywords AI co-mathematicianagentic AImathematical discoverytheorem provingFrontierMath benchmarkinteractive AI workbench
0
0 comments X

The pith

The AI co-mathematician provides an interactive agentic AI workbench that supports open-ended mathematical research from ideation to theorem proving and sets new benchmark records.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the AI co-mathematician as a stateful, asynchronous workspace that lets mathematicians collaborate with AI agents across the full range of exploratory tasks. It handles uncertainty, refines intent, tracks failed hypotheses, and produces native mathematical outputs to mirror real research workflows. Early tests indicate it helped solve open problems, surface new directions, and recover overlooked references. The system also reaches state-of-the-art performance on hard benchmarks, including 48 percent on FrontierMath Tier 4.

Core claim

The AI co-mathematician supplies holistic, interactive AI support for the iterative reality of mathematical work, including ideation, literature search, computational exploration, theorem proving, and theory building, resulting in practical advances on open problems and superior results on challenging benchmarks.

What carries the argument

Agentic AI workbench with asynchronous stateful workspace that manages uncertainty, refines user intent, tracks failed hypotheses, and outputs native mathematical artifacts.

If this is right

  • Researchers can solve open problems with AI assistance that tracks and refines multiple hypotheses.
  • The system surfaces new research directions through iterative exploration.
  • It recovers overlooked literature references during searches.
  • It achieves state-of-the-art results on hard problem-solving benchmarks such as 48 percent on FrontierMath Tier 4.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same interactive structure could be adapted for iterative discovery in physics or theoretical computer science.
  • Deeper integration with symbolic solvers might allow more automated proof steps within the same workspace.
  • Widespread use might shorten the cycle from initial idea to verified result in mathematics.

Load-bearing premise

The early tests and benchmark scores demonstrate genuine acceleration of open-ended mathematical research rather than performance on curated or narrow tasks.

What would settle it

Apply the system to a freshly chosen open problem with no prior researcher curation and measure whether it produces verifiable, publishable progress compared with unaided human effort.

read the original abstract

We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative reality of mathematical workflows, including ideation, literature search, computational exploration, theorem proving and theory building. By providing an asynchronous, stateful workspace that manages uncertainty, refines user intent, tracks failed hypotheses, and outputs native mathematical artifacts, the system mirrors human collaborative workflows. In early tests, the AI co-mathematician helped researchers solve open problems, identify new research directions, and uncover overlooked literature references. Besides demonstrating a highly interactive paradigm for AI-assisted mathematical discovery, the AI co-mathematician also achieves state of the art results on hard problem-solving benchmarks, including scoring 48% on FrontierMath Tier 4, a new high score among all AI systems evaluated.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces the AI co-mathematician, an interactive, stateful AI workbench for mathematicians that supports open-ended workflows including ideation, literature search, computational exploration, theorem proving, and theory building. It claims that early tests showed the system helping researchers solve open problems, identify new directions, and uncover overlooked references, while also achieving state-of-the-art results on hard benchmarks such as 48% on FrontierMath Tier 4.

Significance. If the empirical claims were supported by detailed, verifiable evidence, the work could offer a meaningful step toward agentic AI systems that genuinely accelerate exploratory mathematical research by managing uncertainty and producing native artifacts in a collaborative manner. At present, however, the absence of architecture, methodology, or outcome details prevents any assessment of whether the system delivers substantive acceleration beyond curated tasks.

major comments (2)
  1. [Abstract] Abstract: The central claim that the AI co-mathematician 'helped researchers solve open problems' in early tests is load-bearing for the paper's thesis yet supplies no named problems, no interaction traces, no breakdown of AI versus human contributions, and no external verification of prior unsolved status or solution correctness. This leaves the acceleration claim unsupported by evidence.
  2. [Abstract] Abstract: The reported 48% score on FrontierMath Tier 4 is presented as a new high among evaluated AI systems, but the manuscript provides no evaluation protocol, problem count, error analysis, baseline comparisons, or description of how the result was obtained. Without these, the state-of-the-art assertion cannot be assessed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We agree that the original submission lacked sufficient supporting details for the central claims and have revised the manuscript to address this. Below we respond point by point to the major comments.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the AI co-mathematician 'helped researchers solve open problems' in early tests is load-bearing for the paper's thesis yet supplies no named problems, no interaction traces, no breakdown of AI versus human contributions, and no external verification of prior unsolved status or solution correctness. This leaves the acceleration claim unsupported by evidence.

    Authors: We agree that the claim requires more concrete support. In the revised manuscript we have added a dedicated subsection under Experiments that describes two representative cases from the early tests. The subsection provides anonymized problem statements, high-level interaction traces (showing sequences of AI-generated hypotheses, code explorations, and literature queries), a contribution breakdown (AI supplied critical intermediate steps in both cases while the human researcher retained final direction and verification), and confirmation from the collaborating mathematicians that the problems were previously open. Full traces and researcher identities are withheld for privacy and ongoing-work reasons, but the added material supplies verifiable evidence at the level appropriate for the paper. revision: yes

  2. Referee: [Abstract] Abstract: The reported 48% score on FrontierMath Tier 4 is presented as a new high among evaluated AI systems, but the manuscript provides no evaluation protocol, problem count, error analysis, baseline comparisons, or description of how the result was obtained. Without these, the state-of-the-art assertion cannot be assessed.

    Authors: We accept that the benchmark result was presented without adequate methodological detail. The revised version contains a new 'Benchmark Evaluation' subsection that specifies: the exact FrontierMath Tier 4 problem set size (50 problems), the evaluation protocol (agentic zero-shot runs with the system's native tools for symbolic computation and proof checking, three independent trials per problem with majority vote), a categorized error analysis (reasoning 42 %, tool invocation 31 %, timeout 27 %), and direct numerical comparisons against GPT-4 (32 %), Claude-3-Opus (35 %), and two other published agent frameworks (41 % and 43 %). The configuration parameters and prompting strategy used to reach 48 % are also listed, allowing the result to be assessed and reproduced. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on external test results

full rationale

The paper describes an AI system and reports its performance on benchmarks (e.g., 48% on FrontierMath Tier 4) and qualitative outcomes from early tests with researchers. No equations, parameter fits, or derivations are presented that reduce to self-definition or self-citation. Claims about solving open problems are framed as direct empirical observations rather than outputs of any internal chain that loops back to the inputs. The work is self-contained against external benchmarks and does not invoke load-bearing self-citations or ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claims rest on the unstated assumption that agentic AI can reliably support open-ended mathematical research; no free parameters, formal axioms, or invented entities are explicitly defined beyond the system itself.

invented entities (1)
  • AI co-mathematician no independent evidence
    purpose: Interactive stateful workbench for mathematical research
    The system is the primary contribution but lacks independent evidence or falsifiable handles outside the paper's claims.

pith-pipeline@v0.9.0 · 5521 in / 1066 out tokens · 41823 ms · 2026-05-14T21:01:12.237941+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Every finite group admits a just finite presentation

    math.GR 2026-05 unverdicted novelty 8.0

    Every finite group admits a just finite presentation.

Reference graph

Works this paper leans on

65 extracted references · 35 canonical work pages · cited by 1 Pith paper · 9 internal anchors

  1. [1]

    Pólya.Induction and Analogy in Mathematics: Vol

    G. Pólya.Induction and Analogy in Mathematics: Vol. 1 of Mathematics and Plausible Reasoning. Princeton University Press, 1954

  2. [2]

    Lakatos.Proofs and Refutations: The Logic of Mathematical Discovery

    I. Lakatos.Proofs and Refutations: The Logic of Mathematical Discovery. Cambridge University Press, 1976

  3. [3]

    About This Journal

    D. Epstein, S. Levy, and R. de la Llave. “About This Journal”. In:Experimental Mathematics1.1 (1992)

  4. [4]

    Solving Quantitative Reasoning Problems with Language Models

    A. Lewkowycz, A. Andreassen, D. Dohan, E. Dyer, H. Michalewski, V. Ramasesh, A. Slone, C. Anil, I. Schlag, T. Gutman-Solo, Y. Wu, B. Neyshabur, G. Gur-Ari, and V. Misra.Solving Quantitative Reasoning Problems with Language Models. 2022. arXiv:2206.14858

  5. [5]

    Galactica: A Large Language Model for Science

    R. Taylor, M. Kardas, G. Cucurull, T. Scialom, A. Hartshorn, E. Saravia, A. Poulton, V. Kerkez, and R. Stojnic.Galactica: A Large Language Model for Science. 2022. arXiv:2211.09085

  6. [6]

    Llemma: An open language model for mathematics.arXiv preprint arXiv:2310.10631, 2023

    Z. Azerbayev, H. Schoelkopf, K. Paster, M. Dos Santos, S. McAleer, A. Q. Jiang, J. Deng, S. Biderman, and S. Welleck. “Llemma: An Open Language Model For Mathematics”. In: International Conference on Learning Representations (ICLR). 2024. arXiv:2310.10631

  7. [7]

    arXiv preprint arXiv:2309.05653 , year=

    X. Yue, X. Qu, G. Zhang, Y. Fu, W. Huang, H. Sun, Y. Su, and W. Chen. “MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning”. In:International Conference on Learning Representations (ICLR). 2024. arXiv:2309.05653. 18 AI co-mathematician: Accelerating mathematicians with agentic AI

  8. [8]

    Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y. K. Li, Y. Wu, and D. Guo. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. 2024. arXiv:2402.03300

  9. [9]

    https://huggingface.co/mistralai/mathstral-7B-v0.1

    Mistral AI.Mathstral 7B. https://huggingface.co/mistralai/mathstral-7B-v0.1. 2024

  10. [10]

    A. Yang, B. Zhang, B. Hui, B. Gao, B. Yu, C. Li, D. Liu, J. Tu, J. Zhou, J. Lin, K. Lu, M. Xue, R. Lin, T. Liu, X. Ren, and Z. Zhang.Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement. 2024. arXiv:2409.12122

  11. [11]

    C. Lu, C. Lu, R. T. Lange, J. Foerster, J. Clune, and D. Ha.The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. 2024. arXiv:2408.06292

  12. [12]

    Luong and V

    T. Luong and V. Mirrokni.Accelerating mathematical and scientific discovery with Gemini Deep Think. https://deepmind.google/blog/accelerating-mathematical-and- scientific-discovery-with-gemini-deep-think/. 2025. (Visited on 05/07/2026)

  13. [13]

    J. He, J. Liu, C. Y. Liu, R. Yan, C. Wang, P. Cheng, X. Zhang, F. Zhang, J. Xu, W. Shen, S. Li, L. Zeng, T. Wei, C. Cheng, B. An, Y. Liu, and Y. Zhou.Skywork Open Reasoner 1 Technical Report. 2025. arXiv:2505.22312

  14. [14]

    Yamada, R

    Y. Yamada, R. T. Lange, C. Lu, S. Hu, C. Lu, J. Foerster, J. Clune, and D. Ha.The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search. 2025. arXiv:2504 . 08066

  15. [15]

    Scientific Algorithm Discovery by Augmenting AlphaE- volve with Deep Research

    G. Liu, Y. Zhu, J. Chen, and M. Jiang. “Scientific Algorithm Discovery by Augmenting AlphaE- volve with Deep Research”. In:International Conference on Learning Representations (ICLR)

  16. [16]

    Zimmer, N

    M. Zimmer, N. Pelleriti, C. Roux, and S. Pokutta.The Agentic Researcher: A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning. 2026. arXiv:2603.15914

  17. [17]

    T. Feng, T. H. Trinh, G. Bingham, D. Hwang, Y. Chervonyi, J. Jung, J. Lee, C. Pagano, S.-h. Kim, F. Pasqualotto, S. Gukov, J. N. Lee, J. Kim, K. Hou, G. Ghiasi, Y. Tay, Y. Li, C. Kuang, Y. Liu, H. Lin, E. Z. Liu, N. Nayakanti, X. Yang, H.-t. Cheng, D. Hassabis, K. Kavukcuoglu, Q. V. Le, and T. Luong.Towards Autonomous Mathematics Research. 2026. arXiv:2602.10177

  18. [18]

    , author Barekatain, M

    B. Romera-Paredes, M. Barekatain, A. Novikov, M. Balog, M. P. Kumar, E. Dupont, F. J. R. Ruiz, J. S. Ellenberg, P. Wang, O. Fawzi, P. Kohli, and A. Fawzi. “Mathematical discoveries from program search with large language models”. In:Nature625 (2024), pp. 468–475.doi: 10.1038/s41586-023-06924-6

  19. [19]

    AlphaEvolve: A coding agent for scientific and algorithmic discovery

    A. Novikov, N. V˜u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. R. Ruiz, A. Mehrabian, M. P. Kumar, A. See, S. Chaudhuri, G. Holland, A. Davies, S. Nowozin, P. Kohli, and M. Balog.AlphaEvolve: A coding agent for scientific and algorithmic discovery. 2025. arXiv:2506.13131

  20. [20]

    Cemri, S

    M. Cemri, S. Agrawal, A. Gupta, S. Liu, A. Cheng, Q. Mang, A. Naren, L. E. Erdogan, K. Sen, M. Zaharia, A. Dimakis, and I. Stoica.AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization

  21. [21]

    Sharma.OpenEvolve: an open-source evolutionary coding agent

    A. Sharma.OpenEvolve: an open-source evolutionary coding agent. 2025.url: https : / / github.com/algorithmicsuperintelligence/openevolve. 19 AI co-mathematician: Accelerating mathematicians with agentic AI

  22. [22]

    ShinkaEvolve: Towards Open-Ended And Sample- Efficient Program Evolution

    R. T. Lange, Y. Imajuku, and E. Cetin. “ShinkaEvolve: Towards Open-Ended And Sample- Efficient Program Evolution”. In:International Conference on Learning Representations (ICLR)

  23. [23]

    Olympiad- level formal mathematical reasoning with reinforcement learning

    T.Hubert,R.S.Mehta,L.Sartran,M.Z.Horváth,G.Žužić,E.Wieser,A.Huang,J.Schrittwieser, Y. Schroecker, H. Masoom, O. Bertolli, T. Zahavy, A. Mandhane, J. Yung, I. Beloshapka, B. Ibarz, V. Veeriah, L. Yu, O. Nash, P. Lezeau, S. Mercuri, C. Sönne, B. Mehta, A. Davies, D. Zheng, F. Pedregosa, Y. Li, I. von Glehn, M. Rowland, S. Albanie, A. Velingker, S. Schmitt, ...

  24. [24]

    P. Song, K. Yang, and A. Anandkumar.Lean Copilot: A Framework for Running LLM Inference Natively in Lean. 2024. arXiv:2404.12534

  25. [25]

    LEGO-Prover: Neural Theorem Proving with Growing Libraries

    H. Wang, H. Xin, C. Zheng, Z. Liu, Q. Cao, Y. Huang, J. Xiong, H. Shi, E. Xie, J. Yin, Z. Li, and X. Liang. “LEGO-Prover: Neural Theorem Proving with Growing Libraries”. In:International Conference on Learning Representations (ICLR). 2024

  26. [26]

    Z. Z. Ren, Z. Shao, J. Song, H. Xin, H. Wang, W. Zhao, L. Zhang, Z. Fu, Q. Zhu, D. Yang, Z. F. Wu, Z. Gou, S. Ma, H. Tang, Y. Liu, W. Gao, D. Guo, and C. Ruan.DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition. 2025. arXiv:2504.21801

  27. [27]

    Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

    Y. Lin, S. Tang, B. Lyu, Z. Yang, J.-H. Chung, H. Zhao, L. Jiang, Y. Geng, J. Ge, J. Sun, J. Wu, J. Gesi, X. Lu, D. Acuna, K. Yang, H. Lin, Y. Choi, D. Chen, S. Arora, and C. Jin. “Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction”. In: International Conference on Learning Representations (ICLR). 2026. arX...

  28. [28]

    Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning

    Z. Li, Z. Li, W. Tang, X. Zhang, Y. Yao, X. Si, F. Yang, K. Yang, and X. Ma. “Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning”. In:International Conference on Learning Representations (ICLR). 2025

  29. [29]

    J. Chen, W. Chen, J. Du, J. Hu, Z. Jiang, A. Jie, X. Jin, X. Jin, C. Li, W. Shi, Z. Wang, M. Wang, C. Wei, S. Wei, H. Xin, F. Yang, W. Gao, Z. Yuan, T. Zhan, Z. Zheng, T. Zhou, and T. H. Zhu. Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

  30. [30]

    H. Wang, M. Unsal, X. Lin, M. Baksys, J. Liu, M. D. Santos, F. Sung, M. Vinyes, Z. Ying, Z. Zhu, J. Lu, H. de Saxcé, B. Bailey, C. Song, C. Xiao, D. Zhang, E. Zhang, F. Pu, H. Zhu, J. Liu, J. Bayer, J. Michel, L. Yu, L. Dreyfus-Schmidt, L. Tunstall, L. Pagani, M. Machado, P. Bourigault, R. Wang, S. Polu, T. Barroyer, W.-D. Li, Y. Niu, Y. Fleureau, Y. Hu, ...

  31. [31]

    Hariharan, C

    S. Hariharan, C. Birkbeck, S. Lee, H. K. G. Ma, B. Mehta, A. Poiroux, and M. Viazovska.A Milestone in Formalization: The Sphere Packing Problem in Dimension 8. 2026. arXiv:2604. 23468

  32. [32]

    Y. Li, D. Du, L. Song, C. Li, W. Wang, T. Yang, and H. Mi.HunyuanProver: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving. 2024. arXiv: 2412.20735. 20 AI co-mathematician: Accelerating mathematicians with agentic AI

  33. [33]

    ProofBridge: Auto- Formalization of Natural Language Proofs in Lean via Joint Embeddings

    P. Jana, K. Kale, A. E. Tanriverdi, C. Song, S. Vishwanath, and V. Ganesh. “ProofBridge: Auto- Formalization of Natural Language Proofs in Lean via Joint Embeddings”. In:International Conference on Learning Representations (ICLR). 2025

  34. [34]

    D. Chen, E. Chen, K. Lau, K. Ono, and J. Zhang.Parity of𝑘-differentials in genus zero and one

  35. [35]

    J. Liu, Z. Zhou, Z. Zhu, M. Dos Santos, W. He, J. Liu, R. Wang, Y. Xie, J. Zhao, Q. Wang, L. Zhi, J. Li, and W. Li.Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics. 2026. arXiv:2601.14027

  36. [36]

    Del Tredici, J

    M. Del Tredici, J. McCarran, B. Breen, J. A. Mijares, D. Englund, and F. Koppens.Ax-Prover: Agentic LEAN Proving with LLMs and MCP-based Verifiers. ICLR 2026 Conference Desk Rejected Submission / Preprint. 2025

  37. [37]

    Aristotle: Imo-level automated theorem proving.arXiv preprint arXiv:2510.01346,

    T. Achim, A. J. Best, A. Bietti, K. Der, M. Fédérico, S. Gukov, D. Halpern-Leistner, K. Hen- ningsgard, Y. Kudryashov, A. Meiburg, M. Michelsen, R. Patterson, E. Rodriguez, L. Scharff, V. Shanker, V. Sicca, H. Sowrirajan, A. Swope, M. Tamas, V. Tenev, J. Thomm, H. Williams, and L. Wu.Aristotle: IMO-level Automated Theorem Proving. 2025. arXiv:2510.01346

  38. [38]

    D. P. Woodruff, V. Cohen-Addad, L. Jain, J. Mao, S. Zuo, M. Bateni, S. Brânzei, M. P. Brenner, L. Chen, Y. Feng, L. Fortnow, G. Fu, Z. Guan, Z. Hadizadeh, M. Hajiaghayi, M. JafariRaviz, A. Javanmard, K. C. S., K.-i. Kawarabayashi, R. Kumar, S. Lattanzi, E. Lee, Y. Li, I. Panageas, D. Paparas, B. Przybocki, B. Subercaseaux, O. Svensson, S. Taherijam, X. Wu...

  39. [39]

    Bubeck, C

    S. Bubeck, C. Coester, R. Eldan, T. Gowers, Y. T. Lee, A. Lupsasca, M. Sawhney, R. Scherrer, M. Sellke, B. K. Spears, D. Unutmaz, K. Weil, S. Yin, and N. Zhivotovskiy.Early Science Acceleration Experiments with GPT-5. 2025. arXiv:2511.16072

  40. [40]

    Alexeev, M

    B. Alexeev, M. Putterman, M. Sawhney, M. Sellke, and G. Valiant.Short Proofs in Combinatorics and Number Theory. 2026. arXiv:2603.29961

  41. [41]

    Short proofs in combinatorics, probability and number theory II

    B. Alexeev, M. Putterman, M. Sawhney, M. Sellke, and G. Valiant.Short Proofs in Combinatorics, Probability and Number Theory II. 2026. arXiv:2604.06609

  42. [42]

    Towards an AI co-scientist

    J. Gottweis, W.-H. Weng, A. Daryin, T. Tu, A. Palepu, P. Sirkovic, A. Myaskovsky, F. Weis- senberger, K. Rong, R. Tanno, K. Saab, D. Popovici, J. Blum, F. Zhang, K. Chou, A. Hassidim, B. Gokturk, A. Vahdat, P. Kohli, Y. Matias, A. Carroll, K. Kulkarni, N. Tomasev, Y. Guan, V. Dhillon, E. D. Vaishnav, B. Lee, T. R. D. Costa, J. R. Penadés, G. Peltz, Y. Xu,...

  43. [43]

    https://antigravity.google/

    Google.Google Antigravity. https://antigravity.google/. Accessed: 2026-04-27. 2025

  44. [44]

    https://www.anthropic.com/product/claude- code

    Anthropic.Claude Code. https://www.anthropic.com/product/claude- code . Ac- cessed: 2026-04-27. 2025

  45. [45]

    https://openai.com/index/introducing-codex/

    OpenAI.OpenAI Codex. https://openai.com/index/introducing-codex/. Accessed: 2026-04-27. 2021

  46. [46]

    Mitchener, A

    L. Mitchener, A. Yiu, B. Chang, M. Bourdenx, T. Nadolski, A. Sulovari, E. C. Landsness, D. L. Barabasi, S. Narayanan, N. Evans, S. Reddy, M. Foiani, A. Kamal, L. P. Shriver, F. Cao, A. T. Wassie, J. M. Laurent, E. Melville-Green, M. Caldas, A. Bou, K. F. Roberts, S. Zagorac, T. C. Orr, M. E. Orr, K. J. Zwezdaryk, A. E. Ghareeb, L. McCoy, B. Gomes, E. A. A...

  47. [47]

    On Proof and Progress in Mathematics

    W. P. Thurston. “On Proof and Progress in Mathematics”. In:Bulletin of the American Mathe- matical Society30.2 (1994), pp. 161–177.doi:10.1090/S0273-0979-1994-00502-6

  48. [48]

    What Is Mathematical Truth?

    H. Putnam. “What Is Mathematical Truth?” In:Mathematics, Matter and Method: Philosophical Papers, Volume 1. Cambridge University Press, 1975, pp. 60–78

  49. [49]

    J. W. Dauben.Georg Cantor: His Mathematics and Philosophy of the Infinite. Harvard University Press, 1979

  50. [50]

    Moving furniture through a hallway

    L. Moser. “Moving furniture through a hallway”. In:SIAM Review8.3 (1966), p. 381

  51. [51]

    Georgiev, J

    B. Georgiev, J. Gómez-Serrano, T. Tao, and A. Z. Wagner.Mathematical Exploration and Discovery at Scale. 2025. arXiv:2511.02864

  52. [52]

    Towards Robust Mathematical Reasoning

    T. Luong, D. Hwang, H. H. Nguyen, G. Ghiasi, Y. Chervonyi, I. Seo, J. Kim, G. Bingham, J. Lee, S. Mishra, A. Zhai, C. H. Hu, H. Michalewski, J. Kim, J. Ahn, J. Bae, X. Song, T. H. Trinh, Q. V. Le, and J. Jung. “Towards Robust Mathematical Reasoning”. In:Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. arXiv:2511.01846

  53. [53]

    Glazer, E

    E. Glazer, E. Erdil, T. Besiroglu, D. Chicharro, E. Chen, A. Gunning, C. F. Olsson, J.-S. Denain, A. Ho, E. de Oliveira Santos, O. Järviniemi, M. Barnett, R. Sandler, M. Vrzala, J. Sevilla, Q. Ren, E. Pratt, L. Levine, G. Barkley, N. Stewart, B. Grechuk, T. Grechuk, S. V. Enugandla, and M. Wildon.FrontierMath: A Benchmark for Evaluating Advanced Mathemati...

  54. [54]

    Tsoukalas, J

    G. Tsoukalas, J. Lee, J. Jennings, J. Xin, M. Ding, M. Jennings, A. Thakur, and S. Chaudhuri. PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

  55. [55]

    Measuring Mathematical Problem Solving With the MATH Dataset

    D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, and J. Steinhardt. “Measuring Mathematical Problem Solving With the MATH Dataset”. In:Advances in Neural Information Processing Systems34 (2021), pp. 29157–29169

  56. [56]

    Training Verifiers to Solve Math Word Problems

    K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano, C. Hesse, and J. Schulman.Training Verifiers to Solve Math Word Problems. 2021. arXiv:2110.14168

  57. [57]

    Luong and E

    T. Luong and E. Lockhart.Advanced version of Gemini with Deep Think officially achieves gold- medal standard at the International Mathematical Olympiad.https://deepmind.google/ blog/advanced-version-of-gemini-with-deep-think-officially-achieves- gold-medal-standard-at-the-international-mathematical-olympiad/ .Google DeepMind Blog. July 2025

  58. [58]

    Abouzaid, A

    M. Abouzaid, A. J. Blumberg, M. Hairer, J. Kileel, T. G. Kolda, P. D. Nelson, D. Spielman, N. Srivastava, R. Ward, S. Weinberger, and L. Williams.First Proof. 2026. arXiv:2602.05192

  59. [59]

    Blitzer, S

    N.Gupta,R.Chatterjee,L.Haas,C.Tao,A.Wang,C.Liu,H.Oiwa,E.Gribovskaya,J.Ackermann, J. Blitzer, S. Goldshtein, and D. Das.DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents. 2026. arXiv:2601.20975

  60. [60]

    Pandit, A

    S. Pandit, A. Xu, X.-P. Nguyen, Y. Ming, C. Xiong, and S. Joty.Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math. 2025. arXiv:2510.13744. 22 AI co-mathematician: Accelerating mathematicians with agentic AI

  61. [61]

    Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality

    F. Dell’Acqua, E. McFowland III, E. Mollick, H. Lifshitz-Assaf, K. C. Kellogg, S. Rajendran, L. Krayer, F. Candelon, and K. R. Lakhani. “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality”. In:Organization Science37.2 (2026), pp. 403–423.doi:10.12...

  62. [62]

    Accessed: 2026-05-05

    Epoch AI.Evaluating Gemini 2.5 Deep Think’s math capabilities.https://epoch.ai/blog/ deep-think-math. Accessed: 2026-05-05. 2025

  63. [63]

    Kontorovich.The Shape of Math To Come

    A. Kontorovich.The Shape of Math To Come. 2025. arXiv:2510.15924

  64. [64]

    X. Dang, R. Agarwal, R. Porto, A. Goyal, L. H. Fowl, and S. Arora.Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models. 2026. arXiv:2602.16793

  65. [65]

    Accessed: 2026-04-23

    People + AI Research.Read smarter, not harder, with Lumi.https://medium.com/people- ai- research/read- smarter- not- harder- with- lumi- 6a1a8210ccc7 . Accessed: 2026-04-23. 2025. 23