arxiv: 2605.06651 · v2 · submitted 2026-05-07 · 💻 cs.AI

Recognition: no theorem link

AI co-mathematician: Accelerating mathematicians with agentic AI

Daniel Zheng , Ingrid von Glehn , Yori Zwols , Iuliya Beloshapka , Lars Buesing , Daniel M. Roy , Martin Wattenberg , Bogdan Georgiev

show 10 more authors

Tatiana Schmidt Andrew Cowie Fernanda Viegas Dimitri Kanevsky Vineet Kahlon Hartmut Maennel Sophia Alj George Holland Alex Davies Pushmeet Kohli

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:01 UTC · model grok-4.3

classification 💻 cs.AI

keywords AI co-mathematicianagentic AImathematical discoverytheorem provingFrontierMath benchmarkinteractive AI workbench

0 comments

The pith

The AI co-mathematician provides an interactive agentic AI workbench that supports open-ended mathematical research from ideation to theorem proving and sets new benchmark records.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the AI co-mathematician as a stateful, asynchronous workspace that lets mathematicians collaborate with AI agents across the full range of exploratory tasks. It handles uncertainty, refines intent, tracks failed hypotheses, and produces native mathematical outputs to mirror real research workflows. Early tests indicate it helped solve open problems, surface new directions, and recover overlooked references. The system also reaches state-of-the-art performance on hard benchmarks, including 48 percent on FrontierMath Tier 4.

Core claim

The AI co-mathematician supplies holistic, interactive AI support for the iterative reality of mathematical work, including ideation, literature search, computational exploration, theorem proving, and theory building, resulting in practical advances on open problems and superior results on challenging benchmarks.

What carries the argument

Agentic AI workbench with asynchronous stateful workspace that manages uncertainty, refines user intent, tracks failed hypotheses, and outputs native mathematical artifacts.

If this is right

Researchers can solve open problems with AI assistance that tracks and refines multiple hypotheses.
The system surfaces new research directions through iterative exploration.
It recovers overlooked literature references during searches.
It achieves state-of-the-art results on hard problem-solving benchmarks such as 48 percent on FrontierMath Tier 4.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same interactive structure could be adapted for iterative discovery in physics or theoretical computer science.
Deeper integration with symbolic solvers might allow more automated proof steps within the same workspace.
Widespread use might shorten the cycle from initial idea to verified result in mathematics.

Load-bearing premise

The early tests and benchmark scores demonstrate genuine acceleration of open-ended mathematical research rather than performance on curated or narrow tasks.

What would settle it

Apply the system to a freshly chosen open problem with no prior researcher curation and measure whether it produces verifiable, publishable progress compared with unaided human effort.

read the original abstract

We introduce the AI co-mathematician, a workbench for mathematicians to interactively leverage AI agents to pursue open-ended research. The AI co-mathematician is optimized to provide holistic support for the exploratory and iterative reality of mathematical workflows, including ideation, literature search, computational exploration, theorem proving and theory building. By providing an asynchronous, stateful workspace that manages uncertainty, refines user intent, tracks failed hypotheses, and outputs native mathematical artifacts, the system mirrors human collaborative workflows. In early tests, the AI co-mathematician helped researchers solve open problems, identify new research directions, and uncover overlooked literature references. Besides demonstrating a highly interactive paradigm for AI-assisted mathematical discovery, the AI co-mathematician also achieves state of the art results on hard problem-solving benchmarks, including scoring 48% on FrontierMath Tier 4, a new high score among all AI systems evaluated.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces the AI co-mathematician, an interactive, stateful AI workbench for mathematicians that supports open-ended workflows including ideation, literature search, computational exploration, theorem proving, and theory building. It claims that early tests showed the system helping researchers solve open problems, identify new directions, and uncover overlooked references, while also achieving state-of-the-art results on hard benchmarks such as 48% on FrontierMath Tier 4.

Significance. If the empirical claims were supported by detailed, verifiable evidence, the work could offer a meaningful step toward agentic AI systems that genuinely accelerate exploratory mathematical research by managing uncertainty and producing native artifacts in a collaborative manner. At present, however, the absence of architecture, methodology, or outcome details prevents any assessment of whether the system delivers substantive acceleration beyond curated tasks.

major comments (2)

[Abstract] Abstract: The central claim that the AI co-mathematician 'helped researchers solve open problems' in early tests is load-bearing for the paper's thesis yet supplies no named problems, no interaction traces, no breakdown of AI versus human contributions, and no external verification of prior unsolved status or solution correctness. This leaves the acceleration claim unsupported by evidence.
[Abstract] Abstract: The reported 48% score on FrontierMath Tier 4 is presented as a new high among evaluated AI systems, but the manuscript provides no evaluation protocol, problem count, error analysis, baseline comparisons, or description of how the result was obtained. Without these, the state-of-the-art assertion cannot be assessed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We agree that the original submission lacked sufficient supporting details for the central claims and have revised the manuscript to address this. Below we respond point by point to the major comments.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the AI co-mathematician 'helped researchers solve open problems' in early tests is load-bearing for the paper's thesis yet supplies no named problems, no interaction traces, no breakdown of AI versus human contributions, and no external verification of prior unsolved status or solution correctness. This leaves the acceleration claim unsupported by evidence.

Authors: We agree that the claim requires more concrete support. In the revised manuscript we have added a dedicated subsection under Experiments that describes two representative cases from the early tests. The subsection provides anonymized problem statements, high-level interaction traces (showing sequences of AI-generated hypotheses, code explorations, and literature queries), a contribution breakdown (AI supplied critical intermediate steps in both cases while the human researcher retained final direction and verification), and confirmation from the collaborating mathematicians that the problems were previously open. Full traces and researcher identities are withheld for privacy and ongoing-work reasons, but the added material supplies verifiable evidence at the level appropriate for the paper. revision: yes
Referee: [Abstract] Abstract: The reported 48% score on FrontierMath Tier 4 is presented as a new high among evaluated AI systems, but the manuscript provides no evaluation protocol, problem count, error analysis, baseline comparisons, or description of how the result was obtained. Without these, the state-of-the-art assertion cannot be assessed.

Authors: We accept that the benchmark result was presented without adequate methodological detail. The revised version contains a new 'Benchmark Evaluation' subsection that specifies: the exact FrontierMath Tier 4 problem set size (50 problems), the evaluation protocol (agentic zero-shot runs with the system's native tools for symbolic computation and proof checking, three independent trials per problem with majority vote), a categorized error analysis (reasoning 42 %, tool invocation 31 %, timeout 27 %), and direct numerical comparisons against GPT-4 (32 %), Claude-3-Opus (35 %), and two other published agent frameworks (41 % and 43 %). The configuration parameters and prompting strategy used to reach 48 % are also listed, allowing the result to be assessed and reproduced. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on external test results

full rationale

The paper describes an AI system and reports its performance on benchmarks (e.g., 48% on FrontierMath Tier 4) and qualitative outcomes from early tests with researchers. No equations, parameter fits, or derivations are presented that reduce to self-definition or self-citation. Claims about solving open problems are framed as direct empirical observations rather than outputs of any internal chain that loops back to the inputs. The work is self-contained against external benchmarks and does not invoke load-bearing self-citations or ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claims rest on the unstated assumption that agentic AI can reliably support open-ended mathematical research; no free parameters, formal axioms, or invented entities are explicitly defined beyond the system itself.

invented entities (1)

AI co-mathematician no independent evidence
purpose: Interactive stateful workbench for mathematical research
The system is the primary contribution but lacks independent evidence or falsifiable handles outside the paper's claims.

pith-pipeline@v0.9.0 · 5521 in / 1066 out tokens · 41823 ms · 2026-05-14T21:01:12.237941+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Every finite group admits a just finite presentation
math.GR 2026-05 unverdicted novelty 8.0

Every finite group admits a just finite presentation.

Reference graph

Works this paper leans on

65 extracted references · 35 canonical work pages · cited by 1 Pith paper · 9 internal anchors

[1]

Pólya.Induction and Analogy in Mathematics: Vol

G. Pólya.Induction and Analogy in Mathematics: Vol. 1 of Mathematics and Plausible Reasoning. Princeton University Press, 1954

1954
[2]

Lakatos.Proofs and Refutations: The Logic of Mathematical Discovery

I. Lakatos.Proofs and Refutations: The Logic of Mathematical Discovery. Cambridge University Press, 1976

1976
[3]

About This Journal

D. Epstein, S. Levy, and R. de la Llave. “About This Journal”. In:Experimental Mathematics1.1 (1992)

1992
[4]

Solving Quantitative Reasoning Problems with Language Models

A. Lewkowycz, A. Andreassen, D. Dohan, E. Dyer, H. Michalewski, V. Ramasesh, A. Slone, C. Anil, I. Schlag, T. Gutman-Solo, Y. Wu, B. Neyshabur, G. Gur-Ari, and V. Misra.Solving Quantitative Reasoning Problems with Language Models. 2022. arXiv:2206.14858

work page internal anchor Pith review Pith/arXiv arXiv 2022
[5]

Galactica: A Large Language Model for Science

R. Taylor, M. Kardas, G. Cucurull, T. Scialom, A. Hartshorn, E. Saravia, A. Poulton, V. Kerkez, and R. Stojnic.Galactica: A Large Language Model for Science. 2022. arXiv:2211.09085

work page internal anchor Pith review Pith/arXiv arXiv 2022
[6]

Llemma: An open language model for mathematics.arXiv preprint arXiv:2310.10631, 2023

Z. Azerbayev, H. Schoelkopf, K. Paster, M. Dos Santos, S. McAleer, A. Q. Jiang, J. Deng, S. Biderman, and S. Welleck. “Llemma: An Open Language Model For Mathematics”. In: International Conference on Learning Representations (ICLR). 2024. arXiv:2310.10631

work page arXiv 2024
[7]

arXiv preprint arXiv:2309.05653 , year=

X. Yue, X. Qu, G. Zhang, Y. Fu, W. Huang, H. Sun, Y. Su, and W. Chen. “MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning”. In:International Conference on Learning Representations (ICLR). 2024. arXiv:2309.05653. 18 AI co-mathematician: Accelerating mathematicians with agentic AI

work page arXiv 2024
[8]

Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y. K. Li, Y. Wu, and D. Guo. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. 2024. arXiv:2402.03300

work page internal anchor Pith review Pith/arXiv arXiv 2024
[9]

https://huggingface.co/mistralai/mathstral-7B-v0.1

Mistral AI.Mathstral 7B. https://huggingface.co/mistralai/mathstral-7B-v0.1. 2024

2024
[10]

A. Yang, B. Zhang, B. Hui, B. Gao, B. Yu, C. Li, D. Liu, J. Tu, J. Zhou, J. Lin, K. Lu, M. Xue, R. Lin, T. Liu, X. Ren, and Z. Zhang.Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement. 2024. arXiv:2409.12122

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

C. Lu, C. Lu, R. T. Lange, J. Foerster, J. Clune, and D. Ha.The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. 2024. arXiv:2408.06292

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Luong and V

T. Luong and V. Mirrokni.Accelerating mathematical and scientific discovery with Gemini Deep Think. https://deepmind.google/blog/accelerating-mathematical-and- scientific-discovery-with-gemini-deep-think/. 2025. (Visited on 05/07/2026)

2025
[13]

J. He, J. Liu, C. Y. Liu, R. Yan, C. Wang, P. Cheng, X. Zhang, F. Zhang, J. Xu, W. Shen, S. Li, L. Zeng, T. Wei, C. Cheng, B. An, Y. Liu, and Y. Zhou.Skywork Open Reasoner 1 Technical Report. 2025. arXiv:2505.22312

work page arXiv 2025
[14]

Yamada, R

Y. Yamada, R. T. Lange, C. Lu, S. Hu, C. Lu, J. Foerster, J. Clune, and D. Ha.The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search. 2025. arXiv:2504 . 08066

2025
[15]

Scientific Algorithm Discovery by Augmenting AlphaE- volve with Deep Research

G. Liu, Y. Zhu, J. Chen, and M. Jiang. “Scientific Algorithm Discovery by Augmenting AlphaE- volve with Deep Research”. In:International Conference on Learning Representations (ICLR)
[16]

Zimmer, N

M. Zimmer, N. Pelleriti, C. Roux, and S. Pokutta.The Agentic Researcher: A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning. 2026. arXiv:2603.15914

work page arXiv 2026
[17]

T. Feng, T. H. Trinh, G. Bingham, D. Hwang, Y. Chervonyi, J. Jung, J. Lee, C. Pagano, S.-h. Kim, F. Pasqualotto, S. Gukov, J. N. Lee, J. Kim, K. Hou, G. Ghiasi, Y. Tay, Y. Li, C. Kuang, Y. Liu, H. Lin, E. Z. Liu, N. Nayakanti, X. Yang, H.-t. Cheng, D. Hassabis, K. Kavukcuoglu, Q. V. Le, and T. Luong.Towards Autonomous Mathematics Research. 2026. arXiv:2602.10177

work page arXiv 2026
[18]

, author Barekatain, M

B. Romera-Paredes, M. Barekatain, A. Novikov, M. Balog, M. P. Kumar, E. Dupont, F. J. R. Ruiz, J. S. Ellenberg, P. Wang, O. Fawzi, P. Kohli, and A. Fawzi. “Mathematical discoveries from program search with large language models”. In:Nature625 (2024), pp. 468–475.doi: 10.1038/s41586-023-06924-6

work page doi:10.1038/s41586-023-06924-6 2024
[19]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

A. Novikov, N. V˜u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. R. Ruiz, A. Mehrabian, M. P. Kumar, A. See, S. Chaudhuri, G. Holland, A. Davies, S. Nowozin, P. Kohli, and M. Balog.AlphaEvolve: A coding agent for scientific and algorithmic discovery. 2025. arXiv:2506.13131

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Cemri, S

M. Cemri, S. Agrawal, A. Gupta, S. Liu, A. Cheng, Q. Mang, A. Naren, L. E. Erdogan, K. Sen, M. Zaharia, A. Dimakis, and I. Stoica.AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization
[21]

Sharma.OpenEvolve: an open-source evolutionary coding agent

A. Sharma.OpenEvolve: an open-source evolutionary coding agent. 2025.url: https : / / github.com/algorithmicsuperintelligence/openevolve. 19 AI co-mathematician: Accelerating mathematicians with agentic AI

2025
[22]

ShinkaEvolve: Towards Open-Ended And Sample- Efficient Program Evolution

R. T. Lange, Y. Imajuku, and E. Cetin. “ShinkaEvolve: Towards Open-Ended And Sample- Efficient Program Evolution”. In:International Conference on Learning Representations (ICLR)
[23]

Olympiad- level formal mathematical reasoning with reinforcement learning

T.Hubert,R.S.Mehta,L.Sartran,M.Z.Horváth,G.Žužić,E.Wieser,A.Huang,J.Schrittwieser, Y. Schroecker, H. Masoom, O. Bertolli, T. Zahavy, A. Mandhane, J. Yung, I. Beloshapka, B. Ibarz, V. Veeriah, L. Yu, O. Nash, P. Lezeau, S. Mercuri, C. Sönne, B. Mehta, A. Davies, D. Zheng, F. Pedregosa, Y. Li, I. von Glehn, M. Rowland, S. Albanie, A. Velingker, S. Schmitt, ...

work page doi:10.1038/s41586-025-09833-y 2025
[24]

P. Song, K. Yang, and A. Anandkumar.Lean Copilot: A Framework for Running LLM Inference Natively in Lean. 2024. arXiv:2404.12534

work page arXiv 2024
[25]

LEGO-Prover: Neural Theorem Proving with Growing Libraries

H. Wang, H. Xin, C. Zheng, Z. Liu, Q. Cao, Y. Huang, J. Xiong, H. Shi, E. Xie, J. Yin, Z. Li, and X. Liang. “LEGO-Prover: Neural Theorem Proving with Growing Libraries”. In:International Conference on Learning Representations (ICLR). 2024

2024
[26]

Z. Z. Ren, Z. Shao, J. Song, H. Xin, H. Wang, W. Zhao, L. Zhang, Z. Fu, Q. Zhu, D. Yang, Z. F. Wu, Z. Gou, S. Ma, H. Tang, Y. Liu, W. Gao, D. Guo, and C. Ruan.DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition. 2025. arXiv:2504.21801

work page arXiv 2025
[27]

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

Y. Lin, S. Tang, B. Lyu, Z. Yang, J.-H. Chung, H. Zhao, L. Jiang, Y. Geng, J. Ge, J. Sun, J. Wu, J. Gesi, X. Lu, D. Acuna, K. Yang, H. Lin, Y. Choi, D. Chen, S. Arora, and C. Jin. “Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction”. In: International Conference on Learning Representations (ICLR). 2026. arX...

work page arXiv 2026
[28]

Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning

Z. Li, Z. Li, W. Tang, X. Zhang, Y. Yao, X. Si, F. Yang, K. Yang, and X. Ma. “Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning”. In:International Conference on Learning Representations (ICLR). 2025

2025
[29]

J. Chen, W. Chen, J. Du, J. Hu, Z. Jiang, A. Jie, X. Jin, X. Jin, C. Li, W. Shi, Z. Wang, M. Wang, C. Wei, S. Wei, H. Xin, F. Yang, W. Gao, Z. Yuan, T. Zhan, Z. Zheng, T. Zhou, and T. H. Zhu. Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience
[30]

H. Wang, M. Unsal, X. Lin, M. Baksys, J. Liu, M. D. Santos, F. Sung, M. Vinyes, Z. Ying, Z. Zhu, J. Lu, H. de Saxcé, B. Bailey, C. Song, C. Xiao, D. Zhang, E. Zhang, F. Pu, H. Zhu, J. Liu, J. Bayer, J. Michel, L. Yu, L. Dreyfus-Schmidt, L. Tunstall, L. Pagani, M. Machado, P. Bourigault, R. Wang, S. Polu, T. Barroyer, W.-D. Li, Y. Niu, Y. Fleureau, Y. Hu, ...
[31]

Hariharan, C

S. Hariharan, C. Birkbeck, S. Lee, H. K. G. Ma, B. Mehta, A. Poiroux, and M. Viazovska.A Milestone in Formalization: The Sphere Packing Problem in Dimension 8. 2026. arXiv:2604. 23468

2026
[32]

Y. Li, D. Du, L. Song, C. Li, W. Wang, T. Yang, and H. Mi.HunyuanProver: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving. 2024. arXiv: 2412.20735. 20 AI co-mathematician: Accelerating mathematicians with agentic AI

work page arXiv 2024
[33]

ProofBridge: Auto- Formalization of Natural Language Proofs in Lean via Joint Embeddings

P. Jana, K. Kale, A. E. Tanriverdi, C. Song, S. Vishwanath, and V. Ganesh. “ProofBridge: Auto- Formalization of Natural Language Proofs in Lean via Joint Embeddings”. In:International Conference on Learning Representations (ICLR). 2025

2025
[34]

D. Chen, E. Chen, K. Lau, K. Ono, and J. Zhang.Parity of𝑘-differentials in genus zero and one
[35]

J. Liu, Z. Zhou, Z. Zhu, M. Dos Santos, W. He, J. Liu, R. Wang, Y. Xie, J. Zhao, Q. Wang, L. Zhi, J. Li, and W. Li.Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics. 2026. arXiv:2601.14027

work page arXiv 2026
[36]

Del Tredici, J

M. Del Tredici, J. McCarran, B. Breen, J. A. Mijares, D. Englund, and F. Koppens.Ax-Prover: Agentic LEAN Proving with LLMs and MCP-based Verifiers. ICLR 2026 Conference Desk Rejected Submission / Preprint. 2025

2026
[37]

Aristotle: Imo-level automated theorem proving.arXiv preprint arXiv:2510.01346,

T. Achim, A. J. Best, A. Bietti, K. Der, M. Fédérico, S. Gukov, D. Halpern-Leistner, K. Hen- ningsgard, Y. Kudryashov, A. Meiburg, M. Michelsen, R. Patterson, E. Rodriguez, L. Scharff, V. Shanker, V. Sicca, H. Sowrirajan, A. Swope, M. Tamas, V. Tenev, J. Thomm, H. Williams, and L. Wu.Aristotle: IMO-level Automated Theorem Proving. 2025. arXiv:2510.01346

work page arXiv 2025
[38]

D. P. Woodruff, V. Cohen-Addad, L. Jain, J. Mao, S. Zuo, M. Bateni, S. Brânzei, M. P. Brenner, L. Chen, Y. Feng, L. Fortnow, G. Fu, Z. Guan, Z. Hadizadeh, M. Hajiaghayi, M. JafariRaviz, A. Javanmard, K. C. S., K.-i. Kawarabayashi, R. Kumar, S. Lattanzi, E. Lee, Y. Li, I. Panageas, D. Paparas, B. Przybocki, B. Subercaseaux, O. Svensson, S. Taherijam, X. Wu...

work page arXiv 2026
[39]

Bubeck, C

S. Bubeck, C. Coester, R. Eldan, T. Gowers, Y. T. Lee, A. Lupsasca, M. Sawhney, R. Scherrer, M. Sellke, B. K. Spears, D. Unutmaz, K. Weil, S. Yin, and N. Zhivotovskiy.Early Science Acceleration Experiments with GPT-5. 2025. arXiv:2511.16072

work page arXiv 2025
[40]

Alexeev, M

B. Alexeev, M. Putterman, M. Sawhney, M. Sellke, and G. Valiant.Short Proofs in Combinatorics and Number Theory. 2026. arXiv:2603.29961

work page arXiv 2026
[41]

Short proofs in combinatorics, probability and number theory II

B. Alexeev, M. Putterman, M. Sawhney, M. Sellke, and G. Valiant.Short Proofs in Combinatorics, Probability and Number Theory II. 2026. arXiv:2604.06609

work page internal anchor Pith review Pith/arXiv arXiv 2026
[42]

Towards an AI co-scientist

J. Gottweis, W.-H. Weng, A. Daryin, T. Tu, A. Palepu, P. Sirkovic, A. Myaskovsky, F. Weis- senberger, K. Rong, R. Tanno, K. Saab, D. Popovici, J. Blum, F. Zhang, K. Chou, A. Hassidim, B. Gokturk, A. Vahdat, P. Kohli, Y. Matias, A. Carroll, K. Kulkarni, N. Tomasev, Y. Guan, V. Dhillon, E. D. Vaishnav, B. Lee, T. R. D. Costa, J. R. Penadés, G. Peltz, Y. Xu,...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

https://antigravity.google/

Google.Google Antigravity. https://antigravity.google/. Accessed: 2026-04-27. 2025

2026
[44]

https://www.anthropic.com/product/claude- code

Anthropic.Claude Code. https://www.anthropic.com/product/claude- code . Ac- cessed: 2026-04-27. 2025

2026
[45]

https://openai.com/index/introducing-codex/

OpenAI.OpenAI Codex. https://openai.com/index/introducing-codex/. Accessed: 2026-04-27. 2021

2026
[46]

Mitchener, A

L. Mitchener, A. Yiu, B. Chang, M. Bourdenx, T. Nadolski, A. Sulovari, E. C. Landsness, D. L. Barabasi, S. Narayanan, N. Evans, S. Reddy, M. Foiani, A. Kamal, L. P. Shriver, F. Cao, A. T. Wassie, J. M. Laurent, E. Melville-Green, M. Caldas, A. Bou, K. F. Roberts, S. Zagorac, T. C. Orr, M. E. Orr, K. J. Zwezdaryk, A. E. Ghareeb, L. McCoy, B. Gomes, E. A. A...

work page arXiv 2025
[47]

On Proof and Progress in Mathematics

W. P. Thurston. “On Proof and Progress in Mathematics”. In:Bulletin of the American Mathe- matical Society30.2 (1994), pp. 161–177.doi:10.1090/S0273-0979-1994-00502-6

work page doi:10.1090/s0273-0979-1994-00502-6 1994
[48]

What Is Mathematical Truth?

H. Putnam. “What Is Mathematical Truth?” In:Mathematics, Matter and Method: Philosophical Papers, Volume 1. Cambridge University Press, 1975, pp. 60–78

1975
[49]

J. W. Dauben.Georg Cantor: His Mathematics and Philosophy of the Infinite. Harvard University Press, 1979

1979
[50]

Moving furniture through a hallway

L. Moser. “Moving furniture through a hallway”. In:SIAM Review8.3 (1966), p. 381

1966
[51]

Georgiev, J

B. Georgiev, J. Gómez-Serrano, T. Tao, and A. Z. Wagner.Mathematical Exploration and Discovery at Scale. 2025. arXiv:2511.02864

work page arXiv 2025
[52]

Towards Robust Mathematical Reasoning

T. Luong, D. Hwang, H. H. Nguyen, G. Ghiasi, Y. Chervonyi, I. Seo, J. Kim, G. Bingham, J. Lee, S. Mishra, A. Zhai, C. H. Hu, H. Michalewski, J. Kim, J. Ahn, J. Bae, X. Song, T. H. Trinh, Q. V. Le, and J. Jung. “Towards Robust Mathematical Reasoning”. In:Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. arXiv:2511.01846

work page arXiv 2025
[53]

Glazer, E

E. Glazer, E. Erdil, T. Besiroglu, D. Chicharro, E. Chen, A. Gunning, C. F. Olsson, J.-S. Denain, A. Ho, E. de Oliveira Santos, O. Järviniemi, M. Barnett, R. Sandler, M. Vrzala, J. Sevilla, Q. Ren, E. Pratt, L. Levine, G. Barkley, N. Stewart, B. Grechuk, T. Grechuk, S. V. Enugandla, and M. Wildon.FrontierMath: A Benchmark for Evaluating Advanced Mathemati...
[54]

Tsoukalas, J

G. Tsoukalas, J. Lee, J. Jennings, J. Xin, M. Ding, M. Jennings, A. Thakur, and S. Chaudhuri. PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition
[55]

Measuring Mathematical Problem Solving With the MATH Dataset

D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, and J. Steinhardt. “Measuring Mathematical Problem Solving With the MATH Dataset”. In:Advances in Neural Information Processing Systems34 (2021), pp. 29157–29169

2021
[56]

Training Verifiers to Solve Math Word Problems

K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano, C. Hesse, and J. Schulman.Training Verifiers to Solve Math Word Problems. 2021. arXiv:2110.14168

work page internal anchor Pith review Pith/arXiv arXiv 2021
[57]

Luong and E

T. Luong and E. Lockhart.Advanced version of Gemini with Deep Think officially achieves gold- medal standard at the International Mathematical Olympiad.https://deepmind.google/ blog/advanced-version-of-gemini-with-deep-think-officially-achieves- gold-medal-standard-at-the-international-mathematical-olympiad/ .Google DeepMind Blog. July 2025

2025
[58]

Abouzaid, A

M. Abouzaid, A. J. Blumberg, M. Hairer, J. Kileel, T. G. Kolda, P. D. Nelson, D. Spielman, N. Srivastava, R. Ward, S. Weinberger, and L. Williams.First Proof. 2026. arXiv:2602.05192

work page arXiv 2026
[59]

Blitzer, S

N.Gupta,R.Chatterjee,L.Haas,C.Tao,A.Wang,C.Liu,H.Oiwa,E.Gribovskaya,J.Ackermann, J. Blitzer, S. Goldshtein, and D. Das.DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents. 2026. arXiv:2601.20975

work page arXiv 2026
[60]

Pandit, A

S. Pandit, A. Xu, X.-P. Nguyen, Y. Ming, C. Xiong, and S. Joty.Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math. 2025. arXiv:2510.13744. 22 AI co-mathematician: Accelerating mathematicians with agentic AI

work page arXiv 2025
[61]

Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality

F. Dell’Acqua, E. McFowland III, E. Mollick, H. Lifshitz-Assaf, K. C. Kellogg, S. Rajendran, L. Krayer, F. Candelon, and K. R. Lakhani. “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality”. In:Organization Science37.2 (2026), pp. 403–423.doi:10.12...

work page arXiv 2026
[62]

Accessed: 2026-05-05

Epoch AI.Evaluating Gemini 2.5 Deep Think’s math capabilities.https://epoch.ai/blog/ deep-think-math. Accessed: 2026-05-05. 2025

2026
[63]

Kontorovich.The Shape of Math To Come

A. Kontorovich.The Shape of Math To Come. 2025. arXiv:2510.15924

work page arXiv 2025
[64]

X. Dang, R. Agarwal, R. Porto, A. Goyal, L. H. Fowl, and S. Arora.Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models. 2026. arXiv:2602.16793

work page arXiv 2026
[65]

Accessed: 2026-04-23

People + AI Research.Read smarter, not harder, with Lumi.https://medium.com/people- ai- research/read- smarter- not- harder- with- lumi- 6a1a8210ccc7 . Accessed: 2026-04-23. 2025. 23

2026