pith. machine review for the scientific record. sign in

arxiv: 2604.25155 · v1 · submitted 2026-04-28 · 📡 eess.SP

Recognition: unknown

Rethinking Wireless Communications through Formal Mathematical AI Reasoning

Authors on Pith no claims yet

Pith reviewed 2026-05-07 15:39 UTC · model grok-4.3

classification 📡 eess.SP
keywords wireless communicationsformal AI reasoningmathematical verificationtheorem derivationAI discoverycommunication theorynext-generation networks
0
0 comments X

The pith

Wireless communications is a uniquely structured domain for formal AI mathematical reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The growing complexity of next-generation wireless systems requires increasingly sophisticated mathematical analysis that strains human expert capabilities. This paper argues that wireless communications possesses a mathematical structure that makes it particularly well-suited for formal AI reasoning techniques, including theorem proving and large language model derivations. The authors propose a three-layer framework consisting of verification of existing results, derivation of new theorems, and discovery of novel insights to systematically advance the field. A sympathetic reader would see potential in using AI to scale the creation and validation of wireless theory beyond manual processes. If the claim holds, it points toward hybrid human-AI workflows for establishing mathematical knowledge in communication systems.

Core claim

Wireless communications is a uniquely structured domain for formal AI reasoning. We propose a three-layer framework of verification, derivation, and discovery to rethink how wireless mathematical knowledge is established.

What carries the argument

The three-layer framework of verification (confirming existing theorems), derivation (generating new results from known premises), and discovery (identifying new mathematical structures) for wireless communications mathematics.

If this is right

  • Existing wireless theorems can be systematically verified for correctness by AI systems.
  • New mathematical models for advanced systems such as 6G can be derived more efficiently from known principles.
  • Novel theoretical insights and structures can be discovered through AI exploration of the domain.
  • Mathematical analysis in wireless theory can scale beyond the limits of individual human experts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same layered structure might transfer to other structured engineering domains with rich mathematical foundations, such as control systems or optical communications.
  • AI discovery layers could surface unexpected cross-connections between different wireless techniques that human researchers have not yet linked.
  • Combining the framework with numerical simulators would allow empirical checks of AI-derived derivations in realistic channel conditions.

Load-bearing premise

Wireless communications possesses a uniquely structured mathematical domain that makes formal AI reasoning particularly effective and applicable without loss of domain insight or accuracy.

What would settle it

An AI implementation of the framework that fails to correctly verify a standard result such as the Shannon capacity formula for an AWGN channel or produces a derivation contradicting established wireless theory.

Figures

Figures reproduced from arXiv: 2604.25155 by Abbas Jamalipour, Changyuan Zhao, Dong In Kim, Dusit Niyato, Jiacheng Wang, Shiwen Mao, Xianbin Wang, Zan Li.

Figure 1
Figure 1. Figure 1: The rise of AI mathematical reasoning research. Annual number of publications indexed in Web of Science containing view at source ↗
Figure 2
Figure 2. Figure 2: From wireless mathematical problems to AI-assisted reasoning. (a) Wireless communication theory involves diverse view at source ↗
Figure 3
Figure 3. Figure 3: A role-based LLM reasoning framework for Cram view at source ↗
Figure 5
Figure 5. Figure 5: Step-level diagnostics of LLM reasoning for scenario view at source ↗
read the original abstract

Mathematical analysis has long underpinned wireless communication theory, yet the growing complexity of next-generation systems demands increasingly sophisticated reasoning from domain experts. Recent advances in AI mathematical reasoning, from formal theorem proving to large language model (LLM)-based derivation, offer a promising but largely unexplored path forward. Here we argue that wireless communications is a uniquely structured domain for formal AI reasoning, and propose a three-layer framework of verification, derivation, and discovery to rethink how wireless mathematical knowledge is established.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper argues that mathematical analysis has long underpinned wireless communication theory but that growing complexity in next-generation systems requires more sophisticated reasoning. It claims that wireless communications is a uniquely structured domain for formal AI reasoning and proposes a three-layer framework consisting of verification, derivation, and discovery to rethink how wireless mathematical knowledge is established.

Significance. If the proposed framework could be realized with concrete implementations, it might accelerate the verification and discovery of complex wireless theorems, offering a new methodology for handling the mathematical demands of advanced communication systems. The manuscript, however, advances only a high-level conceptual position without any worked examples, formal derivations, or empirical validation, so its potential significance remains speculative at present.

major comments (2)
  1. [Abstract] Abstract and central argument: the claim that wireless communications is a 'uniquely structured domain' for formal AI reasoning is presented without any comparative analysis to other mathematically rich fields (e.g., control theory or quantum information), which is load-bearing for the proposal's motivation.
  2. [Framework description] Proposed three-layer framework: no concrete application of the verification, derivation, or discovery layers is demonstrated on any specific wireless theorem, equation, or problem, leaving the framework's practicality and domain-specific advantages unassessed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and describe the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract and central argument: the claim that wireless communications is a 'uniquely structured domain' for formal AI reasoning is presented without any comparative analysis to other mathematically rich fields (e.g., control theory or quantum information), which is load-bearing for the proposal's motivation.

    Authors: We agree that the uniqueness claim would be strengthened by explicit comparison. In the revised manuscript we will add a short subsection in the introduction that contrasts wireless communications with control theory and quantum information. The comparison will highlight domain-specific features such as the combination of continuous-time stochastic processes, finite-blocklength information-theoretic bounds, and multi-objective resource allocation under hardware constraints, which collectively create a structured setting particularly amenable to formal verification and derivation. This addition will clarify the motivation without altering the core argument. revision: yes

  2. Referee: [Framework description] Proposed three-layer framework: no concrete application of the verification, derivation, or discovery layers is demonstrated on any specific wireless theorem, equation, or problem, leaving the framework's practicality and domain-specific advantages unassessed.

    Authors: The manuscript is intentionally positioned as a conceptual position paper. To directly address the request for concrete grounding, we will insert a new section containing three brief, self-contained illustrative examples: (i) verification of the AWGN capacity formula using a formal theorem-proving sketch, (ii) derivation of a simple diversity-multiplexing tradeoff bound, and (iii) a discovery-style hypothesis generation for a new relationship in massive MIMO under imperfect CSI. These examples will remain at a level consistent with the paper's scope and will not include full code or exhaustive proofs, but they will demonstrate how each layer operates on a familiar wireless result. Full empirical validation and large-scale implementations are acknowledged as future work. revision: partial

Circularity Check

0 steps flagged

No circularity: conceptual position paper without derivations or self-referential reductions

full rationale

The manuscript is a high-level perspective piece that argues wireless communications is uniquely suited to formal AI reasoning and proposes a three-layer framework (verification, derivation, discovery). No equations, derivations, fitted parameters, or load-bearing self-citations appear in the text. The central claim is an assertion about domain structure rather than a result obtained by reducing to prior inputs or definitions within the paper itself. As a result, no step reduces by construction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are stated or implied in the abstract; the work is purely conceptual.

pith-pipeline@v0.9.0 · 5386 in / 947 out tokens · 46670 ms · 2026-05-07T15:39:39.288529+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

93 extracted references · 30 canonical work pages · 6 internal anchors

  1. [1]

    A mathematical theory of communication,

    C. E. Shannon, “A mathematical theory of communication,”The Bell system technical journal, vol. 27, no. 3, pp. 379–423, 1948

  2. [2]

    Massive MIMO networks: Spectral, energy, and hardware efficiency,

    E. Bj ¨ornson, J. Hoydis, and L. Sanguinetti, “Massive MIMO networks: Spectral, energy, and hardware efficiency,”Foundations and Trends ˆW in Signal Processing, vol. 11, no. 3-4, pp. 154–655, 2017

  3. [3]

    Couillet and M

    R. Couillet and M. Debbah,Random matrix methods for wireless communications. Cambridge University Press, 2011

  4. [4]

    A tractable approach to coverage and rate in cellular networks,

    J. G. Andrews, F. Baccelli, and R. K. Ganti, “A tractable approach to coverage and rate in cellular networks,”IEEE Transactions on communications, vol. 59, no. 11, pp. 3122–3134, 2011

  5. [5]

    Haenggi,Stochastic geometry for wireless networks

    M. Haenggi,Stochastic geometry for wireless networks. Cambridge University Press, 2013. 11

  6. [6]

    Integrated sensing and communications: Toward dual-functional wire- less networks for 6G and beyond,

    F. Liu, Y . Cui, C. Masouros, J. Xu, T. X. Han, Y . C. Eldar, and S. Buzzi, “Integrated sensing and communications: Toward dual-functional wire- less networks for 6G and beyond,”IEEE journal on selected areas in communications, vol. 40, no. 6, pp. 1728–1767, 2022

  7. [7]

    A survey on fundamental limits of integrated sensing and communication,

    A. Liu, Z. Huang, M. Li, Y . Wan, W. Li, T. X. Han, C. Liu, R. Du, D. K. P. Tan, J. Luet al., “A survey on fundamental limits of integrated sensing and communication,”IEEE Communications Surveys & Tutorials, vol. 24, no. 2, pp. 994–1034, 2022

  8. [8]

    Stochastic geometry and random graphs for the analysis and design of wireless networks,

    M. Haenggi, J. G. Andrews, F. Baccelli, O. Dousse, and M. Franceschetti, “Stochastic geometry and random graphs for the analysis and design of wireless networks,”IEEE journal on selected areas in communications, vol. 27, no. 7, pp. 1029–1046, 2009

  9. [9]

    An introduction to convex optimization for communications and signal processing,

    Z.-Q. Luo and W. Yu, “An introduction to convex optimization for communications and signal processing,”IEEE Journal on selected areas in communications, vol. 24, no. 8, pp. 1426–1438, 2006

  10. [10]

    A survey of recent advances in optimization methods for wireless communications,

    Y .-F. Liu, T.-H. Chang, M. Hong, Z. Wu, A. M.-C. So, E. A. Jorswieck, and W. Yu, “A survey of recent advances in optimization methods for wireless communications,”IEEE Journal on Selected Areas in Communications, vol. 42, no. 11, pp. 2992–3031, 2024

  11. [11]

    Tse and P

    D. Tse and P. Viswanath,Fundamentals of wireless communication. Cambridge university press, 2005

  12. [12]

    T. M. Cover,Elements of information theory. John Wiley & Sons, 1999

  13. [13]

    The lean 4 theorem prover and program- ming language,

    L. d. Moura and S. Ullrich, “The lean 4 theorem prover and program- ming language,” inInternational Conference on Automated Deduction. Springer, 2021, pp. 625–635

  14. [14]

    Bertot and P

    Y . Bertot and P. Cast ´eran,Interactive theorem proving and program development: Coq’Art: the calculus of inductive constructions. Springer Science & Business Media, 2013

  15. [15]

    Hypertree proof search for neural theorem proving,

    G. Lample, T. Lacroix, M.-A. Lachaux, A. Rodriguez, A. Hayat, T. Lavril, G. Ebner, and X. Martinet, “Hypertree proof search for neural theorem proving,”Advances in neural information processing systems, vol. 35, pp. 26 337–26 349, 2022

  16. [16]

    Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data.arXiv preprint arXiv:2405.14333, 2024

    H. Xin, D. Guo, Z. Shao, Z. Ren, Q. Zhu, B. Liu, C. Ruan, W. Li, and X. Liang, “Deepseek-prover: Advancing theorem proving in llms through large-scale synthetic data,”arXiv preprint arXiv:2405.14333, 2024

  17. [17]

    Tora: A tool-integrated reasoning agent for mathematical problem solving.arXiv preprint arXiv:2309.17452,

    Z. Gou, Z. Shao, Y . Gong, Y . Shen, Y . Yang, M. Huang, N. Duan, and W. Chen, “Tora: A tool-integrated reasoning agent for mathematical problem solving,”arXiv preprint arXiv:2309.17452, 2023

  18. [18]

    Alphaintegrator: Trans- former action search for symbolic integration proofs,

    M. ¨Unsal, T. Gehr, and M. Vechev, “Alphaintegrator: Trans- former action search for symbolic integration proofs,”arXiv preprint arXiv:2410.02666, 2024

  19. [19]

    Solving olympiad geometry without human demonstrations,

    T. H. Trinh, Y . Wu, Q. V . Le, H. He, and T. Luong, “Solving olympiad geometry without human demonstrations,”Nature, vol. 625, no. 7995, pp. 476–482, 2024

  20. [20]

    Olympiad- level formal mathematical reasoning with reinforcement learning,

    T. Hubert, R. Mehta, L. Sartran, M. Z. Horv ´ath, G. ˇZuˇzi´c, E. Wieser, A. Huang, J. Schrittwieser, Y . Schroecker, H. Masoomet al., “Olympiad- level formal mathematical reasoning with reinforcement learning,”Na- ture, pp. 1–3, 2025

  21. [21]

    Jiang, Jia Deng, Stella Biderman, and Sean Welleck

    Z. Azerbayev, H. Schoelkopf, K. Paster, M. D. Santos, S. McAleer, A. Q. Jiang, J. Deng, S. Biderman, and S. Welleck, “Llemma: An open language model for mathematics,”arXiv preprint arXiv:2310.10631, 2023

  22. [22]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . Li, Y . Wuet al., “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,”arXiv preprint arXiv:2402.03300, 2024

  23. [23]

    Solv- ing quantitative reasoning problems with language models,

    A. Lewkowycz, A. Andreassen, D. Dohan, E. Dyer, H. Michalewski, V . Ramasesh, A. Slone, C. Anil, I. Schlag, T. Gutman-Soloet al., “Solv- ing quantitative reasoning problems with language models,”Advances in neural information processing systems, vol. 35, pp. 3843–3857, 2022

  24. [24]

    Measuring Mathematical Problem Solving With the MATH Dataset

    D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, and J. Steinhardt, “Measuring mathematical problem solving with the math dataset,”arXiv preprint arXiv:2103.03874, 2021

  25. [25]

    Olympiadbench: A challenging benchmark for promot- ing agi with olympiad-level bilingual multimodal scientific problems,

    C. He, R. Luo, Y . Bai, S. Hu, Z. Thai, J. Shen, J. Hu, X. Han, Y . Huang, Y . Zhanget al., “Olympiadbench: A challenging benchmark for promot- ing agi with olympiad-level bilingual multimodal scientific problems,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 3828– 3850

  26. [26]

    MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

    K. Zheng, J. M. Han, and S. Polu, “Minif2f: a cross-system benchmark for formal olympiad-level mathematics,”arXiv preprint arXiv:2109.00110, 2021

  27. [27]

    Wireless- mathbench: A mathematical modeling benchmark for llms in wireless communications,

    X. Li, M. Liu, L. Wei, J. An, M. A. Debbah, and C. Yuen, “Wireless- mathbench: A mathematical modeling benchmark for llms in wireless communications,” inFindings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 10 984–11 009

  28. [28]

    AI-driven research in pure mathematics and theoretical physics,

    Y .-H. He, “AI-driven research in pure mathematics and theoretical physics,”Nature Reviews Physics, vol. 6, no. 9, pp. 546–553, 2024

  29. [29]

    J. G. Proakis and M. Salehi,Digital communications. McGraw-hill New York, 2001, vol. 4

  30. [30]

    M. K. Simon and M.-S. Alouini,Digital communication over fading channels. John Wiley & Sons, 2004

  31. [31]

    How much training is needed in multiple-antenna wireless links?

    B. Hassibi and B. M. Hochwald, “How much training is needed in multiple-antenna wireless links?”IEEE Transactions on Information Theory, vol. 49, no. 4, pp. 951–963, 2003

  32. [32]

    Statistical signal processing: estimation theory

    S. M. Kay, “Statistical signal processing: estimation theory.”

  33. [33]

    Optimal multiuser trans- mit beamforming: A difficult problem with a simple solution structure [lecture notes],

    E. Bj ¨ornson, M. Bengtsson, and B. Ottersten, “Optimal multiuser trans- mit beamforming: A difficult problem with a simple solution structure [lecture notes],”IEEE Signal Processing Magazine, vol. 31, no. 4, pp. 142–148, 2014

  34. [34]

    An iteratively weighted mmse approach to distributed sum-utility maximization for a mimo interfering broadcast channel,

    Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “An iteratively weighted mmse approach to distributed sum-utility maximization for a mimo interfering broadcast channel,”IEEE Transactions on Signal Processing, vol. 59, no. 9, pp. 4331–4340, 2011

  35. [35]

    Capacity of multi-antenna gaussian channels,

    E. Telatar, “Capacity of multi-antenna gaussian channels,”European transactions on telecommunications, vol. 10, no. 6, pp. 585–595, 1999

  36. [36]

    On limits of wireless communications in a fading environment when using multiple antennas,

    G. J. Foschini and M. J. Gans, “On limits of wireless communications in a fading environment when using multiple antennas,”Wireless personal communications, vol. 6, no. 3, pp. 311–335, 1998

  37. [37]

    Channel coding rate in the finite blocklength regime,

    Y . Polyanskiy, H. V . Poor, and S. Verd ´u, “Channel coding rate in the finite blocklength regime,”IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2307–2359, 2010

  38. [38]

    The wire-tap channel,

    A. D. Wyner, “The wire-tap channel,”Bell system technical journal, vol. 54, no. 8, pp. 1355–1387, 1975

  39. [39]

    Liu and W

    R. Liu and W. Trappe,Securing wireless communications at the physical layer. Springer, 2010, vol. 7

  40. [40]

    A simple and general parameterization quantifying performance in fading channels,

    Z. Wang and G. B. Giannakis, “A simple and general parameterization quantifying performance in fading channels,”IEEE Transactions on Communications, vol. 51, no. 8, pp. 1389–1398, 2003

  41. [41]

    Modeling and analysis of k-tier downlink heterogeneous cellular networks,

    H. S. Dhillon, R. K. Ganti, F. Baccelli, and J. G. Andrews, “Modeling and analysis of k-tier downlink heterogeneous cellular networks,”IEEE Journal on Selected Areas in Communications, vol. 30, no. 3, pp. 550– 560, 2012

  42. [42]

    Dual methods for nonconvex spectrum optimization of multicarrier systems,

    W. Yu and R. Lui, “Dual methods for nonconvex spectrum optimization of multicarrier systems,”IEEE Transactions on communications, vol. 54, no. 7, pp. 1310–1322, 2006

  43. [43]

    A tutorial on decomposition methods for network utility maximization,

    D. P. Palomar and M. Chiang, “A tutorial on decomposition methods for network utility maximization,”IEEE Journal on Selected Areas in Communications, vol. 24, no. 8, pp. 1439–1451, 2006

  44. [44]

    Srikant and L

    R. Srikant and L. Ying,Communication networks: An optimization, control and stochastic networks perspective. Cambridge University Press, 2014

  45. [45]

    Interference alignment and degrees of freedom of thek-user interference channel,

    V . R. Cadambe and S. A. Jafar, “Interference alignment and degrees of freedom of thek-user interference channel,”IEEE transactions on information theory, vol. 54, no. 8, pp. 3425–3441, 2008

  46. [46]

    Dynamic spectrum management: Complexity and duality,

    Z.-Q. Luo and S. Zhang, “Dynamic spectrum management: Complexity and duality,”IEEE journal of selected topics in signal processing, vol. 2, no. 1, pp. 57–73, 2008

  47. [47]

    Neely,Stochastic network optimization with application to commu- nication and queueing systems

    M. Neely,Stochastic network optimization with application to commu- nication and queueing systems. Morgan & Claypool Publishers, 2010

  48. [48]

    Effective capacity: a wireless link model for support of quality of service,

    D. Wu and R. Negi, “Effective capacity: a wireless link model for support of quality of service,”IEEE Transactions on wireless commu- nications, vol. 2, no. 4, pp. 630–643, 2003

  49. [49]

    Noncooperative cellular wireless with unlimited num- bers of base station antennas,

    T. L. Marzetta, “Noncooperative cellular wireless with unlimited num- bers of base station antennas,”IEEE transactions on wireless communi- cations, vol. 9, no. 11, pp. 3590–3600, 2010

  50. [50]

    El Gamal and Y .-H

    A. El Gamal and Y .-H. Kim,Network information theory. Cambridge university press, 2011

  51. [51]

    Capacity of fading channels with channel side information,

    A. J. Goldsmith and P. P. Varaiya, “Capacity of fading channels with channel side information,”IEEE transactions on information theory, vol. 43, no. 6, pp. 1986–1992, 2002

  52. [52]

    Baccelli and B

    F. Baccelli and B. Błaszczyszyn,Stochastic geometry and wireless networks. Now Publishers Inc, 2009

  53. [53]

    A survey of deep learning for mathematical reasoning,

    P. Lu, L. Qiu, W. Yu, S. Welleck, and K.-W. Chang, “A survey of deep learning for mathematical reasoning,” inProceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers), 2023, pp. 14 605–14 631

  54. [54]

    org/abs/2412.16075

    K. Yang, G. Poesia, J. He, W. Li, K. Lauter, S. Chaudhuri, and D. Song, “Formal mathematical reasoning: A new frontier in ai,”arXiv preprint arXiv:2412.16075, 2024. 12

  55. [55]

    A survey on mathematical reasoning and optimization with large language models,

    A. Forootani, “A survey on mathematical reasoning and optimization with large language models,”arXiv preprint arXiv:2503.17726, 2025

  56. [56]

    Leandojo: Theorem proving with retrieval-augmented language models,

    K. Yang, A. Swope, A. Gu, R. Chalamala, P. Song, S. Yu, S. Godil, R. J. Prenger, and A. Anandkumar, “Leandojo: Theorem proving with retrieval-augmented language models,”Advances in Neural Information Processing Systems, vol. 36, pp. 21 573–21 612, 2023

  57. [57]

    Goedel-prover: A frontier model for open-source automated theorem proving.arXiv preprint arXiv:2502.07640, 2025

    Y . Lin, S. Tang, B. Lyu, J. Wu, H. Lin, K. Yang, J. Li, M. Xia, D. Chen, S. Aroraet al., “Goedel-prover: A frontier model for open- source automated theorem proving,”arXiv preprint arXiv:2502.07640, 2025

  58. [58]

    Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

    H. Wang, M. Unsal, X. Lin, M. Baksys, J. Liu, M. D. Santos, F. Sung, M. Vinyes, Z. Ying, Z. Zhuet al., “Kimina-prover preview: Towards large formal reasoning models with reinforcement learning,”arXiv preprint arXiv:2504.11354, 2025

  59. [59]

    Mathsensei: a tool-augmented large language model for mathematical reasoning,

    D. Das, D. Banerjee, S. Aditya, and A. Kulkarni, “Mathsensei: a tool-augmented large language model for mathematical reasoning,” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024, pp. 942–966

  60. [60]

    MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

    L. Yu, W. Jiang, H. Shi, J. Yu, Z. Liu, Y . Zhang, J. T. Kwok, Z. Li, A. Weller, and W. Liu, “Metamath: Bootstrap your own mathematical questions for large language models,”arXiv preprint arXiv:2309.12284, 2023

  61. [61]

    Math-shepherd: Verify and reinforce llms step-by-step without human annotations,

    P. Wang, L. Li, Z. Shao, R. Xu, D. Dai, Y . Li, D. Chen, Y . Wu, and Z. Sui, “Math-shepherd: Verify and reinforce llms step-by-step without human annotations,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 9426–9439

  62. [62]

    Nipkow, M

    T. Nipkow, M. Wenzel, and L. C. Paulson,Isabelle/HOL: a proof assistant for higher-order logic. Springer, 2002

  63. [63]

    Growing mathlib: maintenance of a large scale math- ematical library,

    A. Baanen, M. R. Ballard, J. Commelin, B. G.-g. Chen, M. Rothgang, and D. Testa, “Growing mathlib: maintenance of a large scale math- ematical library,” inInternational Conference on Intelligent Computer Mathematics. Springer, 2025, pp. 51–70

  64. [64]

    Generative lan- guage modeling for automated theorem proving.arXiv preprint arXiv:2009.03393, 2020

    S. Polu and I. Sutskever, “Generative language modeling for automated theorem proving,”arXiv preprint arXiv:2009.03393, 2020

  65. [65]

    LISA: Language models of isabelle proofs,

    A. Q. Jiang, W. Li, J. M. Han, and Y . Wu, “LISA: Language models of isabelle proofs,” in6th Conference on Artificial Intelligence and Theorem Proving, 2021, pp. 378–392

  66. [66]

    Ege Erdil and Tamay Besiroglu

    K. Dong and T. Ma, “Stp: Self-play llm theorem provers with iterative conjecturing and proving,”arXiv preprint arXiv:2502.00212, 2025

  67. [67]

    Abel: Sample efficient online reinforcement learning for neural theorem proving,

    F. Gloeckle, J. Limperg, G. Synnaeve, and A. Hayat, “Abel: Sample efficient online reinforcement learning for neural theorem proving,” in The 4th Workshop on Mathematical Reasoning and AI at NeurIPS’24, 2024

  68. [68]

    arXiv preprint arXiv:2310.00656 , year=

    H. Wang, H. Xin, C. Zheng, L. Li, Z. Liu, Q. Cao, Y . Huang, J. Xiong, H. Shi, E. Xieet al., “Lego-prover: Neural theorem proving with growing libraries,”arXiv preprint arXiv:2310.00656, 2023

  69. [69]

    Jiang et al.Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs

    A. Q. Jiang, S. Welleck, J. P. Zhou, W. Li, J. Liu, M. Jamnik, T. Lacroix, Y . Wu, and G. Lample, “Draft, sketch, and prove: Guiding formal the- orem provers with informal proofs,”arXiv preprint arXiv:2210.12283, 2022

  70. [70]

    Proving olympiad algebraic inequal- ities without human demonstrations,

    C. Wei, M. Sun, and W. Wang, “Proving olympiad algebraic inequal- ities without human demonstrations,”Advances in Neural Information Processing Systems, vol. 37, pp. 82 811–82 822, 2024

  71. [71]

    MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning

    E. Karpas, O. Abend, Y . Belinkov, B. Lenz, O. Lieber, N. Ratner, Y . Shoham, H. Bata, Y . Levine, K. Leyton-Brownet al., “Mrkl systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning,”arXiv preprint arXiv:2205.00445, 2022

  72. [72]

    Linc: A neurosymbolic approach for logical reasoning by combining language models with first-order logic provers,

    T. Olausson, A. Gu, B. Lipkin, C. Zhang, A. Solar-Lezama, J. Tenen- baum, and R. Levy, “Linc: A neurosymbolic approach for logical reasoning by combining language models with first-order logic provers,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 5153–5176

  73. [73]

    Large language models meet symbolic provers for logical reasoning evaluation,

    C. Qi, R. Ma, B. Li, H. Du, B. Hui, J. Wu, Y . Laili, and C. He, “Large language models meet symbolic provers for logical reasoning evaluation,”arXiv preprint arXiv:2502.06563, 2025

  74. [74]

    Pal: Program-aided language models,

    L. Gao, A. Madaan, S. Zhou, U. Alon, P. Liu, Y . Yang, J. Callan, and G. Neubig, “Pal: Program-aided language models,” inInternational conference on machine learning. PMLR, 2023, pp. 10 764–10 799

  75. [75]

    Satlm: Satisfiability-aided language models using declarative prompting,

    X. Ye, Q. Chen, I. Dillig, and G. Durrett, “Satlm: Satisfiability-aided language models using declarative prompting,”Advances in Neural Information Processing Systems, vol. 36, pp. 45 548–45 580, 2023

  76. [76]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022

  77. [77]

    To cot or not to cot? chain- of-thought helps mainly on math and symbolic reasoning.arXiv preprint arXiv:2409.12183, 2024

    Z. Sprague, F. Yin, J. D. Rodriguez, D. Jiang, M. Wadhwa, P. Singhal, X. Zhao, X. Ye, K. Mahowald, and G. Durrett, “To cot or not to cot? chain-of-thought helps mainly on math and symbolic reasoning,”arXiv preprint arXiv:2409.12183, 2024

  78. [78]

    Injecting numerical reasoning skills into language models,

    M. Geva, A. Gupta, and J. Berant, “Injecting numerical reasoning skills into language models,” inProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 946–958

  79. [79]

    Injecting numerical reasoning skills into knowledge base question answering models,

    Y . Feng, J. Zhang, X. Zhang, L. Liu, C. Li, and H. Chen, “Injecting numerical reasoning skills into knowledge base question answering models,”arXiv preprint arXiv:2112.06109, 2021

  80. [80]

    Self-Consistency Improves Chain of Thought Reasoning in Language Models

    X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdh- ery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,”arXiv preprint arXiv:2203.11171, 2022

Showing first 80 references.