Recognition: unknown
pAI/MSc: ML Theory Research with Humans on the Loop
Pith reviewed 2026-05-09 23:57 UTC · model grok-4.3
The pith
A modular multi-agent system reduces the human steering needed to produce ML theory manuscripts by orders of magnitude.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
pAI/MSc is a customizable, modular multi-agent system that, given a hypothesis, produces a literature-grounded, mathematically established, experimentally supported, and submission-oriented manuscript draft with orders of magnitude less human steering than traditional workflows.
What carries the argument
The modular multi-agent architecture in pAI/MSc that distributes tasks across specialized agents for literature retrieval, mathematical reasoning, code execution for experiments, and text generation, all under human supervision.
Load-bearing premise
That the current capabilities of large language models and agent coordination can accurately and reliably execute the steps of literature review, mathematical proof construction, and experimental validation with only minimal human corrections.
What would settle it
Running the system on a well-known ML theory hypothesis and having domain experts review the output draft for accuracy in citations, mathematical correctness, and experimental validity to determine if it meets submission standards without extensive revisions.
Figures
read the original abstract
We present pAI/MSc, an open-source, customizable, modular multi-agent system for academic research workflows. Our goal is not autonomous scientific ideation, nor fully automated research. It is narrower and more practical: to reduce by orders of magnitude the human steering required to turn a specified hypothesis into a literature-grounded, mathematically established, experimentally supported, submission-oriented manuscript draft. pAI/MSc is built with a current emphasis on machine learning theory and adjacent quantitative fields.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents pAI/MSc, an open-source, customizable, modular multi-agent system for academic research workflows with emphasis on machine learning theory. The stated goal is to reduce by orders of magnitude the human steering required to convert a specified hypothesis into a literature-grounded, mathematically established, experimentally supported, submission-oriented manuscript draft while keeping humans in the loop; the manuscript describes the system architecture but supplies no implementation details, experiments, or metrics.
Significance. If the claimed reduction in human intervention were demonstrated while preserving output quality, the system could meaningfully increase research throughput in quantitative fields. The open-source and modular design is a strength that would support reproducibility and extension by the community. In its current form, however, the manuscript offers only a high-level system description without evidence, so any significance assessment remains prospective.
major comments (1)
- Abstract: The central claim of an 'orders of magnitude' reduction in human steering for literature grounding, mathematical establishment, experiment design, and manuscript assembly is unsupported by any quantitative data, logged intervention counts, user studies, baseline comparisons, or worked examples. This renders the claim an untested design goal rather than a demonstrated property of the system.
minor comments (1)
- The manuscript would benefit from a dedicated section detailing the agent roles, communication protocols, and customization interfaces, as these are referenced only at a high level in the abstract.
Simulated Author's Rebuttal
Thank you for reviewing our manuscript on pAI/MSc. We appreciate your assessment of its potential significance and agree that additional clarification is needed regarding the system's claimed capabilities. We provide a point-by-point response to the major comment below.
read point-by-point responses
-
Referee: Abstract: The central claim of an 'orders of magnitude' reduction in human steering for literature grounding, mathematical establishment, experiment design, and manuscript assembly is unsupported by any quantitative data, logged intervention counts, user studies, baseline comparisons, or worked examples. This renders the claim an untested design goal rather than a demonstrated property of the system.
Authors: We thank the referee for this observation. The manuscript indeed presents pAI/MSc primarily as a system architecture and design, with the reduction in human steering stated as the core objective enabled by its customizable multi-agent framework. No quantitative evaluations, such as intervention counts or user studies, are included because the current work focuses on describing the system rather than evaluating its performance metrics. We will revise the abstract and relevant sections to explicitly characterize the 'orders of magnitude' reduction as a design goal and intended benefit, rather than a demonstrated result. Additionally, we will expand on implementation details, provide worked examples of the workflow where possible, and outline plans for future empirical validation to address this concern. revision: yes
Circularity Check
No derivations, predictions, or equations; system description paper has no circularity
full rationale
The manuscript is a descriptive account of an open-source multi-agent architecture for research assistance. It states design goals (reducing human steering by orders of magnitude) but supplies no equations, fitted parameters, uniqueness theorems, self-citations used as load-bearing premises, or renamings of empirical patterns. The central claim is an untested assertion about future performance rather than a derivation that reduces to its own inputs. No load-bearing step matches any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Mathematical discoveries from program search with large language models
Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli, and Alhussein Fawzi. Mathematical discoveries from program search with large language models.Nature, 625(7995):468–475, 2024. doi: 10.1038/s41586-023-06924...
-
[2]
Funsearch
GoogleDeepMind. Funsearch. GitHubrepository, 2023. URLhttps://github.com/google-deepmind/ funsearch. Repository accompanying the FunSearch Nature paper; accessed 2026-03-21
2023
-
[3]
Alexander Novikov, Ngân V˜ u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wag- ner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. Alphaevolve: A coding agent for scientific and al...
work page internal anchor Pith review doi:10.48550/arxiv.2506.13131 2025
-
[4]
Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, and Adam Zsolt Wagner. Mathematical explo- ration and discovery at scale.arXiv preprint arXiv:2511.02864, 2025. doi: 10.48550/arXiv.2511.02864. URLhttps://arxiv.org/abs/2511.02864
-
[5]
Mathematical problem repository for alphaevolve
Google DeepMind. Mathematical problem repository for alphaevolve. GitHub repository, 2025. URL https://github.com/google-deepmind/alphaevolve_repository_of_problems. Repository accom- panying the Mathematical exploration and discovery at scale preprint; accessed 2026-03-21
2025
-
[6]
Reinforced Generation of Combinatorial Structures: Ramsey Numbers
Ansh Nagda, Prabhakar Raghavan, and Abhradeep Thakurta. Reinforced generation of combinatorial structures: Ramsey numbers.arXiv preprint arXiv:2603.09172, 2026. doi: 10.48550/arXiv.2603.09172. URLhttps://arxiv.org/abs/2603.09172
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.09172 2026
-
[7]
Donald E. Knuth. Claude’s cycles. Informal note / PDF on Knuth’s preprints page, February 2026. URL https://cs.stanford.edu/~knuth/papers/claude-cycles.pdf. Dated 2026-02-28; revised 2026-03- 16
2026
-
[8]
Thestoryoferdősproblem#1026
TerenceTao. Thestoryoferdősproblem#1026. BlogpostonWhat’s New, December2025. URLhttps: //terrytao.wordpress.com/2025/12/08/the-story-of-erdos-problem-126/. Published 2025-12- 08
2025
-
[9]
Mohammed Abouzaid, Andrew J. Blumberg, Martin Hairer, Joe Kileel, Tamara G. Kolda, Paul D. Nelson, Daniel Spielman, Nikhil Srivastava, Rachel Ward, Shmuel Weinberger, and Lauren Williams. First proof.arXiv preprint arXiv:2602.05192, 2026. doi: 10.48550/arXiv.2602.05192. URLhttps: //arxiv.org/abs/2602.05192
-
[10]
First batch
First Proof Project. First batch. Project website, February 2026. URLhttps://1stproof.org/first- batch.html. First-batch page; site lists February 2026 release context; accessed 2026-03-21
2026
-
[11]
Our first proof submissions
OpenAI. Our first proof submissions. OpenAI research page, February 2026. URLhttps://openai. com/index/first-proof-submissions/. Published 2026-02-20
2026
-
[12]
Wenlin Zhang and Haobo Ma. Lean 4 formal verification of 8/10 #1stproof problems: Complete proofs with ai–human pipeline, partial qed for q4 & q6. Zenodo preprint, February 2026. URLhttps: //zenodo.org/records/18635744. Created 2026-02-13. Zenodo also lists a second record with the same title and metadata at DOI 10.5281/zenodo.18635110
-
[13]
Advancing science and math with gpt-5.2
OpenAI. Advancing science and math with gpt-5.2. OpenAI publication, December 2025. URLhttps: //openai.com/index/gpt-5-2-for-science-and-math. Published 2025-12-11. 17
2025
-
[14]
On learning-curve monotonicity for maximum likelihood estimators,
Mark Sellke and Steven Yin. On learning-curve monotonicity for maximum likelihood estimators.arXiv preprint arXiv:2512.10220, 2025. doi: 10.48550/arXiv.2512.10220. URLhttps://arxiv.org/abs/ 2512.10220
-
[15]
Introducing gauss, an agent for autoformalization
Math, Inc. Introducing gauss, an agent for autoformalization. Company blog post, n.d.. URLhttps: //www.math.inc/gauss. Undated page; accessed 2026-03-21
2026
-
[16]
Strong pnt
Math, Inc. Strong pnt. Project page, n.d.. URLhttps://math-inc.github.io/strongpnt/. Undated page; accessed 2026-03-21
2026
-
[17]
strongpnt
Math, Inc. strongpnt. GitHub repository, n.d.. URLhttps://github.com/math-inc/strongpnt. Repository for the Strong PNT formalization; accessed 2026-03-21
2026
-
[18]
Gauss – an agentic formalization of the prime number theorem
Jared Duker Lichtman. Gauss – an agentic formalization of the prime number theorem. Fields In- stitute talk page, October 2025. URLhttps://www.fields.utoronto.ca/talks/Gauss-agentic- formalization-Prime-Number-Theorem. Talk date: 2025-10-28
2025
-
[19]
Nat Sothanaphan. Resolution of erdős problem #728: a writeup of aristotle’s lean proof.arXiv preprint arXiv:2601.07421, 2026. doi: 10.48550/arXiv.2601.07421. URLhttps://arxiv.org/abs/2601.07421
-
[20]
Today marks a momentous milestone for ai and mathematics
Harmonic. Today marks a momentous milestone for ai and mathematics. X post, January 2026. URL https://x.com/HarmonicMath/status/2008693723413225814. Posted 2026-01-06; dynamic-source metadata should be rechecked before camera-ready copy if cited in the main text
-
[21]
Thomas F. Bloom. Erdős problem #728. ErdosProblems.com entry, January 2026. URLhttps: //www.erdosproblems.com/728. Page last edited 2026-01-06; accessed 2026-03-21
2026
-
[22]
Thomas F. Bloom. Erdős problem #729. ErdosProblems.com entry, January 2026. URLhttps: //www.erdosproblems.com/729. Page last edited 2026-01-11; accessed 2026-03-21
2026
-
[23]
Thomas F. Bloom. Erdős problem #397. ErdosProblems.com entry, January 2026. URLhttps: //www.erdosproblems.com/397. Page last edited 2026-01-12; accessed 2026-03-21
2026
-
[24]
Erdős problem database
teorth. Erdős problem database. GitHub repository, n.d. URLhttps://github.com/teorth/ erdosproblems. Repository README accessed 2026-03-21
2026
-
[25]
gpt-5 has solved an unsolved mathematical problem,
GIGAZINE. An openai researcher posted that “gpt-5 has solved an unsolved mathematical problem,” but it turned out that the problem had already been solved, leading to ridicule from rival developers, in- cluding google deepmind ceo demis hassabis. News article, October 2025. URLhttps://gigazine.net/ gsc_news/en/20251020-openai-researcher-announced-gpt-5-ma...
-
[26]
Gênant: Openai beweert dat chatgpt wiskundeproblemen oplost, maar dat klopt niet
Erwin Vogelaar. Gênant: Openai beweert dat chatgpt wiskundeproblemen oplost, maar dat klopt niet. Bright.nl news article, October 2025. URLhttps://www.bright.nl/nieuws/1703437/g-nant- openai-beweert-dat-chatgpt-wiskundeproblemen-oplost-maar-dat-klopt-niet.html. Pub- lished 2025-10-20; accessed 2026-03-21
-
[27]
Schwartz
Matthew D. Schwartz. Vibe physics: The AI grad student. Anthropic Science Blog, March 2026. URL https://www.anthropic.com/research/vibe-physics. Accessed: 2026-03-24
2026
-
[28]
Matthew D. Schwartz. Resummation of the c-parameter sudakov shoulder using effective field theory. arXiv preprint arXiv:2601.02484, 2026. doi: 10.48550/arXiv.2601.02484. URLhttps://arxiv.org/ abs/2601.02484
-
[29]
Zachary C. Lipton and Jacob Steinhardt. Troubling trends in machine learning scholarship.Queue, 17 (1), 2019. doi: 10.1145/3317287.3328534. URLhttps://doi.org/10.1145/3317287.3328534. ACM Queue article; multiple secondary indexes report pages 45–77, but page/article-number formatting varies across services, so pages are omitted here deliberately. 18
-
[30]
troubling trends in machine learning scholarship
Andrew Gelman. “troubling trends in machine learning scholarship”. Statistical Modeling, Causal In- ference, and Social Science blog, September 2019. URLhttps://statmodeling.stat.columbia.edu/ 2019/09/30/troubling-trends-in-machine-learning-scholarship/. Blog commentary pointing to Lipton and Steinhardt and discussing hype, “provably” language, and advert...
2019
-
[31]
Improving reproducibility in machine learning research (A report from the NeurIPS 2019 reproducibility program).Journal of Machine Learning Research, 22(164):1–20, 2021
Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Flo- rence d’Alché Buc, Emily Fox, and Hugo Larochelle. Improving reproducibility in machine learning research (A report from the NeurIPS 2019 reproducibility program).Journal of Machine Learning Research, 22(164):1–20, 2021. URLhttps://www.jmlr.org/papers/v22/20...
2019
-
[32]
Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou, Franziska Sofia Hafner, Harry Mayne, Jan Batzner, Negar Foroutan, Chris Schmitz, Karolina Korgul, Hunar Batra, Oishi Deb, Emma Be- harry, Cornelius Emde, Thomas Foster, Anna Gausen, María Grandury, Simeng Han, Valentin Hof- mann, Lujain Ibrahim, Hazel Kim, Hannah Rose Kirk, Fangru Lin, Gabrielle Kaili...
-
[33]
Study identifies weaknesses in how AI systems are evaluated
Oxford Internet Institute. Study identifies weaknesses in how AI systems are evaluated. Press release, November 2025. URLhttps://www.oii.ox.ac.uk/news-events/study-identifies-weaknesses- in-how-ai-systems-are-evaluated/. Press release accompanying the benchmark-validity study; in- cludes quoted claims about unclear definitions, weak methods, and misleadin...
2025
-
[34]
Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaud- hary, Michael Young, Jean-Francois Crespo, and Dan Dennison
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaud- hary, Michael Young, Jean-Francois Crespo, and Dan Dennison. Hidden technical debt in machine learn- ing systems. InAdvances in Neural Information Processing Systems 28, pages 2503–2511, 2015. URL https://papers.nips.cc/paper/5656-hidden-technical-debt-in-mac...
2015
-
[35]
Deep reinforcement learning that matters
Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, pages 3207–3214, 2018. doi: 10.1609/AAAI.V32I1.11694. URLhttps://doi.org/10.1609/ AAAI.V32I1.11694
-
[36]
5 arXivTemplateA PREPRINT Anne M
Nick McGreivy and Ammar Hakim. Weak baselines and reporting biases lead to overoptimism in machinelearningforfluid-relatedpartialdifferentialequations.Nature Machine Intelligence, 6(10):1256– 1269, 2024. doi: 10.1038/s42256-024-00897-5. URLhttps://doi.org/10.1038/s42256-024-00897-5
-
[37]
Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard S. Zemel, Wieland Brendel, Matthias Bethge, and Felix A. Wichmann. Shortcut learning in deep neural networks.Nature Machine In- telligence, 2(11):665–673, 2020. doi: 10.1038/s42256-020-00257-z. URLhttps://doi.org/10.1038/ s42256-020-00257-z
-
[38]
Dennis Ulmer, Christian Hardmeier, and Jes Frellsen. deep-significance — easy and meaningful statisti- cal significance testing in the age of neural networks, 2022. URLhttps://doi.org/10.48550/arXiv. 2204.06815. arXiv preprint; also listed as a contribution to the ML Evaluation Standards Workshop at ICLR 2022 in institutional repositories. 19
work page internal anchor Pith review doi:10.48550/arxiv 2022
-
[39]
Castro, Fabio De Sousa Ribeiro, Ozan Oktay, Melissa McCradden, and Ben Glocker
Charles Jones, Daniel C. Castro, Fabio De Sousa Ribeiro, Ozan Oktay, Melissa McCradden, and Ben Glocker. A causal perspective on dataset bias in machine learning for medical imaging.Nature Machine Intelligence, 6:138–146, 2024. doi: 10.1038/s42256-024-00797-8. URLhttps://doi.org/10.1038/ s42256-024-00797-8
-
[40]
Alex Broadbent and Thomas Grote. Can robots do epidemiology? machine learning, causal inference, and predicting the outcomes of public health interventions.Philosophy & Technology, 35:14, 2022. doi: 10.1007/s13347-022-00509-3. URLhttps://doi.org/10.1007/s13347-022-00509-3. Springer presents this as volume 35, article number 14; issue and expanded page-ran...
-
[41]
Gary S. Collins and Karel G. M. Moons. Reporting of artificial intelligence prediction models.The Lancet, 393(10181):1577–1579, 2019. doi: 10.1016/S0140-6736(19)30037-6. URLhttps://doi.org/ 10.1016/S0140-6736(19)30037-6
-
[42]
AI Snake Oil
Liz Fuller-Wright. “AI Snake Oil”: A Conversation with Princeton AI Experts Arvind Narayanan and Sayash Kapoor. Princeton University News, December 2024. URL https://www.princeton.edu/news/2024/12/18/ai-snake-oil-conversation-princeton-ai- experts-arvind-narayanan-and-sayash-kapoor. Interview/article quoting Narayanan and Kapoor on AI that does not work a...
2024
-
[43]
Autoresearch
Andrej Karpathy. Autoresearch. GitHub repository, 2026. URLhttps://github.com/karpathy/ autoresearch/blob/master/program.md. Repository documentation inprogram.md; accessed 2026- 03-26
2026
-
[44]
Mlagentbench: Evaluating language agents on machine learning experimentation
Qian Huang, Jian Vora, Percy Liang, and Jure Leskovec. Mlagentbench: Evaluating language agents on machine learning experimentation. InForty-first International Conference on Machine Learning, 2024. URLhttps://openreview.net/forum?id=1Fs1LvjYQW
2024
-
[45]
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The AI scientist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292, 2024. doi: 10.48550/arXiv.2408.06292. URLhttps://arxiv.org/abs/2408.06292
work page internal anchor Pith review doi:10.48550/arxiv.2408.06292 2024
-
[46]
Agent laboratory: Using LLM agents as research assistants
Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, and Emad Barsoum. Agent laboratory: Using LLM agents as research assistants. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Findings of the Association for Computational Linguistics: EMNLP 2025, pag...
2025
-
[47]
arXiv preprint arXiv:2505.19955 , year =
Hui Chen, Miao Xiong, Yujie Lu, Wei Han, Ailin Deng, Yufei He, Jiaying Wu, Yibo Li, Yue Liu, and Bryan Hooi. Mlr-bench: Evaluating AI agents on open-ended machine learning research.arXiv preprint arXiv:2505.19955, 2025. doi: 10.48550/arXiv.2505.19955. URLhttps://arxiv.org/abs/2505.19955
-
[48]
Qing Ke, Emilio Ferrara, Filippo Radicchi, and Alessandro Flammini. Defining and identifying sleeping beauties in science.Proceedings of the National Academy of Sciences, 112(24):7426–7431, 2015. doi: 10.1073/pnas.1424329112. URLhttps://www.pnas.org/doi/10.1073/pnas.1424329112
-
[49]
Bibliometrics: The Leiden manifesto for research metrics
Diana Hicks, Paul Wouters, Ludo Waltman, Sarah de Rijcke, and Ismael Rafols. Bibliometrics: The leiden manifesto for research metrics.Nature, 520(7548):429–431, 2015. doi: 10.1038/520429a. URL https://www.nature.com/articles/520429a
-
[50]
The metric tide: Report of the independent review of the role of metrics in research assessment and management
James Wilsdon, Liz Allen, Eleonora Belfiore, Philip Campbell, Stephen Curry, Steven Hill, Richard Jones, Jude Hill, Roger Kain, Ben Johnson, Simon Kerridge, Jane Tinkler, Mike Thelwall, Paul Wouters, 20 and Ian Viney. The metric tide: Report of the independent review of the role of metrics in research assessment and management. Technical report, Higher Ed...
-
[51]
URLhttps://hdl.handle.net/10779/uos.23418680
-
[52]
Michael Fire and Carlos Guestrin. Over-optimization of academic publishing metrics: Observing goodhart’s law in action.GigaScience, 8(6):giz053, 2019. doi: 10.1093/gigascience/giz053. URL https://doi.org/10.1093/gigascience/giz053
-
[53]
Improving factuality and reasoning in language models through multiagent debate
Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. InForty-first international conference on machine learning, 2024
2024
-
[54]
Citebench: A benchmark for scientific citation text generation, 2022
Martin Funkquist, Ilia Kuznetsov, Yufang Hou, and Iryna Gurevych. Citebench: A benchmark for scientific citation text generation, 2022. URLhttps://arxiv.org/abs/2212.09577. Using the arXiv submission year; later bibliographic records may surface under 2023 metadata updates
-
[55]
Chatcite: LLM agent with human workflow guidance for comparative literature summary, 2024
Yutong Li, Lu Chen, Aiwei Liu, Kai Yu, and Lijie Wen. Chatcite: LLM agent with human workflow guidance for comparative literature summary, 2024. URLhttps://arxiv.org/abs/2403.02574
-
[56]
Scholarcopilot: Training large language models for academic writing with accurate citations, 2025
Yubo Wang, Xueguang Ma, Ping Nie, Huaye Zeng, Zhiheng Lyu, Yuxuan Zhang, Benjamin Schneider, Yi Lu, Xiang Yue, and Wenhu Chen. Scholarcopilot: Training large language models for academic writing with accurate citations, 2025. URLhttps://arxiv.org/abs/2504.00824
-
[57]
Surveygen: Quality-aware scien- tific survey generation with large language models, 2025
Tong Bao, Mir Tafseer Nayeem, Davood Rafiei, and Chengzhi Zhang. Surveygen: Quality-aware scien- tific survey generation with large language models, 2025. URLhttps://arxiv.org/abs/2508.17647
-
[58]
Overleafcopilot: Empowering academic writing in Overleaf with large language models, 2024
Haomin Wen, Zhenjie Wei, Yan Lin, Jiyuan Wang, Yuxuan Liang, and Huaiyu Wan. Overleafcopilot: Empowering academic writing in Overleaf with large language models, 2024. URLhttps://arxiv. org/abs/2403.09733
-
[59]
L., Chen, N., Gong, Y ., and He, B
Junyi Hou, Huikai Andre Lin, Nuo Chen, Yiwei Gong, and Bingsheng He. Paperdebugger: A plugin- based multi-agent system for in-editor academic writing, review, and editing, 2025. URLhttps:// arxiv.org/abs/2512.02589
-
[60]
Autonomous LLM-driven research – from data to human-verifiable research papers
Tal Ifargan, Lukas Hafner, Maor Kern, Ori Alcalay, and Roy Kishony. Autonomous LLM-driven research — from data to human-verifiable research papers.NEJM AI, 2(1), 2025. doi: 10.1056/AIoa2400555. URLhttps://ai.nejm.org/doi/10.1056/AIoa2400555
-
[61]
The AI scientist: Towards fully automated open-ended scientific discovery, 2024
Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The AI scientist: Towards fully automated open-ended scientific discovery, 2024. URLhttps://arxiv.org/abs/2408. 06292
2024
-
[62]
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search
Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The AI scientist-v2: Workshop-level automated scientific discovery via agentic tree search, 2025. URLhttps://arxiv.org/abs/2504.08066
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[63]
doi: 10.18653/v1/2025.findings-emnlp.320
Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, and Emad Barsoum. Agent laboratory: Using LLM agents as research assis- tants. InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 5977–6043. Association for Computational Linguistics, 2025. doi: 10.18653/v1/2025.fi...
-
[64]
International Conference on Learning Representations (ICLR) , year =
Yixuan Weng, Minjun Zhu, Guangsheng Bao, Hongbo Zhang, Jindong Wang, Yue Zhang, and Linyi Yang. Cycleresearcher: Improving automated research via automated review, 2024. URLhttps: //arxiv.org/abs/2411.00816. First submitted in 2024; later revised in 2025. 21
-
[65]
Jiabin Tang, Lianghao Xia, Zhonghang Li, and Chao Huang. AI-researcher: Autonomous scientific innovation, 2025. URLhttps://arxiv.org/abs/2505.18705
-
[66]
Samuel Schmidgall and Michael Moor. Agentrxiv: Towards collaborative autonomous research, 2025. URLhttps://arxiv.org/abs/2503.18102
-
[67]
Build your personalized research group: A multiagent framework for continual and interactive science automation,
Ed Li, Junyu Ren, Xintian Pan, Cat Yan, Chuanhao Li, Dirk Bergemann, and Zhuoran Yang. Build your personalized research group: A multiagent framework for continual and interactive science automation,
- [68]
-
[69]
Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, Khaled Saab, Dan Popovici, Jacob Blum, Fan Zhang, Katherine Chou, Avinatan Hassidim, Burak Gokturk, Amin Vahdat, Pushmeet Kohli, Yossi Matias, Andrew Carroll, Kavita Kulkarni, Nenad Tomasev, Yuan Guan, Vi...
work page internal anchor Pith review arXiv 2025
-
[70]
InternAgent Team, Bo Zhang, Shiyang Feng, Xiangchao Yan, Jiakang Yuan, Runmin Ma, Yusong Hu, Zhiyin Yu, Xiaohan He, Songtao Huang, Shaowei Hou, Zheng Nie, Zhilong Wang, Jinyao Liu, Tianshuo Peng, Peng Ye, Dongzhan Zhou, Shufei Zhang, Xiaosong Wang, Yilan Zhang, Meng Li, Zhongying Tu, Xiangyu Yue, Wangli Ouyang, Bowen Zhou, and Lei Bai. Internagent: When a...
-
[71]
Shiyang Feng, Runmin Ma, Xiangchao Yan, Yue Fan, Yusong Hu, Songtao Huang, Shuaiyu Zhang, Zongsheng Cao, Tianshuo Peng, Jiakang Yuan, Zijie Guo, Zhijie Zhong, Shangheng Du, Weida Wang, Jinxin Shi, Yuhao Zhou, Xiaohan He, Zhiyin Yu, Fangchen Yu, Qihao Zheng, Jiamin Wu, Mianxin Liu, Chi Zhang, Shaowei Hou, Shuya Li, Yankai Jiang, Wenjie Lou, Lilong Wang, Zi...
-
[72]
Alexander Novikov, Ngân V˜ u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wag- ner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. Alphaevolve: A coding agent for scientific and al...
work page internal anchor Pith review arXiv 2025
-
[73]
Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, and Adam Zsolt Wagner. Mathematical explo- ration and discovery at scale, 2025. URLhttps://arxiv.org/abs/2511.02864
-
[74]
Deepinnovator: Triggering the innovative capabilities of llms.arXiv preprint arXiv:2602.18920, 2026
Tianyu Fan, Fengji Zhang, Yuxiang Zheng, Bei Chen, Xinyao Niu, Chengen Huang, Junyang Lin, and Chao Huang. Deepinnovator: Triggering the innovative capabilities of LLMs, 2026. URLhttps: //arxiv.org/abs/2602.18920
-
[75]
Lukas Weidener, Marko Brkić, Phillip Lee, Martin Karlsson, Kevin Noessler, and Paul Kohlhaas. From agent-only social networks to autonomous scientific research: Lessons from OpenClaw and Moltbook, andthearchitectureofClawdLabandBeach.Science, 2026. URLhttps://arxiv.org/abs/2602.19810
-
[76]
Unifiedqa: Crossing format boundaries with a single QA system.CoRR, abs/2005.00700, 2020a
KyleLo, LucyLuWang, MarkNeumann, RodneyKinney, andDanielWeld. S2orc: Thesemanticscholar open research corpus. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4969–4983. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020. acl-main.447. URLhttps://aclanthology.org/2020.acl-main.447/. 22
-
[77]
Jason Priem, Heather Piwowar, and Richard Orr. Openalex: A fully-open index of scholarly works, authors, venues, institutions, and concepts, 2022. URLhttps://arxiv.org/abs/2205.01833
-
[78]
URL https://aclantholo gy.org/2023.acl-long.557/
Tianyu Gao, Howard Yen, Jiatong Yu, and Danqi Chen. Enabling large language models to generate text with citations. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6465–6488. Association for Computational Linguistics, 2023. doi: 10.18653/v1/2023. emnlp-main.398. URLhttps://aclanthology.org/2023.emnlp-main.398/
-
[79]
Michael D. Skarlinski, Sam Cox, Jon M. Laurent, James D. Braza, Michaela Hinks, Michael J. Ham- merling, Manvitha Ponnapati, Samuel G. Rodriques, and Andrew D. White. Language agents achieve superhuman synthesis of scientific knowledge, 2024. URLhttps://arxiv.org/abs/2409.13740
-
[80]
Elicit: AI for scientific research, n.d
Elicit. Elicit: AI for scientific research, n.d. URLhttps://orion.elicit.com/. Undated product site; accessed 2026-03-21
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.