AIBuildAI-2: A Knowledge-Enhanced Agent for Automatically Building AI Models
Pith reviewed 2026-06-29 12:30 UTC · model grok-4.3
The pith
AIBuildAI-2 equips an AI agent with an external hierarchical knowledge system to automatically build high-performing AI models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AIBuildAI-2 achieves state-of-the-art results by using a hierarchical knowledge system that organizes AI development knowledge into high-level instructions and low-level documents, allowing the agent to dynamically load relevant context and evolve the system from its own experience, leading to a 70.7% medal rate on MLE-Bench and top placement in human competitions.
What carries the argument
The hierarchical knowledge system, which stores high-level knowledge instructions over topical categories and low-level knowledge documents, enabling dynamic retrieval of only the context relevant to the current state and task.
If this is right
- The agent produces model designs grounded in concrete expertise rather than internal parameters alone.
- It ranks first on MLE-Bench with a 70.7% medal rate.
- It places in the top 6.6% among 4,370 human-expert teams in a heart disease prediction competition.
- The knowledge system evolves by distilling completed runs into structured takeaways added back to the system.
Where Pith is reading between the lines
- The approach could allow domain scientists to build custom AI models without deep engineering skills.
- Similar external knowledge mechanisms might improve agents in other areas like code generation or scientific simulation.
- Over time the evolving knowledge could create a compounding advantage as more tasks are completed.
- Retrieval of specific documents might reduce hallucinations in design choices compared to pure LLM prompting.
Load-bearing premise
The external knowledge system supplies concrete, externally verifiable expertise that the agent can dynamically retrieve and apply to produce measurably better model designs and implementations than an LLM relying only on its internal parameters.
What would settle it
Running AIBuildAI-2 without access to the external knowledge system on the same MLE-Bench tasks and finding no improvement in medal rate or competition ranking.
read the original abstract
AI models underpin data-centric applications from image and text processing to scientific discovery in biology, physics, and chemistry. Yet developing them remains heavily manual, requiring practitioners to design architectures, build training pipelines, and iteratively refine solutions, making it challenging for natural scientists without specialized AI engineering expertise to build the high-performing models their research demands. To reduce this burden and broaden access to AI for scientific discovery, agents that automatically build AI models have been proposed. However, the performance of these agents is largely limited by the parametric knowledge of their underlying large language models, which is static, often outdated, and sparse on practical AI model engineering know-how. To address this limitation, we introduce AIBuildAI-2, a knowledge-enhanced agent with an external, evolving knowledge system for automatically building AI models. The knowledge system of AIBuildAI-2 is hierarchical, organizing curated AI development knowledge into high-level knowledge instructions over topical categories and low-level knowledge documents under each category, from which the agent dynamically loads only the context relevant to its current state and the AI task being solved, grounding each design and implementation decision in concrete, externally verifiable expertise. The system is initialized by collecting and cleaning AI-development-related documents from the web and organizing them into the corresponding categories, and continually evolves from the agent's own experience by distilling each completed run on an AI task into structured takeaways that are written back into the knowledge system. AIBuildAI-2 achieves state-of-the-art results, ranking first on MLE-Bench with a 70.7% medal rate and placing in the top 6.6% among 4,370 human-expert teams in a heart disease prediction competition.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AIBuildAI-2, a knowledge-enhanced agent equipped with a hierarchical external knowledge system (high-level instructions over topical categories and low-level documents) that is initialized from web-sourced AI-development documents and evolves by distilling agent experience into structured takeaways. The agent dynamically retrieves relevant context to ground design and implementation decisions. The central empirical claim is that this yields state-of-the-art performance: first place on MLE-Bench (70.7% medal rate) and top 6.6% among 4,370 human-expert teams in a heart-disease prediction competition.
Significance. If the external knowledge system can be shown to supply concrete, verifiable expertise that causally improves model-building outcomes beyond base LLM prompting, the work would address a genuine bottleneck in applying AI to scientific domains and could meaningfully broaden access for non-AI experts. The reported competition rankings, if substantiated with proper controls, would constitute a strong empirical result.
major comments (3)
- [Abstract] Abstract: The performance claims (70.7% medal rate on MLE-Bench; top 6.6% ranking) are stated without any description of experimental protocol, baselines, statistical tests, error bars, or ablation studies isolating the contribution of the external knowledge system versus the base LLM or other agent components. This absence directly undermines evaluation of the central hypothesis that the knowledge system supplies externally verifiable expertise driving the measured gains.
- [Abstract] Abstract: The initialization ('collecting and cleaning AI-development-related documents from the web') and evolution ('distilling each completed run into structured takeaways') mechanisms are described only at a high level; no details are given on curation criteria, deduplication, relevance scoring for dynamic retrieval, or how the hierarchical structure prevents context overload or hallucinated retrieval.
- [Abstract] Abstract: No evidence is supplied that the knowledge documents contain concrete, externally verifiable expertise (e.g., specific hyperparameter heuristics, architecture patterns, or implementation pitfalls) rather than generic or already-internalized LLM knowledge; without such grounding or controlled comparison, the weakest assumption of the work remains untested.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and commit to revisions that strengthen the presentation of our evaluation and knowledge system.
read point-by-point responses
-
Referee: [Abstract] Abstract: The performance claims (70.7% medal rate on MLE-Bench; top 6.6% ranking) are stated without any description of experimental protocol, baselines, statistical tests, error bars, or ablation studies isolating the contribution of the external knowledge system versus the base LLM or other agent components. This absence directly undermines evaluation of the central hypothesis that the knowledge system supplies externally verifiable expertise driving the measured gains.
Authors: We agree that the abstract's brevity omits key evaluation details. The full manuscript contains a dedicated Experiments section describing the MLE-Bench protocol, baselines (including vanilla LLM agents and prior agent systems), statistical tests, error bars from multiple runs, and ablations isolating the knowledge component. We will revise the abstract to add a concise clause referencing the evaluation protocol and directing readers to the Experiments section for baselines, ablations, and statistical details. revision: yes
-
Referee: [Abstract] Abstract: The initialization ('collecting and cleaning AI-development-related documents from the web') and evolution ('distilling each completed run into structured takeaways') mechanisms are described only at a high level; no details are given on curation criteria, deduplication, relevance scoring for dynamic retrieval, or how the hierarchical structure prevents context overload or hallucinated retrieval.
Authors: The abstract intentionally summarizes at a high level. The full paper provides the requested details in Section 3: curation criteria and deduplication during initialization (3.1), relevance scoring and dynamic retrieval (3.4), and hierarchical organization to limit context length and reduce hallucination risk (3.2). We will revise the abstract to include a brief parenthetical reference to these mechanisms and their implementation sections. revision: yes
-
Referee: [Abstract] Abstract: No evidence is supplied that the knowledge documents contain concrete, externally verifiable expertise (e.g., specific hyperparameter heuristics, architecture patterns, or implementation pitfalls) rather than generic or already-internalized LLM knowledge; without such grounding or controlled comparison, the weakest assumption of the work remains untested.
Authors: We accept that the abstract alone does not demonstrate this. The manuscript already includes concrete examples of knowledge documents (e.g., specific hyperparameter schedules and architecture pitfalls for tabular and image tasks) in Section 4 and the appendix, together with ablation results showing gains over base LLM prompting. In revision we will add an explicit subsection with side-by-side excerpts contrasting retrieved knowledge against typical LLM-internal knowledge and will strengthen the controlled comparisons already present in the Experiments section. revision: yes
Circularity Check
No circularity: external knowledge system presented as independent input
full rationale
The abstract describes an external hierarchical knowledge system initialized from web documents and updated from agent experience, with dynamic retrieval of relevant context to ground decisions. No equations, fitted parameters, self-referential definitions, or self-citation chains are present that would reduce the SOTA performance claims (70.7% medal rate, top 6.6% ranking) to the inputs by construction. The mechanism is framed as supplying externally verifiable expertise distinct from the base LLM's parametric knowledge, making the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Lourdes Agapito, Tamara Berg, Jana Kosecka, and Lihi Zelnik-Manor, editors,Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016
2016
-
[2]
A survey of the usages of deep learn- ing for natural language processing.IEEE transactions on neural networks and learning systems, 32(2):604–624, 2020
Daniel W Otter, Julian R Medina, and Jugal K Kalita. A survey of the usages of deep learn- ing for natural language processing.IEEE transactions on neural networks and learning systems, 32(2):604–624, 2020
2020
-
[3]
Karan Singhal, Shekoofeh Azizi, Tao Tu, S. Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, Perry Payne, Martin Seneviratne, Paul Gamble, Chris Kelly, Abubakr Babiker, Nathanael Schärli, Aakanksha Chowdhery, Philip Mansfield, Dina Demner-Fushman, Blaise Agüera y Arcas, Dale Web- ster, Greg S. Corrad...
-
[4]
doi:10.1038/s41586-023-06291-2
-
[5]
Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Mohamed Amin, Le Hou, Kevin Clark, Stephen R. Pfohl, Heather Cole-Lewis, Darlene Neal, Qazi Mamunur Rashid, Mike Schaekermann, Amy Wang, Dev Dash, Jonathan H. Chen, Nigam H. Shah, Sami Lachgar, Philip Andrew Mansfield, Sushant Prakash, Bradley Green, Ewa Domi- nowska, Blaise Agüera y Arca...
-
[6]
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman...
-
[7]
Hanchen Wang, Tianfan Fu, Yuanqi Du, Wenhao Gao, Kexin Huang, Ziming Liu, Payal Chandak, Shengchao Liu, Peter Van Katwyk, Andreea Deac, Anima Anandkumar, Karianne Bergen, Carla P . Gomes, Shirley Ho, Pushmeet Kohli, Joan Lasenby, Jure Leskovec, Tie- Y an Liu, Arjun Manrai, Debora Marks, Bharath Ramsundar, Le Song, Jimeng Sun, Jian Tang, Petar Veliˇckovi´c...
-
[8]
Stokes, Kevin Y ang, Kyle Swanson, Wengong Jin, Andres Cubillos-Ruiz, Nina M
Jonathan M. Stokes, Kevin Y ang, Kyle Swanson, Wengong Jin, Andres Cubillos-Ruiz, Nina M. Donghia, Craig R. MacNair, Shawn French, Lindsey A. Carfrae, Zohar Bloom- Ackermann, Victoria M. Tran, Anush Chiappino-Pepe, Ahmed H. Badran, Ian W. Andrews, Emma J. Chory, George M. Church, Eric D. Brown, Tommi S. Jaakkola, Regina Barzilay, and James J. Collins. A d...
-
[9]
Felix Wong, Erica J. Zheng, Jacqueline A. Valeri, Nina M. Donghia, Melis N. Anahtar, Sato- taka Omori, Alicia Li, Andres Cubillos-Ruiz, Aarti Krishnan, Wengong Jin, Abigail L. Man- son, Jens Friedrichs, Ralf Helbig, Behnoush Hajian, Dawid K. Fiejtek, Florence F . Wagner, Holly H. Soutter, Ashlee M. Earl, Jonathan M. Stokes, Lars D. Renner, and James J. Co...
-
[10]
Springer, 2007
Alan M Turing.Computing machinery and intelligence, pages 23–65. Springer, 2007
2007
-
[11]
Machine learning: Trends, perspectives, and prospects.Science, 349(6245):255–260, 2015
Michael I Jordan and Tom M Mitchell. Machine learning: Trends, perspectives, and prospects.Science, 349(6245):255–260, 2015
2015
-
[12]
Probabilistic machine learning and artificial intelligence.Nature, 521 (7553):452–459, 2015
Zoubin Ghahramani. Probabilistic machine learning and artificial intelligence.Nature, 521 (7553):452–459, 2015
2015
-
[13]
Quantum machine learning.Nature, 549(7671):195–202, 2017
Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum machine learning.Nature, 549(7671):195–202, 2017
2017
-
[14]
Random search for hyper-parameter optimization
James Bergstra and Y oshua Bengio. Random search for hyper-parameter optimization. Journal of machine learning research, 13(2), 2012
2012
-
[15]
Hidden technical debt in machine learning systems
D Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Y oung, Jean-Francois Crespo, and Dan Dennison. Hidden technical debt in machine learning systems. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors,Proceedings of the International Confer- ence on Neu...
2015
-
[16]
Springer, 2019
Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren.Automated machine learning: meth- ods, systems, challenges. Springer, 2019
2019
-
[17]
Re-thinking data strat- egy and integration for artificial intelligence: concepts, opportunities, and challenges.Ap- plied Sciences, 13(12):7082, 2023
Abdulaziz Aldoseri, Khalifa N Al-Khalifa, and Abdel Magid Hamouda. Re-thinking data strat- egy and integration for artificial intelligence: concepts, opportunities, and challenges.Ap- plied Sciences, 13(12):7082, 2023
2023
-
[18]
The limits of fair medical imaging ai in real-world generalization.Nature medicine, 30(10):2838– 2848, 2024
Yuzhe Y ang, Haoran Zhang, Judy W Gichoya, Dina Katabi, and Marzyeh Ghassemi. The limits of fair medical imaging ai in real-world generalization.Nature medicine, 30(10):2838– 2848, 2024
2024
-
[19]
AI research agents for machine learning: Search, exploration, and generalization in MLE-bench
Edan Toledo, Karen Hambardzumyan, Martin Josifoski, RISHI HAZRA, Nicolas Baldwin, Alexis Audran-Reiss, Michael Kuchnik, Despoina Magka, Minqi Jiang, Alisia Maria Lupidi, Andrei Lupu, Roberta Raileanu, Tatiana Shavrina, Kelvin Niu, Jean-Christophe Gagnon- Audet, Michael Shvartsman, Shagun Sodhani, Alexander H Miller, Abhishek Charnalia, Derek Dunfield, Car...
2025
-
[20]
Shangheng Du, Xiangchao Y an, Dengyang Jiang, Jiakang Yuan, Yusong Hu, Xin Li, Liang He, Bo Zhang, and Lei Bai. AutoMLGen: Navigating fine-grained optimization for coding agents.arXiv preprint arXiv:2510.08511, 2025
-
[21]
AIDE: AI-Driven Exploration in the Space of Code
Zhengyao Jiang, Dominik Schmidt, Dhruv Srikanth, Dixing Xu, Ian Kaplan, Deniss Ja- cenko, and Yuxiang Wu. AIDE: AI-Driven exploration in the space of code.arXiv preprint arXiv:2502.13138, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[22]
Xu Y ang, Xiao Y ang, Shikai Fang, Yifei Zhang, Jian Wang, Bowen Xian, Qizheng Li, Jingyuan Li, Minrui Xu, Yuante Li, Haoran Pan, Yuge Zhang, Weiqing Liu, Y elong Shen, Weizhu Chen, and Jiang Bian. R&D-Agent: An LLM-Agent framework towards autonomous data science.arXiv preprint arXiv:2505.14738, 2025
-
[23]
AIBuildAI: An AI Agent for Automatically Building AI Models
Ruiyi Zhang, Peijia Qin, Qi Cao, Li Zhang, and Pengtao Xie. AIBuildAI: An AI agent for automatically building AI models.arXiv preprint arXiv:2604.14455, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[24]
MLE-bench: Evaluating machine learning agents on machine learning engineering,
Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Aleksander Madry, and Lilian Weng. MLE-bench: Evaluating machine learning agents on machine learning engineering,
-
[25]
International Conference on Learning Representations (ICLR)
-
[26]
Kaggle: Y our machine learning and data science community.https://www
Kaggle. Kaggle: Y our machine learning and data science community.https://www. kaggle.com. Accessed: 2026-05-20
2026
-
[27]
Mle-bench leaderboard (commit c5631ba).https://github.com/openai/ mle-bench/tree/c5631ba61ceeb0573235a6ce209db435327a1e84, 2026
OpenAI. Mle-bench leaderboard (commit c5631ba).https://github.com/openai/ mle-bench/tree/c5631ba61ceeb0573235a6ce209db435327a1e84, 2026. Ac- cessed: 2026-03-18
2026
-
[28]
Retrieval-augmented generation for knowledge-intensive NLP tasks.Ad- vances in Neural Information Processing Systems, 2020
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks.Ad- vances in Neural Information Processing Systems, 2020
2020
-
[29]
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG
Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, and Athanasios V. Vasi- lakos. Agentic retrieval-augmented generation: A survey on agentic RAG.arXiv preprint arXiv:2501.09136, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[30]
MARS: Modular Agent with Reflective Search for Automated AI Research
Jiefeng Chen, Bhavana Dalvi Mishra, Jaehyun Nam, Rui Meng, Tomas Pfister, and Jinsung Y oon. MARS: Modular agent with reflective search for automated AI research.arXiv preprint arXiv:2602.02660, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[31]
The FM agent.arXiv preprint arXiv:2510.26144, 2025
Annan Li, Chufan Wu, Zengle Ge, Y ee Hin Chong, Zhinan Hou, Lizhe Cao, Cheng Ju, Jianmin Wu, Huaiming Li, Haobo Zhang, Shenghao Feng, Mo Zhao, Fengzhi Qiu, Rui Y ang, Mengmeng Zhang, Wenyi Zhu, Yingying Sun, Quan Sun, Shunhao Y an, Danyu Liu, Dawei Yin, and Dou Shen. The FM agent.arXiv preprint arXiv:2510.26144, 2025
-
[32]
Zexi Liu, Yuzhu Cai, Xinyu Zhu, Yujie Zheng, Runkun Chen, Ying Wen, Y anfeng Wang, Weinan E, and Siheng Chen. ML-Master: Towards AI-for-AI via integration of exploration and reasoning.arXiv preprint arXiv:2506.16499, 2025
-
[33]
Alireza Nadafian, Alireza Mohammadshahi, and Majid Y azdani. KAPSO: A knowledge- grounded framework for autonomous program synthesis and optimization.arXiv preprint arXiv:2601.21526, 2026
-
[34]
InternAgent Team, Bo Zhang, Shiyang Feng, Xiangchao Y an, Jiakang Yuan, Runmin Ma, Yusong Hu, Zhiyin Yu, Xiaohan He, Songtao Huang, et al. InternAgent: When agent be- comes the scientist—building closed-loop system from hypothesis to verification.arXiv preprint arXiv:2505.16938, 2025
-
[35]
CatBoost: Unbiased boosting with categorical features.Advances in Neural Information Processing Systems, 31, 2018
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. CatBoost: Unbiased boosting with categorical features.Advances in Neural Information Processing Systems, 31, 2018
2018
-
[36]
Xgboost: A scalable tree boosting system, 2016
Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system, 2016
2016
-
[37]
Lightgbm: a highly efficient gradient boosting decision tree, 2017
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Y e, and Tie-Y an Liu. Lightgbm: a highly efficient gradient boosting decision tree, 2017
2017
-
[38]
Predicting heart disease (Playground Series S6E2).https://www.kaggle
Kaggle. Predicting heart disease (Playground Series S6E2).https://www.kaggle. com/competitions/playground-series-s6e2/overview, 2026. Accessed: 2026-04-29
2026
-
[39]
Guppy, Stella Lee, and Victor Froelicher
Robert Detrano, Andras Janosi, Walter Steinbrunn, Matthias Pfisterer, Johann-Jakob Schmid, Sarbjit Sandhu, Kern H. Guppy, Stella Lee, and Victor Froelicher. International application of a new probability algorithm for the diagnosis of coronary artery disease.The American Journal of Cardiology, 64(5):304–310, 1989. doi:10.1016/0002-9149(89)90524-9
-
[40]
Ensemble selection from libraries of models
Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes. Ensemble selec- tion from libraries of models.Proceedings of the Twenty-first International Conference on Machine Learning, page 18, 2004. doi:10.1145/1015330.1015432
-
[41]
Burns, Akshat Shirish Zalte, Charlles R
Jackson W. Burns, Akshat Shirish Zalte, Charlles R. A. Abreu, Jochen Sieg, Christian Feld- mann, Miriam Mathea, and William H. Green. Deep learning foundation models from clas- sical molecular descriptors.arXiv preprint arXiv:2506.15792, 2025
-
[42]
Extended-connectivity fingerprints.Journal of Chemical Information and Modeling, 50(5):742–754, 2010
David Rogers and Mathew Hahn. Extended-connectivity fingerprints.Journal of Chemical Information and Modeling, 50(5):742–754, 2010. doi:10.1021/ci100050t
-
[43]
RDKit: Open-source cheminformatics software
Greg Landrum and The RDKit Contributors. RDKit: Open-source cheminformatics software. https://www.rdkit.org, 2024. Accessed: 2026-04-29
2024
-
[44]
Mordred: A molecular descriptor calculator.Journal of Cheminformatics, 10(1):4, 2018
Hirotomo Moriwaki, Yu-Shi Tian, Norihito Kawashita, and Tatsuya Takagi. Mordred: A molecular descriptor calculator.Journal of Cheminformatics, 10(1):4, 2018. doi:10.1186/ s13321-018-0258-y
2018
-
[45]
A software package for sequential quadratic programming
Dieter Kraft. A software package for sequential quadratic programming. Tech. Rep. DFVLR- FB 88-28, DFVLR, Institut für Dynamik der Flugsysteme, Oberpfaffenhofen, Germany, 1988
1988
-
[46]
Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik
Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Con- nor W. Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik. Therapeutics data commons: Machine learning datasets and tasks for drug discovery and development.Advances in Neural Information Processing Systems Datasets and Benchmarks Track, 2021
2021
-
[47]
OpenADMET ExpansionRx blind challenge.https://huggingface
OpenADMET. OpenADMET ExpansionRx blind challenge.https://huggingface. co/spaces/openadmet/OpenADMET-ExpansionRx-Challenge, 2026. Ac- cessed: 2026-04-29
2026
-
[48]
SMILES, a chemical language and information system
David Weininger. SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules.Journal of Chemical Information and Computer Sciences, 28(1):31–36, 1988. doi:10.1021/ci00057a005
-
[49]
Schoenholz, Patrick F
Justin Gilmer, Samuel S. Schoenholz, Patrick F . Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry.Proceedings of the 34th International Conference on Machine Learning (ICML), pages 1263–1272, 2017
2017
-
[50]
Kevin Y ang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, Andrew Palmer, Volker Set- tels, Tommi Jaakkola, Klavs Jensen, and Regina Barzilay. Analyzing learned molecular Zhanget al.| AIBuildAI-2 9 representations for property prediction.Journal of Chemical Information and...
-
[51]
Zhaoping Xiong, Dingyan Wang, Xiaohong Liu, Feisheng Zhong, Xiaozhe Wan, Xu- tong Li, Zhaojun Li, Xiaomin Luo, Kaixian Chen, Hualiang Jiang, and Mingyue Zheng. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism.Journal of Medicinal Chemistry, 63(16):8749–8760, 2020. doi: 10.1021/acs.jmedchem.9b00959
-
[52]
Joseph L. Durant, Burton A. Leland, Douglas R. Henry, and James G. Nourse. Reoptimiza- tion of MDL keys for use in drug discovery.Journal of Chemical Information and Computer Sciences, 42(6):1273–1280, 2002. doi:10.1021/ci010132r
-
[53]
Xinyu Zhu, Yuzhu Cai, Zexi Liu, Bingyang Zheng, Cheng Wang, Rui Y e, Yuzhi Zhang, Linfeng Zhang, Weinan E, Siheng Chen, and Y anfeng Wang. Toward ultra-long-horizon agentic science: Cognitive accumulation for machine learning engineering.arXiv preprint arXiv:2601.10402, 2026
-
[54]
Automated design of agentic systems.International Conference on Learning Representations, 2025
Shengran Hu, Cong Lu, and Jeff Clune. Automated design of agentic systems.International Conference on Learning Representations, 2025
2025
-
[55]
Rosser and Jakob Nicolaus Foerster
J. Rosser and Jakob Nicolaus Foerster. Agentbreeder: Mitigating the ai safety risks of multi- agent scaffolds via self-improvement.Advances in Neural Information Processing Systems, 2025. Methods Problem formulation.We formalize automated AI model development as the task of constructing a runnable AI solution from a task description and a dataset. The inp...
2025
-
[56]
Rafael Martí, Mauricio G. C. Resende, and Celso C. Ribeiro. Multi-start methods for combi- natorial optimization.European Journal of Operational Research, 226(1):1–8, 2013
2013
-
[57]
ReAct: Synergizing reasoning and acting in language models.International Confer- ence on Learning Representations, 2023
Shunyu Y ao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models.International Confer- ence on Learning Representations, 2023
2023
-
[58]
Agent skills: An open standard for extending AI agent capabilities.https: //agentskills.io/home, 2025
Agent Skills. Agent skills: An open standard for extending AI agent capabilities.https: //agentskills.io/home, 2025. Accessed: 2026-05-20
2025
-
[59]
Introducing Agent Skills.https://claude.com/blog/skills, 2025
Anthropic. Introducing Agent Skills.https://claude.com/blog/skills, 2025. Ac- cessed: 2026-04-27
2025
-
[60]
OpenAI skills.https://openai.com/academy/skills/, 2025
OpenAI. OpenAI skills.https://openai.com/academy/skills/, 2025. Accessed: 2026-05-20
2025
-
[61]
PyTorch: An impera- tive style, high-performance deep learning library.Advances in Neural Information Process- ing Systems, 2019
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Y ang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chil- amkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An impera- tive style, high-per...
2019
-
[62]
Hugging Face: The ai community building the future.https:// huggingface.co
Hugging Face. Hugging Face: The ai community building the future.https:// huggingface.co. Accessed: 2026-05-20
2026
-
[63]
Scikit-learn: Machine learning in Python.Journal of Machine Learn- ing Research, 12:2825–2830, 2011
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Van- derplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: Machine learning in Python.Journal of Machine Learn- ing Rese...
2011
-
[64]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Y acine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State- of-the-a...
-
[65]
GitHub: Build software better, together.https://github.com
GitHub. GitHub: Build software better, together.https://github.com. Accessed: 2026-05-20
2026
-
[66]
arXiv.org: Open-access archive for scholarly articles.https://arxiv.org
arXiv. arXiv.org: Open-access archive for scholarly articles.https://arxiv.org. Ac- cessed: 2026-05-20
2026
-
[67]
Andrei Z. Broder. On the resemblance and containment of documents.Proceedings of the Compression and Complexity of Sequences, pages 21–29, 1997. doi:10.1109/SEQUEN. 1997.666900
-
[68]
Claude opus 4.7 system card.https://anthropic.com/ claude-opus-4-7-system-card, 2026
Anthropic. Claude opus 4.7 system card.https://anthropic.com/ claude-opus-4-7-system-card, 2026. Accessed: 2026-04-27. 12 Zhanget al.| AIBuildAI-2
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.