Optimizing Expert-Designed Crystal Graph Networks for Band-Gap Prediction with an Autonomous LLM Research Loop

Boris I. Yakobson; Chenmu Zhang

arxiv: 2606.29717 · v1 · pith:J4PSAILTnew · submitted 2026-06-29 · ❄️ cond-mat.mtrl-sci · cs.AI· cs.LG

Optimizing Expert-Designed Crystal Graph Networks for Band-Gap Prediction with an Autonomous LLM Research Loop

Chenmu Zhang , Boris I. Yakobson This is my paper

Pith reviewed 2026-06-30 05:46 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.AIcs.LG

keywords crystal graph networksband-gap predictionMatBench benchmarkLLM coding agentautonomous optimizationmaterials machine learningspace-group embeddingmessage passing

0 comments

The pith

An autonomous LLM coding agent built the top model for crystal band-gap prediction on the MatBench benchmark.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that a general-purpose coding agent can autonomously refine crystal graph networks to achieve the highest accuracy on the MatBench band-gap task among models trained without external pretraining. It outperforms all seventeen previously reported expert-designed models on a benchmark of more than 100,000 crystals. The agent reached this result by adding element-pair features to message-passing edges and incorporating crystal space-group embeddings, both of which are established techniques. The work therefore shows that LLM-driven loops can optimize expert machine-learning architectures for materials property prediction while also mapping the practical boundaries of such automation.

Core claim

On the MatBench band-gap benchmark (>100k crystals), a general-purpose coding agent autonomously built the most accurate model trained without external pretraining, ahead of all seventeen expert-designed models reported for the task. The agent reached this performance by implementing known methods: element-pair features on each message-passing edge and a crystal space-group embedding. The study both validates that LLM-agent autonomous research can optimize an expert-designed machine learning model for material property prediction and examines the limitations of the approach.

What carries the argument

Autonomous LLM research loop that iteratively codes, trains, and evaluates crystal graph networks for band-gap regression.

If this is right

The same agent loop can be applied to other MatBench tasks such as formation energy or elasticity prediction.
Element-pair edge features and space-group embeddings become standard additions to crystal graph architectures.
Autonomous coding agents reduce the human effort needed to reach state-of-the-art performance on fixed materials benchmarks.
Limitations identified in the agent loop will guide the design of more reliable autonomous research systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The result implies that many incremental gains in materials ML may already be reachable by systematic recombination of known components rather than novel inventions.
If the loop generalizes, similar agents could be deployed on private or newly collected crystal datasets without requiring a large team of domain experts.
The work leaves open whether the agent would discover genuinely new architectural motifs when the search space is expanded beyond current crystal-graph conventions.

Load-bearing premise

The agent's model was evaluated under identical conditions to the 17 expert models on the public benchmark, with no hidden advantages from the agent's implementation details or data handling.

What would settle it

Re-running the agent's final model on the public MatBench split and finding that its accuracy falls below at least one of the seventeen expert baselines.

Figures

Figures reproduced from arXiv: 2606.29717 by Boris I. Yakobson, Chenmu Zhang.

**Figure 2.** Figure 2: The lowest validation MAE reached (eV, solid black line) against experiment [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The best model’s mean absolute error (gold stars) relative to coGN’s, on [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: The lowest held-out MAE reached, against experiment number, for two runs on [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Predicting a material's properties from its structure is a central, fast-advancing problem in computational materials science. A decade of work has produced standard public benchmarks and many published machine-learning models for the task (Dunn et al., 2020). The task's fixed metric and these baselines make it a natural setting for autonomous agent research (Karpathy, 2026). On the MatBench band-gap benchmark ($>$100k crystals), a general-purpose coding agent autonomously built the most accurate model trained without external pretraining, ahead of all seventeen expert-designed models reported for the task. A closer analysis shows it reached this by implementing known methods: either already standard in crystal neural-network models, or borrowed from other areas of machine learning. The contributing implementations include element-pair features on each message-passing edge and a crystal space-group embedding. The work not only demonstrates that LLM-agent autonomous research can optimize an expert-designed machine learning model for material property prediction, but also investigates the limitations of such autonomous research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

An LLM agent built a top MatBench band-gap model by adding standard element-pair edges and space-group embeddings, but the abstract leaves the exact benchmark protocol unconfirmed.

read the letter

The main thing to know is that the paper shows a general-purpose coding agent autonomously producing a crystal graph model that beats all 17 cited expert baselines on the MatBench band-gap task for over 100k crystals, without any external pretraining. The authors are clear that the edge came from already-known pieces: element-pair features on message-passing edges and a space-group embedding.

What the work does reasonably well is document the agent's concrete changes and then step back to discuss the limits of autonomous loops. It avoids claiming invention of new architecture and instead treats the result as a case study in what current agents can recombine from the literature. That honesty is useful for anyone tracking how LLM agents perform on real scientific modeling tasks.

The soft spot is the evaluation setup. The abstract states the model was ahead on the public benchmark but supplies no explicit confirmation that the identical MatBench train/test splits, normalization, or evaluation script were used. If the agent's code introduced even minor differences in data handling or featurization, the headline comparison weakens. The paper also gives little detail on prompt engineering, training hyperparameters, or variance across runs, which matters for an empirical claim like this.

This is for readers working on graph networks in materials or on LLM agents for automated science. Someone already following MatBench or crystal GNNs might want the case study for the specific tweaks that worked; broader claims about autonomous research acceleration are scoped to one task. It deserves peer review because the benchmark result is concrete enough for referees to check the protocol and methods section directly.

Referee Report

2 major / 2 minor

Summary. The manuscript describes an autonomous LLM-based coding agent that optimizes an expert-designed crystal graph neural network for band-gap prediction. On the MatBench benchmark (>100k crystals), the resulting model—incorporating element-pair features on message-passing edges and crystal space-group embeddings—is reported to outperform all 17 previously published expert-designed models when trained without external pretraining. The work also examines the limitations of such autonomous optimization loops.

Significance. If the performance edge is shown to arise solely from the architectural choices under identical benchmark conditions, the result would illustrate that general-purpose coding agents can autonomously discover and implement known but effective techniques from crystal networks and broader ML literature, thereby demonstrating a viable path for accelerating model development in materials informatics while also surfacing practical constraints of agent-driven research.

major comments (2)

[Abstract / Results] Abstract and main results section: the headline claim that the agent-built model is 'ahead of all seventeen expert-designed models' on the public MatBench band-gap benchmark is load-bearing for the paper's central contribution, yet the manuscript supplies no explicit statement or supplementary table confirming that the identical train/test splits, normalization, missing-value handling, and evaluation script from the MatBench repository were used. Any deviation would prevent attribution of the accuracy gain to the autonomous loop rather than implementation differences.
[Methods] Methods and experimental details: the abstract states that the agent 'autonomously built' the model by implementing element-pair edge features and space-group embedding, but the manuscript provides no details on the exact agent prompts, the base architecture before modification, training hyperparameters, statistical validation (e.g., multiple random seeds or cross-validation), or ablation studies isolating the contribution of each added component. These omissions make it impossible to assess whether the reported superiority is robust.

minor comments (2)

[Results] The manuscript should include a dedicated table or supplementary note listing the 17 baseline models with their reported MatBench metrics for direct comparison.
[Methods] Notation for the crystal graph (e.g., definition of edge features and space-group embedding) should be introduced with an equation or diagram in the methods section to improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the emphasis on reproducibility and methodological transparency. Both major comments identify areas where additional explicit statements and details will strengthen the manuscript; we will incorporate revisions to address them directly.

read point-by-point responses

Referee: [Abstract / Results] Abstract and main results section: the headline claim that the agent-built model is 'ahead of all seventeen expert-designed models' on the public MatBench band-gap benchmark is load-bearing for the paper's central contribution, yet the manuscript supplies no explicit statement or supplementary table confirming that the identical train/test splits, normalization, missing-value handling, and evaluation script from the MatBench repository were used. Any deviation would prevent attribution of the accuracy gain to the autonomous loop rather than implementation differences.

Authors: We agree that an explicit confirmation of benchmark fidelity is required to attribute performance differences to the autonomous loop. In the revised manuscript we will add a paragraph in the Methods section stating that the official MatBench train/test splits, normalization, missing-value handling, and evaluation scripts were followed exactly as released in the MatBench repository. We will also add a supplementary table that tabulates the key benchmark parameters and confirms adherence to the public protocol. revision: yes
Referee: [Methods] Methods and experimental details: the abstract states that the agent 'autonomously built' the model by implementing element-pair edge features and space-group embedding, but the manuscript provides no details on the exact agent prompts, the base architecture before modification, training hyperparameters, statistical validation (e.g., multiple random seeds or cross-validation), or ablation studies isolating the contribution of each added component. These omissions make it impossible to assess whether the reported superiority is robust.

Authors: We acknowledge the current manuscript is missing these implementation details. The revised Methods section will be expanded to report: the precise prompts supplied to the coding agent, the base architecture prior to any modifications, all training hyperparameters, performance statistics across multiple random seeds, and ablation experiments that isolate the contribution of the element-pair edge features and the space-group embedding. These additions will enable readers to evaluate robustness directly. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark result on external public data

full rationale

The paper's central claim is an empirical performance comparison on the public MatBench band-gap benchmark (>100k crystals), where an LLM agent autonomously produced a model outperforming 17 expert baselines. No mathematical derivation, fitted parameters, or self-referential equations are presented. The result rests on external public benchmark data and reported model performances rather than internal definitions or self-citation chains. This matches the default expectation of a self-contained empirical finding with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical ML application study with no mathematical derivations, free parameters, or new physical axioms introduced; the central claim rests entirely on benchmark performance comparison.

pith-pipeline@v0.9.1-grok · 5715 in / 1158 out tokens · 16841 ms · 2026-06-30T05:46:40.730957+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 21 canonical work pages

[1]

Graph networks as a universal machine learning framework for molecules and crystals

Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, and Shyue Ping Ong. Graph networks as a universal machine learning framework for molecules and crystals. Chemistry of Materials, 31 0 (9): 0 3564--3572, 2019. doi:10.1021/acs.chemmater.9b01294

work page doi:10.1021/acs.chemmater.9b01294 2019
[2]

Atomistic line graph neural network for improved materials property predictions

Kamal Choudhary and Brian DeCost. Atomistic line graph neural network for improved materials property predictions. npj Computational Materials, 7 0 (1): 0 185, 2021. doi:10.1038/s41524-021-00650-1

work page doi:10.1038/s41524-021-00650-1 2021
[3]

Materials property prediction for limited datasets enabled by feature selection and joint learning with modnet

Pierre-Paul De Breuck, Geoffroy Hautier, and Gian-Marco Rignanese. Materials property prediction for limited datasets enabled by feature selection and joint learning with modnet. npj Computational Materials, 7 0 (1): 0 83, 2021. doi:10.1038/s41524-021-00552-2

work page doi:10.1038/s41524-021-00552-2 2021
[4]

Densegnn: universal and scalable deeper graph neural networks for high-performance property prediction in crystals and molecules

Hongwei Du, Jiamin Wang, Jian Hui, Lanting Zhang, and Hong Wang. Densegnn: universal and scalable deeper graph neural networks for high-performance property prediction in crystals and molecules. npj Computational Materials, 10 0 (1): 0 292, 2024. doi:10.1038/s41524-024-01444-x

work page doi:10.1038/s41524-024-01444-x 2024
[5]

Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm

Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, and Anubhav Jain. Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm. npj Computational Materials, 6 0 (1): 0 138, 2020. doi:10.1038/s41524-020-00406-3

work page doi:10.1038/s41524-020-00406-3 2020
[6]

Margraf, and Stephan G \"u nnemann

Johannes Gasteiger, Shankari Giri, Johannes T. Margraf, and Stephan G \"u nnemann. Fast and uncertainty-aware directional message passing for non-equilibrium molecules, 2020. NeurIPS 2020 Machine Learning for Molecules Workshop

2020
[7]

Combining feature-based approaches with graph neural networks and symbolic regression for synergistic performance and interpretability

Rog \'e rio Almeida Gouv \^e a, Pierre-Paul De Breuck, Tatiane Pretto, Gian-Marco Rignanese, and Marcos Jos \'e Leite Santos. Combining feature-based approaches with graph neural networks and symbolic regression for synergistic performance and interpretability. npj Computational Materials, 12 0 (1): 0 67, January 2026. doi:10.1038/s41524-025-01938-2

work page doi:10.1038/s41524-025-01938-2 2026
[8]

Weinberger

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. Deep networks with stochastic depth. In European Conference on Computer Vision (ECCV), pp.\ 646--661. Springer, 2016. doi:10.1007/978-3-319-46493-0_39

work page doi:10.1007/978-3-319-46493-0_39 2016
[9]

Materials informatics transformer: A language model for interpretable materials properties prediction, 2023

Hongshuo Huang, Rishikesh Magar, Changwen Xu, and Amir Barati Farimani. Materials informatics transformer: A language model for interpretable materials properties prediction, 2023

2023
[10]

Formula graph self-attention network for representation-domain independent materials discovery

Achintha Ihalage and Yang Hao. Formula graph self-attention network for representation-domain independent materials discovery. Advanced Science, 9 0 (18): 0 2200164, 2022. doi:10.1002/advs.202200164

work page doi:10.1002/advs.202200164 2022
[11]

Wider or deeper? scaling LLM inference-time compute with adaptive branching tree search

Yuichi Inoue, Kou Misaki, Yuki Imajuku, So Kuroki, Taishi Nakamura, and Takuya Akiba. Wider or deeper? scaling LLM inference-time compute with adaptive branching tree search. In Advances in Neural Information Processing Systems (NeurIPS), 2025

2025
[12]

LLMatDesign : Autonomous materials discovery with large language models, 2024

Shuyi Jia, Chao Zhang, and Victor Fung. LLMatDesign : Autonomous materials discovery with large language models, 2024

2024
[13]

AIDE : AI -driven exploration in the space of code, 2025

Zhengyao Jiang, Dominik Schmidt, Dhruv Srikanth, Dixing Xu, Ian Kaplan, Deniss Jacenko, and Yuxiang Wu. AIDE : AI -driven exploration in the space of code, 2025

2025
[14]

Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. Swe-bench: Can language models resolve real-world github issues? In International Conference on Learning Representations (ICLR), 2024

2024
[15]

autoresearch

Andrej Karpathy. autoresearch. https://github.com/karpathy/autoresearch, March 2026

2026
[16]

Matini-net: Versatile material informatics research framework for feature engineering and deep neural network design

Myeonghun Lee, Taehyun Park, and Kyoungmin Min. Matini-net: Versatile material informatics research framework for feature engineering and deep neural network design. Journal of Chemical Information and Modeling, 64 0 (23): 0 8770--8783, 2024. doi:10.1021/acs.jcim.4c01676

work page doi:10.1021/acs.jcim.4c01676 2024
[17]

doi: 10.1126/science.abq1158

Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, R \'e mi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d'Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, P...

work page doi:10.1126/science.abq1158 2022
[18]

The ai scientist: Towards fully automated open-ended scientific discovery, 2024

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist: Towards fully automated open-ended scientific discovery, 2024

2024
[19]

Alexander Novikov, Ng \^a n V \ u , Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. AlphaEvolve : A coding agent for scientific a...

2025
[20]

Scalable deeper graph neural networks for high-performance materials property prediction

Sadman Sadeed Omee, Steph-Yves Louis, Nihang Fu, Lai Wei, Sourin Dey, Rongzhi Dong, Qinyang Li, and Jianjun Hu. Scalable deeper graph neural networks for high-performance materials property prediction. Patterns, 3 0 (5): 0 100491, 2022. doi:10.1016/j.patter.2022.100491

work page doi:10.1016/j.patter.2022.100491 2022
[21]

Malliaros, and Joseph Musielewicz

Ali Ramlaoui, Alexandre Duval, Hannah Bull, Victor Schmidt, Hugues Talbot, Fragkiskos D. Malliaros, and Joseph Musielewicz. TriForces : Augmenting atomistic GNNs for transferable representations, May 2026. Accepted at ICML 2026

2026
[22]

Janosh Riebesell, Rhys E. A. Goodall, Philipp Benner, Yuan Chiang, Bowen Deng, Gerbrand Ceder, Mark Asta, Alpha A. Lee, Anubhav Jain, and Kristin A. Persson. A framework to evaluate machine learning crystal stability predictions. Nature Machine Intelligence, 7 0 (6): 0 836--847, June 2025. doi:10.1038/s42256-025-01055-1

work page doi:10.1038/s42256-025-01055-1 2025
[23]

Pawan Kumar, Emilien Dupont, Francisco J

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli, and Alhussein Fawzi. Mathematical discoveries from program search with large language models. Nature, 625 0 (7995): 0 468--475, 2024. doi:10.1038/s41586-02...

work page doi:10.1038/s41586-023-06924-6 2024
[24]

Connectivity optimized nested line graph networks for crystal structures

Robin Ruff, Patrick Reiser, Jan St \"u hmer, and Pascal Friederich. Connectivity optimized nested line graph networks for crystal structures. Digital Discovery, 3 0 (3): 0 594--601, 2024. doi:10.1039/d4dd00018h

work page doi:10.1039/d4dd00018h 2024
[25]

u tt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and K.-R. M \

K. T. Sch \"u tt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and K.-R. M \"u ller. SchNet -- a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148 0 (24): 0 241722, 2018. doi:10.1063/1.5019779

work page doi:10.1063/1.5019779 2018
[26]

Sch \"u tt, Oliver T

Kristof T. Sch \"u tt, Oliver T. Unke, and Michael Gastegger. Equivariant message passing for the prediction of tensorial properties and molecular spectra. In Proceedings of the 38th International Conference on Machine Learning (ICML), volume 139, pp.\ 9377--9388. PMLR, 2021

2021
[27]

Kitchin, Zachary W

Nima Shoghi, Adeesh Kolluru, John R. Kitchin, Zachary W. Ulissi, C. Lawrence Zitnick, and Brandon M. Wood. From molecules to materials: Pre-training large generalizable models for atomic property prediction. In The Twelfth International Conference on Learning Representations (ICLR), 2024. URL https://openreview.net/forum?id=PfPnugdxup

2024
[28]

Szymanski, Bernardus Rendy, Yuxing Fei, Rishi E

Nathan J. Szymanski, Bernardus Rendy, Yuxing Fei, Rishi E. Kumar, Tanjin He, David Milsted, Matthew J. McDermott, Max Gallant, Ekin Dogus Cubuk, Amil Merchant, Haegyeom Kim, Anubhav Jain, Christopher J. Bartel, Kristin Persson, Yan Zeng, and Gerbrand Ceder. An autonomous laboratory for the accelerated synthesis of novel materials. Nature, 624 0 (7990): 0 ...

work page doi:10.1038/s41586-023-06734-w 2023
[29]

Miller, Abhishek Charnalia, Derek Dunfield, Carole-Jean Wu, Pontus Stenetorp, Nicola Cancedda, Jakob Nicolaus Foerster, and Yoram Bachrach

Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Rishi Hazra, Nicolas Baldwin, Alexis Audran-Reiss, Michael Kuchnik, Despoina Magka, Minqi Jiang, Alisia Maria Lupidi, Andrei Lupu, Roberta Raileanu, Kelvin Niu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Michael Shvartsman, Shagun Sodhani, Alexander H. Miller, Abhishek Charnalia, Derek Dunfield, Car...

2025
[30]

Kauwe, Ryan J

Anthony Yu-Tung Wang, Steven K. Kauwe, Ryan J. Murdock, and Taylor D. Sparks. Compositionally restricted attention-based network for materials property predictions. npj Computational Materials, 7 0 (1): 0 77, 2021. doi:10.1038/s41524-021-00545-1

work page doi:10.1038/s41524-021-00545-1 2021
[31]

Crystograph: A comprehensive predictive model for crystal material properties and the benchmark

Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, and Sheng Gong. Crystograph: A comprehensive predictive model for crystal material properties and the benchmark. Battery Energy, 4 0 (4): 0 e70004, 2025. doi:10.1002/bte2.70004

work page doi:10.1002/bte2.70004 2025
[32]

A general-purpose machine learning framework for predicting properties of inorganic materials

Logan Ward, Ankit Agrawal, Alok Choudhary, and Christopher Wolverton. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Computational Materials, 2 0 (1): 0 16028, 2016. doi:10.1038/npjcompumats.2016.28

work page doi:10.1038/npjcompumats.2016.28 2016
[33]

Grossman

Tian Xie and Jeffrey C. Grossman. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical Review Letters, 120 0 (14): 0 145301, 2018. doi:10.1103/PhysRevLett.120.145301

work page doi:10.1103/physrevlett.120.145301 2018
[34]

CLOUD : A scalable and physics-informed foundation model for crystal representation learning

Changwen Xu, Shang Zhu, and Venkatasubramanian Viswanathan. CLOUD : A scalable and physics-informed foundation model for crystal representation learning. Nature Communications, 17 0 (1): 0 4074, 2026. doi:10.1038/s41467-026-70467-3

work page doi:10.1038/s41467-026-70467-3 2026
[35]

The AI scientist-v2: Workshop-level automated scientific discovery via agentic tree search, April 2025

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The AI scientist-v2: Workshop-level automated scientific discovery via agentic tree search, April 2025

2025
[36]

Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE -agent: Agent-computer interfaces enable automated software engineering, 2024. NeurIPS 2024

2024

[1] [1]

Graph networks as a universal machine learning framework for molecules and crystals

Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, and Shyue Ping Ong. Graph networks as a universal machine learning framework for molecules and crystals. Chemistry of Materials, 31 0 (9): 0 3564--3572, 2019. doi:10.1021/acs.chemmater.9b01294

work page doi:10.1021/acs.chemmater.9b01294 2019

[2] [2]

Atomistic line graph neural network for improved materials property predictions

Kamal Choudhary and Brian DeCost. Atomistic line graph neural network for improved materials property predictions. npj Computational Materials, 7 0 (1): 0 185, 2021. doi:10.1038/s41524-021-00650-1

work page doi:10.1038/s41524-021-00650-1 2021

[3] [3]

Materials property prediction for limited datasets enabled by feature selection and joint learning with modnet

Pierre-Paul De Breuck, Geoffroy Hautier, and Gian-Marco Rignanese. Materials property prediction for limited datasets enabled by feature selection and joint learning with modnet. npj Computational Materials, 7 0 (1): 0 83, 2021. doi:10.1038/s41524-021-00552-2

work page doi:10.1038/s41524-021-00552-2 2021

[4] [4]

Densegnn: universal and scalable deeper graph neural networks for high-performance property prediction in crystals and molecules

Hongwei Du, Jiamin Wang, Jian Hui, Lanting Zhang, and Hong Wang. Densegnn: universal and scalable deeper graph neural networks for high-performance property prediction in crystals and molecules. npj Computational Materials, 10 0 (1): 0 292, 2024. doi:10.1038/s41524-024-01444-x

work page doi:10.1038/s41524-024-01444-x 2024

[5] [5]

Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm

Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, and Anubhav Jain. Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm. npj Computational Materials, 6 0 (1): 0 138, 2020. doi:10.1038/s41524-020-00406-3

work page doi:10.1038/s41524-020-00406-3 2020

[6] [6]

Margraf, and Stephan G \"u nnemann

Johannes Gasteiger, Shankari Giri, Johannes T. Margraf, and Stephan G \"u nnemann. Fast and uncertainty-aware directional message passing for non-equilibrium molecules, 2020. NeurIPS 2020 Machine Learning for Molecules Workshop

2020

[7] [7]

Combining feature-based approaches with graph neural networks and symbolic regression for synergistic performance and interpretability

Rog \'e rio Almeida Gouv \^e a, Pierre-Paul De Breuck, Tatiane Pretto, Gian-Marco Rignanese, and Marcos Jos \'e Leite Santos. Combining feature-based approaches with graph neural networks and symbolic regression for synergistic performance and interpretability. npj Computational Materials, 12 0 (1): 0 67, January 2026. doi:10.1038/s41524-025-01938-2

work page doi:10.1038/s41524-025-01938-2 2026

[8] [8]

Weinberger

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. Deep networks with stochastic depth. In European Conference on Computer Vision (ECCV), pp.\ 646--661. Springer, 2016. doi:10.1007/978-3-319-46493-0_39

work page doi:10.1007/978-3-319-46493-0_39 2016

[9] [9]

Materials informatics transformer: A language model for interpretable materials properties prediction, 2023

Hongshuo Huang, Rishikesh Magar, Changwen Xu, and Amir Barati Farimani. Materials informatics transformer: A language model for interpretable materials properties prediction, 2023

2023

[10] [10]

Formula graph self-attention network for representation-domain independent materials discovery

Achintha Ihalage and Yang Hao. Formula graph self-attention network for representation-domain independent materials discovery. Advanced Science, 9 0 (18): 0 2200164, 2022. doi:10.1002/advs.202200164

work page doi:10.1002/advs.202200164 2022

[11] [11]

Wider or deeper? scaling LLM inference-time compute with adaptive branching tree search

Yuichi Inoue, Kou Misaki, Yuki Imajuku, So Kuroki, Taishi Nakamura, and Takuya Akiba. Wider or deeper? scaling LLM inference-time compute with adaptive branching tree search. In Advances in Neural Information Processing Systems (NeurIPS), 2025

2025

[12] [12]

LLMatDesign : Autonomous materials discovery with large language models, 2024

Shuyi Jia, Chao Zhang, and Victor Fung. LLMatDesign : Autonomous materials discovery with large language models, 2024

2024

[13] [13]

AIDE : AI -driven exploration in the space of code, 2025

Zhengyao Jiang, Dominik Schmidt, Dhruv Srikanth, Dixing Xu, Ian Kaplan, Deniss Jacenko, and Yuxiang Wu. AIDE : AI -driven exploration in the space of code, 2025

2025

[14] [14]

Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. Swe-bench: Can language models resolve real-world github issues? In International Conference on Learning Representations (ICLR), 2024

2024

[15] [15]

autoresearch

Andrej Karpathy. autoresearch. https://github.com/karpathy/autoresearch, March 2026

2026

[16] [16]

Matini-net: Versatile material informatics research framework for feature engineering and deep neural network design

Myeonghun Lee, Taehyun Park, and Kyoungmin Min. Matini-net: Versatile material informatics research framework for feature engineering and deep neural network design. Journal of Chemical Information and Modeling, 64 0 (23): 0 8770--8783, 2024. doi:10.1021/acs.jcim.4c01676

work page doi:10.1021/acs.jcim.4c01676 2024

[17] [17]

doi: 10.1126/science.abq1158

Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, R \'e mi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d'Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, P...

work page doi:10.1126/science.abq1158 2022

[18] [18]

The ai scientist: Towards fully automated open-ended scientific discovery, 2024

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist: Towards fully automated open-ended scientific discovery, 2024

2024

[19] [19]

Alexander Novikov, Ng \^a n V \ u , Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. AlphaEvolve : A coding agent for scientific a...

2025

[20] [20]

Scalable deeper graph neural networks for high-performance materials property prediction

Sadman Sadeed Omee, Steph-Yves Louis, Nihang Fu, Lai Wei, Sourin Dey, Rongzhi Dong, Qinyang Li, and Jianjun Hu. Scalable deeper graph neural networks for high-performance materials property prediction. Patterns, 3 0 (5): 0 100491, 2022. doi:10.1016/j.patter.2022.100491

work page doi:10.1016/j.patter.2022.100491 2022

[21] [21]

Malliaros, and Joseph Musielewicz

Ali Ramlaoui, Alexandre Duval, Hannah Bull, Victor Schmidt, Hugues Talbot, Fragkiskos D. Malliaros, and Joseph Musielewicz. TriForces : Augmenting atomistic GNNs for transferable representations, May 2026. Accepted at ICML 2026

2026

[22] [22]

Janosh Riebesell, Rhys E. A. Goodall, Philipp Benner, Yuan Chiang, Bowen Deng, Gerbrand Ceder, Mark Asta, Alpha A. Lee, Anubhav Jain, and Kristin A. Persson. A framework to evaluate machine learning crystal stability predictions. Nature Machine Intelligence, 7 0 (6): 0 836--847, June 2025. doi:10.1038/s42256-025-01055-1

work page doi:10.1038/s42256-025-01055-1 2025

[23] [23]

Pawan Kumar, Emilien Dupont, Francisco J

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli, and Alhussein Fawzi. Mathematical discoveries from program search with large language models. Nature, 625 0 (7995): 0 468--475, 2024. doi:10.1038/s41586-02...

work page doi:10.1038/s41586-023-06924-6 2024

[24] [24]

Connectivity optimized nested line graph networks for crystal structures

Robin Ruff, Patrick Reiser, Jan St \"u hmer, and Pascal Friederich. Connectivity optimized nested line graph networks for crystal structures. Digital Discovery, 3 0 (3): 0 594--601, 2024. doi:10.1039/d4dd00018h

work page doi:10.1039/d4dd00018h 2024

[25] [25]

u tt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and K.-R. M \

K. T. Sch \"u tt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, and K.-R. M \"u ller. SchNet -- a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148 0 (24): 0 241722, 2018. doi:10.1063/1.5019779

work page doi:10.1063/1.5019779 2018

[26] [26]

Sch \"u tt, Oliver T

Kristof T. Sch \"u tt, Oliver T. Unke, and Michael Gastegger. Equivariant message passing for the prediction of tensorial properties and molecular spectra. In Proceedings of the 38th International Conference on Machine Learning (ICML), volume 139, pp.\ 9377--9388. PMLR, 2021

2021

[27] [27]

Kitchin, Zachary W

Nima Shoghi, Adeesh Kolluru, John R. Kitchin, Zachary W. Ulissi, C. Lawrence Zitnick, and Brandon M. Wood. From molecules to materials: Pre-training large generalizable models for atomic property prediction. In The Twelfth International Conference on Learning Representations (ICLR), 2024. URL https://openreview.net/forum?id=PfPnugdxup

2024

[28] [28]

Szymanski, Bernardus Rendy, Yuxing Fei, Rishi E

Nathan J. Szymanski, Bernardus Rendy, Yuxing Fei, Rishi E. Kumar, Tanjin He, David Milsted, Matthew J. McDermott, Max Gallant, Ekin Dogus Cubuk, Amil Merchant, Haegyeom Kim, Anubhav Jain, Christopher J. Bartel, Kristin Persson, Yan Zeng, and Gerbrand Ceder. An autonomous laboratory for the accelerated synthesis of novel materials. Nature, 624 0 (7990): 0 ...

work page doi:10.1038/s41586-023-06734-w 2023

[29] [29]

Miller, Abhishek Charnalia, Derek Dunfield, Carole-Jean Wu, Pontus Stenetorp, Nicola Cancedda, Jakob Nicolaus Foerster, and Yoram Bachrach

Edan Toledo, Karen Hambardzumyan, Martin Josifoski, Rishi Hazra, Nicolas Baldwin, Alexis Audran-Reiss, Michael Kuchnik, Despoina Magka, Minqi Jiang, Alisia Maria Lupidi, Andrei Lupu, Roberta Raileanu, Kelvin Niu, Tatiana Shavrina, Jean-Christophe Gagnon-Audet, Michael Shvartsman, Shagun Sodhani, Alexander H. Miller, Abhishek Charnalia, Derek Dunfield, Car...

2025

[30] [30]

Kauwe, Ryan J

Anthony Yu-Tung Wang, Steven K. Kauwe, Ryan J. Murdock, and Taylor D. Sparks. Compositionally restricted attention-based network for materials property predictions. npj Computational Materials, 7 0 (1): 0 77, 2021. doi:10.1038/s41524-021-00545-1

work page doi:10.1038/s41524-021-00545-1 2021

[31] [31]

Crystograph: A comprehensive predictive model for crystal material properties and the benchmark

Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, and Sheng Gong. Crystograph: A comprehensive predictive model for crystal material properties and the benchmark. Battery Energy, 4 0 (4): 0 e70004, 2025. doi:10.1002/bte2.70004

work page doi:10.1002/bte2.70004 2025

[32] [32]

A general-purpose machine learning framework for predicting properties of inorganic materials

Logan Ward, Ankit Agrawal, Alok Choudhary, and Christopher Wolverton. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Computational Materials, 2 0 (1): 0 16028, 2016. doi:10.1038/npjcompumats.2016.28

work page doi:10.1038/npjcompumats.2016.28 2016

[33] [33]

Grossman

Tian Xie and Jeffrey C. Grossman. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical Review Letters, 120 0 (14): 0 145301, 2018. doi:10.1103/PhysRevLett.120.145301

work page doi:10.1103/physrevlett.120.145301 2018

[34] [34]

CLOUD : A scalable and physics-informed foundation model for crystal representation learning

Changwen Xu, Shang Zhu, and Venkatasubramanian Viswanathan. CLOUD : A scalable and physics-informed foundation model for crystal representation learning. Nature Communications, 17 0 (1): 0 4074, 2026. doi:10.1038/s41467-026-70467-3

work page doi:10.1038/s41467-026-70467-3 2026

[35] [35]

The AI scientist-v2: Workshop-level automated scientific discovery via agentic tree search, April 2025

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The AI scientist-v2: Workshop-level automated scientific discovery via agentic tree search, April 2025

2025

[36] [36]

Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE -agent: Agent-computer interfaces enable automated software engineering, 2024. NeurIPS 2024

2024