pith. sign in

arxiv: 2410.12771 · v2 · pith:7R5FTE2Lnew · submitted 2024-10-16 · ❄️ cond-mat.mtrl-sci · cs.AI· physics.comp-ph

Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models

Pith reviewed 2026-05-23 18:55 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.AIphysics.comp-ph
keywords inorganic materialsdensity functional theorymachine learning modelsmaterials discoveryopen datasetEquiformerV2formation energy predictionstability prediction
0
0 comments X

The pith

The OMat24 dataset and EquiformerV2 models achieve state-of-the-art predictions of material stability and formation energies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces the Open Materials 2024 dataset consisting of over 110 million density functional theory calculations on inorganic materials, designed to capture structural and compositional diversity. It also releases pre-trained EquiformerV2 models that reach new performance levels on the Matbench Discovery benchmark. These models predict ground-state stability with an F1 score above 0.9 and formation energies with an accuracy of 20 meV per atom. The open availability of both the data and models is intended to lower barriers for researchers working on AI-driven materials discovery for applications such as climate solutions and advanced electronics. The authors also investigate the effects of model scale, denoising tasks, and fine-tuning across related datasets.

Core claim

We release the OMat24 dataset with more than 110 million DFT calculations focused on inorganic materials and a set of EquiformerV2 models. These models attain state-of-the-art results on the Matbench Discovery leaderboard, predicting ground-state stability to an F1 score exceeding 0.9 and formation energies to 20 meV/atom accuracy. We examine how model size, auxiliary denoising objectives, and fine-tuning influence performance on OMat24, MPtraj, and Alexandria datasets. The public release of the dataset and models supports community progress in AI for materials science.

What carries the argument

The EquiformerV2 models trained on the large-scale OMat24 dataset of over 110 million DFT calculations that emphasize diversity in structure and composition.

If this is right

  • High-accuracy stability predictions allow more reliable identification of viable new materials.
  • 20 meV/atom energy accuracy improves estimates of thermodynamic favorability for synthesis.
  • Open pre-trained models lower the computational barrier for applying AI to materials problems.
  • Performance gains from larger models and fine-tuning suggest paths for further improvement.
  • Exploration across multiple datasets indicates the approach generalizes beyond the primary training set.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dataset may support training models for additional material properties like conductivity or hardness if extended.
  • Community use could lead to hybrid models combining this data with experimental observations.
  • Similar large-scale open datasets might accelerate progress in related fields such as catalysis or battery materials.
  • Reduced reliance on closed datasets could standardize benchmarks in the field.

Load-bearing premise

The DFT calculations used to generate the OMat24 dataset accurately represent the true physical properties of the materials and the sampled configurations provide sufficient structural and compositional diversity for the models to generalize beyond the training data.

What would settle it

Experimental validation showing that materials predicted as stable by the models are actually unstable when synthesized and tested in the lab, or vice versa.

read the original abstract

The ability to discover new materials with desirable properties is critical for numerous applications from helping mitigate climate change to advances in next generation computing hardware. AI has the potential to accelerate materials discovery and design by more effectively exploring the chemical space compared to other computational methods or by trial-and-error. While substantial progress has been made on AI for materials data, benchmarks, and models, a barrier that has emerged is the lack of publicly available training data and open pre-trained models. To address this, we present a Meta FAIR release of the Open Materials 2024 (OMat24) large-scale open dataset and an accompanying set of pre-trained models. OMat24 contains over 110 million density functional theory (DFT) calculations focused on structural and compositional diversity. Our EquiformerV2 models achieve state-of-the-art performance on the Matbench Discovery leaderboard and are capable of predicting ground-state stability and formation energies to an F1 score above 0.9 and an accuracy of 20 meV/atom, respectively. We explore the impact of model size, auxiliary denoising objectives, and fine-tuning on performance across a range of datasets including OMat24, MPtraj, and Alexandria. The open release of the OMat24 dataset and models enables the research community to build upon our efforts and drive further advancements in AI-assisted materials science.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Open Materials 2024 (OMat24) dataset containing over 110 million DFT calculations on inorganic materials with emphasis on structural and compositional diversity, together with pre-trained EquiformerV2 models. These models are stated to reach state-of-the-art performance on the external Matbench Discovery leaderboard, attaining F1 scores above 0.9 for ground-state stability classification and 20 meV/atom accuracy for formation-energy regression. The work further examines the effects of model scale, auxiliary denoising objectives, and fine-tuning across OMat24, MPtraj, and Alexandria, and releases both the dataset and models publicly.

Significance. If the reported performance numbers hold under the stated evaluation protocol, the contribution is significant because it supplies a large-scale, openly released training resource and foundation models that lower the barrier for subsequent work in AI-assisted materials discovery. Explicit credit is due for the public release of both data and weights and for the use of an external benchmark that reduces circularity risk.

major comments (2)
  1. [Results] Results section (performance tables/figures): the claim of state-of-the-art performance on Matbench Discovery is central, yet the manuscript does not provide a side-by-side comparison table listing the exact metrics (F1, MAE, etc.) of the immediately preceding leaderboard entries; without this, the magnitude of improvement cannot be assessed quantitatively.
  2. [Methods] Dataset construction / Methods: the assertion that the 110 million structures provide sufficient diversity for generalization beyond the training distribution is load-bearing for the reported transfer performance, but the sampling protocol (composition, space-group coverage, relaxation criteria) is described only at high level; a quantitative diversity metric or coverage analysis is required.
minor comments (2)
  1. [Abstract] Abstract: the numerical claims (F1 > 0.9, 20 meV/atom) should be accompanied by the precise evaluation split or leaderboard version used.
  2. [Figures] Figure captions: several performance plots lack error bars or the number of independent runs, reducing interpretability of the model-size and fine-tuning ablations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive assessment of the work and for the constructive suggestions that will improve the clarity of the manuscript. We address each major comment below.

read point-by-point responses
  1. Referee: [Results] Results section (performance tables/figures): the claim of state-of-the-art performance on Matbench Discovery is central, yet the manuscript does not provide a side-by-side comparison table listing the exact metrics (F1, MAE, etc.) of the immediately preceding leaderboard entries; without this, the magnitude of improvement cannot be assessed quantitatively.

    Authors: We agree that a side-by-side comparison table would allow readers to quantitatively assess the improvement. In the revised manuscript we will add a table in the Results section that reports F1, MAE, and other key metrics for the top entries on the public Matbench Discovery leaderboard together with our EquiformerV2 results. This table will be placed immediately before or after the current performance claims so that the magnitude of the advance is transparent. revision: yes

  2. Referee: [Methods] Dataset construction / Methods: the assertion that the 110 million structures provide sufficient diversity for generalization beyond the training distribution is load-bearing for the reported transfer performance, but the sampling protocol (composition, space-group coverage, relaxation criteria) is described only at high level; a quantitative diversity metric or coverage analysis is required.

    Authors: We acknowledge that the current Methods description of the sampling protocol remains high-level. In the revision we will expand this section with quantitative diversity metrics, including elemental composition histograms, space-group coverage statistics, and a direct comparison of structural and compositional coverage against MP and Alexandria. We will also report the distribution of relaxation criteria and any filtering steps applied, thereby providing the requested coverage analysis. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's primary claims involve releasing a new DFT-generated dataset (OMat24, >110M calculations) and reporting EquiformerV2 model performance on the external Matbench Discovery leaderboard (F1>0.9, 20 meV/atom accuracy). These metrics are computed against independent benchmark labels rather than the paper's own fitted quantities or self-generated targets. No load-bearing derivation reduces by construction to a fitted parameter, self-citation chain, or ansatz smuggled from prior author work; the evaluation is benchmark-external and the dataset release is a direct contribution without internal circular reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The paper's contributions rest on the assumption that DFT is a suitable method for generating training data and that the sampled configurations represent the relevant chemical space for generalization.

free parameters (1)
  • EquiformerV2 model architecture and training hyperparameters
    Model size, auxiliary objectives, and fine-tuning choices are explored to achieve reported performance but specific values are not detailed in the abstract.
axioms (1)
  • domain assumption Density functional theory (DFT) provides reliable approximations for material properties.
    All data is generated using DFT calculations.

pith-pipeline@v0.9.0 · 5810 in / 1360 out tokens · 40802 ms · 2026-05-23T18:55:42.665590+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 30 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SLayerGen: a Crystal Generative Model for all Space and Layer Groups

    cond-mat.mtrl-sci 2026-05 unverdicted novelty 8.0

    SLayerGen generates crystals invariant to any space or layer group via autoregressive lattice and Wyckoff sampling plus equivariant diffusion, achieving gains over bulk models on diperiodic materials after correcting ...

  2. JanusPipe: Efficient Pipeline Parallel Training for Machine Learning Interatomic Potentials

    cs.DC 2026-05 unverdicted novelty 7.0

    JanusPipe introduces SymFold and WaveK to enable efficient 3D-parallel training for conservative MLIPs, reporting 1.51x and 1.45x average throughput gains over 1F1B and Hanayo baselines on 32 GPUs.

  3. Lang2MLIP: End-to-End Language-to-Machine Learning Interatomic Potential Development with Autonomous Agentic Workflows

    cs.LG 2026-05 unverdicted novelty 7.0

    Lang2MLIP is an LLM multi-agent framework that automates end-to-end development of machine learning interatomic potentials from natural language input for heterogeneous materials systems.

  4. Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials

    cs.DC 2026-04 conditional novelty 7.0

    MatRIS-MoE and Janus enable efficient exascale training of billion-parameter universal interatomic potentials by addressing second-order derivative computation and communication overheads.

  5. Atomistic Machine Learning with Irreducible Cartesian Natural Tensors

    cond-mat.mtrl-sci 2025-10 unverdicted novelty 7.0

    CarNet develops irreducible Cartesian natural tensors and an equivariant model that matches leading spherical-tensor performance for ML interatomic potentials and high-rank tensor predictions like elastic constants.

  6. Teachers that teach the irrelevant: Pre-training machine learned interaction potentials with classical force fields for robust molecular dynamics simulations

    physics.chem-ph 2025-09 unverdicted novelty 7.0

    Pre-training ML interaction potentials on classical force fields followed by ab initio fine-tuning produces stable and accurate molecular dynamics simulations for gas-phase molecules, liquid water, and hydrogen combustion.

  7. JanusPipe: Efficient Pipeline Parallel Training for Machine Learning Interatomic Potentials

    cs.DC 2026-05 unverdicted novelty 6.0

    JanusPipe is a new 3D-parallel training system for conservative MLIPs that uses SymFold and WaveK to achieve 1.51x and 1.45x average throughput gains over 1F1B and Hanayo on 32 GPUs.

  8. CrystalREPA: Transferring Physical Priors from Universal MLIPs to Crystal Generative Models

    cond-mat.mtrl-sci 2026-05 unverdicted novelty 6.0

    CrystalREPA closes the representation gap between crystal generators and universal MLIPs via contrastive alignment, yielding more stable and valid generated crystals while revealing that MLIP teacher quality is better...

  9. Compact SO(3) Equivariant Atomistic Foundation Models via Structural Pruning

    cs.LG 2026-05 unverdicted novelty 6.0

    Structural pruning of SO(3) equivariant atomistic models from large checkpoints yields 1.5-4x fewer parameters and 2.5-4x less pre-training compute than small models trained from scratch, while outperforming them on m...

  10. MatterSim-MT: A multi-task foundation model for in silico materials characterization

    cond-mat.mtrl-sci 2026-05 unverdicted novelty 6.0

    MatterSim-MT is a foundation model pretrained on over 35 million first-principles structures that predicts material structure, dynamics, and thermodynamics while enabling multi-task simulations of phonon splitting, fe...

  11. Density diversity in training data governs thermodynamic transferability of machine learning interatomic potentials

    physics.chem-ph 2026-05 unverdicted novelty 6.0

    Density diversity in training data is the key factor for making machine learning interatomic potentials transferable across thermodynamic states, outperforming temperature diversity.

  12. VibroML: an automated toolkit for high-throughput vibrational analysis and dynamic instability remediation of crystalline materials using machine-learned potentials

    cond-mat.mtrl-sci 2026-04 unverdicted novelty 6.0

    VibroML automates remediation of dynamic instabilities in crystalline materials by combining MLIPs with genetic algorithms for polymorph search, finite-temperature MD validation, and compositional alloying to yield st...

  13. Errors that matter: Uncertainty-aware universal machine-learning potentials calibrated on experiments

    physics.chem-ph 2026-04 conditional novelty 6.0

    PET-UAFD ensemble of ML potentials, calibrated on experimental cohesive energies and moduli, matches experimental accuracy on liquid properties and supplies uncertainty estimates via the PET-EXP protocol.

  14. Agentic Fusion of Large Atomic and Language Models to Accelerate Superconductor Discovery

    cs.LG 2026-04 unverdicted novelty 6.0

    An agentic framework fusing large atomic and language models rediscovers 66 known superconductors and guides experimental verification of four new ones with transition temperatures from 2.5 K to 6.5 K.

  15. AI-Driven Expansion and Application of the Alexandria Database

    cond-mat.mtrl-sci 2025-12 accept novelty 6.0

    A combined generative model, ML potential, and graph neural network pipeline expands the Alexandria database by 1.3 million DFT-validated compounds with 99% success near the convex hull and releases training data for ...

  16. An experimentally validated end-to-end framework for operando modeling of intrinsically complex metallosilicates

    cond-mat.mtrl-sci 2025-12 conditional novelty 6.0

    An end-to-end framework combining domain separation, lightweight ML potentials, and de novo in silico synthesis enables quantitative atomistic modeling of mesoporous metallosilicates that matches experimental densitie...

  17. Machine Learning Phonon Spectra for Fast and Accurate Optical Lineshapes of Defects

    cond-mat.mtrl-sci 2025-08 unverdicted novelty 6.0

    Machine learning interatomic potentials fine-tuned on first-principles relaxation data accurately reproduce phonon spectra and optical lineshapes for defects, matching explicit calculations and experiments.

  18. Systematic Fine-Tuning of MACE Interatomic Potentials for Catalysis

    physics.chem-ph 2026-05 conditional novelty 5.0

    Fine-tuned MACE MLIPs achieve lower mean absolute errors on catalytic reaction energies and barriers than from-scratch models, with a large fine-tuned model performing best on both metallic and oxide systems including...

  19. OptiMat Alloys: a FAIR, living database of multi-principal element alloys enabled by a conversational agent

    cond-mat.mtrl-sci 2026-04 unverdicted novelty 5.0

    OptiMat Alloys is a conversational AI system that maintains a living FAIR database of multi-principal element alloy calculations and enables natural-language, on-demand computations with built-in uncertainty checks.

  20. Accuracy and Efficiency Benchmarks of Pretrained Machine Learning Potentials for Molecular Simulations

    physics.chem-ph 2026-01 unverdicted novelty 5.0

    Benchmarks of 15 MLIPs show parameter count and training set size correlate with accuracy, architecture drives speed and memory, and explicit Coulomb terms provide no benefit.

  21. Comparing the latent features of universal machine-learning interatomic potentials

    physics.chem-ph 2025-12 unverdicted novelty 5.0

    Different uMLIPs encode chemical space in distinct ways, with high cross-model feature reconstruction errors, and fine-tuning preserves strong pre-training bias in the latent features.

  22. Tailored Vapor Deposition Unlocks Large-Grain, Wafer-Scale Epitaxial Growth of 2D Magnetic CrCl3

    cond-mat.mtrl-sci 2025-05 unverdicted novelty 5.0

    Centimeter-scale epitaxial growth of phase-pure crystalline 2D CrCl3 films achieved on mica via controlled physical vapor transport with innovations in light management, high carrier-gas flow, and moisture control.

  23. Position: Graph Condensation Needs a Reset -- Move Beyond Full-dataset Training and Model-Dependence

    cs.LG 2026-05 conditional novelty 4.0

    The paper claims current graph condensation approaches are flawed due to full-dataset training requirements, high overhead, poor generalization, and misleading evaluation metrics, calling for a reset toward lightweigh...

  24. Assessing foundational atomistic models for iron alloys under Earth's core conditions

    physics.geo-ph 2026-05 unverdicted novelty 4.0

    Foundational atomistic models reproduce some structural and dynamical properties of iron alloys under core conditions but none consistently match first-principles benchmarks due to missing explicit treatment of therma...

  25. Accurate and Efficient Interatomic Potentials for Dislocations in InP

    cond-mat.mtrl-sci 2026-04 unverdicted novelty 4.0

    New ACE and MACE potentials for InP achieve at most 4% error on partial dislocation formation energies versus DFT, outperforming literature models by factors of 4-12 while being computationally faster.

  26. Machine Learning Interatomic Potentials for Million-Atom Simulations of Multicomponent Alloys

    cond-mat.mtrl-sci 2026-04 unverdicted novelty 4.0

    GRACE MLIPs train faster and predict alloy properties more accurately than NEP, but NEP's 60-fold speed advantage enables reliable million-atom simulations of shock propagation when paired with ensemble uncertainty qu...

  27. Comparing fine-tuning strategies of MACE machine learning force field for modeling Li-ion diffusion in LiF for batteries

    cond-mat.mtrl-sci 2025-10 conditional novelty 4.0

    MACE-MPA-0 predicts Li diffusion Ea of 0.22 eV in LiF, fine-tuned version with 300 points gives 0.20 eV, close to DeePMD reference of 0.24 eV, using far less training data.

  28. Atomistic Modeling of Chemical Disorder in Materials: Bridging Classical Methods and AI-Assisted Approaches

    cond-mat.mtrl-sci 2026-05 unverdicted novelty 3.0

    A review of classical and AI-assisted methods for modeling chemical disorder in atomistic simulations of alloys and complex materials.

  29. Position: Graph Condensation Needs a Reset -- Move Beyond Full-dataset Training and Model-Dependence

    cs.LG 2026-05 unverdicted novelty 3.0

    Graph condensation methods must move beyond full-dataset training and model dependence toward lightweight, architecture-agnostic designs to achieve practical efficiency in GNNs.

  30. Inverse Design of Inorganic Compounds with Generative AI

    physics.chem-ph 2026-04 unverdicted novelty 2.0

    A review of generative AI for inverse design of inorganic compounds, analyzing adaptations for their complexity in composition, geometry, symmetry, and electronic structure, with discussion of future benchmarks and sy...

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · cited by 28 Pith papers · 2 internal anchors

  1. [1]

    Lawrence Zitnick, and Zachary Ulissi

    Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon, Devi Parikh, C. Lawrence Zitnick, and Zachary Ulissi. Open catalyst 2020 (oc20) dataset and community challenges.ACS Catalysis, 11(10):6059–6072, 2021

  2. [2]

    An introduction to electrocatalyst design using machine learning for renewable energy storage.arXiv preprint arXiv:2010.09435, 2020

    C Lawrence Zitnick, Lowik Chanussot, Abhishek Das, Siddharth Goyal, Javier Heras-Domingo, Caleb Ho, Weihua Hu, Thibaut Lavril, Aini Palizhati, Morgane Riviere, et al. An introduction to electrocatalyst design using machine learning for renewable energy storage.arXiv preprint arXiv:2010.09435, 2020

  3. [3]

    Robust and synthesizable photocatalysts for co2 reduction: a data-driven materials discovery.Nature Communications, 10(1):443, 2019

    Arunima K Singh, Joseph H Montoya, John M Gregoire, and Kristin A Persson. Robust and synthesizable photocatalysts for co2 reduction: a data-driven materials discovery.Nature Communications, 10(1):443, 2019

  4. [4]

    Brabson, Abhishek Das, Zachary Ulissi, Matt Uyttendaele, Andrew J

    Anuroop Sriram, Sihoon Choi, Xiaohan Yu, Logan M. Brabson, Abhishek Das, Zachary Ulissi, Matt Uyttendaele, Andrew J. Medford, and David S. Sholl. The open dac 2023 dataset and challenges for sorbent discovery in direct air capture.ACS Central Science, 10(5):923–941, 2024

  5. [5]

    Recent advances and applications of deep learning methods in materials science.npj Computational Materials, 8(1):59, 2022

    Kamal Choudhary, Brian DeCost, Chi Chen, Anubhav Jain, Francesca Tavazza, Ryan Cohn, Cheol Woo Park, Alok Choudhary, Ankit Agrawal, Simon JL Billinge, et al. Recent advances and applications of deep learning methods in materials science.npj Computational Materials, 8(1):59, 2022

  6. [6]

    Ceder, Y

    G. Ceder, Y. M. Chiang, D. R. Sadoway, M. K. Aydinol, Y. I. Jang, and B. Huang. Identification of cathode materials for lithium batteries guided by first-principles calculations.Nature, 392(6677):694–696, 1998

  7. [7]

    Tibbitt, Christopher B

    Mark W. Tibbitt, Christopher B. Rodell, Jason A. Burdick, and Kristi S. Anseth. Progress in material design for biomedical applications.Proceedings of the National Academy of Sciences, 112(47):14444–14451, 2015

  8. [8]

    Computational materials science and chemistry: Accelerating discovery and innovation through simulation-based engineering and science

    George Crabtree, Sharon Glotzer, Bill McCurdy, and Jim Roberto. Computational materials science and chemistry: Accelerating discovery and innovation through simulation-based engineering and science. Technical report, United States, 2010

  9. [9]

    Accelerated discovery of 3d printing materials using data-driven multiobjective optimization.Science advances, 7(42):eabf7435, 2021

    Timothy Erps, Michael Foshey, Mina Konaković Luković, Wan Shou, Hanns Hagen Goetzke, Herve Dietsch, Klaus Stoll, Bernhard von Vacano, and Wojciech Matusik. Accelerated discovery of 3d printing materials using data-driven multiobjective optimization.Science advances, 7(42):eabf7435, 2021

  10. [10]

    High-throughput computational materials screening and discovery of optoelectronic semiconductors.WIREs Computational Molecular Science, 11(1):e1489, 2021

    Shulin Luo, Tianshu Li, Xinjiang Wang, Muhammad Faizan, and Lijun Zhang. High-throughput computational materials screening and discovery of optoelectronic semiconductors.WIREs Computational Molecular Science, 11(1):e1489, 2021

  11. [11]

    Pogue, Alexander New, Kyle McElroy, Nam Q

    Elizabeth A. Pogue, Alexander New, Kyle McElroy, Nam Q. Le, Michael J. Pekala, Ian McCue, Eddie Gienger, Janna Domenico, Elizabeth Hedrick, Tyrel M. McQueen, Brandon Wilfong, Christine D. Piatko, Christopher R. Ratto, Andrew Lennon, Christine Chung, Timothy Montalbano, Gregory Bassen, and Christopher D. Stiles. Closed-loop superconducting materials discov...

  12. [12]

    A universal graph deep learning interatomic potential for the periodic table

    Chi Chen and Shyue Ping Ong. A universal graph deep learning interatomic potential for the periodic table. Nature Computational Science, 2(11):718–728, 2022

  13. [13]

    Atomistic line graph neural network for improved materials property predictions

    Kamal Choudhary and Brian DeCost. Atomistic line graph neural network for improved materials property predictions. npj Computational Materials, 7(1):185, 2021

  14. [14]

    Ilyes Batatia, David Peter Kovacs, Gregor N. C. Simm, Christoph Ortner, and Gabor Csanyi. MACE: Higher order equivariant message passing neural networks for fast and accurate force fields. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022. 11

  15. [15]

    The design space of e (3)-equivariant atom-centered interatomic potentials.arXiv preprint arXiv:2205.06643, 2022

    Ilyes Batatia, Simon Batzner, Dávid Péter Kovács, Albert Musaelian, Gregor NC Simm, Ralf Drautz, Christoph Ortner, Boris Kozinsky, and Gábor Csányi. The design space of e (3)-equivariant atom-centered interatomic potentials.arXiv preprint arXiv:2205.06643, 2022

  16. [16]

    Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E

    Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P. Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E. Smidt, and Boris Kozinsky. E(3)-equivariant graph neural networks for data- efficient and accurate interatomic potentials.Nature Communications, 13(1):2453, 2022

  17. [17]

    Battaglia

    Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter W. Battaglia. Learning to simulate complex physics with graph networks. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020

  18. [18]

    Lawrence Zitnick, and Abhishek Das

    Johannes Gasteiger, Muhammed Shuaibi, Anuroop Sriram, Stephan Günnemann, Zachary Ward Ulissi, C. Lawrence Zitnick, and Abhishek Das. Gemnet-OC: Developing graph neural networks for large and diverse molecular simulation datasets.Transactions on Machine Learning Research, 2022

  19. [19]

    Reducing so (3) convolutions to so (2) for efficient equivariant gnns

    Saro Passaro and C Lawrence Zitnick. Reducing so (3) convolutions to so (2) for efficient equivariant gnns. In International Conference on Machine Learning, pages 27420–27438. Proceedings of Machine Learning Research, 2023

  20. [20]

    Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations,

    Yi-Lun Liao, Brandon Wood, Abhishek Das, and Tess Smidt. Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations.arXiv preprint arXiv:2306.12059, 2023

  21. [21]

    Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations.Journal of Chemical Theory and Computation, 2024

    Yutack Park, Jaesun Kim, Seungwoo Hwang, and Seungwu Han. Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations.Journal of Chemical Theory and Computation, 2024

  22. [22]

    MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures

    Han Yang, Chenxi Hu, Yichi Zhou, Xixian Liu, Yu Shi, Jielan Li, Guanzhi Li, Zekun Chen, Shuizhou Chen, Claudio Zeni, et al. Mattersim: A deep learning atomistic model across elements, temperatures and pressures. arXiv preprint arXiv:2405.04967, 2024

  23. [23]

    orb-models github repository

    Orbital Materials. orb-models github repository. https:// github.com/ orbital-materials/ orb-models, 2024

  24. [24]

    Markland

    Peter Eastman, Pavan Kumar Behara, David L Dotson, Raimondas Galvelis, John E Herr, Josh T Horton, Yuezhi Mao, John D Chodera, Benjamin P Pritchard, Yuanqing Wang, Gianni De Fabritiis, and Thomas E. Markland. Spice, a dataset of drug-like molecules and peptides for training machine learning potentials. Scientific Data, 10(1):11, 2023

  25. [25]

    The ani-1ccx and ani-1x data sets, coupled-cluster and density functional theory properties for molecules.Scientific Data, 7(1):134, 2020

    Justin S Smith, Roman Zubatyuk, Benjamin Nebgen, Nicholas Lubbers, Kipton Barros, Adrian E Roitberg, Olexandr Isayev, and Sergei Tretiak. The ani-1ccx and ani-1x data sets, coupled-cluster and density functional theory properties for molecules.Scientific Data, 7(1):134, 2020

  26. [26]

    The open catalyst 2022 (oc22) dataset and challenges for oxide electrocatalysts.ACS Catalysis, 13(5):3066–3084, 2023

    Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, et al. The open catalyst 2022 (oc22) dataset and challenges for oxide electrocatalysts.ACS Catalysis, 13(5):3066–3084, 2023

  27. [27]

    Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm

    Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, and Anubhav Jain. Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm. npj Computational Materials, 6(1):138, 2020

  28. [28]

    A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, and K. A. Persson. The Materials Project: A materials genome approach to accelerating materials innovation.APL Materials, 1(1):011002, 2013

  29. [29]

    Chgnet as a pretrained universal neural network potential for charge-informed atomistic modelling

    Bowen Deng, Peichen Zhong, KyuJung Jun, Janosh Riebesell, Kevin Han, Christopher J Bartel, and Gerbrand Ceder. Chgnet as a pretrained universal neural network potential for charge-informed atomistic modelling. Nature Machine Intelligence, 5(9):1031–1041, 2023

  30. [30]

    Matbench discovery–an evaluation framework for machine learning crystal stability prediction.arXiv preprint arXiv:2308.14920, 2023

    Janosh Riebesell, Rhys EA Goodall, Anubhav Jain, Philipp Benner, Kristin A Persson, and Alpha A Lee. Matbench discovery–an evaluation framework for machine learning crystal stability prediction.arXiv preprint arXiv:2308.14920, 2023. 12

  31. [32]

    Improving machine-learning models in materials science through large datasets

    Jonathan Schmidt, Tiago FT Cerqueira, Aldo H Romero, Antoine Loew, Fabian Jäger, Hai-Chen Wang, Silvana Botti, and Miguel AL Marques. Improving machine-learning models in materials science through large datasets. Materials Today Physics, page 101560, 2024

  32. [33]

    A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation.APL Materials, 1(1), 2013

  33. [35]

    Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks.Nature Communications, 12(1):5104, 2021

    Daniel Schwalbe-Koda, Aik Rui Tan, and Rafael Gómez-Bombarelli. Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks.Nature Communications, 12(1):5104, 2021

  34. [36]

    Exploiting redundancy in large materials datasets for efficient machine learning with less data

    Kangming Li, Daniel Persaud, Kamal Choudhary, Brian DeCost, Michael Greenwood, and Jason Hattrick- Simpers. Exploiting redundancy in large materials datasets for efficient machine learning with less data. Nature Communications, 14(1):7283, 2023

  35. [37]

    Ab initio molecular-dynamics simulation of the liquid-metal–amorphous- semiconductor transition in germanium.Physical Review B, 49(20):14251–14269, 1994

    Georg Kresse and Jürgen Hafner. Ab initio molecular-dynamics simulation of the liquid-metal–amorphous- semiconductor transition in germanium.Physical Review B, 49(20):14251–14269, 1994

  36. [38]

    Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set.Physical Review B, 54(16):11169–11186, 1996

    Georg Kresse and Jürgen Furthmüller. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set.Physical Review B, 54(16):11169–11186, 1996

  37. [39]

    Python materials genomics (pymatgen): A robust, open-source python library for materials analysis.Computational Materials Science, 68:314–319, 2013

    Shyue Ping Ong, William Davidson Richards, Anubhav Jain, Geoffroy Hautier, Michael Kocher, Shreyas Cholia, Dan Gunter, Vincent L Chevrier, Kristin A Persson, and Gerbrand Ceder. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis.Computational Materials Science, 68:314–319, 2013

  38. [40]

    Issue #2968: Add support for new crystal structure prediction method

    Materials Project. Issue #2968: Add support for new crystal structure prediction method. https: //github.com/materialsproject/pymatgen/issues/2968, 2023

  39. [41]

    Issue #3016: Add support for new crystal structure prediction method

    Materials Project. Issue #3016: Add support for new crystal structure prediction method. https: //github.com/materialsproject/pymatgen/issues/3016, 2023

  40. [42]

    Generalized gradient approximation made simple

    John P Perdew, Kieron Burke, and Matthias Ernzerhof. Generalized gradient approximation made simple. Physical Review Letters, 77(18):3865, 1996

  41. [43]

    Restoring the density-gradient expansion for exchange in solids and surfaces.Physical review letters, 100(13):136406, 2008

    John P Perdew, Adrienn Ruzsinszky, Gábor I Csonka, Oleg A Vydrov, Gustavo E Scuseria, Lucian A Constantin, Xiaolan Zhou, and Kieron Burke. Restoring the density-gradient expansion for exchange in solids and surfaces.Physical review letters, 100(13):136406, 2008

  42. [44]

    Strongly constrained and appropriately normed semilocal density functional.Physical Review Letters, 115(3):036402, 2015

    Jianwei Sun, Adrienn Ruzsinszky, and John P Perdew. Strongly constrained and appropriately normed semilocal density functional.Physical Review Letters, 115(3):036402, 2015

  43. [45]

    Accurate and numerically efficient r2scan meta-generalized gradient approximation.The Journal of Physical Chemistry Letters, 11(19):8208–8215, 2020

    James W Furness, Aaron D Kaplan, Jinliang Ning, John P Perdew, and Jianwei Sun. Accurate and numerically efficient r2scan meta-generalized gradient approximation.The Journal of Physical Chemistry Letters, 11(19):8208–8215, 2020

  44. [46]

    Predicting stable crystalline compounds using chemical similarity.npj Computational Materials, 7(1):12, 2021

    Hai-Chen Wang, Silvana Botti, and Miguel AL Marques. Predicting stable crystalline compounds using chemical similarity.npj Computational Materials, 7(1):12, 2021

  45. [47]

    The aflow library of crystallographic prototypes: part 1.Computational Materials Science, 136:S1–S828, 2017

    Michael J Mehl, David Hicks, Cormac Toher, Ohad Levy, Robert M Hanson, Gus Hart, and Stefano Curtarolo. The aflow library of crystallographic prototypes: part 1.Computational Materials Science, 136:S1–S828, 2017

  46. [48]

    Rapid discovery of stable materials by coordinate-free coarse graining.Science Advances, 8(30):eabn4117, 2022

    Rhys EA Goodall, Abhijith S Parackal, Felix A Faber, Rickard Armiento, and Alpha A Lee. Rapid discovery of stable materials by coordinate-free coarse graining.Science Advances, 8(30):eabn4117, 2022

  47. [49]

    Graph networks as a universal machine learning framework for molecules and crystals.Chemistry of Materials, 31(9):3564–3572, 2019

    Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, and Shyue Ping Ong. Graph networks as a universal machine learning framework for molecules and crystals.Chemistry of Materials, 31(9):3564–3572, 2019. 13

  48. [50]

    A foundation model for atomistic materials chemistry

    Ilyes Batatia, Philipp Benner, Yuan Chiang, Alin M Elena, Dávid P Kovács, Janosh Riebesell, Xavier R Advincula, Mark Asta, William J Baldwin, Noam Bernstein, et al. A foundation model for atomistic materials chemistry.arXiv preprint arXiv:2401.00096, 2023

  49. [51]

    Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023

    Amil Merchant, Simon Batzner, Samuel S Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk. Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023

  50. [52]

    A critical examination of compound stability predictions from machine-learned formation energies.npj Computational Materials, 6(1):97, 2020

    Christopher J Bartel, Amalie Trewartha, Qi Wang, Alexander Dunn, Anubhav Jain, and Gerbrand Ceder. A critical examination of compound stability predictions from machine-learned formation energies.npj Computational Materials, 6(1):97, 2020

  51. [53]

    A hitchhiker’s guide to geometric gnns for 3d atomic systems.arXiv preprint arXiv:2312.07511, 2023

    Alexandre Duval, Simon V Mathis, Chaitanya K Joshi, Victor Schmidt, Santiago Miret, Fragkiskos D Malliaros, Taco Cohen, Pietro Lio, Yoshua Bengio, and Michael Bronstein. A hitchhiker’s guide to geometric gnns for 3d atomic systems.arXiv preprint arXiv:2312.07511, 2023

  52. [54]

    The oc20 leaderboard

    Meta Fundamental AI Research and collaborators. The oc20 leaderboard. https:// opencatalystproject.org/ leaderboard.html, 2024

  53. [55]

    Generalizing denoising to non-equilibrium structures improves equivariant force fields.arXiv preprint arXiv:2403.09549, 2024

    Yi-Lun Liao, Tess Smidt, and Abhishek Das. Generalizing denoising to non-equilibrium structures improves equivariant force fields.arXiv preprint arXiv:2403.09549, 2024

  54. [56]

    The omat24 dataset

    Meta Fundamental AI Research. The omat24 dataset. https:// huggingface.co/ datasets/ fairchem/ OMAT24, 2024

  55. [57]

    The fair chemistry (fairchem) model repository

    Meta Fundamental AI Research and Collaborators. The fair chemistry (fairchem) model repository. https:// github.com/ F AIR-Chem/ fairchem, 2024

  56. [58]

    The omat24 trained model checkpoints.https:// huggingface.co/ fairchem/ OMAT24, 2024

    Meta Fundamental AI Research. The omat24 trained model checkpoints.https:// huggingface.co/ fairchem/ OMAT24, 2024

  57. [59]

    Review of computational approaches to predict the thermodynamic stability of inorganic solids

    Christopher J Bartel. Review of computational approaches to predict the thermodynamic stability of inorganic solids. Journal of Materials Science, 57(23):10475–10498, 2022

  58. [60]

    The thermodynamic scale of inorganic crystalline metastability.Science Advances, 2(11):e1600225, 2016

    Wenhao Sun, Stephen T Dacek, Shyue Ping Ong, Geoffroy Hautier, Anubhav Jain, William D Richards, Anthony C Gamst, Kristin A Persson, and Gerbrand Ceder. The thermodynamic scale of inorganic crystalline metastability.Science Advances, 2(11):e1600225, 2016

  59. [61]

    Ryan Kingsbury, Ayush S Gupta, Christopher J Bartel, Jason M Munro, Shyam Dwaraknath, Matthew Horton, and Kristin A Persson. Performance comparison of r2-scan and scan metagga density functionals for solid materials via an automated, high-throughput computational workflow.Physical Review Materials, 6(1):013801, 2022

  60. [62]

    Materials project database versions.https:// docs.materialsproject.org/ changes/ database-versions, 2024

    The Materials Project. Materials project database versions.https:// docs.materialsproject.org/ changes/ database-versions, 2024

  61. [63]

    How robust are modern graph neural network potentials in long and hot molecular dynamics simulations? Machine Learning: Science and Technology, 3(4):045010, 2022

    Sina Stocker, Johannes Gasteiger, Florian Becker, Stephan Günnemann, and Johannes T Margraf. How robust are modern graph neural network potentials in long and hot molecular dynamics simulations? Machine Learning: Science and Technology, 3(4):045010, 2022

  62. [64]

    Forces are not enough: Benchmark and critical evaluation for machine learning force fields with molecular simulations.arXiv preprint arXiv:2210.07237, 2022

    Xiang Fu, Zhenghao Wu, Wujie Wang, Tian Xie, Sinan Keten, Rafael Gomez-Bombarelli, and Tommi Jaakkola. Forces are not enough: Benchmark and critical evaluation for machine learning force fields with molecular simulations.arXiv preprint arXiv:2210.07237, 2022

  63. [65]

    S. Raja, I. Amin, F. Pedregosa, and A. S. Krishnapriyan. Stability-aware training of neural network interatomic potentials with differentiable boltzmann estimators.arXiv preprint arXiv:2402.13984, 2024

  64. [66]

    Crystal toolkit: A web app framework to improve usability and accessibility of materials science research algorithms.arXiv preprint arXiv:2302.06147, 2023

    Matthew Horton, Jimmy-Xuan Shen, Jordan Burns, Orion Cohen, François Chabbey, Alex M Ganose, Rishabh Guha, Patrick Huck, Hamming Howard Li, Matthew McDermott, et al. Crystal toolkit: A web app framework to improve usability and accessibility of materials science research algorithms.arXiv preprint arXiv:2302.06147, 2023

  65. [67]

    Janosh Riebesell, Haoyu Yang, Rhys Goodall, and Sterling G. Baird. Pymatviz: visualization toolkit for materials informatics, 2022. 10.5281/zenodo.7486816 - https://github.com/janosh/pymatviz. 14 Appendix A Dataset statistics Figures 4 shows histograms for the number of atoms per structure in the sub-datasets that make up the OMat24 dataset. Similarly fig...