Regression with Large Language Models for Materials and Molecular Property Prediction

Dane Morgan; Hamed Mahdavi; Lane E. Schultz; Maciej P. Polak; Ryan Jacobs; Vasant Honavar

arxiv: 2409.06080 · v2 · submitted 2024-09-09 · ❄️ cond-mat.mtrl-sci · cs.LG

Regression with Large Language Models for Materials and Molecular Property Prediction

Ryan Jacobs , Maciej P. Polak , Lane E. Schultz , Hamed Mahdavi , Vasant Honavar , Dane Morgan This is my paper

Pith reviewed 2026-05-23 21:08 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.LG

keywords large language modelsmolecular property predictionmaterials property predictionQM9 datasetSMILES representationregressionLLaMA 3fine-tuning

0 comments

The pith

Fine-tuned LLaMA 3 on SMILES strings rivals random forests for QM9 molecular property regression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether large language models can perform numerical regression for molecular and materials properties when given only simple composition strings as input. It fine-tunes LLaMA 3 exclusively on the generative loss and reports that the resulting model yields useful predictions on multiple QM9 properties that match the accuracy of random forest and fully connected neural network baselines. On a set of 28 materials properties the same approach produces results comparable to random forest models that rely on elemental descriptors. The work also shows LLaMA 3 outperforming GPT-3.5 and GPT-4o under identical conditions. These findings indicate that generative language models may be repurposed for quantitative scientific tasks without custom architectures or detailed structural data.

Core claim

LLaMA 3, when fine-tuned using the SMILES representation of molecules and only the generative loss, provides useful regression results which can rival standard materials property prediction models like random forest or fully connected neural networks on the QM9 dataset. On 28 materials properties it supplies comparable though slightly worse accuracy relative to random forest and elemental descriptors when given only compound chemical descriptions. Errors remain 5-10 times higher than those of state-of-the-art models that receive atom types and coordinates.

What carries the argument

Fine-tuning LLaMA 3 on the generative loss with SMILES strings or chemical composition descriptions as sole inputs for property-value regression.

If this is right

Language models can serve as drop-in regressors for physical properties when supplied with string representations alone.
Standard generative training objectives can implicitly encode quantitative structure-property relationships.
Materials and molecular datasets expressed as SMILES or composition strings become directly usable for LLM-based prediction.
LLM performance on these tasks improves over GPT-3.5 and GPT-4o, pointing to scaling advantages within the LLaMA family.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same string-only fine-tuning recipe could be tested on other string-representable scientific domains such as protein sequences or crystal prototypes.
Hybrid pipelines that feed LLM outputs as priors into atomistic models might reduce the accuracy gap to coordinate-based methods.
If next-token prediction already captures numerical trends, further gains may come from larger context windows rather than new loss functions.

Load-bearing premise

Fine-tuning solely on the generative loss with only composition-based string inputs is sufficient to learn accurate numerical property predictions without requiring more granular structural representations or regression-specific objectives.

What would settle it

A direct comparison on the QM9 dataset in which LLaMA 3 errors stay more than five times larger than those of a random forest baseline even after identical fine-tuning protocols.

read the original abstract

We demonstrate the ability of large language models (LLMs) to perform material and molecular property regression tasks, a significant deviation from the conventional LLM use case. We benchmark the Large Language Model Meta AI (LLaMA) 3 on several molecular properties in the QM9 dataset and 28 materials properties. Only composition-based input strings are used as the model input and we fine tune on only the generative loss. We broadly find that LLaMA 3, when fine-tuned using the SMILES representation of molecules, provides useful regression results which can rival standard materials property prediction models like random forest or fully connected neural networks on the QM9 dataset. Not surprisingly, LLaMA 3 errors are 5-10x higher than those of the state-of-the-art models that were trained using far more granular representation of molecules (e.g., atom types and their coordinates) for the same task. Similarly, LLaMA 3 provides comparable, although slightly worse, accuracy relative to random forest and elemental descriptors when using just compound chemical description on our set of 28 materials properties. Interestingly, LLaMA 3 provides improved predictions compared to GPT-3.5 and GPT-4o. This work highlights the versatility of LLMs, suggesting that LLM-like generative models can potentially transcend their traditional applications to tackle complex physical phenomena, thus paving the way for future research and applications in chemistry, materials science and other scientific domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fine-tuned LLaMA 3 matches random forest on QM9 property regression from SMILES strings but trails structure-aware models by a wide margin.

read the letter

The core result here is that LLaMA 3, fine-tuned only on the generative loss with SMILES or composition strings, produces regression numbers on QM9 that sit roughly in line with random forest and fully connected nets. On the 28 materials properties it is a bit behind random forest with elemental descriptors but ahead of the GPT models tested. That is the actual new piece: a concrete benchmark set for this model and input style on these tasks, not a general claim about LLMs replacing physics-based methods.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that LLaMA 3, fine-tuned exclusively on the generative (next-token prediction) loss using only composition-based string inputs (SMILES for molecules, chemical descriptions for materials), can perform regression for molecular properties on the QM9 dataset and 28 materials properties. It reports that the resulting accuracies rival those of random forest and fully connected neural networks on QM9 while remaining 5-10x worse than coordinate-based SOTA models, and that LLaMA 3 outperforms GPT-3.5 and GPT-4o on the materials tasks.

Significance. If the empirical benchmarks hold under rigorous verification, the work would demonstrate that standard generative fine-tuning of an LLM can yield usable numerical property predictions from minimal string inputs, thereby extending LLM applicability to regression tasks in chemistry and materials science without custom regression heads or structural coordinates. This would be a concrete, falsifiable illustration of the versatility of generative models for physical-property tasks.

major comments (3)

[Methods] Methods (fine-tuning procedure): The central claim that 'fine tune on only the generative loss' produces accurate regression rests on an unstated assumption that next-token prediction on formatted target strings will implicitly minimize numerical error. No details are given on output parsing (e.g., regex extraction, handling of units or scientific notation), temperature during inference, or whether any auxiliary regression loss or value-head was used; without these, it is impossible to rule out that reported performance arises from mode collapse to mean values or post-hoc averaging rather than genuine property learning.
[Results] Results (QM9 benchmarks): The claim that LLaMA 3 'rivals' random forest and FCNN is load-bearing for the abstract's main conclusion, yet the manuscript provides no information on the precise train/test splits, whether the baselines used identical splits or the same SMILES-derived features, or any statistical comparison (error bars, p-values). This omission directly affects whether the reported parity is reproducible or an artifact of experimental setup.
[Materials properties] Materials properties section: The input representation is described only as 'compound chemical description.' Without an explicit example or enumeration of the string format (element counts, stoichiometry notation, etc.), it is unclear how the input granularity compares to the 'elemental descriptors' used by the random-forest baseline, undermining the direct comparability asserted in the abstract.

minor comments (2)

[Abstract] The abstract states 'we broadly find' without quantifying the number of properties or runs; a concise table summarizing all QM9 targets and the 28 materials properties would improve clarity.
[Figures] Figure captions and axis labels should explicitly state the error metric (MAE, RMSE, etc.) and units for each property to allow immediate comparison with literature values.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which have helped us improve the clarity and reproducibility of the manuscript. We address each major comment below and have revised the paper accordingly to incorporate additional methodological details, experimental specifications, and input examples.

read point-by-point responses

Referee: [Methods] Methods (fine-tuning procedure): The central claim that 'fine tune on only the generative loss' produces accurate regression rests on an unstated assumption that next-token prediction on formatted target strings will implicitly minimize numerical error. No details are given on output parsing (e.g., regex extraction, handling of units or scientific notation), temperature during inference, or whether any auxiliary regression loss or value-head was used; without these, it is impossible to rule out that reported performance arises from mode collapse to mean values or post-hoc averaging rather than genuine property learning.

Authors: We agree that the original Methods section lacked sufficient detail on these aspects. In the revised manuscript, we have added a new subsection detailing the fine-tuning and inference procedure. This includes: the use of regex-based parsing to extract numerical values from generated text (with explicit handling for scientific notation and units), inference performed at temperature 0.0 for deterministic outputs, and explicit confirmation that only the standard generative next-token prediction loss was used with no auxiliary regression loss or value head. These additions clarify that the reported performance stems from the fine-tuned model's learned associations rather than post-processing artifacts. revision: yes
Referee: [Results] Results (QM9 benchmarks): The claim that LLaMA 3 'rivals' random forest and FCNN is load-bearing for the abstract's main conclusion, yet the manuscript provides no information on the precise train/test splits, whether the baselines used identical splits or the same SMILES-derived features, or any statistical comparison (error bars, p-values). This omission directly affects whether the reported parity is reproducible or an artifact of experimental setup.

Authors: We acknowledge this omission and have revised the Results section to specify that the QM9 experiments employed the standard train/test splits from the original QM9 dataset publication. The random forest and FCNN baselines were re-implemented using identical splits and SMILES-derived features for direct comparability. We now include error bars computed over multiple independent runs and note the absence of statistically significant differences where the performances are comparable, supporting the reproducibility of the parity claim. revision: yes
Referee: [Materials properties] Materials properties section: The input representation is described only as 'compound chemical description.' Without an explicit example or enumeration of the string format (element counts, stoichiometry notation, etc.), it is unclear how the input granularity compares to the 'elemental descriptors' used by the random-forest baseline, undermining the direct comparability asserted in the abstract.

Authors: We have revised the Materials properties section to include explicit examples of the compound chemical description strings (e.g., formats incorporating element counts and stoichiometry such as 'Compound with 3 atoms of Fe and 4 atoms of O'). This addition enables readers to directly compare the input granularity to the elemental descriptors in the random forest baseline and supports the comparability asserted in the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmarking against external baselines.

full rationale

The paper reports fine-tuning LLaMA 3 on generative loss using SMILES/composition strings, then measures regression performance on QM9 and 28 materials properties via direct comparison to independent models (random forest, FCNN, GPT variants). No derivation chain, fitted parameters renamed as predictions, self-citations, or ansatzes exist; all claims rest on held-out test metrics and external SOTA references. The central result is falsifiable by re-running the benchmarks and does not reduce to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical observation that generative fine-tuning on composition strings yields competitive regression; it assumes composition text encodes enough signal for the task and that standard LLM training suffices without new loss terms or architectures.

axioms (1)

domain assumption Composition-based input strings contain sufficient information to support useful property regression when the model is fine-tuned on generative loss.
Invoked by the choice to use only SMILES or chemical descriptions without atomic coordinates or graphs.

pith-pipeline@v0.9.0 · 5806 in / 1308 out tokens · 32684 ms · 2026-05-23T21:08:31.319262+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context
cs.CL 2026-04 unverdicted novelty 7.0

Quantile tokens inserted into LLM inputs combined with neighbor retrieval enable direct prediction of full distributions, yielding lower MAPE and narrower intervals than baselines on Airbnb and StackSample tasks.
Scale-Dependent Input Representation and Confidence Estimation for LLMs in Materials Property Prediction
cond-mat.mtrl-sci 2026-05 conditional novelty 5.0

Larger LLMs handle detailed crystal descriptions better than small ones, and mean negative log-likelihood of predicted numbers tracks prediction error after fine-tuning.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · cited by 2 Pith papers

[1]

ChemBERTa: large -scale self -supervised pretraining fo r molecular property prediction

(1) Chithrananda, S.; Grand, G.; Ramsundar, B. ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction. ArXiv 2020, abs/2010.09885. (2) Ahmad, W.; Simon, E.; Chithrananda, S.; Grand, G.; Ramsundar, B. ChemBERTa-2: Towards Chemical Foundation Models

work page arXiv 2020
[2]

SMILES-BERT

(4) Wang, S.; Guo, Y .; Wang, Y .; Sun, H.; Huang, J. SMILES-BERT. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics; ACM: New York, NY , USA, 2019; pp 429–436. https://doi.org/10.1145/3307339.3342186. (5) Irwin, R.; Dimitriadis, S.; He, J.; Bjerrum, E. J. Chemformer: A Pre-Trained Tran...

work page doi:10.1145/3307339.3342186 2019
[3]

ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction

(10) Shi, Y .; Zhang, A.; Zhang, E.; Liu, Z.; Wang, X. ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction. In Findings of the Association for Computational Linguistics: EMNLP 2023; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp 5506–5520. https://doi.org/10.18653/v1/2023.findings-emnlp.366. (11) Pei, Q.; ...

work page doi:10.18653/v1/2023.findings-emnlp.366 2023
[4]

(13) Cao, H.; Liu, Z.; Lu, X.; Yao, Y .; Li, Y

https://arxiv.org/abs/2401.14818. (13) Cao, H.; Liu, Z.; Lu, X.; Yao, Y .; Li, Y. InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery

work page arXiv
[5]

N.; Chen, Z.; Ning, X.; Sun, H

(14) Yu, B.; Baker, F. N.; Chen, Z.; Ning, X.; Sun, H. LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset. ArXiv 2024, abs/2402.09391. (15) Sadeghi, S.; Bui, A.; Forooghi, A.; Lu, J.; Ngom, A. Can Large Language Models Understand Molecules? BMC Bioinformatics 2024, 25 (1),

work page arXiv 2024
[6]

(16) Jablonka, K

https://doi.org/10.1186/s12859-024-05847-x. (16) Jablonka, K. M.; Schwaller, P .; Ortega-Guerrero, A.; Smit, B. Leveraging Large Language Models for Predictive Chemistry. Nat Mach Intell 2024, 6 (2), 161–169. https://doi.org/10.1038/s42256-023- 00788-1. (17) Ramakrishnan, R.; Dral, P . O.; Rupp, M.; Von Lilienfeld, O. A. Quantum Chemistry Structures and P...

work page doi:10.1186/s12859-024-05847-x 2024
[7]

(24) Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B

https://doi.org/10.1088/1367-2630/16/1/015018. (24) Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P .; Ramprasad, R.; Gubernatis, J. E.; Lookman, T. Machine Learning Bandgaps of Double Perovskites. Sci Rep 2016, 6 (1), 19375. https://doi.org/10.1038/srep19375. (25) de Jong, M.; Chen, W.; Angsten, T.; Jain, A.; Notestine, R.; Gamst, A.; Sluiter, M.; ...

work page doi:10.1088/1367-2630/16/1/015018 2016
[8]

(27) Yang, C.; Ren, C.; Jia, Y .; Wang, G.; Li, M.; Lu, W

https://doi.org/10.1038/s41524-020-00440-1. (27) Yang, C.; Ren, C.; Jia, Y .; Wang, G.; Li, M.; Lu, W. A Machine Learning-Based Alloy Design System to Facilitate the Rational Design of High Entropy Alloys with Enhanced Hardness. Acta Mater 2022, 222, 117431. https://doi.org/10.1016/j.actamat.2021.117431. 24 (28) Hargreaves, C. J.; Gaultois, M. W.; Daniels...

work page doi:10.1038/s41524-020-00440-1 2022
[9]

(29) Voyles, P .; Schultz, L.; Morgan, D.; Francis, C.; Afflerbach, B.; Hakeem, A

https://doi.org/10.1038/s41524-022-00951-z. (29) Voyles, P .; Schultz, L.; Morgan, D.; Francis, C.; Afflerbach, B.; Hakeem, A. Metallic Glasses and their Properties. https://foundry-ml.org/#/datasets/10.18126%2F7yg1-osf2 (accessed 2024-02-20). (30) Polak, M. P .; Morgan, D. Extracting Accurate Materials Data from Research Papers with Conversational Langua...

work page doi:10.1038/s41524-022-00951-z 2024
[10]

(31) Emery, A

https://doi.org/10.1038/s41467-024-45914-8. (31) Emery, A. A.; Wolverton, C. High-Throughput DFT Calculations of Formation Energy, Stability and Oxygen Vacancy Formation Energy of ABO3 Perovskites. Sci Data 2017, 4 (1), 170153. https://doi.org/10.1038/sdata.2017.153. (32) Castelli, I. E.; Olsen, T.; Datta, S.; Landis, D. D.; Dahl, S.; Thygesen, K. S.; Jac...

work page doi:10.1038/s41467-024-45914-8 2017
[11]

A Universal Framework for Accurate and Efficient Geometric Deep Learning of Molecular Systems

(41) Zhang, S.; Liu, Y .; Xie, L. A Universal Framework for Accurate and Efficient Geometric Deep Learning of Molecular Systems. Sci Rep 2023, 13, 19171. https://doi.org/https://doi.org/10.1038/s41598-023-46382-8. (42) Pinheiro, G. A.; Mucelini, J.; Soares, M. D.; Prati, R. C.; Da Silva, J. L. F.; Quiles, M. G. Machine Learning Prediction of Nine Molecula...

work page doi:10.1038/s41598-023-46382-8 2023
[12]

Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation

(44) Krenn, M.; Häse, F.; Nigam, A.; Friederich, P .; Aspuru-Guzik, A. Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation. Mach Learn Sci Technol 2020, 1 (4), 045024. https://doi.org/10.1088/2632-2153/aba947. (45) Jacobs, R.; Schultz, L.; Scourtas, A.; Schmidt, K. J.; Price-Skelly, O.; Engler, W. Machine Learning Ma...

work page doi:10.1088/2632-2153/aba947 2020
[13]

(46) Goodall, R. E. A.; Lee, A. A. Predicting Materials Properties without Crystal Structure: Deep Representation Learning from Stoichiometry. Nat Commun 2020, 11 (1),

work page 2020
[14]

https://doi.org/10.1038/s41467-020-19964-7

work page doi:10.1038/s41467-020-19964-7

[1] [1]

ChemBERTa: large -scale self -supervised pretraining fo r molecular property prediction

(1) Chithrananda, S.; Grand, G.; Ramsundar, B. ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction. ArXiv 2020, abs/2010.09885. (2) Ahmad, W.; Simon, E.; Chithrananda, S.; Grand, G.; Ramsundar, B. ChemBERTa-2: Towards Chemical Foundation Models

work page arXiv 2020

[2] [2]

SMILES-BERT

(4) Wang, S.; Guo, Y .; Wang, Y .; Sun, H.; Huang, J. SMILES-BERT. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics; ACM: New York, NY , USA, 2019; pp 429–436. https://doi.org/10.1145/3307339.3342186. (5) Irwin, R.; Dimitriadis, S.; He, J.; Bjerrum, E. J. Chemformer: A Pre-Trained Tran...

work page doi:10.1145/3307339.3342186 2019

[3] [3]

ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction

(10) Shi, Y .; Zhang, A.; Zhang, E.; Liu, Z.; Wang, X. ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction. In Findings of the Association for Computational Linguistics: EMNLP 2023; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp 5506–5520. https://doi.org/10.18653/v1/2023.findings-emnlp.366. (11) Pei, Q.; ...

work page doi:10.18653/v1/2023.findings-emnlp.366 2023

[4] [4]

(13) Cao, H.; Liu, Z.; Lu, X.; Yao, Y .; Li, Y

https://arxiv.org/abs/2401.14818. (13) Cao, H.; Liu, Z.; Lu, X.; Yao, Y .; Li, Y. InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery

work page arXiv

[5] [5]

N.; Chen, Z.; Ning, X.; Sun, H

(14) Yu, B.; Baker, F. N.; Chen, Z.; Ning, X.; Sun, H. LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset. ArXiv 2024, abs/2402.09391. (15) Sadeghi, S.; Bui, A.; Forooghi, A.; Lu, J.; Ngom, A. Can Large Language Models Understand Molecules? BMC Bioinformatics 2024, 25 (1),

work page arXiv 2024

[6] [6]

(16) Jablonka, K

https://doi.org/10.1186/s12859-024-05847-x. (16) Jablonka, K. M.; Schwaller, P .; Ortega-Guerrero, A.; Smit, B. Leveraging Large Language Models for Predictive Chemistry. Nat Mach Intell 2024, 6 (2), 161–169. https://doi.org/10.1038/s42256-023- 00788-1. (17) Ramakrishnan, R.; Dral, P . O.; Rupp, M.; Von Lilienfeld, O. A. Quantum Chemistry Structures and P...

work page doi:10.1186/s12859-024-05847-x 2024

[7] [7]

(24) Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B

https://doi.org/10.1088/1367-2630/16/1/015018. (24) Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P .; Ramprasad, R.; Gubernatis, J. E.; Lookman, T. Machine Learning Bandgaps of Double Perovskites. Sci Rep 2016, 6 (1), 19375. https://doi.org/10.1038/srep19375. (25) de Jong, M.; Chen, W.; Angsten, T.; Jain, A.; Notestine, R.; Gamst, A.; Sluiter, M.; ...

work page doi:10.1088/1367-2630/16/1/015018 2016

[8] [8]

(27) Yang, C.; Ren, C.; Jia, Y .; Wang, G.; Li, M.; Lu, W

https://doi.org/10.1038/s41524-020-00440-1. (27) Yang, C.; Ren, C.; Jia, Y .; Wang, G.; Li, M.; Lu, W. A Machine Learning-Based Alloy Design System to Facilitate the Rational Design of High Entropy Alloys with Enhanced Hardness. Acta Mater 2022, 222, 117431. https://doi.org/10.1016/j.actamat.2021.117431. 24 (28) Hargreaves, C. J.; Gaultois, M. W.; Daniels...

work page doi:10.1038/s41524-020-00440-1 2022

[9] [9]

(29) Voyles, P .; Schultz, L.; Morgan, D.; Francis, C.; Afflerbach, B.; Hakeem, A

https://doi.org/10.1038/s41524-022-00951-z. (29) Voyles, P .; Schultz, L.; Morgan, D.; Francis, C.; Afflerbach, B.; Hakeem, A. Metallic Glasses and their Properties. https://foundry-ml.org/#/datasets/10.18126%2F7yg1-osf2 (accessed 2024-02-20). (30) Polak, M. P .; Morgan, D. Extracting Accurate Materials Data from Research Papers with Conversational Langua...

work page doi:10.1038/s41524-022-00951-z 2024

[10] [10]

(31) Emery, A

https://doi.org/10.1038/s41467-024-45914-8. (31) Emery, A. A.; Wolverton, C. High-Throughput DFT Calculations of Formation Energy, Stability and Oxygen Vacancy Formation Energy of ABO3 Perovskites. Sci Data 2017, 4 (1), 170153. https://doi.org/10.1038/sdata.2017.153. (32) Castelli, I. E.; Olsen, T.; Datta, S.; Landis, D. D.; Dahl, S.; Thygesen, K. S.; Jac...

work page doi:10.1038/s41467-024-45914-8 2017

[11] [11]

A Universal Framework for Accurate and Efficient Geometric Deep Learning of Molecular Systems

(41) Zhang, S.; Liu, Y .; Xie, L. A Universal Framework for Accurate and Efficient Geometric Deep Learning of Molecular Systems. Sci Rep 2023, 13, 19171. https://doi.org/https://doi.org/10.1038/s41598-023-46382-8. (42) Pinheiro, G. A.; Mucelini, J.; Soares, M. D.; Prati, R. C.; Da Silva, J. L. F.; Quiles, M. G. Machine Learning Prediction of Nine Molecula...

work page doi:10.1038/s41598-023-46382-8 2023

[12] [12]

Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation

(44) Krenn, M.; Häse, F.; Nigam, A.; Friederich, P .; Aspuru-Guzik, A. Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation. Mach Learn Sci Technol 2020, 1 (4), 045024. https://doi.org/10.1088/2632-2153/aba947. (45) Jacobs, R.; Schultz, L.; Scourtas, A.; Schmidt, K. J.; Price-Skelly, O.; Engler, W. Machine Learning Ma...

work page doi:10.1088/2632-2153/aba947 2020

[13] [13]

(46) Goodall, R. E. A.; Lee, A. A. Predicting Materials Properties without Crystal Structure: Deep Representation Learning from Stoichiometry. Nat Commun 2020, 11 (1),

work page 2020

[14] [14]

https://doi.org/10.1038/s41467-020-19964-7

work page doi:10.1038/s41467-020-19964-7