pith. sign in

arxiv: 1906.09427 · v1 · pith:SDPY76KTnew · submitted 2019-06-22 · 💻 cs.LG · stat.ML

Alchemy: A Quantum Chemistry Dataset for Benchmarking AI Models

Pith reviewed 2026-05-25 18:25 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords Alchemy datasetquantum chemistrygraph neural networksmolecular propertiesmachine learning benchmarkGDB MedChemorganic molecules
0
0 comments X

The pith

Alchemy supplies quantum properties for 119487 molecules to benchmark graph neural networks in chemistry.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Alchemy dataset of 119487 organic molecules sampled from the GDB MedChem database each with up to 14 heavy atoms and annotated with 12 quantum mechanical properties. It expands the size and diversity of prior molecular datasets used for machine learning. The authors run benchmarks of current graph neural network models on Alchemy and report that the added data helps validate and advance those models for chemistry tasks. Readers interested in applying machine learning to molecular property prediction would care because the larger resource allows more rigorous testing than smaller existing collections permit.

Core claim

Alchemy comprises 12 quantum mechanical properties of 119487 organic molecules with up to 14 heavy atoms sampled from the GDB MedChem database. Extensive benchmarks of the state-of-the-art graph neural network models on Alchemy clearly manifest the usefulness of new data in validating and developing machine learning models for chemistry and material science.

What carries the argument

The Alchemy dataset of molecular quantum properties used to benchmark graph neural networks.

If this is right

  • Graph neural network models can be validated at larger scale and with greater molecular diversity than before.
  • Machine learning models for predicting quantum properties gain a new resource for training and testing.
  • The launched contest draws additional researchers to develop models using the dataset.
  • Further molecules generated after the initial 119487 samples increase the available resource for ongoing work.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Alchemy could become a standard benchmark that replaces or augments smaller datasets such as QM9 for routine model evaluation.
  • Architectures whose performance improves markedly with the added volume may be preferred for scaling to larger chemical systems.
  • The sampling from GDB MedChem may allow models trained on Alchemy to transfer more readily to medicinal chemistry applications than models trained only on narrower sets.

Load-bearing premise

The molecules sampled from the GDB MedChem database supply sufficient additional diversity and relevance beyond existing smaller datasets to meaningfully advance model development.

What would settle it

Benchmarks in which the same graph neural network models exhibit identical relative performance and no new validation insights on Alchemy versus prior smaller datasets would falsify the usefulness claim.

Figures

Figures reproduced from arXiv: 1906.09427 by Benben Liao, Chang-Yu Hsieh, Chee-Kong Lee, Guangyong Chen, Jie Tang, Jiezhong Qiu, Pengfei Chen, Qiming Sun, Renjie Liao, Richard Zemel, Shengyu Zhang, Weiwen Liu.

Figure 1
Figure 1. Figure 1: We first used OpenBabel (O’Boyle et al., 2011) to parse SMILES string and built the [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: An example for molecule CC(O)C(C)C(=O)NC=O, a sample from Alchemy dataset. 0 1 2 HF/STO-3G (hours) 0 2 4 6 Histogram 0 20 40 60 DFT (hours) 0.0 0.1 0.2 0.3 0 50 100 150 Property (hours) 0.00 0.02 0.04 0.06 0 50 100 150 Total (hours) 0.00 0.01 0.02 0.03 0.04 0.05 [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Histogram of Alchemy’s running time. Each step takes 0.17/4.13/21.11 hours on average, [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A pairplot for calculated properties. In both benchmarks, we use the Alchemy dataset which contains 119,487 molecules. All node and edge features used in the benchmarking experiments are listed in [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

We introduce a new molecular dataset, named Alchemy, for developing machine learning models useful in chemistry and material science. As of June 20th 2019, the dataset comprises of 12 quantum mechanical properties of 119,487 organic molecules with up to 14 heavy atoms, sampled from the GDB MedChem database. The Alchemy dataset expands the volume and diversity of existing molecular datasets. Our extensive benchmarks of the state-of-the-art graph neural network models on Alchemy clearly manifest the usefulness of new data in validating and developing machine learning models for chemistry and material science. We further launch a contest to attract attentions from researchers in the related fields. More details can be found on the contest website \footnote{https://alchemy.tencent.com}. At the time of benchamrking experiment, we have generated 119,487 molecules in our Alchemy dataset. More molecular samples are generated since then. Hence, we provide a list of molecules used in the reported benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the Alchemy dataset of 119,487 organic molecules (up to 14 heavy atoms) sampled from GDB MedChem, each annotated with 12 quantum mechanical properties. It reports benchmarks of state-of-the-art graph neural network models on this dataset, asserts that the results demonstrate the dataset's usefulness for validating and developing ML models in chemistry, and launches an associated contest.

Significance. If the expanded scale and diversity from GDB MedChem introduce new modeling challenges absent from smaller prior sets, the dataset and benchmarks could support incremental progress in graph-based models for quantum chemistry. The explicit release of the exact molecule list used for benchmarking is a positive reproducibility feature.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'our extensive benchmarks of the state-of-the-art graph neural network models on Alchemy clearly manifest the usefulness of new data' is unsupported because no quantitative comparison (property distribution shifts, scaffold novelty, or cross-dataset transfer performance) to existing datasets such as QM9 is provided.
  2. [Abstract] Abstract: the benchmarking description states dataset size and property count but supplies no error bars, run-to-run variance, or exclusion criteria for the 119,487 molecules, which undermines assessment of the reliability of the GNN results.
minor comments (2)
  1. [Abstract] Typo: 'benchamrking' should be 'benchmarking'.
  2. [Abstract] Grammatical: 'attract attentions from researchers' should be 'attract attention from researchers'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and will revise the manuscript accordingly where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'our extensive benchmarks of the state-of-the-art graph neural network models on Alchemy clearly manifest the usefulness of new data' is unsupported because no quantitative comparison (property distribution shifts, scaffold novelty, or cross-dataset transfer performance) to existing datasets such as QM9 is provided.

    Authors: We acknowledge that the abstract asserts the usefulness of the new data without providing explicit quantitative comparisons to QM9 (such as property distribution shifts or scaffold novelty). The manuscript text notes that Alchemy expands volume and diversity by sampling from GDB MedChem and supplies the exact list of benchmarked molecules, but we agree this does not substitute for direct comparative metrics. We will revise the abstract to moderate the claim and add a new subsection with quantitative comparisons to QM9. revision: yes

  2. Referee: [Abstract] Abstract: the benchmarking description states dataset size and property count but supplies no error bars, run-to-run variance, or exclusion criteria for the 119,487 molecules, which undermines assessment of the reliability of the GNN results.

    Authors: The manuscript already states that the exact list of 119,487 molecules used for the reported benchmarks is released to support reproducibility, which addresses exclusion criteria. However, we agree that the absence of error bars and run-to-run variance in the benchmarking results limits evaluation of reliability. We will revise the benchmarking section to report these statistics (e.g., standard deviations across multiple runs) and clarify any additional filtering steps applied. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset release and empirical benchmarks only

full rationale

The paper is a data release (Alchemy sampled from GDB MedChem, 119k molecules, 12 QM properties) plus standard GNN benchmarking. No derivation chain, fitted parameters renamed as predictions, self-citation load-bearing on a uniqueness theorem, or ansatz smuggling exists. The central claim that benchmarks 'manifest usefulness' is an empirical assertion, not a reduction of any output to its own inputs by construction. No equations or self-referential steps are present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Dataset release paper; no free parameters, mathematical axioms, or new postulated entities are introduced beyond standard quantum-chemistry property calculations.

pith-pipeline@v0.9.0 · 5734 in / 924 out tokens · 42948 ms · 2026-05-25T18:25:42.684950+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Weisfeiler-Leman Is Incomplete on Simple Spectrum Graphs, so Canonicalize Them

    cs.LG 2026-05 unverdicted novelty 7.0

    k-WL is incomplete on simple spectrum graphs; PRiSM is the first provably complete canonicalization for their eigendecompositions.

  2. Path-Based Gradient Boosting for Graph-Level Prediction

    cs.LG 2026-04 unverdicted novelty 6.0

    PathBoost extends path-based gradient boosting with logistic loss, prefix-based multi-attribute handling, and automatic anchor selection, achieving better or comparable results to GNNs and graph kernels on benchmark d...

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · cited by 2 Pith papers · 1 internal anchor

  1. [1]

    Cho, K., Van Merri \"e nboer, B., Bahdanau, D., and Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259

  2. [2]

    Defferrard, M., Bresson, X., and Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. In NeurIPS

  3. [3]

    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). ImageNet: A Large-Scale Hierarchical Image Database . In CVPR '09

  4. [4]

    Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In NAACL-HLT '19

  5. [5]

    S., Riley, P

    Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. (2017). Neural message passing for quantum chemistry. In ICML

  6. [6]

    N., Duvenaud, D., Hern \'a ndez-Lobato, J

    G \'o mez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hern \'a ndez-Lobato, J. M., S \'a nchez-Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams, R. P., and Aspuru-Guzik, A. (2018). Automatic Chemical Design Using a Data-driven Continuous Representation of Molecules . ACS Central Science , 4(2):268--276

  7. [7]

    Gori, M., Monfardini, G., and Scarselli, F. (2005). A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005. , volume 2, pages 729--734. IEEE

  8. [8]

    He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In CVPR '16 , pages 770--778

  9. [9]

    Jin, W., Barzilay, R., and Jaakkola, T. (2018). Junction Tree Variational Autoencoder for Molecular Graph Generation . ICML '18

  10. [10]

    Kipf, T. N. and Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In ICLR

  11. [11]

    Knizia, G. (2013). Intrinsic Atomic Orbitals: An Unbiased Bridge between Quantum Theory and Chemical Concepts . Journal of Chemical Theory and Computation , 9(11):4834--4843

  12. [12]

    Lanczos, C. (1950). An iteration method for the solution of the eigenvalue problem of linear differential and integral operators . United States Governm. Press Office Los Angeles, CA

  13. [13]

    Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. (2016). Gated graph sequence neural networks. In ICLR

  14. [14]

    Liao, R., Zhao, Z., Urtasun, R., and Zemel, R. S. (2019). Lanczosnet: Multi-scale deep graph convolutional networks. In ICLR

  15. [15]

    P., Liaw, A., Dahl, G

    Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E., and Svetnik, V. (2015). Deep neural nets as a method for quantitative structure--activity relationships. Journal of chemical information and modeling , 55(2):263--274

  16. [16]

    M., Banck, M., James, C

    O'Boyle, N. M., Banck, M., James, C. A., Morley, C., Vandermeersch, T., and Hutchison, G. R. (2011). Open babel: An open chemical toolbox. Journal of Cheminformatics , 3(1):33

  17. [17]

    Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). SQuAD: 100,000+ Questions for Machine Comprehension of Text . In EMNLP '16

  18. [18]

    O., Rupp, M., and Von Lilienfeld, O

    Ramakrishnan, R., Dral, P. O., Rupp, M., and Von Lilienfeld, O. A. (2014). Quantum Chemistry Structures and Properties of 134 Kilo Molecules . Scientific Data , 1:140022

  19. [19]

    P., and Pande, V

    Ramsundar, B., Liu, B., Wu, Z., Verras, A., Tudor, M., Sheridan, R. P., and Pande, V. (2017). Is multitask deep learning practical for pharma? Journal of chemical information and modeling , 57(8):2068--2076

  20. [20]

    C., and Reymond, J.-L

    Ruddigkeit, L., Van Deursen, R., Blum, L. C., and Reymond, J.-L. (2012). Enumeration of 166 billion organic small molecules in the chemical universe database gdb-17. Journal of Chemical Information and Modeling , 52(11):2864--2875

  21. [21]

    and Aspuru-Guzik, A

    Sanchez-Lengeling, B. and Aspuru-Guzik, A. (2018). Inverse Molecular Design Using Machine Learning: Generative Models for Matter Engineering . Science , 361(6400):360--365

  22. [22]

    C., Hagenbuchner, M., and Monfardini, G

    Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. (2009). The graph neural network model. IEEE Transactions on Neural Networks , 20(1):61--80

  23. [23]

    N., Bloem, P., Van Den Berg, R., Titov, I., and Welling, M

    Schlichtkrull, M., Kipf, T. N., Bloem, P., Van Den Berg, R., Titov, I., and Welling, M. (2018). Modeling relational data with graph convolutional networks. In European Semantic Web Conference , pages 593--607. Springer

  24. [24]

    Alphachem

    Segler, M., Preu , M., and Waller, M. P. (2017). Towards "Alphachem": Chemical Synthesis Planning with Tree Search and Deep Neural Network Policies . In ICLR '17 Workshop

  25. [25]

    H., Preuss, M., and Waller, M

    Segler, M. H., Preuss, M., and Waller, M. P. (2018). Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI . Nature , 555(7698):604

  26. [26]

    Shen, Y., Huang, P.-S., Gao, J., and Chen, W. (2017). Reasonet: Learning to Stop Reading in Machine Comprehension . In KDD '17 , pages 1047--1055. ACM

  27. [27]

    J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al

    Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the Game of Go with Deep Neural Networks and Tree Search . Nature , 529(7587):484

  28. [28]

    Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. (2017). Mastering the Game of Go without Human Knowledge . Nature , 550(7676):354

  29. [29]

    C., Blunt, N

    Sun, Q., Berkelbach, T. C., Blunt, N. S., Booth, G. H., Guo, S., Li, Z., Liu, J., McClain, J. D., Sayfutyarova, E. R., Sharma, S., et al. (2018). Pyscf: the python-based simulations of chemistry framework. Wiley Interdisciplinary Reviews: Computational Molecular Science , 8(1):e1340

  30. [30]

    and Chan, G

    Sun, Q. and Chan, G. K.-L. (2014). Exact and optimal quantum mechanics/molecular mechanics boundaries. Journal of Chemical Theory and Computation , 10(9):3784--3790

  31. [31]

    Veli c kovi \'c , P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2018). Graph attention networks. In ICLR

  32. [32]

    Vinyals, O., Bengio, S., and Kudlur, M. (2015). Order matters: Sequence to sequence for sets. ICLR '15

  33. [33]

    Weigend, F. (2002). A Fully Direct RI-HF Algorithm: Implementation , Optimised Auxiliary Basis Sets , Demonstration of Accuracy and Efficiency . Phys. Chem. Chem. Phys. , 4:4285--4291

  34. [34]

    Weininger, D. (1988). Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. , 28(1):31--36

  35. [35]

    N., Gomes, J., Geniesse, C., Pappu, A

    Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., Pappu, A. S., Leswing, K., and Pande, V. (2018). MoleculeNet: a Benchmark for Molecular Machine Learning . Chemical Science , 9(2):513--530

  36. [36]

    Xu, K., Hu, W., Leskovec, J., and Jegelka, S. (2019). How powerful are graph neural networks? In ICLR

  37. [37]

    Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K.-i., and Jegelka, S. (2018). Representation learning on graphs with jumping knowledge networks. In ICML

  38. [38]

    Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., and Leskovec, J. (2018). Hierarchical graph representation learning with differentiable pooling. In NeurIPS , pages 4800--4810