pith. machine review for the scientific record. sign in

arxiv: 2605.14287 · v1 · submitted 2026-05-14 · ⚛️ physics.chem-ph

Recognition: 2 theorem links

· Lean Theorem

A quantum chemistry dataset containing ground-state and conical-intersection structures of 260k molecules

Authors on Pith no claims yet
Pith Number pith:HRIKZRLT state: computed view record JSON
4 claims · 80 references · 2 theorem links. This is the computed registry record for this paper; it is not author-attested yet.

Pith reviewed 2026-05-15 02:31 UTC · model grok-4.3

classification ⚛️ physics.chem-ph
keywords conical intersectionsquantum chemistry datasetphotochemistrymachine learningsemi-empirical methodsmolecular geometriesexcited statesOM2/MRCI
0
0 comments X

The pith

A dataset supplies ground-state and conical-intersection structures for 260,000 small molecules computed at the OM2/MRCI level.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a quantum chemistry dataset of 260,000 small molecules containing both optimized ground-state geometries with energies and conical-intersection geometries with energies. All computations use the OM2 semi-empirical method for ground-state optimization and the OM2/MRCI level for energies and conical intersections, covering molecules with up to ten heavy atoms from C, N, O, and F. The explicit purpose is to supply training data that links traditional photochemical calculations to machine learning models for excited-state processes. A sympathetic reader would care because conical intersections control many light-driven reactions, yet large, consistent datasets for them have been unavailable until now.

Core claim

We constructed a quantum chemistry dataset containing ground-state and conical-intersection structures of small molecules (up to ten heavy atoms: C, N, O, F). Ground-state geometries were optimized at the semi-empirical OM2 level, with single-point energies calculated at the OM2/MRCI level. Conical-intersection geometries and energies were also computed at the OM2/MRCI level. This dataset is designed to enable a deep integration of photochemistry with machine learning, bridging the gap between photochemical insight and data-driven approaches.

What carries the argument

The OM2/MRCI semi-empirical calculations that produce optimized ground-state structures and conical-intersection structures plus energies for each of the 260,000 molecules.

If this is right

  • Machine learning models can be trained directly on the conical-intersection examples to predict locations and energy gaps in photoinduced reactions.
  • Large-scale screening of photochemical pathways becomes feasible for thousands of small organic molecules without repeated quantum chemistry runs.
  • Data-driven methods can now incorporate both ground-state and excited-state features from a single consistent source.
  • The dataset supports development of surrogate models that accelerate excited-state dynamics simulations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The data could be combined with existing ground-state property datasets to train models that jointly predict ground- and excited-state behavior.
  • A validation campaign that recomputes a few thousand entries at CASSCF or higher levels would quantify the accuracy limits of the semi-empirical approximation.
  • Models trained on this set might generalize to slightly larger molecules if the conical-intersection features prove transferable beyond ten heavy atoms.

Load-bearing premise

The OM2/MRCI semi-empirical level supplies sufficiently accurate geometries and relative energies for conical intersections across the full chemical space of the 260,000 molecules.

What would settle it

Direct comparison of conical-intersection geometries and energy gaps from a representative subset of the dataset against reference values from higher-level multireference methods such as CASPT2 or MRCI with expanded basis sets would show large systematic deviations if the assumption fails.

Figures

Figures reproduced from arXiv: 2605.14287 by Chao Xu, Chuqiao Feng, Jiahui Zhang, Yifei Zhu, Yingjin Ma, Zhenggang Lan.

Figure 1
Figure 1. Figure 1: Workflow for the construction of the QCDGE-CI dataset, consisting of five steps: [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Analysis of compound type and molecular ring distribution in the QCDGE-CI [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Analysis of principal moments of inertia in the QCDGE-CI dataset. Here [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Bond length distributions of (a) C-C bonds, (b) C-O bonds, (c) C-N bonds, (d) [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of S1 VEE and S0-S1 CI energies in the QCDGE-CI dataset. (a) All molecules. (b) Aromatics. (c) Carboacyclic. (d) Carbocycles. (e) Fused carbocycles. (f) Fused heterocycles. (g) Heteroacyclic (h) Heteroaromatics. (i) Heterocycles. Code Availability All research was implemented in Python programming language (3.11.10), and several im￾portant libraries used in this study are: RDKit (version 2022.… view at source ↗
read the original abstract

Conical intersections play central roles in photoinduced reactions. However, comprehensive conical-intersection datasets that could advance our understanding of excited-state reaction processes remain scarce. To address this gap, we constructed a quantum chemistry dataset containing ground-state and conical-intersection structures of small molecules (up to ten heavy atoms: C, N, O, F). Ground-state geometries were optimized at the semi-empirical OM2 level, with single-point energies calculated at the OM2/MRCI level. Conical-intersection geometries and energies were also computed at the OM2/MRCI level. This dataset is designed to enable a deep integration of photochemistry with machine learning, bridging the gap between photochemical insight and data-driven approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript presents a quantum chemistry dataset containing ground-state and conical-intersection structures for 260k small molecules (up to ten heavy atoms: C, N, O, F). Ground-state geometries were optimized at the semi-empirical OM2 level with single-point energies at OM2/MRCI; conical-intersection geometries and energies were computed at the OM2/MRCI level. The work is framed as enabling machine-learning integration with photochemistry.

Significance. If the dataset is released with complete metadata and access instructions, it would address a genuine scarcity of large-scale conical-intersection data. This resource could support data-driven studies of photoinduced processes, provided users understand the semi-empirical accuracy limits.

minor comments (3)
  1. Add an explicit data-availability statement with repository DOI, file formats, and any usage license.
  2. Specify the exact sampling procedure used to generate the 260k molecules so that the chemical-space coverage can be assessed.
  3. Report the software version, convergence thresholds, and any filtering criteria applied during the OM2/MRCI conical-intersection searches.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript and for recommending minor revision. We appreciate the recognition that the dataset addresses a genuine scarcity of large-scale conical-intersection data and will ensure it is released with complete metadata and access instructions.

Circularity Check

0 steps flagged

No significant circularity; dataset generation is direct computation

full rationale

The manuscript describes the direct application of established semi-empirical methods (OM2 for ground-state geometry optimization and OM2/MRCI for single-point energies and conical-intersection searches) to generate structures and energies for 260k molecules. No equations, fitted parameters, predictions, or self-citations are invoked that reduce any claim to the inputs by construction. The central claim is simply that the dataset was produced and released using the stated protocol, which holds by explicit computation without internal reduction or circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The dataset rests on the domain assumption that the chosen semi-empirical method is adequate for the intended machine-learning use case; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption The OM2/MRCI semi-empirical method yields usable ground-state geometries and conical-intersection locations for small organic molecules containing C, N, O, F.
    This level of theory is applied uniformly to all 260k molecules as the sole computational engine described in the abstract.

pith-pipeline@v0.9.0 · 5434 in / 1322 out tokens · 39613 ms · 2026-05-15T02:31:29.742854+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages

  1. [1]

    and Yarkony, D

    Domcke, W. and Yarkony, D. and K\". Conical Intersections: Electronic Structure, Dynamics & Spectroscopy , publisher =. 2004 , volume =

  2. [2]

    and Yarkony, D

    Domcke, W. and Yarkony, D. R. and K\". Conical Intersections: Theory, Computation, and Experiment , publisher =. 2011 , volume =

  3. [3]

    and Yarkony, D

    Domcke, W. and Yarkony, D. R. , title =. Annu. Rev. Phys. Chem. , year =

  4. [4]

    and Barbatti, M

    Crespo-Otero, R. and Barbatti, M. , title =. Chem. Rev. , year =

  5. [5]

    and Borin, A

    Barbatti, M. and Borin, A. C. and Ullrich, S. , title =. 2015 , doi =

  6. [6]

    Curchod, B. F. and Mart\'. Ab Initio Nonadiabatic Quantum Molecular Dynamics , journal =. 2018 , volume =

  7. [7]

    and Krause, P

    Matsika, S. and Krause, P. , title =. Annu. Rev. Phys. Chem. , year =

  8. [8]

    and Cupellini, L

    Segatta, F. and Cupellini, L. and Garavelli, M. and Mennucci, B. , title =. Chem. Rev. , year =

  9. [9]

    , title =

    Teller, E. , title =. J. Phys. Chem. , year =

  10. [10]

    Modern Quantum Chemistry , publisher =

    Szab. Modern Quantum Chemistry , publisher =

  11. [11]

    Leach, A. R. , title =

  12. [12]

    and Gatti, F

    Meyer, H.-D. and Gatti, F. and Worth, G. A. , title =. 2009 , doi =

  13. [13]

    Exciton-Vibrational Coupling in the Dynamics and Spectroscopy of

    Schr\". Exciton-Vibrational Coupling in the Dynamics and Spectroscopy of. Phys. Rep. , year =

  14. [14]

    and Wang, H

    Thoss, M. and Wang, H. , title =. Annu. Rev. Phys. Chem. , year =

  15. [15]

    and He, X

    Liu, J. and He, X. and Wu, B. , title =. Acc. Chem. Res. , year =

  16. [16]

    and Akimov, A

    Smith, B. and Akimov, A. V. , title =. J. Phys.: Condens. Matter , year =

  17. [17]

    and Campos-Gonzalez-Angulo, J

    Aldossary, A. and Campos-Gonzalez-Angulo, J. A. and Pablo-Garc\'. In Silico Chemical Experiments in the Age of. Adv. Mater. , year =

  18. [18]

    and Fu, T

    Wang, H. and Fu, T. and Du, Y. and Gao, W. and Huang, K. and Liu, Z. and Chandak, P. and Liu, S. and Van Katwyk, P. and Deac, A. and Anandkumar, A. and Bergen, K. and Gomes, C. P. and Ho, S. and Kohli, P. and Lasenby, J. and Leskovec, J. and Liu, T. Y. and Manrai, A. and Marks, D. and Ramsundar, B. and Song, L. and Sun, J. and Tang, J. and Veli. Scientifi...

  19. [19]

    and Shi, Z

    Shao, C. and Shi, Z. and Xu, J. and Wang, L. , title =. J. Phys. Chem. Lett. , year =

  20. [20]

    and Wu, T

    Tao, H. and Wu, T. and Aldeghi, M. and Wu, T. C. and Aspuru-Guzik, A. and Kumacheva, E. , title =. Nat. Rev. Mater. , year =

  21. [21]

    and Husic, B

    Glielmo, A. and Husic, B. E. and Rodriguez, A. and Clementi, C. and No\'. Unsupervised Learning Methods for Molecular Simulation Data , journal =. 2021 , volume =

  22. [22]

    and Aldeghi, M

    Bannigan, P. and Aldeghi, M. and Bao, Z. and H\". Machine Learning Directed Drug Formulation Development , journal =. 2021 , volume =

  23. [23]

    Dral, P. O. and Barbatti, M. , title =. Nat. Rev. Chem. , year =

  24. [24]

    and Marquetand, P

    Westermayr, J. and Marquetand, P. , title =. Chem. Rev. , year =

  25. [25]

    and Lopez, S

    Li, J. and Lopez, S. A. , title =. Chem. Phys. Rev. , year =

  26. [26]

    and Xiao, K

    Pereira, F. and Xiao, K. and Latino, D. A. R. S. and Wu, C. and Zhang, Q. and Aires-de-Sousa, J. , title =. J. Chem. Inf. Model. , year =

  27. [27]

    and Sch\"

    Pronobis, W. and Sch\". Capturing Intensive and Extensive. Eur. Phys. J. B , year =

  28. [28]

    Fast and Accurate Excited States Predictions: Machine Learning and Diabatization , journal =

    Sr. Fast and Accurate Excited States Predictions: Machine Learning and Diabatization , journal =. 2024 , volume =

  29. [29]

    and Zou, Z

    Yuan, M. and Zou, Z. and Luo, Y. and Jiang, J. and Hu, W. , title =. J. Phys. Chem. Lett. , year =

  30. [30]

    and Wang, S.-R

    Xiao, D. and Wang, S.-R. and Liu, X.-Y. and Fang, W.-H. and Cui, G. , title =. J. Phys. Chem. Lett. , year =

  31. [31]

    and Omar, \"O

    Coxson, A. and Omar, \"O. H. and del Cueto, M. and Troisi, A. , title =. J. Chem. Theory Comput. , year =

  32. [32]

    and Xie, Y

    Hu, D. and Xie, Y. and Li, X. and Li, L. and Lan, Z. , title =. J. Phys. Chem. Lett. , year =

  33. [33]

    and Liu, X.-Y

    Chen, W.-K. and Liu, X.-Y. and Fang, W.-H. and Dral, P. O. and Cui, G. , title =. J. Phys. Chem. Lett. , year =

  34. [34]

    Dral, P. O. and Barbatti, M. and Thiel, W. , title =. J. Phys. Chem. Lett. , year =

  35. [35]

    and Stein, R

    Li, J. and Stein, R. and Adrion, D. M. and Lopez, S. A. , title =. J. Am. Chem. Soc. , year =

  36. [36]

    and Peng, J

    Lin, K. and Peng, J. and Xu, C. and Gu, F. L. and Lan, Z. , title =. J. Phys. Chem. Lett. , year =

  37. [37]

    and Jia, L

    Tang, D. and Jia, L. and Shen, L. and Fang, W.-H. , title =. J. Phys. Chem. Lett. , year =

  38. [38]

    and Dral, P

    Ramakrishnan, R. and Dral, P. O. and Rupp, M. and von Lilienfeld, O. A. , title =. Sci. Data , year =

  39. [39]

    and Li, M

    Zhu, Y. and Li, M. and Xu, C. and Lan, Z. , title =. Sci. Data , year =

  40. [40]

    and Thiel, W

    Weber, W. and Thiel, W. , title =. Theor. Chem. Acc. , year =

  41. [41]

    and Beck, M

    Koslowski, A. and Beck, M. E. and Thiel, W. , title =. J. Comput. Chem. , year =

  42. [42]

    Neutral and Charged Biradicals, Zwitterions, Funnels in

    Bona. Neutral and Charged Biradicals, Zwitterions, Funnels in. Angew. Chem. Int. Ed. Engl. , year =

  43. [43]

    and Beck, M

    Thiel, W. and Beck, M. and Billeter, S. and others , title =. 2019 , note =

  44. [44]

    doi:10.5281/zenodo.7671152 , note =

    Open-source cheminformatics , year =. doi:10.5281/zenodo.7671152 , note =

  45. [45]

    and Gamez, J

    Nikiforov, A. and Gamez, J. A. and Thiel, W. and Huix-Rotllant, M. and Filatov, M. , title =. J. Chem. Phys. , year =

  46. [46]

    Malmqvist, P.-. The. Chem. Phys. Lett. , year =

  47. [47]

    , title =

    Olsen, J. , title =. Int. J. Quantum Chem. , year =

  48. [48]

    Keal, T. W. and Koslowski, A. and Thiel, W. , title =. Theor. Chem. Acc. , year =

  49. [49]

    Sauer, W. H. B. and Schwarz, M. K. , title =. J. Chem. Inf. Comput. Sci. , year =

  50. [50]

    , title =

    Ertl, P. , title =. J. Cheminf. , year =

  51. [51]

    Anatole , year = 2015, month = aug, journal =

    Ramakrishnan, Raghunathan and Hartmann, Mia and Tapavicza, Enrico and Von Lilienfeld, O. Anatole , year = 2015, month = aug, journal =. Electronic Spectra from

  52. [52]

    A Deep Learning Model for Predicting Selected Organic Molecular Spectra , author =. Nat. Comput. Sci. , volume =

  53. [53]

    doi:10.1021/acs.jpclett.5c00839 , copyright =

    Yuan, Mingzhi and Zou, Zihan and Luo, Yi and Jiang, Jun and Hu, Wei , year = 2025, month = apr, journal =. doi:10.1021/acs.jpclett.5c00839 , copyright =

  54. [54]

    Liang, Jiechun and Ye, Shuqian and Dai, Tianshu and Zha, Ziyue and Gao, Yuechen and Zhu, Xi , year = 2020, month = nov, journal =

  55. [55]

    Nakata, Maho and Shimazaki, Tomomi , year = 2017, month = jun, journal =

  56. [56]

    Molecular Quantum Chemical Data Sets and Databases for Machine Learning Potentials , author =. Mach. Learn.: Sci. Technol. , volume =

  57. [57]

    Machine Learning of Molecular Electronic Properties in Chemical Compound Space , author =. New J. Phys. , volume =

  58. [58]

    Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning , author =. Phys. Rev. Lett. , volume =

  59. [59]

    Axelrod, Simon and. Sci. Data , volume =

  60. [60]

    Smith, J. S. and Isayev, O. and Roitberg, A. E. , year = 2017, journal =

  61. [61]

    and Conte, Chen Qu Riccardo and Nandi, Apurba and Houston, Paul L

    Bowman, Joel M. and Conte, Chen Qu Riccardo and Nandi, Apurba and Houston, Paul L. and Yu, Qi , year = 2022, month = jun, journal =. The. doi:10.1063/5.0089200 , archiveprefix =. 2205.11663 , primaryclass =

  62. [62]

    and Barbatti, Mario , year = 2023, month = feb, journal =

    Pinheiro Jr, Max and Zhang, Shuang and Dral, Pavlo O. and Barbatti, Mario , year = 2023, month = feb, journal =

  63. [63]

    and Dral, Pavlo O

    Zhang, Lina and Zhang, Shuang and Owens, Alec and Yurchenko, Sergei N. and Dral, Pavlo O. , year = 2022, month = mar, journal =

  64. [64]

    Pengmei, Zihan and Liu, Junyu and Shu, Yinan , year = 2024, month = feb, journal =. Beyond

  65. [65]

    Computational and Data Driven Molecular Material Design Assisted by Low Scaling Quantum Mechanics Calculations and Machine Learning , author=. Chem. Sci. , volume=. 2021 , publisher=

  66. [66]

    Designing Promising Thermally Activated Delayed Fluorescence Emitters via Machine Learning-Assisted High-Throughput Virtual Screening , author=. J. Phys. Chem. C , volume=. 2023 , publisher=

  67. [67]

    A Unified Active Learning Framework for Photosensitizer Design , author=. Chem. Sci. , volume=. 2026 , publisher=

  68. [68]

    2023 , doi=

    Development and Implementation of in Silico Molecule Fragmentation Algorithms for the Cheminformatics Analysis of Natural Product Spaces , author=. 2023 , doi=

  69. [69]

    Nonadiabatic dynamics: The SHARC approach , author=. WIRES. COMPUT. MOL. SCI. , volume=. 2018 , publisher=

  70. [70]

    Effect of the damping function in dispersion corrected density functional theory , author=. J. Comput. Chem. , volume=. 2011 , publisher=

  71. [71]

    InChI, the IUPAC International Chemical Identifier , author=. J. Cheminf. , volume=. 2015 , publisher=

  72. [72]

    The properties of known drugs. 1. Molecular frameworks , author=. J. Med. Chem. , volume=. 1996 , publisher=

  73. [73]

    Reactants, Products, and Transition States of Elementary Chemical Reactions Based on Quantum Chemistry , author =. Sci. Data , volume =

  74. [74]

    Two Excited-State Datasets for Quantum Chemical

    Lupo Pasini, Massimiliano and Mehta, Kshitij and Yoo, Pilsun and Irle, Stephan , year = 2023, month = aug, journal =. Two Excited-State Datasets for Quantum Chemical

  75. [75]

    Pybel: a Python Wrapper for the OpenBabel Cheminformatics Toolkit , author=. Chem. Cent. J. , volume=. 2008 , publisher=

  76. [76]

    Open Babel: An Open Chemical Toolbox , author=. J. Cheminf. , volume=. 2011 , publisher=

  77. [77]

    The Atomic Simulation Environment---a Python Library for Working with Atoms , author=. J. Phys. Condens. Matter , volume=. 2017 , publisher=

  78. [78]

    2020 , institution=

    HDF5-Version 1.12.0 , author=. 2020 , institution=

  79. [79]

    Excited State Non-Adiabatic Dynamics of Large Photoswitchable Molecules Using a Chemically Transferable Machine Learning Potential , author =. Nat. Commun. , volume =

  80. [80]

    Machine Learning Enables Long Time Scale Molecular Photodynamics Simulations , author=. Chem. Sci. , volume=. 2019 , publisher=