A transfer compression technique using semi-empirical data reduces molecular representation dimensions by a median 72% (range 36-98%) while retaining accuracy for energy, heat capacity, dipole moment and polarizability on QM9 and VQM24, and improves data efficiency for dipoles to 19% of training dat
Data-driven complete basis set limit estimates from a minimal auxiliary basis
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Quantum chemistry calculations are often performed using atom-centered basis sets which are chosen to balance accuracy and cost. While they are systematically improvable, the total energy converges slowly with basis set size towards the complete basis set (CBS) limit. Common extrapolation methods require several intermediate-quality calculations to afford an estimate of the CBS energy. We propose combining a pairwise interaction model with a minimal complementary auxiliary basis set (CABS) baseline to estimate the CBS energy from a single quantum chemistry calculation in a minimal basis set via Kernel-Ridge-Regression (KRR), which is more efficient than both direct and $\Delta$-machine learning. We show that KRR on standard molecular representations can be improved by approximating atom-wise local kernels using Chebyshev polynomials which allows us to train KRR models efficiently on moderate compute resources, further enabling a data-driven approach towards CBS combining physical baselines capturing leading order effects with data-efficient machine learning models.
fields
physics.chem-ph 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Property-Specific Molecular Representations via Feature-Space Transfer Compression
A transfer compression technique using semi-empirical data reduces molecular representation dimensions by a median 72% (range 36-98%) while retaining accuracy for energy, heat capacity, dipole moment and polarizability on QM9 and VQM24, and improves data efficiency for dipoles to 19% of training dat