pith. sign in

Data-driven complete basis set limit estimates from a minimal auxiliary basis

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Quantum chemistry calculations are often performed using atom-centered basis sets which are chosen to balance accuracy and cost. While they are systematically improvable, the total energy converges slowly with basis set size towards the complete basis set (CBS) limit. Common extrapolation methods require several intermediate-quality calculations to afford an estimate of the CBS energy. We propose combining a pairwise interaction model with a minimal complementary auxiliary basis set (CABS) baseline to estimate the CBS energy from a single quantum chemistry calculation in a minimal basis set via Kernel-Ridge-Regression (KRR), which is more efficient than both direct and $\Delta$-machine learning. We show that KRR on standard molecular representations can be improved by approximating atom-wise local kernels using Chebyshev polynomials which allows us to train KRR models efficiently on moderate compute resources, further enabling a data-driven approach towards CBS combining physical baselines capturing leading order effects with data-efficient machine learning models.

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Property-Specific Molecular Representations via Feature-Space Transfer Compression

physics.chem-ph · 2026-06-19 · unverdicted · novelty 6.0

A transfer compression technique using semi-empirical data reduces molecular representation dimensions by a median 72% (range 36-98%) while retaining accuracy for energy, heat capacity, dipole moment and polarizability on QM9 and VQM24, and improves data efficiency for dipoles to 19% of training dat

citing papers explorer

Showing 1 of 1 citing paper.

  • Property-Specific Molecular Representations via Feature-Space Transfer Compression physics.chem-ph · 2026-06-19 · unverdicted · none · ref 50 · internal anchor

    A transfer compression technique using semi-empirical data reduces molecular representation dimensions by a median 72% (range 36-98%) while retaining accuracy for energy, heat capacity, dipole moment and polarizability on QM9 and VQM24, and improves data efficiency for dipoles to 19% of training dat