arxiv: 2605.03964 · v1 · submitted 2026-05-05 · 💻 cs.LG · physics.chem-ph

Recognition: unknown

Pretrained Model Representations as Acquisition Signals for Active Learning of MLIPs

Eszter Varga-Umbrich , Shikha Surana , Paul Duckworth , Jules Tilly , Olivier Peltre , Zachary Weller-Davies

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:05 UTC · model grok-4.3

classification 💻 cs.LG physics.chem-ph

keywords active learningmachine learning interatomic potentialspretrained representationsneural tangent kernelacquisition functionsreactive chemistrylatent space kernelsdata efficiency

0 comments

The pith

Pretrained latent spaces from machine learning interatomic potentials supply effective acquisition signals for active learning in reactive chemistry.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether the internal representations of an already-trained MLIP can be turned directly into rules for choosing which new configurations to label next. It constructs two kernels from the model's latent space and measures how well they guide data selection on reactive chemistry tasks. These signals reduce the number of expensive quantum labels needed to hit target accuracy levels compared with random selection, fixed descriptors, or model committees. If the claim holds, training accurate potentials for reactions becomes cheaper because no separate uncertainty modules or ensemble training are required.

Core claim

Acquisition signals extracted directly from the latent space of a pretrained MLIP, using a finite-width neural tangent kernel and an activation kernel, outperform fixed-descriptor baselines, committee disagreement, and random acquisition. This approach reduces the data required to reach target performance levels by an average of 38% for energy errors and 28% for force errors on reactive-chemistry benchmarks. The pretrained representations also yield similarity spaces that better match model errors and preserve chemically meaningful structure.

What carries the argument

Finite-width neural tangent kernel and activation kernel built from hidden latent space features of a pretrained MLIP, used to measure representational similarity for selecting informative unlabeled points.

If this is right

Active learning loops for MLIPs can run without training extra uncertainty estimators or maintaining model committees.
Fewer quantum chemistry calculations are needed to reach given accuracy targets for both energies and forces.
The geometry induced by pretraining aligns more closely with actual model residuals than fixed chemical descriptors do.
Pretrained representations already encode chemically relevant similarities that support reliable data selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same latent-space kernels could be tested for active learning in related tasks such as molecular dynamics sampling or property prediction.
If the alignment between latent geometry and error persists across different pretraining datasets, the method might generalize beyond the specific benchmarks used here.
One could examine whether mixing these kernels with other acquisition heuristics produces additive gains in data efficiency.
The results hint that pretraining serves a dual role: improving base predictions and providing a ready-made uncertainty proxy for later fine-tuning stages.

Load-bearing premise

The latent space of a pretrained MLIP already contains the information necessary for effective acquisition without auxiliary uncertainty heads, Bayesian training, fine-tuning, or committee ensembles.

What would settle it

A new reactive chemistry benchmark in which the kernel acquisition methods require at least as much labeled data as random selection to reach the same energy and force error targets would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.03964 by Eszter Varga-Umbrich, Jules Tilly, Olivier Peltre, Paul Duckworth, Shikha Surana, Zachary Weller-Davies.

**Figure 1.** Figure 1: Force RMSE (meV Å−1 ) under the natural T1x setting. Each acquisition step adds five structures. The table reports metrics averaged across the natural T1x pools; lower is better for all columns. Among the methods compared here, model-dependent kernels are the strongest acquisition signals: NTK-PV gives the best force AUC and final force error, while Activation-PV gives the best energy AUC. resolution does … view at source ↗

**Figure 2.** Figure 2: Global kernel matrices on a five-reaction T1x subset. Structures are sorted by reaction family and frame index. The pretrained NTK preserves coarse reaction-family structure while retaining finer variation along reaction paths. Appendix C gives the full global and within-reaction kernel diagnostics for NTK, activations, SOAP, and Tanimoto kernels. 0.0 0.2 0.4 0.6 0.8 1.0 Nominal Coverage 0.0 0.2 0.4 0.6 0.… view at source ↗

**Figure 3.** Figure 3: Nominal versus empirical coverage, for the selected kernels. Pretrained activation and pretrained NTK show the best calibration among the compared kernels, while random neural kernels and descriptor kernels are less accurate and less well calibrated. pretrained NTK gives the lowest expected calibration error (ECE). Randomly initialised neural kernels are weaker than their pretrained counterparts, and des… view at source ↗

**Figure 4.** Figure 4: shows the energy and force learning curves for these committee variants on the three T1x candidate pools. The main failure mode is an unstable energy–force trade-off. Energy committees can improve final energy error, but their acquisition scores are not well aligned with force improvement and they give weaker force learning curves than random acquisition. Force committees select structures that are more us… view at source ↗

**Figure 5.** Figure 5: Spearman correlation between committee standard deviation and absolute prediction error across active-learning rounds. The weak correlations show that committee disagreement is not a consistently calibrated ranking signal. 15 view at source ↗

**Figure 6.** Figure 6: Force and energy learning curves across the T1x sets when training from scratch showing similar results to the pretrained case view at source ↗

**Figure 7.** Figure 7: Global NTK kernel matrices on the T1x subset, with structures ordered by reaction family. The panels compare the randominitialised MACE NTK with NTK kernels computed after selected active-learning iterations. 17 view at source ↗

**Figure 8.** Figure 8: Within-reaction NTK frame-block kernels for the T1x subset after selected active-learning iterations. Each block orders structures by frame index along a reaction pathway. 18 view at source ↗

**Figure 9.** Figure 9: Global NTK kernel matrices on the T1x subset, with structures ordered by reaction family. The panels show how an untrained NTK evolves with AL iteration. 19 view at source ↗

**Figure 10.** Figure 10: Within-reaction scratch NTK frame-block kernels for the T1x subset after selected active-learning iterations. Each block orders structures by frame index along a reaction pathway. 20 view at source ↗

**Figure 11.** Figure 11: Global activation-kernel matrices on the T1x subset, with structures ordered by reaction family. The panels compare randominitialized and active-learning iteration checkpoints using pooled scalar MACE activations. 21 view at source ↗

**Figure 12.** Figure 12: Within-reaction activation frame-block kernels for the T1x subset after selected active-learning iterations. 22 view at source ↗

**Figure 13.** Figure 13: Global srctach activation kernel matrices on the T1x subset, with structures ordered by reaction family. The panels show how an untrained activation kernel evolves with AL iteration. 23 view at source ↗

**Figure 14.** Figure 14: Within-reaction scratch activation frame-block kernels for the T1x subset after selected active-learning iterations. 24 view at source ↗

**Figure 15.** Figure 15: SOAP kernel diagnostics on the T1x subset. SOAP captures coarse reaction-family structure using fixed local-geometry descriptors, but many within-reaction similarities remain high. rxn00722 rxn02072 rxn05576 rxn06328 rxn06359 rxn00722 rxn02072 rxn05576 rxn06328 rxn06359 0.2 0.4 0.6 0.8 1.0 Kernel value (a) Tanimoto global kernel 1 2 3 4 5 6 7 8 10 1 2 3 4 5 6 7 8 10 Frame rxn00722 1 2 3 4 5 6 7 8 9 1 2 3 … view at source ↗

**Figure 16.** Figure 16: Tanimoto kernel diagnostics on the T1x subset. Morgan fingerprints mainly reflect molecular graph identity and are less sensitive to continuous geometry changes along a fixed reaction path and are not able to capture inter reaction similarities. 25 view at source ↗

**Figure 17.** Figure 17: Additional transferability results on PMechDB, RGD, and T1x Mixed. The curves show force RMSE across active-learning rounds. 0 2 4 6 8 10 12 14 16 18 20 50 PMechDB 0 2 4 6 8 10 12 14 16 18 20 100 150 RGD 0 2 4 6 8 10 12 14 16 18 20 20 50 100 T1X Mixed AL Round Force MAE (meV/A) Random Activation LCMD Committee Energy NTK LCMD view at source ↗

**Figure 18.** Figure 18: Additional transferability results on PMechDB, RGD, and T1x Mixed. The curves show force MAE across active-learning rounds. 27 view at source ↗

**Figure 19.** Figure 19: Additional transferability results on PMechDB, RGD, and T1x Mixed. The curves show energy RMSE across active-learning rounds. 0 2 4 6 8 10 12 14 16 18 20 15 20 30 PMechDB 0 2 4 6 8 10 12 14 16 18 20 10 15 20 RGD 0 2 4 6 8 10 12 14 16 18 20 2 5 10 20 T1X Mixed AL Round Energy MAE/atom (meV) Random Activation LCMD Committee Energy NTK LCMD view at source ↗

**Figure 20.** Figure 20: Additional transferability results on PMechDB, RGD, and T1x Mixed. The curves show energy MAE across active-learning rounds. 28 view at source ↗

read the original abstract

Training machine learning interatomic potentials (MLIPs) for reactive chemistry is often bottlenecked by the high cost of quantum chemical labels and the scarcity of transition state configurations in candidate pools. Active learning (AL) can mitigate these costs, but its effectiveness hinges on the acquisition rule. We investigate whether the latent space of a pretrained MLIP already contains the information necessary for effective acquisition, eliminating the need for auxiliary uncertainty heads, Bayesian training and fine-tuning, or committee ensembles. We introduce two acquisition signals derived directly from a pretrained MACE potential: a finite-width neural tangent kernel (NTK) and an activation kernel built from hidden latent space features. On reactive-chemistry benchmarks, both kernels consistently outperform fixed-descriptor baselines, committee disagreement, and random acquisition, reducing the data required to reach performance targets by an average of 38% for energy error and 28% for force error. We further show that the pretrained model induces similarity spaces that preserve chemically meaningful structure and provide more reliable residual uncertainty estimates than randomly initialised or fixed-descriptor-based kernels. Our results suggest that pretraining aligns latent-space geometry with model error, yielding a practical and sufficient acquisition signal for reactive MLIP fine-tuning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Pretrained MACE latents yield usable NTK and activation kernels for active learning in reactive MLIPs, cutting data needs by about a third versus committees and descriptors.

read the letter

The main thing to know is that this paper shows you can derive acquisition functions straight from the latent space of a fixed pretrained MACE model using a finite-width neural tangent kernel and an activation kernel built on hidden features. On reactive-chemistry benchmarks these signals beat fixed-descriptor baselines, committee disagreement, and random selection, with average reductions of 38% in data needed for target energy error and 28% for force error. They also check that pretraining matters by comparing against randomly initialized versions of the same kernels, which supports the claim that the latent geometry already tracks model error without extra uncertainty modeling or ensembles.

Referee Report

1 major / 2 minor

Summary. The manuscript claims that acquisition signals derived from the latent space of a pretrained MACE MLIP—a finite-width neural tangent kernel (NTK) and an activation kernel from hidden features—provide effective guidance for active learning when fine-tuning MLIPs on reactive chemistry tasks. These pretrained kernels are reported to outperform fixed-descriptor baselines, committee disagreement, and random acquisition, reducing the data required to reach target performance by an average of 38% for energy error and 28% for force error. The work further argues that pretraining aligns latent-space geometry with model error, yielding reliable residual uncertainty estimates without auxiliary uncertainty heads, Bayesian training, or ensembles.

Significance. If the empirical results hold under detailed scrutiny, the approach offers a practical simplification for active learning pipelines in MLIP development, especially in data-scarce reactive chemistry settings. By showing that representations from an existing pretrained model suffice for acquisition, it reduces the need for additional modeling overhead and provides evidence that pretraining captures chemically meaningful structure in the similarity space. This could accelerate iterative refinement of potentials while leveraging prior computational investment in large-scale pretraining.

major comments (1)

Abstract: the central claim of consistent outperformance and specific average data reductions (38% energy, 28% force) is presented without naming the reactive-chemistry benchmarks, reporting the number of independent trials, error bars, or any statistical tests; this information is load-bearing for evaluating the reliability of the empirical demonstration.

minor comments (2)

Methods section: provide explicit pseudocode or equations for computing the finite-width NTK and activation kernel from the fixed pretrained MACE features to ensure reproducibility and to clarify any normalization steps.
Results section: include a table or figure caption that lists the exact benchmark datasets, target error thresholds, and baseline implementations so that the reported improvements can be directly compared and verified.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive summary, recognition of the work's potential significance, and recommendation for minor revision. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Abstract: the central claim of consistent outperformance and specific average data reductions (38% energy, 28% force) is presented without naming the reactive-chemistry benchmarks, reporting the number of independent trials, error bars, or any statistical tests; this information is load-bearing for evaluating the reliability of the empirical demonstration.

Authors: We agree that the abstract would benefit from these supporting details to allow immediate evaluation of the empirical claims. The specific reactive-chemistry benchmarks, number of independent trials, error bars, and statistical comparisons are already reported in the main text (Sections 4 and 5) and supplementary material. In the revised version we will condense this information into the abstract—naming the benchmarks, stating the trial count, adding error bars to the reported average reductions, and noting the statistical tests—while preserving the current word count and overall claims. This change strengthens presentation without altering results or conclusions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical comparison of pretrained kernels vs. baselines

full rationale

The paper presents an empirical study introducing NTK and activation kernels from a fixed pretrained MACE model as acquisition signals for active learning of MLIPs. It directly measures performance against fixed-descriptor baselines, committee disagreement, and random acquisition on reactive-chemistry benchmarks, reporting average data reductions without any mathematical derivation chain, self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations. The central claim that pretrained latent spaces supply sufficient acquisition information is tested via explicit contrasts (including randomly initialized kernels) and remains falsifiable through the reported experiments, with no reduction of results to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the domain assumption that pretrained latent geometry aligns with model error; the two kernels are introduced as new signals without additional fitted parameters or invented physical entities.

axioms (1)

domain assumption The latent space of a pretrained MLIP contains sufficient information for effective acquisition without auxiliary uncertainty mechanisms.
This premise is required for the claim that the two kernels eliminate the need for extra heads, Bayesian training, or ensembles.

invented entities (2)

Finite-width neural tangent kernel acquisition signal no independent evidence
purpose: To serve as an acquisition function measuring informativeness from the pretrained model
Newly proposed signal derived from the model; no independent evidence outside the empirical results is provided.
Activation kernel from hidden latent space features no independent evidence
purpose: To serve as an acquisition function built from model activations
Newly proposed signal; no independent evidence outside the empirical results is provided.

pith-pipeline@v0.9.0 · 5530 in / 1421 out tokens · 64206 ms · 2026-05-07T16:05:18.539886+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Force-Aware Neural Tangent Kernels for Scalable and Robust Active Learning of MLIPs
cs.LG 2026-05 unverdicted novelty 7.0

Force-aware NTKs and chunked acquisition enable scalable, robust active learning for MLIPs, achieving lowest energy and force errors on OC20 and remaining competitive on other benchmarks.

Reference graph

Works this paper leans on

113 extracted references · 60 canonical work pages · cited by 1 Pith paper

[2]

Bartók, Risi Kondor, and Gábor Csányi

Bartók, A. P., Kondor, R., and Csányi, G. On representing chemical environments. Physical Review B, 87 0 (18): 0 184115, 2013. doi:10.1103/PhysRevB.87.184115

work page doi:10.1103/physrevb.87.184115 2013
[3]

arXiv preprint arXiv:2401.00096 , year=

Batatia, I., Benner, P., Chiang, Y., Elena, A. M., Kovács, D. P., Riebesell, J., Advincula, X. R., Asta, M., Avaylon, M., et al. A foundation model for atomistic materials chemistry, 2023 a . URL https://arxiv.org/abs/2401.00096

work page arXiv 2023
[4]

Batatia, D

Batatia, I., Kovács, D. P., Simm, G. N. C., Ortner, C., and Csányi, G. Mace: Higher order equivariant message passing neural networks for fast and accurate force fields, 2023 b . URL https://arxiv.org/abs/2206.07697

work page arXiv 2023
[5]

Schaaf, Ondrej Marsalek, and Christoph Schran

Beck, H., Simko, P., Schaaf, L. L., Marsalek, O., and Schran, C. Multi-head committees enable direct uncertainty prediction for atomistic foundation models. The Journal of Chemical Physics, 163 0 (23): 0 234103, 2025. doi:10.1063/5.0288994

work page doi:10.1063/5.0288994 2025
[6]

Generalized neural-network representation of high-dimensional potential-energy surfaces

Behler, J. and Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Physical Review Letters, 98 0 (14): 0 146401, 2007. doi:10.1103/PhysRevLett.98.146401

work page doi:10.1103/physrevlett.98.146401 2007
[7]

Machine learning interatomic potentials: library for efficient training, model development and simulation of molecular systems, 2025

Brunken, C., Peltre, O., Chomet, H., Walewski, L., McAuliffe, M., Heyraud, V., Attias, S., Maarand, M., Khanfir, Y., Toledo, E., Falcioni, F., Bluntzer, M., Acosta-Gutiérrez, S., and Tilly, J. Machine learning interatomic potentials: library for efficient training, model development and simulation of molecular systems, 2025. URL https://arxiv.org/abs/2505.22397

work page arXiv 2025
[8]

and Ong, S

Chen, C. and Ong, S. P. A universal graph deep learning interatomic potential for the periodic table. Nature Computational Science, 2: 0 718--728, 2022. doi:10.1038/s43588-022-00349-3

work page doi:10.1038/s43588-022-00349-3 2022
[9]

Blips: Bayesian learned interatomic potentials, 2026

Coscia, D., de Haan, P., and Welling, M. Blips: Bayesian learned interatomic potentials, 2026. URL https://arxiv.org/abs/2508.14022

work page arXiv 2026
[10]

Online test-time adaptation for better generalization of interatomic potentials to out-of-distribution data

Cui, T., Tang, C., Zhou, D., Wang, L., Zheng, Y., Wang, Y., Wang, L., Yang, W., Bai, L., and Ouyang, W. Online test-time adaptation for better generalization of interatomic potentials to out-of-distribution data. Nature Communications, 16: 0 1891, 2025. doi:10.1038/s41467-025-57101-4

work page doi:10.1038/s41467-025-57101-4 2025
[11]

Bartók, Gábor Csányi, and Michele Ceriotti

De, S., Bartók, A. P., Csányi, G., and Ceriotti, M. Machine learning unifies the modeling of materials and molecules. Science Advances, 3 0 (12): 0 e1701816, 2017. doi:10.1126/sciadv.1701816

work page doi:10.1126/sciadv.1701816 2017
[12]

Bartel, and Gerbrand Ceder

Deng, B., Zhong, P., Jun, K., Riebesell, J., Han, K., Bartel, C. J., and Ceder, G. CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling. Nature Machine Intelligence, 5: 0 1031--1041, 2023. doi:10.1038/s42256-023-00716-3

work page doi:10.1038/s42256-023-00716-3 2023
[13]

Persson, and Gerbrand Ceder

Deng, B., Choi, Y., Zhong, P., Riebesell, J., Anand, S., Li, Z., Jun, K., Persson, K. A., and Ceder, G. Systematic softening in universal machine learning interatomic potentials. npj Computational Materials, 11: 0 9, 2025. doi:10.1038/s41524-024-01500-6

work page doi:10.1038/s41524-024-01500-6 2025
[17]

Himanen, L., J \"a ger, M. O. J., Morooka, E. V., Federici Canova, F., Ranawat, Y. S., Gao, D. Z., Rinke, P., and Foster, A. S. DScribe: Library of descriptors for machine learning in materials science . Computer Physics Communications, 247: 0 106949, 2020. ISSN 0010-4655. doi:10.1016/j.cpc.2019.106949. URL https://doi.org/10.1016/j.cpc.2019.106949

work page doi:10.1016/j.cpc.2019.106949 2020
[18]

H., Ortner, C., and Wang, Y

Ho, C. H., Ortner, C., and Wang, Y. Flexible uncertainty calibration for machine-learned interatomic potentials, 2025. URL https://arxiv.org/abs/2510.00721

work page arXiv 2025
[19]

A framework and benchmark for deep batch active learning for regression

Holzmüller, D., Zaverkin, V., Kästner, J., and Steinwart, I. A framework and benchmark for deep batch active learning for regression. Journal of Machine Learning Research, 24 0 (164): 0 1--81, 2023. URL https://www.jmlr.org/papers/v24/22-0937.html

2023
[21]

On-the-fly machine learning force field generation: Application to melting points

Jinnouchi, R., Karsai, F., and Kresse, G. On-the-fly machine learning force field generation: Application to melting points. Physical Review B, 100 0 (1): 0 014105, 2019. doi:10.1103/PhysRevB.100.014105

work page doi:10.1103/physrevb.100.014105 2019
[24]

A., D'Souza, A., and Choyal, V

Khan, M. A., D'Souza, A., and Choyal, V. Active learning strategies for efficient machine-learned interatomic potentials across diverse material systems, 2026. URL https://arxiv.org/abs/2601.06916

work page arXiv 2026
[26]

Smith, and Benjamin Nebgen

Kulichenko, M., Barros, K., Lubbers, N., Li, Y. W., Messerly, R., Tretiak, S., Smith, J. S., and Nebgen, B. Uncertainty-driven dynamics for active learning of interatomic potentials. Nature Computational Science, 3: 0 230--239, 2023. doi:10.1038/s43588-023-00406-5

work page doi:10.1038/s43588-023-00406-5 2023
[29]

Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G

Levine, D. S., Shuaibi, M., Spotte-Smith, E. W. C., Taylor, M. G., Hasyim, M. R., Michel, K., Batatia, I., Csányi, G., Dzamba, M., Eastman, P., Frey, N. C., Fu, X., Gharakhanyan, V., Krishnapriyan, A. S., Rackers, J. A., Raja, S., Rizvi, A., Rosen, A. S., Ulissi, Z., Vargas, S., Zitnick, C. L., Blau, S. M., and Wood, B. M. The open molecules 2025 (omol25)...

work page arXiv 2025
[30]

A critical review of machine learning interatomic potentials and hamiltonian

Li, Y., Zhang, X., Liu, M., and Shen, L. A critical review of machine learning interatomic potentials and hamiltonian. Journal of Materials Informatics, 5 0 (4), 2025. ISSN 2770-372X. doi:10.20517/jmi.2025.17. URL https://www.oaepublish.com/articles/jmi.2025.17

work page doi:10.20517/jmi.2025.17 2025
[32]

arXiv preprint arXiv:2504.06231 , year=

Neumann, M., Gin, J., Rhodes, B., Bennett, S., Li, Z., Choubisa, H., Hussey, A., and Godwin, J. Orb-v3: atomistic simulation at scale, 2025. URL https://arxiv.org/abs/2504.06231

work page arXiv 2025
[34]

Peterson, Rune Christensen, and Alireza Khorshidi

Peterson, A. A., Christensen, R., and Khorshidi, A. Addressing uncertainty in atomistic machine learning. Physical Chemistry Chemical Physics, 19: 0 10978--10985, 2017. doi:10.1039/C7CP00375G

work page doi:10.1039/c7cp00375g 2017
[35]

Podryabinkin, E. V. and Shapeev, A. V. Active learning of linearly parametrized interatomic potentials. Computational Materials Science, 140: 0 171--180, 2017. doi:10.1016/j.commatsci.2017.08.031

work page doi:10.1016/j.commatsci.2017.08.031 2017
[38]

Extended-connectivity fingerprints

Rogers, D. and Hahn, M. Extended-connectivity fingerprints. Journal of Chemical Information and Modeling, 50 0 (5): 0 742--754, 2010. doi:10.1021/ci100050t

work page doi:10.1021/ci100050t 2010
[39]

Committee neural network potentials control generalization errors and enable active learning

Schran, C., Brezina, K., and Marsalek, O. Committee neural network potentials control generalization errors and enable active learning. The Journal of Chemical Physics, 153 0 (10): 0 104105, 2020. doi:10.1063/5.0016004

work page doi:10.1063/5.0016004 2020
[42]

Smith, Ben Nebgen, Nicholas Lubbers, Olexandr Isayev, and Adrian E

Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O., and Roitberg, A. E. Less is more: Sampling chemical space with active learning. The Journal of Chemical Physics, 148 0 (24): 0 241733, 2018. doi:10.1063/1.5023802

work page doi:10.1063/1.5023802 2018
[46]

Torrisi, Simon Batzner, Yu Xie, Lixin Sun, Alexie M

Vandermause, J., Torrisi, S. B., Batzner, S., Xie, Y., Sun, L., Kolpak, A. M., and Kozinsky, B. On-the-fly active learning of interpretable Bayesian force fields for atomistic rare events. npj Computational Materials, 6: 0 20, 2020. doi:10.1038/s41524-020-0283-z

work page doi:10.1038/s41524-020-0283-z 2020
[49]

Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso- Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R

Wood, B. M., Dzamba, M., Fu, X., Gao, M., Shuaibi, M., Barroso-Luque, L., Abdelmaqsoud, K., Gharakhanyan, V., Kitchin, J. R., Levine, D. S., Michel, K., Sriram, A., Cohen, T., Das, A., Rizvi, A., Sahoo, S. J., Ulissi, Z. W., and Zitnick, C. L. Uma: A family of universal models for atoms, 2026. URL https://arxiv.org/abs/2506.23971

work page arXiv 2026
[50]

Exploring chemical and conformational spaces by batch mode deep active learning

Zaverkin, V., Holzmüller, D., Steinwart, I., and Kästner, J. Exploring chemical and conformational spaces by batch mode deep active learning. Digital Discovery, 1: 0 605--620, 2022. doi:10.1039/D2DD00034B

work page doi:10.1039/d2dd00034b 2022
[51]

Uncertainty-biased molecular dynamics for learning uniformly accurate interatomic potentials

Zaverkin, V., Holzmüller, D., Christiansen, H., Errica, F., Alesiani, F., Takamoto, M., Niepert, M., and Kästner, J. Uncertainty-biased molecular dynamics for learning uniformly accurate interatomic potentials. npj Computational Materials, 10: 0 83, 2024. doi:10.1038/s41524-024-01254-1

work page doi:10.1038/s41524-024-01254-1 2024
[52]

Active learning of uniformly accurate interatomic potentials for materials simulation

Zhang, L., Lin, D.-Y., Wang, H., Car, R., and E, W. Active learning of uniformly accurate interatomic potentials for materials simulation. Physical Review Materials, 3 0 (2): 0 023804, 2019. doi:10.1103/PhysRevMaterials.3.023804

work page doi:10.1103/physrevmaterials.3.023804 2019
[54]

Fast uncertainty estimates in deep learning interatomic potentials

Zhu, A., Batzner, S., Musaelian, A., and Kozinsky, B. Fast uncertainty estimates in deep learning interatomic potentials. The Journal of Chemical Physics, 158 0 (16): 0 164111, 2023. doi:10.1063/5.0136574

work page doi:10.1063/5.0136574 2023
[55]

Data curation for machine learning interatomic potentials by determinantal point processes, 2026

Zou, J. and Marzouk, Y. Data curation for machine learning interatomic potentials by determinantal point processes, 2026. URL https://arxiv.org/abs/2603.22160

work page arXiv 2026
[56]

and Becke, A

Kohn, W. and Becke, A. D. and Parr, R. G. , title =. The Journal of Physical Chemistry , year =. doi:10.1021/jp960669l , url =

work page doi:10.1021/jp960669l
[57]

2023 , eprint=

MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields , author=. 2023 , eprint=

2023
[58]

2025 , eprint=

Machine Learning Interatomic Potentials: library for efficient training, model development and simulation of molecular systems , author=. 2025 , eprint=

2025
[59]

Machine learning interatomic potential: Bridge the gap between small-scale models and realistic device-scale simulations , journal =

Guanjie Wang and Changrui Wang and Xuanguang Zhang and Zefeng Li and Jian Zhou and Zhimei Sun , keywords =. Machine learning interatomic potential: Bridge the gap between small-scale models and realistic device-scale simulations , journal =. 2024 , issn =. doi:https://doi.org/10.1016/j.isci.2024.109673 , url =

work page doi:10.1016/j.isci.2024.109673 2024
[60]

Quality of uncertainty estimates from neural network potential ensembles , volume=

Kahle, Leonid and Zipoli, Federico , year=. Quality of uncertainty estimates from neural network potential ensembles , volume=. Physical Review E , publisher=. doi:10.1103/physreve.105.015311 , number=

work page doi:10.1103/physreve.105.015311
[61]

Journal of Materials Informatics , VOLUME =

Yifan Li and Xiuying Zhang and Mingkang Liu and Lei Shen , TITLE =. Journal of Materials Informatics , VOLUME =. 2025 , NUMBER =

2025
[62]

Jacobs, Ryan and Morgan, Dane and Attarian, Siamak and Meng, Jun and Shen, Chen and Wu, Zhenghao and Xie, Clare Yijia and Yang, Julia H. and Artrith, Nongnuch and Blaiszik, Ben and Ceder, Gerbrand and Choudhary, Kamal and Csanyi, Gabor and Cubuk, Ekin Dogus and Deng, Bowen and Drautz, Ralf and Fu, Xiang and Godwin, Jonathan and Honavar, Vasant and Isayev,...

work page doi:10.1016/j.cossms.2025.101214 2025
[63]

Physical Review Letters , volume=

Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons , author=. Physical Review Letters , volume=. 2010 , doi=

2010
[64]

Physical Review Letters , volume=

Deep Potential Molecular Dynamics: A Scalable Model with the Accuracy of Quantum Mechanics , author=. Physical Review Letters , volume=. 2018 , doi=

2018
[65]

and Sauceda, Huziel E

Schütt, Kristof T. and Sauceda, Huziel E. and Kindermans, Pieter-Jan and Tkatchenko, Alexandre and Müller, Klaus-Robert , journal=. 2018 , doi=

2018
[66]

Physical Review B , volume=

Atomic cluster expansion for accurate and transferable interatomic potentials , author=. Physical Review B , volume=. 2019 , doi=

2019
[67]

and Kornbluth, Mordechai and Molinari, Nicola and Smidt, Tess E

Batzner, Simon and Musaelian, Albert and Sun, Lixin and Geiger, Mario and Mailoa, Jonathan P. and Kornbluth, Mordechai and Molinari, Nicola and Smidt, Tess E. and Kozinsky, Boris , journal=. 2022 , doi=

2022
[68]

Nature Communications , volume=

Learning local equivariant representations for large-scale atomistic dynamics , author=. Nature Communications , volume=. 2023 , doi=

2023
[69]

2026 , eprint=

UMA: A Family of Universal Models for Atoms , author=. 2026 , eprint=

2026
[70]

Nature Communications , year =

Mazitov, Arslan and Bigi, Filippo and Kellner, Matthias and Pegolo, Paolo and Tisi, Davide and Fraux, Guillaume and Pozdnyakov, Sergey and Loche, Philip and Ceriotti, Michele , title =. Nature Communications , year =. doi:10.1038/s41467-025-65662-7 , url =

work page doi:10.1038/s41467-025-65662-7
[71]

Physical Review Letters , volume=

Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces , author=. Physical Review Letters , volume=. 2007 , doi=

2007
[72]

Physical Review B , volume=

On representing chemical environments , author=. Physical Review B , volume=. 2013 , doi=

2013
[73]

Science Advances , volume=

Machine learning unifies the modeling of materials and molecules , author=. Science Advances , volume=. 2017 , doi=

2017
[74]

Journal of Chemical Information and Modeling , volume=

Extended-Connectivity Fingerprints , author=. Journal of Chemical Information and Modeling , volume=. 2010 , doi=

2010
[75]

Chemical Reviews , year=

Data Generation for Machine Learning Interatomic Potentials and Beyond , author=. Chemical Reviews , year=
[76]

The Journal of Chemical Physics , volume=

Less is more: Sampling chemical space with active learning , author=. The Journal of Chemical Physics , volume=. 2018 , doi=

2018
[77]

Computational Materials Science , volume=

Active learning of linearly parametrized interatomic potentials , author=. Computational Materials Science , volume=. 2017 , doi=

2017
[78]

Physical Review B , volume=

On-the-fly machine learning force field generation: Application to melting points , author=. Physical Review B , volume=. 2019 , doi=

2019
[79]

and Batzner, Simon and Xie, Yu and Sun, Lixin and Kolpak, Alexie M

Vandermause, Jonathan and Torrisi, Steven B. and Batzner, Simon and Xie, Yu and Sun, Lixin and Kolpak, Alexie M. and Kozinsky, Boris , journal=. On-the-fly active learning of interpretable. 2020 , doi=

2020
[80]

Physical Review Materials , volume=

Active learning of uniformly accurate interatomic potentials for materials simulation , author=. Physical Review Materials , volume=. 2019 , doi=

2019
[81]

A comprehensive benchmark of active learning strategies with AutoML for small-sample regression in materials science , volume =

Bi, Jinghou and Xu, Yuanhao and Conrad, Felix and Wiemer, Hajo and Ihlenfeldt, Steffen , year =. A comprehensive benchmark of active learning strategies with AutoML for small-sample regression in materials science , volume =. Scientific Reports , publisher =. doi:10.1038/s41598-025-24613-4 , number =

work page doi:10.1038/s41598-025-24613-4
[82]

Nature Computational Science , volume=

Uncertainty-driven dynamics for active learning of interatomic potentials , author=. Nature Computational Science , volume=. 2023 , doi=

2023
[83]

npj Computational Materials , volume=

De novo exploration and self-guided learning of potential-energy surfaces , author=. npj Computational Materials , volume=. 2019 , doi=

2019
[84]

The Journal of Chemical Physics , volume=

An entropy-maximization approach to automated training set generation for interatomic potentials , author=. The Journal of Chemical Physics , volume=. 2020 , doi=

2020
[85]

Digital Discovery , year=

Active learning meets metadynamics: automated workflow for reactive machine learning interatomic potentials , author=. Digital Discovery , year=
[86]

The Journal of Chemical Physics , volume=

Committee neural network potentials control generalization errors and enable active learning , author=. The Journal of Chemical Physics , volume=. 2020 , doi=

2020
[87]

Physical Chemistry Chemical Physics , volume=

Addressing uncertainty in atomistic machine learning , author=. Physical Chemistry Chemical Physics , volume=. 2017 , doi=

2017
[88]

The Journal of Chemical Physics , volume=

Fast uncertainty estimates in deep learning interatomic potentials , author=. The Journal of Chemical Physics , volume=. 2023 , doi=

2023
[89]

Machine Learning: Science and Technology , volume=

Uncertainty quantification by direct propagation of shallow ensembles , author=. Machine Learning: Science and Technology , volume=. 2024 , doi=

2024
[90]

The Journal of Chemical Physics , volume=

Multi-head committees enable direct uncertainty prediction for atomistic foundation models , author=. The Journal of Chemical Physics , volume=. 2025 , doi=

2025
[91]

Physical Review Materials , volume =

Ouyang, Xinjian and Wang, Zhilong and Jie, Xiao and Zhang, Feng and Zhang, Yanxing and Liu, Laijun and Wang, Dawei , title =. Physical Review Materials , volume =. 2024 , month = oct, publisher =. doi:10.1103/PhysRevMaterials.8.103804 , url =

work page doi:10.1103/physrevmaterials.8.103804 2024
[92]

2026 , eprint=

Cutting Through the Noise: On-the-fly Outlier Detection for Robust Training of Machine Learning Interatomic Potentials , author=. 2026 , eprint=

2026
[93]

2026 , eprint=

BLIPs: Bayesian Learned Interatomic Potentials , author=. 2026 , eprint=

2026
[94]

2025 , eprint=

Orb-v3: atomistic simulation at scale , author=. 2025 , eprint=

2025
[95]

Tan, Aik Rui and Urata, Shingo and Goldman, Samuel and Dietschreit, Johannes C. B. and Gómez-Bombarelli, Rafael , year=. Single-model uncertainty quantification in neural network potentials does not consistently outperform model ensembles , volume=. npj Computational Materials , publisher=. doi:10.1038/s41524-023-01180-8 , number=

work page doi:10.1038/s41524-023-01180-8
[96]

Advances in Neural Information Processing Systems , year=

Neural Tangent Kernel: Convergence and Generalization in Neural Networks , author=. Advances in Neural Information Processing Systems , year=
[97]

Journal of Machine Learning Research , volume=

A Framework and Benchmark for Deep Batch Active Learning for Regression , author=. Journal of Machine Learning Research , volume=. 2023 , url=

2023
[98]

2023 , eprint=

Black-Box Batch Active Learning for Regression , author=. 2023 , eprint=

2023
[99]

, year =

Vazirani, Vijay V. , year =. k-Center , ISBN =. doi:10.1007/978-3-662-04565-7_5 , booktitle =

work page doi:10.1007/978-3-662-04565-7_5
[100]

Rasmussen, Carl Edward and Williams, Christopher K. I. , year =. Gaussian Processes for Machine Learning , ISBN =. doi:10.7551/mitpress/3206.001.0001 , publisher =

work page doi:10.7551/mitpress/3206.001.0001
[101]

Digital Discovery , volume=

Exploring chemical and conformational spaces by batch mode deep active learning , author=. Digital Discovery , volume=. 2022 , doi=

2022
[102]

Journal of Chemical Theory and Computation , year=

Enhanced Representation-Based Sampling for the Efficient Generation of Data Sets for Machine-Learned Interatomic Potentials , author=. Journal of Chemical Theory and Computation , year=

Showing first 80 references.