pith. sign in

arxiv: 2606.23725 · v1 · pith:ZFX6RGLQnew · submitted 2026-06-19 · ❄️ cond-mat.mtrl-sci · cs.LG

Computational references are not experiments: pre-registered validation of machine-learned sodium-cathode voltages

Pith reviewed 2026-06-26 13:56 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.LG
keywords machine learningbattery materialssodium-ion cathodesDFT validationpre-registered studyvoltage predictioncomputational references
0
0 comments X

The pith

Machine-learned sodium-cathode voltage predictions fail pre-registered validation against experiment because the DFT references are the dominant error source.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a machine-learning voltage screen for Na-ion cathodes, trained and judged on computed references, produces 0.67 V mean absolute error against an operator-audited set of experimental literature values. The pre-registered upper 95% confidence bound on the bias-corrected error reaches 1.09 V, the residuals correlate strongly with voltage (r = -0.94), and no simple additive correction works. Direct comparison on two compounds reveals that the Materials Project PBE+U reference sits 0.54 V below measured voltages, so the computational reference, not the learned model, accounts for most of the discrepancy. The authors therefore retire the screen and pre-register a calibration audit of their DFT ledger against benchmark Li couples.

Core claim

On an operator-audited set of six known Na-ion cathodes, the held-out mean absolute error is 0.67 V with an upper 95% confidence bound of 1.09 V on the cross-validated bias-corrected error; the residual is voltage-dependent (r = -0.94) so additive calibration is invalid, and on the two compounds allowing three-way comparison the PBE+U reference lies 0.54 V below experiment while the model prediction is closer to measurement.

What carries the argument

The pre-registered validation against an operator-audited experimental test set that separates model error from reference error.

If this is right

  • The screen is retired because it cannot be treated as verified against experiment.
  • No additive calibration of the model is valid because residuals vary strongly with voltage.
  • At least 70% of the targeted Na substitution space has already been published according to a prior-art screen.
  • A calibration audit of the DFT ledger against four benchmark Li couples is now pre-registered.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Many other machine-learning battery screens that rely on the same class of DFT references may be limited by reference accuracy rather than model capacity.
  • Direct experimental anchoring or improved DFT functionals for voltage prediction would be needed before such screens can be considered reliable.
  • The observation that prior computational searches have already covered most of the Na substitution space suggests limited additional yield from further unanchored computational enumeration in this chemical space.

Load-bearing premise

The small operator-audited collection of literature experimental voltages forms an unbiased and representative ground-truth set.

What would settle it

New experimental measurements or additional audited literature values showing that the Materials Project PBE+U voltages lie within 0.2 V of experiment on a larger set of Na cathodes.

Figures

Figures reproduced from arXiv: 2606.23725 by Krishna Teja Vepa.

Figure 2
Figure 2. Figure 2: FIG. 2. Signed prediction error against the experimental [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Three-way comparison for the two polymorph-resolved [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Machine-learning screens for battery materials are trained and judged almost entirely against computed reference voltages, and those references carry their own systematic errors. We report a case in which this matters quantitatively: our own screening stack (a graph-network voltage screen, a prior-art triage layer, and a local PBE+U bench) fails pre-registered validation against experiment-anchored literature values. Verdict thresholds, failure modes, and the primary metric were committed before analysis. On an operator-audited set of known Na-ion cathodes (n = 6 after one documented exclusion; verdict unchanged at n = 7), the raw held-out mean absolute error was 0.67 V, the pre-registered conservative metric, the upper 95% confidence bound of the cross-validated bias-corrected error, was 1.09 V, and the residual was strongly voltage-dependent (r = -0.94), so no additive calibration is valid. On the two compounds where prediction, database reference, and experiment could all be compared, the Materials Project PBE+U reference sat about 0.54 V below measurement: the reference, not the model, dominated the error. A prior-art screen found at least 70% of the targeted Na substitution space already published. We retire the screen, bound what "verified" means for our DFT ledger, and pre-register a calibration audit of it against four benchmark Li couples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript reports a pre-registered validation of a machine-learned graph-network voltage screen for Na-ion cathode materials against operator-audited experimental literature values. On a set of n=6 known Na-ion cathodes (after one documented exclusion), the raw held-out MAE is 0.67 V, the pre-registered conservative metric (upper 95% CI of cross-validated bias-corrected error) is 1.09 V, and the residual shows strong voltage dependence (r = -0.94), indicating no valid additive calibration. Analysis of two compounds where prediction, database reference, and experiment can be compared shows the Materials Project PBE+U reference underestimates by ~0.54 V, suggesting the computational reference, not the ML model, dominates the error. The authors retire the screen and pre-register a calibration audit of their DFT ledger against Li couples.

Significance. If the findings hold, this work highlights a critical limitation in ML materials screening: reliance on computed references can lead to misleading performance assessments. The explicit pre-registration of metrics, thresholds, and failure modes, along with transparent documentation of the test set curation, strengthens the credibility of the validation process and provides a model for rigorous benchmarking in the field.

major comments (1)
  1. [Abstract / Validation results] The operator-audited selection of the n=6 (or n=7) test set with one documented exclusion is load-bearing for the central claim that the ML model fails validation and that the DFT reference dominates the error. With such a small sample, it is not demonstrated that the curation process avoids selection bias that could produce the observed voltage-dependent residual pattern (r=-0.94) and the attribution of error to the references even if the model itself is unbiased.
minor comments (1)
  1. [Abstract] The notation for the residual correlation (r = -0.94) could benefit from specifying the exact number of points used in the correlation calculation, given the small n.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and for emphasizing the importance of test-set curation in a small-sample validation. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract / Validation results] The operator-audited selection of the n=6 (or n=7) test set with one documented exclusion is load-bearing for the central claim that the ML model fails validation and that the DFT reference dominates the error. With such a small sample, it is not demonstrated that the curation process avoids selection bias that could produce the observed voltage-dependent residual pattern (r=-0.94) and the attribution of error to the references even if the model itself is unbiased.

    Authors: The test set was defined in the pre-registration as every documented Na-ion cathode in the experimental literature that satisfied the stated inclusion criteria; the single exclusion and its rationale were recorded before any residual analysis. Operator auditing verified compliance with those criteria without reference to model outputs or voltage values. We agree that n=6 inherently limits statistical power to exclude every conceivable selection bias. However, the pre-registration of the full protocol (test-set definition, metric, failure threshold, and analysis plan) before model evaluation removes the most common source of post-hoc bias. The observed r=-0.94 voltage dependence is independently corroborated by the direct DFT-vs-experiment comparison on the two compounds where all three quantities exist, showing a consistent 0.54 V underestimation by the PBE+U reference irrespective of the ML model. This external anchor supports attribution of the dominant error to the computational ledger rather than to curation artifacts. No revision is required. revision: no

Circularity Check

0 steps flagged

No circularity: central result is direct comparison to external experimental literature values

full rationale

The paper reports pre-registered held-out MAE, bias-corrected upper CI, and residual correlation computed directly from operator-audited literature experimental voltages (n=6/7) for known Na-ion cathodes. These quantities are not derived from any internal equations, fitted parameters, or self-citations; they are straightforward statistical summaries against independent external ground truth. The screening stack description references prior work but does not load-bear the validation metrics or failure conclusion. No self-definitional, fitted-input-as-prediction, or ansatz-smuggling steps appear in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the chosen experimental literature values as ground truth and on standard statistical procedures for error estimation; no new physical entities or ad-hoc fitted parameters are introduced to support the failure verdict.

axioms (1)
  • standard math Standard statistical procedures for mean absolute error, cross-validated bias correction, and 95% confidence bounds apply without modification to the n=6 sample.
    Invoked for the reported MAE of 0.67 V and conservative metric of 1.09 V.

pith-pipeline@v0.9.1-grok · 5787 in / 1462 out tokens · 47074 ms · 2026-06-26T13:56:00.473056+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 6 canonical work pages

  1. [1]

    Merchant, S

    A. Merchant, S. Batzner, S. S. Schoenholz, M. Aykol, G. Cheon, and E. D. Cubuk, Scaling deep learning for materials discovery, Nature624, 80 (2023)

  2. [2]

    N. J. Szymanski, B. Rendy, Y. Fei, R. E. Kumar, T. He, D. Milsted, M. J. McDermott, M. Gallant, E. D. Cubuk, A. Merchant, H. Kim, A. Jain, C. J. Bartel, K. Persson, Y. Zeng, and G. Ceder, An autonomous laboratory for the accelerated synthesis of inorganic materials, Nature 624, 86 (2023)

  3. [3]

    A. K. Cheetham and R. Seshadri, Artificial Intelligence Driving Materials Discovery? Perspective on the Article: Scaling Deep Learning for Materials Discovery, Chemistry of Materials36, 3490 (2024)

  4. [4]

    Nielsen, A

    J. Leeman, Y. Liu, J. Stiles, S. B. Lee, P. Bhatt, L. M. Schoop, and R. G. Palgrave, Challenges in High- Throughput Inorganic Materials Prediction and Au- tonomous Synthesis, PRX Energy3, 10.1103/PRXEn- ergy.3.011002 (2024)

  5. [5]

    M. K. Aydinol, A. F. Kohan, G. Ceder, K. Cho, and J. Joannopoulos, Ab initio study of lithium intercalation in metal oxides and metal dichalcogenides, Physical Review B56, 1354 (1997)

  6. [6]

    L. Wang, T. Maxisch, and G. Ceder, Oxidation energies of transition metal oxides within the GGA + U framework, Physical Review B73, 10.1103/PhysRevB.73.195107 (2006)

  7. [7]

    A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, and K. A. Persson, Commentary: The Materials Project: A materials genome approach to accelerating materials innovation, APL Materials1, 10.1063/1.4812323 (2013)

  8. [8]

    B. A. Nosek, C. R. Ebersole, A. C. DeHaven, and D. T. Mellor, The preregistration revolution, Proceedings of the National Academy of Sciences115, 2600 (2018)

  9. [9]

    Davies, K

    D. Davies, K. Butler, A. Jackson, J. Skelton, K. Morita, and A. Walsh, SMACT: Semiconducting Materials by Analogy and Chemical Theory, Journal of Open Source Software4, 1361 (2019)

  10. [10]

    Batatia, D

    I. Batatia, D. P. Kov´ acs, G. N. C. Simm, C. Ortner, and G. Cs´ anyi, MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields (2022), arXiv:2206.07697

  11. [11]

    Batatia, P

    I. Batatia, P. Benner, Y. Chiang, A. M. Elena, D. P. Kov´ acs, J. Riebesell, X. R. Advincula, M. Asta, M. Avay- lon, W. J. Baldwin,et al., A foundation model for atom- istic materials chemistry (2023), arXiv:2401.00096

  12. [12]

    Giannozzi, S

    P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car, C. Cavazzoni, D. Ceresoli, G. L. Chiarotti, M. Cococcioni, I. Dabo,et al., QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials, Journal of Physics: Condensed Matter21, 395502 (2009)

  13. [13]

    Giannozzi, O

    P. Giannozzi, O. Andreussi, T. Brumme, O. Bunau, M. Buongiorno Nardelli, M. Calandra, R. Car, C. Cavaz- zoni, D. Ceresoli, M. Cococcioni,et al., Advanced capabil- ities for materials modelling with Quantum ESPRESSO, Journal of Physics: Condensed Matter29, 465901 (2017)

  14. [14]

    Z. Jian, L. Zhao, H. Pan, Y.-S. Hu, H. Li, W. Chen, and L. Chen, Carbon coated Na3V2(PO4)3 as novel elec- trode material for sodium ion batteries, Electrochemistry Communications14, 86 (2012)

  15. [15]

    Kim, D.-H

    J. Kim, D.-H. Seo, H. Kim, I. Park, J.-K. Yoo, S.-K. Jung, Y.-U. Park, W. A. Goddard III, and K. Kang, Unexpected discovery of low-cost maricite NaFePO 4 as a high-performance electrode for Na-ion batteries, Energy & Environmental Science8, 540 (2015)

  16. [16]

    Chiring, M

    A. Chiring, M. Mazumder, S. K. Pati, C. S. Johnson, and P. Senguttuvan, Unraveling the formation mechanism of 12 NaCoPO4 polymorphs, Journal of Solid State Chemistry 293, 121766 (2021)

  17. [17]

    Barker, M

    J. Barker, M. Y. Saidi, and J. L. Swoyer, A Sodium- Ion Cell Based on the Fluorophosphate Compound NaVPO[sub 4]F, Electrochemical and Solid-State Letters 6, A1 (2003)

  18. [18]

    A. K. Padhi, K. S. Nanjundaswamy, and J. B. Goode- nough, Phospho-olivines as Positive-Electrode Materials for Rechargeable Lithium Batteries, Journal of The Elec- trochemical Society144, 1188 (1997)

  19. [19]

    Nishimura, M

    S.-i. Nishimura, M. Nakamura, R. Natsui, and A. Yamada, New Lithium Iron Pyrophosphate as 3.5 V Class Cathode Material for Lithium Ion Battery, Journal of the American Chemical Society132, 13596 (2010)

  20. [20]

    Thackeray, W

    M. Thackeray, W. David, P. Bruce, and J. Goodenough, Lithium insertion into manganese spinels, Materials Re- search Bulletin18, 461 (1983)

  21. [21]

    Ohzuku, M

    T. Ohzuku, M. Kitagawa, and T. Hirai, Electrochemistry of Manganese Dioxide in Lithium Nonaqueous Cell: III . X-Ray Diffractional Study on the Reduction of Spinel- Related Manganese Dioxide, Journal of The Electrochem- ical Society137, 769 (1990)

  22. [22]

    Rodr´ ıguez-Carvajal, G

    J. Rodr´ ıguez-Carvajal, G. Rousse, C. Masquelier, and M. Hervieu, Electronic Crystallization in a Lithium Bat- tery Material: Columnar Ordering of Electrons and Holes in the Spinel LiMn 2 O 4, Physical Review Letters81, 4660 (1998)

  23. [23]

    H. J. Monkhorst and J. D. Pack, Special points for Brillouin-zone integrations, Physical Review B13, 5188 (1976)

  24. [24]

    S. L. Dudarev, G. A. Botton, S. Y. Savrasov, C. J. Humphreys, and A. P. Sutton, Electron-energy-loss spec- tra and the structural stability of nickel oxide: An LSDA+U study, Physical Review B57, 1505 (1998)

  25. [25]

    Timrov, N

    I. Timrov, N. Marzari, and M. Cococcioni, HP – A code for the calculation of Hubbard parameters using density- functional perturbation theory, Computer Physics Com- munications279, 108455 (2022)

  26. [26]

    K. F. Garrity, J. W. Bennett, K. M. Rabe, and D. Van- derbilt, Pseudopotentials for high-throughput DFT calcu- lations, Computational Materials Science81, 446 (2014)

  27. [27]

    Dal Corso, Pseudopotentials periodic table: From H to Pu, Computational Materials Science95, 337 (2014)

    A. Dal Corso, Pseudopotentials periodic table: From H to Pu, Computational Materials Science95, 337 (2014)

  28. [28]

    Chen and S

    C. Chen and S. P. Ong, A universal graph deep learn- ing interatomic potential for the periodic table, Nature Computational Science2, 718 (2022)

  29. [29]

    Bootstrap methods: Another look at the jackknife,

    B. Efron, Bootstrap Methods: Another Look at the Jack- knife, The Annals of Statistics7, 10.1214/aos/1176344552 (1979)

  30. [30]

    S.-H. Bo, X. Li, A. J. Toumar, and G. Ceder, Layered-to- Rock-Salt Transformation in Desodiated Na x CrO2 ( x 0.4), Chemistry of Materials28, 1419 (2016)

  31. [31]

    H. Kim, I. Park, D.-H. Seo, S. Lee, S.-W. Kim, W. J. Kwon, Y.-U. Park, C. S. Kim, S. Jeon, and K. Kang, New Iron-Based Mixed-Polyanion Cathodes for Lithium and Sodium Rechargeable Batteries: Combined First Princi- ples Calculations and Experimental Study, Journal of the American Chemical Society134, 10369 (2012)

  32. [32]

    G. Yan, S. Mariyappan, G. Rousse, Q. Jacquet, M. De- schamps, R. David, B. Mirvaux, J. W. Freeland, and J.-M. Tarascon, Higher energy and safer sodium ion batteries via an electrochemically made disordered Na3V2(PO4)2F3 material, Nature Communications10, 10.1038/s41467- 019-08359-y (2019)

  33. [33]

    Barpanda, T

    P. Barpanda, T. Ye, S.-i. Nishimura, S.-C. Chung, Y. Ya- mada, M. Okubo, H. Zhou, and A. Yamada, Sodium iron pyrophosphate: A novel 3.0 V iron-based cathode for sodium-ion batteries, Electrochemistry Communications 24, 116 (2012)

  34. [34]

    Barpanda, G

    P. Barpanda, G. Oyama, S.-i. Nishimura, S.-C. Chung, and A. Yamada, A 3.8-V earth-abundant sodium battery electrode, Nature Communications5, 10.1038/ncomms5358 (2014)

  35. [35]

    I. U. Mohsin, L. Schneider, Z. Yu, W. Cai, and C. Ziebert, Enabling the Electrochemical Performance of Maricite- NaMnPO4 and Maricite-NaFePO4 Cathode Materials in Sodium-Ion Batteries, International Journal of Electro- chemistry2023, 1 (2023)

  36. [36]

    Tripathi, T

    R. Tripathi, T. N. Ramesh, B. L. Ellis, and L. F. Nazar, Scalable Synthesis of Tavorite LiFeSO4 F and NaFeSO4 F Cathode Materials, Angewandte Chemie International Edition49, 8738 (2010)

  37. [37]

    Komaba, N

    S. Komaba, N. Yabuuchi, T. Nakayama, A. Ogata, T. Ishikawa, and I. Nakai, Study on the Reversible Elec- trode Reaction of Na 1−−x Ni0.5 Mn0.5 O2 for a Recharge- able Sodium-Ion Battery, Inorganic Chemistry51, 6211 (2012)

  38. [38]

    B. L. Ellis, W. R. M. Makahnouk, Y. Makimura, K. Toghill, and L. F. Nazar, A multifunctional 3.5 V iron-based phosphate cathode for rechargeable batteries, Nature Materials6, 749 (2007)

  39. [39]

    Kawabe, N

    Y. Kawabe, N. Yabuuchi, M. Kajiyama, N. Fukuhara, T. Inamasu, R. Okuyama, I. Nakai, and S. Komaba, Synthesis and electrode performance of carbon coated Na2FePO4F for rechargeable Na batteries, Electrochem- istry Communications13, 1225 (2011)

  40. [40]

    Y. Liu, Y. Zhou, J. Zhang, Y. Xia, T. Chen, and S. Zhang, Monoclinic Phase Na 3 Fe2(PO4)3: Synthesis, Structure, and Electrochemical Performance as Cathode Material in Sodium-Ion Batteries, ACS Sustainable Chemistry & Engineering5, 1306 (2016)