pith. sign in

arxiv: 2605.16611 · v1 · pith:SJPWZCVTnew · submitted 2026-05-15 · ❄️ cond-mat.mtrl-sci

Machine Learning Approaches to Point Defects in Non-Metallic Materials: A Review of Methods

Pith reviewed 2026-05-20 15:50 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci
keywords machine learningpoint defectsnon-metallic materialsdefect formation energiesmachine-learning potentialscharged defectsFermi levelfinite-size corrections
0
0 comments X

The pith

Machine learning studies of point defect energies in non-metals divide into direct models from local structures and machine-learning potentials for the full energy surface, with dataset quality as the main limiter.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This review organizes recent machine-learning work on point defects in non-metallic materials around defect formation energies. It groups existing approaches into direct models that map local atomic representations straight to energies and machine-learning potentials that fit the entire defect-containing energy surface. The analysis shows that training data quality frequently sets the practical limits on accuracy and transferability more than the choice of algorithm. Charged-defect calculations stand out as an unresolved frontier because they require consistent treatment of Fermi-level position, finite-size corrections, and long-range electrostatics. The two-category framing helps clarify where current methods deliver useful results and where further development is still needed.

Core claim

Existing studies largely fall into two categories: direct ML models that predict defect energetics from local structural representations, and machine-learning potentials (MLPs) that approximate the defect-containing potential energy surface. Key achievements are summarized alongside persistent bottlenecks, with dataset quality often dominating practical model performance. Charged-defect formation energies emerge as a central frontier requiring careful handling of Fermi-level alignment, finite-size corrections, and long-range electrostatics for meaningful comparisons and transferable predictions across different materials.

What carries the argument

The two-category classification of direct ML models versus machine-learning potentials, combined with the emphasis on dataset quality and consistent treatment of charged-defect electrostatics.

If this is right

  • Higher-quality curated datasets would raise the accuracy and reliability of both direct models and machine-learning potentials for defect predictions.
  • Standardized protocols for Fermi-level alignment and finite-size corrections would allow direct numerical comparisons of charged-defect energies across different materials and studies.
  • Improved handling of long-range electrostatics in ML approaches would enable more transferable predictions when moving from one non-metallic material to another.
  • Recognition that data quality often outweighs algorithmic choice would shift research effort toward systematic dataset construction rather than solely toward new model architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The review implies that shared public repositories of high-quality defect structures and energies could accelerate progress across the field by reducing redundant data generation.
  • Consistent charged-defect methods developed for point defects may later support machine-learning studies of interfaces or extended defects that also involve electrostatic boundary conditions.
  • If dataset quality remains the dominant factor, experimental validation campaigns focused on a few well-characterized materials could serve as benchmarks to test model transferability more stringently.
  • The classification suggests that hybrid workflows combining direct ML predictions for quick screening with MLP-based relaxation for structural accuracy may become common practice.

Load-bearing premise

The reviewed literature is representative of the field and the two-category division plus the emphasis on dataset quality and charged-defect issues capture the main practical limitations without major omissions or selection bias in the papers chosen for discussion.

What would settle it

A broad survey that identifies a third distinct class of machine-learning methods for defect energetics not captured by the direct-model or MLP categories would undermine the completeness of the proposed division.

Figures

Figures reproduced from arXiv: 2605.16611 by Shin Kiyohara, Yu Kumagai.

Figure 1
Figure 1. Figure 1: (a) Schematic illustration of representative point defects in non [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: Intrinsic Point Defects (Top: Vacancy, [PITH_FULL_IMAGE:figures/full_fig_p033_1.png] view at source ↗
read the original abstract

We review recent machine-learning (ML) approaches for point defects in non-metallic materials, with an emphasis on defect formation energies. Existing studies largely fall into two categories: direct ML models that predict defect energetics from local structural representations, and machine-learning potentials (MLPs) that approximate the defect-containing potential energy surface. We summarize key achievements as well as persistent bottlenecks, emphasizing that dataset quality often dominates practical model performance. We further identify charged-defect formation energies as a central frontier, where Fermi-level alignment, finite-size corrections, and long-range electrostatics must be handled carefully and consistently to enable meaningful comparisons and transferable predictions across different materials.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript reviews recent machine-learning approaches to point defects in non-metallic materials, with emphasis on defect formation energies. It partitions existing studies into two categories—direct ML models that map local structural representations to defect energetics, and machine-learning potentials (MLPs) that approximate the full defect-containing potential-energy surface—then summarizes key achievements, persistent bottlenecks, and the dominant role of dataset quality. It further flags charged-defect formation energies as a central open frontier, stressing the need for consistent treatment of Fermi-level alignment, finite-size corrections, and long-range electrostatics.

Significance. If the taxonomy and literature synthesis hold, the review supplies a timely organizing framework for a rapidly growing subfield. By foregrounding dataset quality and the special difficulties of charged defects, it can help practitioners avoid common pitfalls and direct attention toward transferable, materials-agnostic predictions. The absence of internal derivations or fitted parameters is appropriate for a review; its value lies in the clarity of the two-category division and the explicit identification of charged-defect handling as a load-bearing practical limitation.

minor comments (3)
  1. §2 (or the section introducing the taxonomy): the boundary between “direct ML models” and “MLPs” is stated clearly in the abstract but would benefit from an explicit decision tree or table that assigns each cited work to one category, so readers can verify the partition without ambiguity.
  2. The discussion of charged-defect formation energies correctly identifies Fermi-level alignment and finite-size corrections as critical, yet the manuscript does not provide even a brief worked example (e.g., a small table comparing uncorrected vs. corrected formation energies for a canonical system such as the oxygen vacancy in MgO). Adding one such concrete illustration would strengthen the claim that these issues remain a “central frontier.”
  3. Several citations appear only in the text and not in the reference list (or vice versa); a final pass to ensure every in-text citation has a corresponding entry is needed.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our review and for recommending minor revision. The referee summary accurately reflects the manuscript's scope, taxonomy of direct ML models versus MLPs, emphasis on dataset quality as the dominant factor, and identification of charged-defect formation energies as a key open challenge. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity in this review paper

full rationale

This paper is a literature review that categorizes existing ML studies on point defects into direct models and MLPs, summarizes achievements and bottlenecks such as dataset quality and charged-defect handling, and identifies frontiers without any original derivation chain, equations, predictions, or fitted parameters internal to the work. All claims rest on external cited literature rather than reducing to self-definitions, self-citations as load-bearing premises, or renamings within the paper itself. The two-category taxonomy is descriptive and does not invoke uniqueness theorems or ansatzes from prior author work in a circular manner.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard assumptions from the machine-learning and materials-science literature it cites; no new free parameters, axioms, or invented entities are introduced by the review itself.

axioms (1)
  • domain assumption Machine-learning models for materials can be meaningfully evaluated by comparing predicted defect formation energies to reference calculations or experiments.
    Implicit in the discussion of model performance and bottlenecks.

pith-pipeline@v0.9.0 · 5636 in / 1183 out tokens · 35692 ms · 2026-05-20T15:50:24.404763+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    Below, we review previous studies from the perspective of each category

    This progress was largely driven by the widespread availability of ML software packages. Below, we review previous studies from the perspective of each category. • Target materials / Defect types: Several studies consider only one or a few compositions, and even when a larger number of materials is included, the structural frameworks are often limited, fo...

  2. [2]

    Target properties

    Summary of the studies on direct ML methods. In the “Target properties” column, FE and TL represent defect formation energies and transition levels, respectively. In the “XC functional” column, PBE, HSE, and SCAN denote the Perdew–Burke–Ernzerhof, Heyd–Scuseria–Ernzerhof, and “Strongly Constrained and Appropriately Normed” functionals, respectively, and “...

  3. [3]

    Reviews of Modern Physics , author =

    Summary of MLP applications to point defects. The “First authors and Ref” column lists the first author of each study together with the corresponding reference. The “Year” column indicates the publication year. The “Host Materials (count)” column summarizes the host materials investigated in each study, with the number of distinct compositions given in pa...

  4. [4]

    (5) Noguchi, Y

    https://doi.org/10.3390/nano11102752. (5) Noguchi, Y. Defect Chemistry in Perovskite Ferroelectrics. Journal of the Ceramic Society of Japan 2021, 129, 271–285. https://doi.org/10.2109/jcersj2.21039. (6) Ogawa, T.; others. Point Defect Chemistry for Ionic Conduction in Solid Electrolytes with Isovalent Cation Mixing. J. Mater. Chem. A Mater. 2024,

  5. [5]

    (7) Pastor, E.; others

    https://doi.org/10.1039/D4TA05684A. (7) Pastor, E.; others. Electronic Defects in Metal Oxide Photocatalysts. Nat. Rev. Mater. 2022, 7, 503–521. https://doi.org/10.1038/s41578-022-00433-0. (8) Wang, J.; others. Multiscale Modeling of Crystal Defects in Structural Materials. MRS Bull. 2024, 49, 224–235. https://doi.org/10.1557/s43577-023-00647-9. (9) Giame...

  6. [6]

    (14) Sanson, A

    https://doi.org/10.3390/sym13081315. (14) Sanson, A. EXAFS Spectroscopy: A Powerful Tool for the Study of Local Vibrational Dynamics. Microstructures 2021, 1, 2021004. https://doi.org/10.20517/microstructures.2021.03. (15) Tuomisto, F.; Makkonen, I. Defect Identification in Semiconductors with Positron Annihilation: Experiment and Theory. Rev. Mod. Phys. ...

  7. [7]

    (17) Perdew, J

    https://doi.org/10.1038/s43586-024-00311-9. (17) Perdew, J. P.; Burke, K.; Ernzerhof, M. Generalized Gradient Approximation Made Simple. Phys. Rev. Lett. 1996, 77 (18), 3865–3868. https://doi.org/10.1103/PhysRevLett.77.3865. (18) Wachi, K.; Makizawa, M.; Aihara, T.; Kiyohara, S.; Kumagai, Y.; Kamata, K. Oxygen Defect Engineering of Hexagonal Perovskite Ox...

  8. [8]

    (24) Kudo, S.; Yamasaki, T.; Suzuki, I.; Dorai, A.; Costa-Amaral, R.; Bae, S.; Kumagai, Y

    https://doi.org/10.1021/jacs.8b09917. (24) Kudo, S.; Yamasaki, T.; Suzuki, I.; Dorai, A.; Costa-Amaral, R.; Bae, S.; Kumagai, Y. Role of Hydrogen in the N-Type Oxide Semiconductor MgIn2O4: Experimental Observation of Electrical Conductivity and First-Principles Insight. APL Mater. 2025, 13 (4). (25) Matsuzaki, K.; Chang, C. W.; Nagafuji, T.; Tsunoda, N.; ...

  9. [9]

    (53) Vu, T

    https://doi.org/10.1038/s41524-022-00730-w. (53) Vu, T. N. H.; Kumagai, Y. Investigation of Hole Dopability in Oxygen-2p-Dominated Bands. Chemistry of Materials 2025, 37 (23), 9505–9514. https://doi.org/10.1021/acs.chemmater.5c02032. 32 (54) Kumagai, Y. Computational Screening of P-Type Transparent Conducting Oxides Using the Optical Absorption Spectra an...

  10. [10]

    33 (65) Freysoldt, C.; Neugebauer, J.; Van De Walle, C

    https://doi.org/10.1038/s41524-026-02060-7. 33 (65) Freysoldt, C.; Neugebauer, J.; Van De Walle, C. G. Fully Ab Initio Finite-Size Corrections for Charged-Defect Supercell Calculations. Phys. Rev. Lett. 2009, 102 (1). https://doi.org/10.1103/PhysRevLett.102.016402. (66) Kumagai, Y.; Oba, F. Electrostatics-Based Finite-Size Corrections for First-Principles...

  11. [11]

    Intrinsic Point Defects (Top: Vacancy, Intrinsic Interstitialcy, Antisite) and Extrinsic Point Defects (Bottom: Substitution, Extrinsic Interstitialcy). Comput. Phys. Commun. 2018, 226, 165–179. https://doi.org/10.17632/7vzk5gxzh3.1. (68) Kumagai, Y. Finite-Size Corrections to Defect Energetics along One-Dimensional Configuration Coordinate. Phys. Rev. B ...

  12. [12]

    (93) Breiman, L

    https://doi.org/10.1016/j.ceramint.2012.10.079. (93) Breiman, L. Random Forests. Mach. Learn. 2001, 45 (1), 5–32. (94) Shawe-Taylor, J.; Cristianini, N. Kernel Methods for Pattern Analysis; Cambridge university press,

  13. [13]

    (95) Friedman, J. H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 1189–1232. 36 (96) Smola, A. J.; Schölkopf, B. A Tutorial on Support Vector Regression. Stat. Comput. 2004, 14 (3), 199–222. (97) Tibshirani, R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodologic...

  14. [14]

    (103) Sharma, V.; Kumar, P.; Dev, P.; Pilania, G

    https://doi.org/10.1016/j.commatsci.2020.109803. (103) Sharma, V.; Kumar, P.; Dev, P.; Pilania, G. Machine Learning Substitutional Defect Formation Energies in ABO3perovskites. J. Appl. Phys. 2020, 128 (3). https://doi.org/10.1063/5.0015538. (104) Shenoy, V. B.; Frey, N. C.; Akinwande, D.; Jariwala, D. Machine Learning-Enabled Design of Point Defects in 2...

  15. [15]

    (154) Zhang, P.; Liao, W.; Zhu, Z.; Qin, M.; Zhang, Z.; Jin, D.; Liu, Y.; Wang, Z.; Lu, Z.; Xiong, R

    https://doi.org/10.1103/PhysRevApplied.18.054022. (154) Zhang, P.; Liao, W.; Zhu, Z.; Qin, M.; Zhang, Z.; Jin, D.; Liu, Y.; Wang, Z.; Lu, Z.; Xiong, R. Tuning the Lattice Thermal Conductivity of Sb2Te3 by Cr Doping: A Deep Potential Molecular Dynamics Study. Physical Chemistry Chemical Physics 2023, 15422–15432. https://doi.org/10.1039/d3cp00999h. (155) G...

  16. [16]

    Generalized Gradient Approximation Made Simple

    https://doi.org/10.1103/PhysRevApplied.21.024043. 42 (157) Zhang, J.; Zhang, H.; Wu, J.; Qian, X.; Song, B.; Lin, C. Te; Liu, T. H.; Yang, R. Vacancy-Induced Phonon Localization in Boron Arsenide Using a Unified Neural Network Interatomic Potential. Cell Rep. Phys. Sci. 2024, 5 (1), 101760. https://doi.org/10.1016/j.xcrp.2023.101760. (158) Yokoi, T.; Fuji...

  17. [17]

    (182) Furness, J

    https://doi.org/10.1038/s41524-024-01207-8. (182) Furness, J. W.; Kaplan, A. D.; Ning, J.; Perdew, J. P.; Sun, J. Accurate and Numerically Efficient R2SCAN Meta-Generalized Gradient Approximation. Journal of Physical Chemistry Letters 2020, 11 (19), 8208–8215. https://doi.org/10.1021/acs.jpclett.0c02405. (183) Chen, H.; Zhang, Y.; Zhou, C.; Zhou, Y. Deep ...