pith. sign in

arxiv: 2606.09419 · v1 · pith:TTUPLCXNnew · submitted 2026-06-08 · ❄️ cond-mat.mtrl-sci · cs.AI

Context-Aware Deep Learning for Defect Classification in Atomic-Resolution STEM

Pith reviewed 2026-06-27 15:40 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.AI
keywords context-aware learningdefect classificationSTEM imagingtransition-metal dichalcogenidesdeep learningmaterials characterizationmetadata conditioningatomic-resolution microscopy
0
0 comments X

The pith

Conditioning on experimental metadata transforms ambiguous image-only defect classification into a well-posed physical problem.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that defect identification in atomic-resolution STEM images is inherently ambiguous when relying on contrast alone because similar patterns can arise under different materials or imaging conditions. By building a dataset of roughly 55 million simulated patches across 576 cases in 96 doped monolayer transition-metal dichalcogenides, the authors demonstrate that feeding the model additional metadata on composition, beam energy, and detector geometry removes that ambiguity. The resulting context-aware classifier reaches over 98 percent accuracy on the simulated data and near-human agreement on real experiments while cutting posterior entropy by 94 percent. The work therefore treats context as the primary fix rather than model architecture.

Core claim

The central claim is that conditioning defect classification on contextual variables transforms the task from an ill-posed image-only problem into a well-posed, physically grounded one. Using a systematically generated dataset of approximately 55 million simulated patches spanning 576 cases, the framework achieves over 98 percent accuracy on simulations, near-human agreement on experimental images, and a 94 percent reduction in posterior entropy. The approach thereby links observed contrast directly to the underlying chemical and imaging conditions.

What carries the argument

A context-aware learning framework that concatenates image-derived contrast features with metadata on composition, beam energy, and detector geometry before classification.

If this is right

  • Defect assignments become physically interpretable because contrast is now tied to specific chemical and imaging conditions.
  • The same conditioning strategy supplies a general route to multimodal models for autonomous materials characterization.
  • Posterior entropy drops by 94 percent, directly lowering uncertainty in downstream automated decisions.
  • Near-human performance on experimental data indicates that the simulation-to-real transfer is already sufficient for practical use.
  • Emphasis shifts from architectural scaling to systematic inclusion of available experimental metadata.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same metadata-conditioning step could be tested on other electron-microscopy tasks such as phase identification or strain mapping where context is routinely recorded.
  • If the simulation-to-experiment gap proves larger than assumed, targeted fine-tuning on a small set of labeled experimental patches would be a direct next test.
  • Real-time incorporation of live metadata during acquisition could enable on-the-fly defect flagging inside the microscope control loop.
  • The entropy reduction metric offers a quantitative way to compare context-aware models against purely image-based ones across different material systems.

Load-bearing premise

The simulated dataset of 55 million patches across 576 cases faithfully captures the joint distribution of image contrast and metadata that occurs in real experiments.

What would settle it

Apply the trained model to a fresh collection of experimental STEM images whose defect identities have been independently verified by multiple human experts and measure whether accuracy remains near the reported human-agreement level.

Figures

Figures reproduced from arXiv: 2606.09419 by Cheng Zhang, Goki Eda, Ivan Verzhbitskiy, Jiadong Dan, Leyi Loh, Michel Bosman, N. Duane Loh, Yuan Chen.

Figure 1
Figure 1. Figure 1: From implicit to explicit context: a general framework for context-aware defect classification [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The architecture of the overall learning framework comprising [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Encoding of contextual and contrast information. (a) [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Workflow for Generating Simulation Training Datasets [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Architecture and performance of defect attention model [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Workflow and evaluation of human and model classifications on a multi-dopant WSe [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
read the original abstract

Artificial intelligence is rapidly advancing materials characterization, yet most applications in electron microscopy rely solely on image contrast, overlooking the chemical and experimental context that shapes image formation. This limitation makes defect classification inherently ambiguous, as similar contrasts can arise from different materials or imaging conditions. Here we develop a context-aware learning framework that integrates image-derived contrast with metadata describing composition, beam energy, and detector geometry. Using a systematically constructed dataset of ~55 million simulated patches spanning 576 cases across 96 doped monolayer transition-metal dichalcogenides, we show that conditioning on contextual variables transforms defect classification from an ill-posed image-only task into a well-posed, physically grounded problem. The framework achieves over 98% accuracy on simulations and near-human agreement on experimental data, with a 94% reduction in posterior entropy. By emphasizing contextual grounding over architectural complexity, this approach links experimental image contrast to the underlying chemical and imaging conditions, supporting physically grounded defect assignments and a general pathway toward multimodal AI models for autonomous materials characterization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces a context-aware deep learning framework for classifying defects in atomic-resolution STEM images of doped monolayer transition-metal dichalcogenides. It integrates image-derived contrast with metadata on composition, beam energy, and detector geometry. Using a dataset of ~55 million simulated patches spanning 576 cases across 96 materials, the authors claim that conditioning on context transforms defect classification from an ill-posed image-only task into a well-posed, physically grounded problem, achieving >98% accuracy on simulations, near-human agreement on experimental images, and a 94% reduction in posterior entropy.

Significance. If the central claims hold after addressing validation gaps, the work usefully demonstrates that experimental metadata can resolve ambiguities in image-based defect classification, supporting more reliable materials characterization. The systematic construction of a large simulated dataset across many cases and the use of posterior entropy reduction as a quantitative metric are strengths that could inform multimodal models in microscopy. The emphasis on contextual grounding rather than architectural novelty is a constructive contribution to the field.

major comments (3)
  1. [Abstract] Abstract: the 94% posterior entropy reduction is stated without reporting the image-only baseline entropy value, the exact formula used (e.g., average over patches or classes), or the number of contextual variables included, making it impossible to assess whether the reported collapse is driven by context or by other factors.
  2. [Results (experimental)] Results section on experimental validation: only qualitative 'near-human agreement' is reported for real STEM images, with no accuracy, confusion matrix, or agreement metric provided that is comparable to the >98% simulation figure; this is load-bearing for the claim that the framework produces physically grounded assignments under real imaging conditions.
  3. [Methods (dataset)] Methods (dataset construction): no quantitative distributional comparison (contrast histograms, noise spectra, or feature-space distances between simulated and experimental patches) is supplied to test whether the joint distribution of image contrast and metadata in the 55M synthetic patches matches real data; without this, the transfer of accuracy and entropy reduction to experiments remains unverified.
minor comments (2)
  1. [Abstract] Abstract: the final sentence is truncated in the provided text ('cont'); ensure the full claim about multimodal AI models is clearly stated.
  2. [Methods] Notation for metadata variables (composition, beam energy, detector geometry) should be defined consistently when first introduced to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the 94% posterior entropy reduction is stated without reporting the image-only baseline entropy value, the exact formula used (e.g., average over patches or classes), or the number of contextual variables included, making it impossible to assess whether the reported collapse is driven by context or by other factors.

    Authors: We agree that the abstract should include these details for clarity. The reduction is the relative decrease ((H_image-only - H_context)/H_image-only) in average posterior entropy (Shannon entropy averaged over patches), using the three contextual variables listed in the abstract. The baseline value appears in the Methods and supplementary material. We will revise the abstract to report the baseline entropy and clarify the formula. revision: yes

  2. Referee: [Results (experimental)] Results section on experimental validation: only qualitative 'near-human agreement' is reported for real STEM images, with no accuracy, confusion matrix, or agreement metric provided that is comparable to the >98% simulation figure; this is load-bearing for the claim that the framework produces physically grounded assignments under real imaging conditions.

    Authors: We agree that quantitative metrics would make the experimental claim more robust and comparable. The current text uses 'near-human agreement' because experimental patches lack definitive ground truth. We will revise the Results section to report a quantitative inter-rater agreement percentage between model outputs and majority expert labels on the experimental set, along with a corresponding confusion matrix. revision: yes

  3. Referee: [Methods (dataset)] Methods (dataset construction): no quantitative distributional comparison (contrast histograms, noise spectra, or feature-space distances between simulated and experimental patches) is supplied to test whether the joint distribution of image contrast and metadata in the 55M synthetic patches matches real data; without this, the transfer of accuracy and entropy reduction to experiments remains unverified.

    Authors: We agree that explicit distributional comparisons would strengthen the sim-to-real transfer argument. We will add contrast histograms, noise spectra, and feature-space distances (computed via a fixed encoder) between the simulated and experimental patches to the Methods section and supplementary material in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; performance metrics are independent of training procedure

full rationale

The paper constructs a simulated training set from known physical models of 96 doped TMD monolayers and reports test accuracy (>98%) on held-out simulated patches plus qualitative agreement on separate experimental images. No derivation, equation, or central claim reduces by construction to a fitted parameter or self-citation chain; the entropy reduction and accuracy figures are measured outcomes on data generated independently of the classifier. The framework is therefore self-contained against external benchmarks with no load-bearing self-definitional or fitted-input steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the simulated image formation model accurately reproduces real contrast under the supplied metadata; no free parameters are named in the abstract, but the neural network itself contains many learned weights whose values are not reported. No new physical entities are postulated.

axioms (1)
  • domain assumption The image formation physics encoded in the simulation engine is a faithful forward model for the experimental conditions described by the metadata.
    Invoked when the authors state that conditioning on metadata turns the task well-posed; without this the simulated training data would not transfer.

pith-pipeline@v0.9.1-grok · 5729 in / 1422 out tokens · 18954 ms · 2026-06-27T15:40:30.362482+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references

  1. [1]

    Atom-by-atom structural and chemical analysis by annular dark-field electron microscopy

    O. L. Krivanek et al., “Atom-by-atom structural and chemical analysis by annular dark-field electron microscopy”, Nature464, 571–574 (2010)

  2. [2]

    Z dependence of electron scattering by single atoms into annular dark-field detectors

    M. M. J. Treacy, “Z dependence of electron scattering by single atoms into annular dark-field detectors”, Microsc. Microanal.17, 847–858 (2011)

  3. [3]

    Dynamics of annular bright field imaging in scanning transmission electron microscopy

    S. D. Findlay et al., “Dynamics of annular bright field imaging in scanning transmission electron microscopy”, Ultramicroscopy110, 903–923 (2010)

  4. [4]

    Direct determination of the chemical bonding of individual impurities in graphene

    W. Zhou et al., “Direct determination of the chemical bonding of individual impurities in graphene”, Phys. Rev. Lett.109, 206803 (2012)

  5. [5]

    Electron energy-loss near-edge structure – a tool for the investigation of electronic structure on the nanometre scale

    V. J. Keast et al., “Electron energy-loss near-edge structure – a tool for the investigation of electronic structure on the nanometre scale”, J. Microsc.203, 135–175 (2001)

  6. [6]

    2D atomic mapping of oxidation states in transition metal oxides by scanning transmission electron microscopy and electron energy-loss spectroscopy

    H. Tan et al., “2D atomic mapping of oxidation states in transition metal oxides by scanning transmission electron microscopy and electron energy-loss spectroscopy”, Phys. Rev. Lett.107, 107602 (2011)

  7. [7]

    Four-dimensional scanning transmission electron microscopy (4D-STEM): From scanning nanodiffraction to ptychography and beyond

    C. Ophus, “Four-dimensional scanning transmission electron microscopy (4D-STEM): From scanning nanodiffraction to ptychography and beyond”, Microsc. Microanal.25, 563–582 (2019)

  8. [8]

    Preserving surface strain in nanocatalysts via morphology control

    C. Shi et al., “Preserving surface strain in nanocatalysts via morphology control”, Sci. Adv.10, eadp3788 (2024)

  9. [9]

    Real-space visualization of charge density wave induced local inversion-symmetry breaking in a skyrmion magnet

    H. Ni et al., “Real-space visualization of charge density wave induced local inversion-symmetry breaking in a skyrmion magnet”, arXiv [cond-mat.mtrl-sci], 104414 (2023)

  10. [10]

    Direct observation of local atomic order in a metallic glass

    A. Hirata et al., “Direct observation of local atomic order in a metallic glass”, Nat. Mater.10, 28–33 (2011)

  11. [11]

    Systematic mapping of icosahedral short-range order in a melt-spun Zr 36Cu64 metallic glass

    A. C. Y . Liu et al., “Systematic mapping of icosahedral short-range order in a melt-spun Zr 36Cu64 metallic glass”, Phys. Rev. Lett.110, 205505 (2013)

  12. [12]

    STEM image analysis based on deep learning: identification of vacancy defects and polymorphs of MoS2

    K. Lee et al., “STEM image analysis based on deep learning: identification of vacancy defects and polymorphs of MoS2”, Nano Lett.22, 4677–4685 (2022)

  13. [13]

    Deep convolutional neural networks to restore single-shot electron microscopy images

    I. Lobato, T. Friedrich, and S. Van Aert, “Deep convolutional neural networks to restore single-shot electron microscopy images”, Npj Comput. Mater.10, 10 (2024)

  14. [14]

    DDDNet: A lightweight and robust deep learning model for accurate segmentation and analysis of TEM images

    C. Wang et al., “DDDNet: A lightweight and robust deep learning model for accurate segmentation and analysis of TEM images”, APL Mater.12, 111107 (2024)

  15. [15]

    Single-image-based deep learning for precise atomic defect identification

    K. Li et al., “Single-image-based deep learning for precise atomic defect identification”, Nano Lett.24, 10275–10283 (2024)

  16. [16]

    Automated crystal system identification from electron diffraction patterns using multiview opinion fusion machine learning

    J. Chen et al., “Automated crystal system identification from electron diffraction patterns using multiview opinion fusion machine learning”, Proc. Natl. Acad. Sci. U. S. A.120, e2309240120 (2023)

  17. [17]

    Manifold learning of four-dimensional scanning transmission electron microscopy

    X. Li et al., “Manifold learning of four-dimensional scanning transmission electron microscopy”, npj Computational Materials5, 5 (2019). 15

  18. [18]

    Effect of a comprehensive deep-learning model on the accuracy of chest x-ray interpretation by radiologists: A retrospective, multireader multicase study

    J. C. Y . Seah et al., “Effect of a comprehensive deep-learning model on the accuracy of chest x-ray interpretation by radiologists: A retrospective, multireader multicase study”, Lancet Digit. Health3, e496–e506 (2021)

  19. [19]

    End-to-end data-driven weather prediction

    A. Allen et al., “End-to-end data-driven weather prediction”, Nature641, 1172–1179 (2025)

  20. [20]

    DeepFusion: Lidar-camera deep fusion for multi-modal 3D object detection

    Y . Li et al., “DeepFusion: Lidar-camera deep fusion for multi-modal 3D object detection”, arXiv [cs.CV] (2022)

  21. [21]

    Learning representations by back-propagating errors

    D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors”, Nature323, 533–536 (1986)

  22. [22]

    Attention is all you need

    A. Vaswani et al., “Attention is all you need”, arXiv [cs.CL] (2017)

  23. [23]

    Representation learning: A review and new perspectives

    Y . Bengio, A. Courville, and P . Vincent, “Representation learning: A review and new perspectives”, arXiv [cs.LG] (2012)

  24. [24]

    Matminer: An open source toolkit for materials data mining

    L. Ward et al., “Matminer: An open source toolkit for materials data mining”, Comput. Mater. Sci.152, 60–69 (2018)

  25. [25]

    The abTEM code: Transmission electron microscopy from first principles

    J. Madsen and T. Susi, “The abTEM code: Transmission electron microscopy from first principles”, Open Res Eur1, 24 (2021)

  26. [26]

    The shell structure of atoms

    G. Eickerling and M. Reiher, “The shell structure of atoms”, J. Chem. Theory Comput.4, 286–296 (2008)

  27. [27]

    Learning motifs and their hierarchies in atomic resolution microscopy

    J. Dan et al., “Learning motifs and their hierarchies in atomic resolution microscopy”, Science Advances 8, eabk1005 (2022)

  28. [28]

    Symmetry quantification and segmentation in STEM imaging through zernike moments

    J. Dan et al., “Symmetry quantification and segmentation in STEM imaging through zernike moments”, Chin. Physics B33, 086803 (2024)

  29. [29]

    Nb impurity-bound excitons as quantum emitters in monolayer WS2

    L. Loh et al., “Nb impurity-bound excitons as quantum emitters in monolayer WS2”, Nat. Commun.15, 10035 (2024)

  30. [30]

    Survey on deep learning with class imbalance

    J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with class imbalance”, J. Big Data6, 27 (2019)

  31. [31]

    A survey on imbalanced learning: Latest research, applications and future directions

    W. Chen et al., “A survey on imbalanced learning: Latest research, applications and future directions”, Artif. Intell. Rev.57, 137 (2024)

  32. [32]

    Accelerated data-driven materials science with the materials project

    M. K. Horton et al., “Accelerated data-driven materials science with the materials project”, Nat. Mater., 1–11 (2025)

  33. [33]

    Experimental quantification of annular dark-field images in scanning transmission electron microscopy

    J. M. Lebeau and S. Stemmer, “Experimental quantification of annular dark-field images in scanning transmission electron microscopy”, Ultramicroscopy108, 1653–1658 (2008)

  34. [34]

    Layer normalization

    J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization”, arXiv [stat.ML] (2016)

  35. [35]

    Batch normalization: accelerating deep network training by reducing internal covariate shift

    S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift”, arXiv [cs.LG] (2015)

  36. [36]

    Multilayer feedforward networks are universal approxima- tors

    K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approxima- tors”, Neural Netw.2, 359–366 (1989)

  37. [37]

    Learning internal representations by error propaga- tion

    D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propaga- tion”, inReadings in cognitive science(Elsevier, 1988), pp. 399–421

  38. [38]

    Deep learning using rectified linear units (ReLU)

    A. F . Agarap, “Deep learning using rectified linear units (ReLU)”, arXiv [cs.NE] (2018)

  39. [39]

    Dropout: A simple way to prevent neural networks from overfitting

    N. Srivastava et al., “Dropout: A simple way to prevent neural networks from overfitting”, J. Mach. Learn. Res.15, 1929–1958 (2014)

  40. [40]

    Parametric UMAP embeddings for representation and semisupervised learning

    T. Sainburg, L. McInnes, and T. Q. Gentner, “Parametric UMAP embeddings for representation and semisupervised learning”, Neural Comput.33, 2881–2907 (2021)

  41. [41]

    Information-based objective functions for active data selection

    D. J. C. MacKay, “Information-based objective functions for active data selection”, Neural Comput.4, 590–604 (1992)

  42. [42]

    Entropy search for information-efficient global optimization

    P . Hennig and C. J. Schuler, “Entropy search for information-efficient global optimization”, arXiv [stat.ML], 1809–1837 (2011)

  43. [43]

    Obtaining well calibrated probabilities using bayesian binning

    M. Pakdaman Naeini, G. Cooper, and M. Hauskrecht, “Obtaining well calibrated probabilities using bayesian binning”, Proc. Conf. AAAI Artif. Intell.29,10.1609/aaai.v29i1.9602(2015)

  44. [44]

    On calibration of modern neural networks

    C. Guo et al., “On calibration of modern neural networks”, arXiv [cs.LG] (2017)

  45. [45]

    A coefficient of agreement for nominal scales

    J. Cohen, “A coefficient of agreement for nominal scales”, Educ. Psychol. Meas.20, 37–46 (1960)

  46. [46]

    Interrater reliability: The kappa statistic

    M. L. McHugh, “Interrater reliability: The kappa statistic”, Biochem. Med. (Zagreb)22, 276–282 (2012)

  47. [47]

    The measurement of observer agreement for categorical data

    J. R. Landis and G. G. Koch, “The measurement of observer agreement for categorical data”, Biometrics 33, 159–174 (1977). 16

  48. [48]

    Bias, prevalence and kappa

    T. Byrt, J. Bishop, and J. B. Carlin, “Bias, prevalence and kappa”, J. Clin. Epidemiol.46, 423–429 (1993)

  49. [49]

    Learning from imbalanced data

    H. He and E. A. Garcia, “Learning from imbalanced data”, IEEE Trans. Knowl. Data Eng.21, 1263–1284 (2009)

  50. [50]

    Evaluation: From precision, recall and F-measure to ROC, informedness, marked- ness and correlation

    D. M. W. Powers, “Evaluation: From precision, recall and F-measure to ROC, informedness, marked- ness and correlation”, arXiv [cs.LG] (2020)

  51. [51]

    A review of the F-measure: Its history, properties, criticism, and alternatives

    P . Christen, D. J. Hand, and N. Kirielle, “A review of the F-measure: Its history, properties, criticism, and alternatives”, ACM Comput. Surv.56, 1–24 (2024)

  52. [52]

    Single atomic defect conductivity for selective dilute impurity imaging in 2D semicon- ductors

    N. T. T. Vu et al., “Single atomic defect conductivity for selective dilute impurity imaging in 2D semicon- ductors”, ACS Nano17, 15648–15655 (2023). 17 Supplementary Information Supplementary Information Context-Aware Deep Learning for Defect Classification in Atomic-Resolution STEM Jiadong Dan1,2,*, Cheng Zhang3, Leyi Loh4, Ivan Verzhbitskiy5, Yuan Chen...