arxiv: 2605.14255 · v1 · submitted 2026-05-14 · 💻 cs.LG · cs.CV

Recognition: no theorem link

Architecture-Aware Explanation Auditing for Industrial Visual Inspection

Sibo Jia , Zihang Zhao , Kunrong Li

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:38 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords explanation auditingvisual inspectionfaithfulnessperturbation analysisdeep classifiersreadout structuresattention rolloutgrad-cam

0 comments

The pith

The faithfulness of heatmap explanations is bounded by structural distance to the model's native decision mechanism.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests the idea that explanation methods work better when their readout structure aligns closely with how the model itself makes decisions. On a large set of wafer inspection images, attention rollout on a tiny vision transformer produced markedly more faithful heatmaps than gradient or attention methods on CNNs and other transformers, even though the transformer had lower overall accuracy. The same tests show that swapping the perturbation method reverses the performance order, proving that faithfulness is a property of the model-explanation-perturbation triple rather than any one component alone. This matters for industrial systems because visually plausible maps can still point to the wrong image regions unless the explainer is chosen for architectural fit.

Core claim

The native-readout hypothesis holds that perturbation-based faithfulness of an explanation is bounded by its structural distance from the model's native decision mechanism. On WM-811K wafer maps under zero-fill perturbation, ViT-Tiny plus Attention Rollout reached Deletion AUC 0.211 while Swin-Tiny, ResNet18+CBAM, and DenseNet121+Grad-CAM ranged from 0.432 to 0.525. Swin-Tiny's spatial feature-map hierarchy made it compatible with Grad-CAM despite its transformer architecture, isolating readout structure as the operative factor. A model-agnostic RISE baseline compressed all families to roughly 0.1 AUC, and blur-fill perturbation reversed the ordering, confirming that rankings are joint over,

What carries the argument

the native-readout hypothesis, which treats faithfulness as limited by structural alignment between explainer and model's internal decision pathway

If this is right

Explanation methods must be chosen according to readout compatibility with the target model rather than architecture family alone.
Deployed heatmaps should be reported with quantitative faithfulness scores instead of visual plausibility alone.
Model and explainer design should be co-ordinated around readout structure to maintain audit reliability.
Multiple perturbation baselines are required because single-protocol rankings are not stable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Industrial pipelines could reduce inspection errors by requiring readout-matched explainers during model selection.
The same audit protocol might expose similar compatibility issues in medical or autonomous-driving image models.
When native compatibility is low, model-agnostic methods may become the default choice rather than an exception.
Dataset shifts could change which readout structures count as native, requiring periodic re-audits.

Load-bearing premise

The chosen zero-fill and blur-fill perturbation protocols measure faithfulness without introducing systematic artifacts that favor particular readout structures.

What would settle it

A finding that a structurally distant explainer consistently achieves lower deletion AUC than a native one across both zero-fill and blur-fill protocols on the same dataset would falsify the bound.

Figures

Figures reproduced from arXiv: 2605.14255 by Kunrong Li, Sibo Jia, Zihang Zhao.

**Figure 1.** Figure 1: WM-811K labelled class distribution 3 Methods 3.1 Dataset The WM-811K benchmark [18] contains 811,457 wafer maps, of which 172,950 carry humanassigned defect-pattern labels spanning nine classes (None, Center, Donut, Edge-Loc, Edge-Ring, Loc, Random, Scratch, Near-Full). The dataset is strongly imbalanced, with approximately 85 % of labelled maps belonging to the defect-free "None" class. Figures 1–2 show… view at source ↗

**Figure 2.** Figure 2: Sample wafer maps (4 per class, 64×64). Black = background, red = normal die, yellow = defect die 6 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Per-class F1 radar (mean across 3 seeds) 4.2 Qualitative interpretability Before reporting the quantitative faithfulness metrics (§4.3), this subsection examines the heatmaps qualitatively. Comparing interpretability methods across architecturally distinct families is methodologically delicate: Grad-CAM on a CNN and Attention Rollout on a Transformer derive from mathematically different objects — gradient-… view at source ↗

**Figure 5.** Figure 5: Qualitative heatmap comparison (highest-confidence correct sample per class) 17 [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Deletion and Insertion curves (mean ± std across 594 samples) Per-sample stability — the cosine similarity of the heatmap under the K=5 semantics-preserving perturbations defined in §3.4.3 — is reported as a boxplot in [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Explanation stability distribution (per-sample cosine similarity across K=5 perturbations) Interpretation caveat. Because readout directness and spatial granularity co-vary across the four families (ViT-Tiny is minimal on both; Swin-Tiny is intermediate; both CNNs are large on both), the result should be interpreted as evidence for composite architecture–explainer distance rather than for either factor alo… view at source ↗

**Figure 8.** Figure 8: Architecture-aware explanation audit workflow for industrial visual inspection 30 [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗

read the original abstract

Industrial visual inspection systems increasingly rely on deep classifiers whose heatmap explanations may appear visually plausible while failing to identify the image regions that actually drive model decisions. This paper operationalizes an architecture-aware explanation audit protocol grounded in the native-readout hypothesis: the perturbation-based faithfulness of an explanation method is bounded by its structural distance from the model's native decision mechanism. On WM-811K wafer maps (9 classes, 172k images) under a three-seed zero-fill perturbation protocol, ViT-Tiny + Attention Rollout attains Deletion AUC 0.211 against 0.432-0.525 for Swin-Tiny / ResNet18+CBAM / DenseNet121 + Grad-CAM (abs(Cohen's d) > 1.1), despite lower classification accuracy. Swin-Tiny disentangles architecture family from readout structure: despite being a Transformer, its spatial feature-map hierarchy makes it Grad-CAM compatible, showing that the operative factor is readout structure rather than architecture family. A model-agnostic control (RISE) compresses all families to Deletion AUC about 0.1, indicating the gap arises from the explainer pathway; notably, RISE outperforms all native methods, so native readout is a compatibility principle rather than an optimality guarantee. A blur-fill sensitivity analysis shows that the family ordering reverses under a different perturbation baseline, reinforcing that faithfulness rankings are joint properties of (model, explainer, perturbation operator) triples. An exploratory boundary-condition study on MVTec AD (pretrained models) indicates that audit results are dataset/task dependent and identifies conditions requiring qualification. The protocol yields actionable guidance: explanation pathways should be co-designed with model architectures based on readout structure, and deployed heatmaps should be accompanied by quantitative faithfulness metrics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical audit protocol for matching explainers to model readouts in industrial inspection, with solid experiments showing readout structure matters more than architecture family, but the core claim looks sensitive to perturbation choice.

read the letter

The main thing to know is that this paper operationalizes an audit protocol for explanation methods in visual inspection by checking compatibility with a model's native readout structure, and the experiments on WM-811K separate that factor from architecture family in a useful way. They run large-scale tests across ViT, Swin, ResNet, and DenseNet variants, using zero-fill deletion AUC as the faithfulness metric. The Swin-Tiny case is particularly clean because it behaves like the CNNs for Grad-CAM despite being a transformer, which supports their point that readout hierarchy drives the gaps more than the overall architecture type. Adding RISE as a model-agnostic control is a good move, and they are straightforward that it beats all the native methods, so the takeaway stays at compatibility rather than superiority. The blur-fill sensitivity check that reverses the model ordering is honest and shows the rankings are really about the full (model, explainer, perturbation) combination. That part lands well for anyone who has to pick an explainer for deployed systems. The softer spot is the native-readout hypothesis itself. They state that perturbation-based faithfulness is bounded by structural distance to the native decision mechanism, but the paper supplies no formal definition or quantitative bound on that distance, leaving it as an empirical pattern. The reversal under blur-fill actually lends weight to the concern that zero-fill may interact with readout type in ways that create the observed gaps rather than purely measuring compatibility. Without additional checks that the perturbation operator is neutral across map structures, the causal interpretation stays provisional. The MVTec exploratory section correctly flags dataset dependence, which keeps the claims grounded. This is aimed at applied researchers and engineers who need reliable heatmaps for manufacturing inspection tasks. It supplies concrete guidance on co-designing explainers with architectures and always reporting quantitative faithfulness scores alongside visuals. A reader working on XAI deployment in similar domains would find the protocol and the joint-dependence reminder directly useful. I would send it to peer review. The empirical scale and controls are substantial enough to justify referee time, even if the theoretical framing around structural distance needs tightening and more validation on perturbation neutrality.

Referee Report

3 major / 2 minor

Summary. The paper operationalizes an architecture-aware explanation audit protocol based on the native-readout hypothesis: perturbation-based faithfulness of an explanation is bounded by its structural distance from the model's native decision mechanism. Large-scale experiments on the WM-811K wafer-map dataset (172k images, 9 classes) across ViT-Tiny, Swin-Tiny, ResNet18+CBAM and DenseNet121 compare native readouts (Attention Rollout, Grad-CAM) against a model-agnostic control (RISE) under zero-fill Deletion AUC, with a blur-fill sensitivity analysis and an exploratory MVTec AD study.

Significance. If the central empirical correlations hold, the work supplies concrete, actionable guidance for co-designing explanation pathways with model architectures in industrial visual inspection and demonstrates that native readout is a compatibility principle rather than an optimality guarantee. The scale of the WM-811K experiments, inclusion of architecture-family controls (Swin-Tiny), RISE baseline, and perturbation-operator sensitivity analysis are genuine strengths that make the protocol reproducible and falsifiable.

major comments (3)

[Abstract / Results section] The central claim that faithfulness is bounded by structural distance to the native readout rests on the assumption that zero-fill (and blur-fill) perturbation protocols are neutral with respect to readout structure. The reported reversal of model ordering under blur-fill directly challenges this neutrality, yet no quantitative test of the model-explanation-perturbation interaction (e.g., two-way ANOVA or permutation test on the three-seed Deletion AUC values) is supplied to isolate readout compatibility from operator artifact.
[Introduction / Methods] No formal definition, metric, or bound is given for 'structural distance' from the native decision mechanism. The claim therefore reduces to an observed correlation (ViT-Tiny + Attention Rollout Deletion AUC 0.211 vs. 0.432-0.525 for mismatched readouts, |Cohen's d| > 1.1) without an independent, testable quantity that could be computed from architecture alone.
[Experiments / Results] While RISE is shown to compress all families to Deletion AUC ~0.1 and outperform native methods, the paper does not report whether this superiority is statistically significant after correction for multiple comparisons across the four model families and two perturbation operators, which is load-bearing for the conclusion that native readout is merely compatibility rather than optimality.

minor comments (2)

[Abstract] The abstract states 'abs(Cohen's d) > 1.1' for the main comparison but does not specify whether the effect size is computed on per-image or per-seed aggregated AUC values; clarify the exact aggregation level.
[Discussion] The exploratory MVTec AD boundary-condition study is described only at high level; adding a short table summarizing the conditions under which audit results change would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which has prompted us to enhance the statistical analysis and clarify key concepts in the manuscript. We address each major comment point by point below.

read point-by-point responses

Referee: [Abstract / Results section] The central claim that faithfulness is bounded by structural distance to the native readout rests on the assumption that zero-fill (and blur-fill) perturbation protocols are neutral with respect to readout structure. The reported reversal of model ordering under blur-fill directly challenges this neutrality, yet no quantitative test of the model-explanation-perturbation interaction (e.g., two-way ANOVA or permutation test on the three-seed Deletion AUC values) is supplied to isolate readout compatibility from operator artifact.

Authors: We agree that the reversal under blur-fill demonstrates an interaction between model, explainer, and perturbation operator. In the revised manuscript, we have added a two-way ANOVA on the three-seed Deletion AUC values with model family and perturbation operator as factors. The analysis yields a significant interaction term (p < 0.01), confirming that faithfulness rankings are joint properties of the triple rather than isolated effects. This is reported in the Results section alongside the existing blur-fill sensitivity analysis. revision: yes
Referee: [Introduction / Methods] No formal definition, metric, or bound is given for 'structural distance' from the native decision mechanism. The claim therefore reduces to an observed correlation (ViT-Tiny + Attention Rollout Deletion AUC 0.211 vs. 0.432-0.525 for mismatched readouts, |Cohen's d| > 1.1) without an independent, testable quantity that could be computed from architecture alone.

Authors: We acknowledge that structural distance was previously described qualitatively. The revised Methods section now provides an operational definition: structural distance is the degree of mismatch between the explainer's native readout structure (e.g., attention maps for ViT vs. spatial feature maps for CNNs) and the model's decision mechanism, which can be assessed from architecture specifications alone. While a closed-form theoretical bound is beyond the current scope, this definition enables testable predictions, as validated by the Swin-Tiny control experiment. The Introduction has been updated to reference this definition explicitly. revision: partial
Referee: [Experiments / Results] While RISE is shown to compress all families to Deletion AUC ~0.1 and outperform native methods, the paper does not report whether this superiority is statistically significant after correction for multiple comparisons across the four model families and two perturbation operators, which is load-bearing for the conclusion that native readout is merely compatibility rather than optimality.

Authors: We have added the requested statistical analysis. Paired t-tests on per-seed AUC values with Bonferroni correction for the eight comparisons (4 models × 2 operators) confirm that RISE's superiority remains significant (adjusted p < 0.01 in all cases). These results are now included in the Experiments section and support the interpretation that native readout is a compatibility principle rather than an optimality guarantee. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparisons without self-referential derivations

full rationale

The paper conducts direct experimental comparisons of Deletion AUC under zero-fill and blur-fill protocols across model-explanation pairs on WM-811K and MVTec AD. No equations, predictions, or bounds are derived that reduce by construction to fitted parameters, self-citations, or ansatzes within the paper itself. The native-readout hypothesis is tested via observable performance gaps and sensitivity analyses rather than proven from internal definitions, rendering the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central protocol rests on the native-readout hypothesis as a domain assumption; no free parameters or invented entities are introduced beyond this guiding principle.

axioms (1)

domain assumption native-readout hypothesis: perturbation-based faithfulness of an explanation method is bounded by its structural distance from the model's native decision mechanism
Stated explicitly as the grounding for the architecture-aware audit protocol.

pith-pipeline@v0.9.0 · 5620 in / 1267 out tokens · 41362 ms · 2026-05-15T02:38:15.539309+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 5 internal anchors

[1]

CVPR, 2016.https://www.cv-foundation.org/openaccess/content_cvpr_2016/pap ers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf

He, K., Zhang, X., Ren, S., & Sun, J.Deep Residual Learning for Image Recognition. CVPR, 2016.https://www.cv-foundation.org/openaccess/content_cvpr_2016/pap ers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf

work page 2016
[2]

CBAM: Convolutional Block Attention Module

Woo, S., Park, J., Lee, J.-Y., & Kweon, I. S.CBAM: Convolutional Block Attention Module. ECCV, 2018.https://arxiv.org/abs/1807.06521

work page internal anchor Pith review Pith/arXiv arXiv 2018
[3]

Q.Densely Connected Convo- lutional Networks

Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q.Densely Connected Convo- lutional Networks. CVPR, 2017.https://openaccess.thecvf.com/content_cvpr_201 7/papers/Huang_Densely_Connected_Convolutional_CVPR_2017_paper.pdf

work page 2017
[4]

et al.An Image Is Worth 16×16 Words: Transformers for Image Recogni- tion at Scale

Dosovitskiy, A. et al.An Image Is Worth 16×16 Words: Transformers for Image Recogni- tion at Scale. ICLR, 2021.https://dblp.org/rec/conf/iclr/DosovitskiyB0WZ21 33

work page 2021
[5]

Selvaraju, R. R. et al.Grad-CAM: Visual Explanations from Deep Networks via Gradient- Based Localization. ICCV, 2017.https://openaccess.thecvf.com/content_ICCV_201 7/papers/Selvaraju_Grad-CAM_Visual_Explanations_ICCV_2017_paper.pdf

work page 2017
[6]

ACL, 2020.ht tps://arxiv.org/abs/2005.00928

Abnar, S., & Zuidema, W.Quantifying Attention Flow in Transformers. ACL, 2020.ht tps://arxiv.org/abs/2005.00928

work page arXiv 2020
[7]

RISE: Randomized Input Sampling for Explanation of Black-box Models

Petsiuk, V., Das, A., & Saenko, K.RISE: Randomized Input Sampling for Explanation of Black-Box Models. BMVC, 2018.https://arxiv.org/abs/1806.07421

work page internal anchor Pith review Pith/arXiv arXiv 2018
[8]

On the Robustness of Interpretability Methods

Alvarez-Melis, D., & Jaakkola, T. S.On the Robustness of Interpretability Methods. ICML Workshop, 2018.https://arxiv.org/abs/1806.08049

work page internal anchor Pith review Pith/arXiv arXiv 2018
[9]

Attention is not Explanation

Jain, S., & Wallace, B. C.Attention Is Not Explanation. NAACL, 2019.https://doi.or g/10.48550/arXiv.1902.10186

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1902.10186 2019
[10]

CVPR, 2021.https://www.computer.org/csdl/proceedings-article/cvpr/20 21/450900a782/1yeIsbbCMO4

Chefer, H., Gur, S., & Wolf, L.Transformer Interpretability Beyond Attention Visualiza- tion. CVPR, 2021.https://www.computer.org/csdl/proceedings-article/cvpr/20 21/450900a782/1yeIsbbCMO4

work page 2021
[11]

Axiomatic Attribution for Deep Networks

Sundararajan, M., Taly, A., & Yan, Q.Axiomatic Attribution for Deep Networks(Inte- grated Gradients). ICML, 2017.https://arxiv.org/abs/1703.01365

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

D., & Fergus, R.Visualizing and Understanding Convolutional Networks(Oc- clusion sensitivity)

Zeiler, M. D., & Fergus, R.Visualizing and Understanding Convolutional Networks(Oc- clusion sensitivity). ECCV, 2014.https://cs.nyu.edu/~fergus/papers/zeilerECCV2 014.pdf Recent wafer-map defect classification and XAI (2024–2026)

work page 2014
[13]

R., Farid, F

Khatun, M. R., Farid, F. A., Dhar, S., Islam, M. S., Uddin, J., & Abdul Karim, H.CBAM- Enhanced Lightweight CNN for Wafer Map Defect Classification. Frontiers in Electronics, 7:1750707, 2026.https://www.frontiersin.org/journals/electronics/articles/10 .3389/felec.2026.1750707/full

work page arXiv 2026
[14]

Micromachines, 16(9):1057, 2025.https://www.mdpi.com/2072-666X/16/9 /1057

Lee, J., Ju, Y., Lim, J., Hong, S., Baek, S.-W., & Lee, J.Enhancing Confidence and Inter- pretability of a CNN-Based Wafer Defect Classification Model Using Temperature Scaling and LIME. Micromachines, 16(9):1057, 2025.https://www.mdpi.com/2072-666X/16/9 /1057

work page 2025
[15]

IEEE Access, 13, 2025.ht tps://ieeexplore.ieee.org/iel8/6287639/10820123/11145756.pdf

Lee, C.-Y., Pleva, M., Hládek, D., Lee, C.-W., & Su, M.-H.Ensemble Learning for Wafer Defect Pattern Classification in the Semiconductor Industry. IEEE Access, 13, 2025.ht tps://ieeexplore.ieee.org/iel8/6287639/10820123/11145756.pdf

work page 2025
[16]

Y., & Kim, T

Park, S. Y., & Kim, T. S.Fuzzy Inference System for Interpretable Classification of Wafer Map Defect Patterns. Electronics, 15(1):130, 2026.https://www.mdpi.com/2079-9292/ 15/1/130

work page 2026
[17]

Pilli, V. S. R. R.Intelligent Model to Detect and Classify Silicon Wafer Map Images. M.Sc. Thesis, Purdue University, 2024.https://hammer.purdue.edu/ndownloader/files/49 412641 34 Benchmark dataset

work page 2024
[18]

R., & Chen, J.-L.Wafer Map Failure Pattern Recognition and Similarity Ranking for Large-Scale Data Sets

Wu, M.-J., Jang, J.-S. R., & Chen, J.-L.Wafer Map Failure Pattern Recognition and Similarity Ranking for Large-Scale Data Sets. IEEE Transactions on Semiconductor Man- ufacturing, 28(1):1–12, 2015.https://ui.adsabs.harvard.edu/abs/2015ITSM...28S42 37W/abstract Boundary-condition dataset and pretrained models

work page 2015
[19]

CVPR, 2019.https://openac cess.thecvf.com/content_CVPR_2019/papers/Bergmann_MVTec_AD_--_A_Comprehens ive_Real-World_Dataset_for_Unsupervised_Anomaly_CVPR_2019_paper.pdf

Bergmann, P., Fauser, M., Sattlegger, D., & Steger, C.MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. CVPR, 2019.https://openac cess.thecvf.com/content_CVPR_2019/papers/Bergmann_MVTec_AD_--_A_Comprehens ive_Real-World_Dataset_for_Unsupervised_Anomaly_CVPR_2019_paper.pdf

work page 2019
[20]

ICML, 2021

Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H.Training Data-Efficient Image Transformers & Distillation Through Attention(DeiT). ICML, 2021. https://arxiv.org/abs/2012.12877 Trustworthy industrial AI

work page arXiv 2021
[21]

European Commission, Directorate-General for Research and Innovation, 2021.https://op.europa.eu/publication/manifestation_i dentifier/PUB_KIBD20021ENN

Breque, M., De Nul, L., & Petridis, A.Industry 5.0: Towards a Sustainable, Human- Centric and Resilient European Industry. European Commission, Directorate-General for Research and Innovation, 2021.https://op.europa.eu/publication/manifestation_i dentifier/PUB_KIBD20021ENN

work page 2021
[22]

Electronics, 13(17):3497, 2024.https://www.mdpi.com/2079-9292/13/17/3497 Hierarchical vision transformers

Moosavi, S., Farajzadeh-Zanjani, M., Razavi-Far, R., Palade, V., & Saif, M.Explain- able AI in Manufacturing and Industrial Cyber–Physical Systems: A Survey. Electronics, 13(17):3497, 2024.https://www.mdpi.com/2079-9292/13/17/3497 Hierarchical vision transformers

work page 2024
[23]

ICCV, 2021.https://arxiv.or g/abs/2103.14030 35 Supplementary Material S1

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B.Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. ICCV, 2021.https://arxiv.or g/abs/2103.14030 35 Supplementary Material S1. Experimental workflow The full pipeline proceeds in four stages, from raw data ingestion through to the final re- port. Each stage is au...

work page arXiv 2021