arxiv: 2605.05748 · v1 · submitted 2026-05-07 · 💻 cs.AI

Recognition: unknown

Evaluating Explainability in Safety-Critical ATR Systems: Limitations of Post-Hoc Methods and Paths Toward Robust XAI

Vanessa Buhrmester , David Muench , Dimitri Bulatov , Michael Arens

Authors on Pith no claims yet

Pith reviewed 2026-05-08 11:25 UTC · model grok-4.3

classification 💻 cs.AI

keywords explainable AIautomatic target recognitionsafety-critical systemspost-hoc explanationsXAI evaluationATR robustnessmodel interpretabilityassurance methods

0 comments

The pith

Post-hoc XAI methods produce spurious, unstable explanations that undermine safety-critical automatic target recognition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether current explainable AI techniques can support reliable human oversight when machine learning models perform automatic target recognition on sensor data. It organizes explanation approaches into categories such as saliency maps, attention mechanisms, and surrogate models, then rates them on how well they deliver understandable outputs, remain consistent under small changes, resist deliberate manipulation, and help verify overall system correctness. The assessment surfaces repeated shortcomings including explanations that highlight irrelevant image features, outputs that shift dramatically with minor input perturbations, and visually appealing results that encourage unwarranted confidence in the model. If these shortcomings hold, standard post-hoc methods cannot meet the assurance needs of ATR systems where incorrect decisions carry high costs.

Core claim

Post-hoc explanation methods in ATR exhibit critical failure modes: they generate spurious correlations unrelated to actual target features, vary unpredictably under input perturbations, and foster overtrust through visually persuasive but unverified outputs. These limitations render the methods unsuitable for the validation and verification requirements of safety-critical deployments, where model decisions must be interpretable in a manner that supports system-level assurance rather than relying on visual plausibility alone.

What carries the argument

An assurance-oriented evaluation framework that applies a four-dimensional taxonomy to XAI methods: interpretability of outputs, robustness to perturbations, vulnerability to manipulation, and fitness for validation and verification tasks.

If this is right

ATR systems cannot rely on current post-hoc methods to confirm that model decisions align with physically meaningful target characteristics.
Unstable explanations prevent repeatable validation steps required for certification or operational approval.
Visually convincing but inaccurate outputs increase the chance of deploying models without detecting underlying errors.
Development must shift toward explanation techniques grounded in causal or physical models of the sensing process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

ATR pipelines may need to embed physical sensor models directly into the explanation generation step to reduce spurious outputs.
Evaluation protocols for safety-critical AI could adopt quantitative metrics for explanation fidelity instead of relying on human visual inspection.
Similar shortcomings are likely to appear in other high-stakes image or signal classification domains that use post-hoc XAI.

Load-bearing premise

The failure modes of spurious explanations, instability, and overtrust are assumed to appear across ATR systems rather than being limited to the specific models or datasets examined.

What would settle it

A controlled test showing a post-hoc XAI method that consistently produces non-spurious, stable explanations that measurably improve human verification accuracy on real ATR sensor data would refute the claim of insufficiency.

Figures

Figures reproduced from arXiv: 2605.05748 by David Muench, Dimitri Bulatov, Michael Arens, Vanessa Buhrmester.

**Figure 1.** Figure 1: Illustrative example of safety-critical applications: Real-time automatic target recognition from a UAV-based thermal infrared perspective The use of XAI in safety-critical applications has received increasing attention, particularly in domains where incorrect or poorly understood model decisions may lead to severe consequences. In such contexts, explainability is not only a diagnostic tool for model dev… view at source ↗

read the original abstract

Explainable Artificial Intelligence (XAI) is increasingly rec ognized as essential for deploying machine learning systems in safety critical environments. In Automatic Target Recognition (ATR), where models operate on image, video, radar, and multisensor data, high pre dictive performance alone is insufficient. Model decisions must also be interpretable, reliable, and suitable for validation. This paper presents a structured evaluation of explainability methods in the context of safety-critical ATR systems: We identify major XAI paradigms, including saliency-based, attention-based, and surrogate ap proaches, as well as recent detection-aware extensions. Based on this, we formalize explainability as an assurance-oriented assessment problem, introduce a taxonomy, and assess these methods with respect to four key dimensions: interpretability, robustness, vulnerability to manipula tion, and suitability for validation and verification. The analysis identifies systematic limitations of current post-hoc explanation methods. In par ticular, we derive critical failure modes such as spurious explanations, instability under perturbations, and overtrust induced by visually con vincing outputs. These findings indicate that widely used XAI techniques may be insufficient for safety-critical deployment. Finally, we discuss implications for ATR systems and outline directions toward more robust, causally grounded, and physically informed explain ability methods. Our results emphasize the need to move beyond visually plausible explanations toward approaches that support reliable decision making and system-level assurance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. This paper presents a structured evaluation of explainability methods for safety-critical Automatic Target Recognition (ATR) systems. It identifies major XAI paradigms (saliency-based, attention-based, surrogate, detection-aware), formalizes explainability as an assurance-oriented assessment problem, introduces a taxonomy, and assesses methods across four dimensions: interpretability, robustness, vulnerability to manipulation, and suitability for validation and verification. The analysis derives critical failure modes of post-hoc methods such as spurious explanations, instability under perturbations, and overtrust, concluding that these techniques may be insufficient for safety-critical deployment and outlining paths toward more robust, causally grounded approaches.

Significance. If the synthesized limitations prove generalizable, the work provides a valuable framework for assessing XAI in ATR contexts and highlights the gap between visually plausible explanations and those supporting reliable decision-making and system assurance. The structured taxonomy and multi-dimensional assessment are explicit strengths that organize prior literature into an assurance-oriented lens.

major comments (1)

[Abstract and main analysis] Abstract and assessment of four dimensions: the central claim that post-hoc methods exhibit spurious explanations, instability, and overtrust (rendering them insufficient for safety-critical ATR) is derived entirely from external XAI literature without ATR-specific quantitative metrics, stability scores, manipulation success rates, or controlled experiments on image/radar/multisensor benchmarks. This leaves the generalization from literature synthesis to deployment insufficiency unanchored and load-bearing for the conclusion.

minor comments (1)

[Abstract] Abstract contains typographical errors: 'rec ognized' (should be 'recognized'), 'pre dictive' (should be 'predictive'), and 'con vincing' (should be 'convincing').

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the opportunity to clarify the scope of our work. Below we address the major comment directly.

read point-by-point responses

Referee: Abstract and assessment of four dimensions: the central claim that post-hoc methods exhibit spurious explanations, instability, and overtrust (rendering them insufficient for safety-critical ATR) is derived entirely from external XAI literature without ATR-specific quantitative metrics, stability scores, manipulation success rates, or controlled experiments on image/radar/multisensor benchmarks. This leaves the generalization from literature synthesis to deployment insufficiency unanchored and load-bearing for the conclusion.

Authors: We agree that the manuscript is a structured literature synthesis and taxonomic analysis rather than a primary empirical study presenting new ATR-specific experiments. The central claims are derived from the cited XAI literature on vision-based models (including detection and classification tasks directly relevant to ATR), which we map onto the four assurance-oriented dimensions and ATR safety requirements. The paper's contribution is the formalization of explainability as an assurance problem, the taxonomy, and the identification of systematic failure modes that recur across the reviewed methods. We acknowledge that this leaves the generalization to ATR deployment somewhat dependent on the transferability of those prior findings. In revision we will (1) update the abstract and Section 1 to explicitly characterize the work as a literature-based assessment, (2) add a dedicated paragraph in the discussion that states the absence of new quantitative ATR benchmarks and calls for future controlled studies on radar/multisensor data, and (3) strengthen citations to any existing ATR-specific XAI evaluations. This constitutes a partial revision; we maintain that a synthesis paper can legitimately highlight risks without generating new metrics. revision: partial

Circularity Check

0 steps flagged

No significant circularity; conceptual synthesis grounded in external XAI literature

full rationale

The paper performs a structured literature-based evaluation of XAI methods for ATR, formalizing a taxonomy and assessing four dimensions (interpretability, robustness, vulnerability, validation suitability) by drawing failure modes such as spurious explanations and instability from prior external work rather than new derivations or fits. No equations, self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided abstract and structure; the central claim of insufficiency for safety-critical use is presented as a synthesis outcome, not a reduction to the paper's own inputs. This is a standard non-circular review-style analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the paper operates as a qualitative review relying on standard domain knowledge in XAI and ATR.

pith-pipeline@v0.9.0 · 5563 in / 967 out tokens · 29880 ms · 2026-05-08T11:25:24.402682+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 2 canonical work pages · 1 internal anchor

[1]

In: NeurIPS (2018) 14 V

Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps. In: NeurIPS (2018) 14 V. Buhrmester et al

2018
[2]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

Agarwal, R., Singh, A., Gupta, P., Torr, P.H.S.: Gaussian-class activation map- ping explainer (g-came): Improved visual explanations for deep neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

2024
[3]

Machine Learning and Knowledge Extraction3(4), 966–989 (2021)

Buhrmester, V., Münch, D., Arens, M.: Analysis of explainers of black box deep neural networks for computer vision: A survey. Machine Learning and Knowledge Extraction3(4), 966–989 (2021)

2021
[4]

In: European Conference on Computer Vision (ECCV) (2020)

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End- to-end object detection with transformers. In: European Conference on Computer Vision (ECCV) (2020)

2020
[5]

Sensors (2025)

Cheng, Z., Wu, Y., Li, Y., Cai, L., Ihnaini, B.: A comprehensive review of explain- able artificial intelligence in computer vision. Sensors (2025)

2025
[6]

In: AAAI (2019)

Ghorbani, A., Abid, A., Zou, J.: Interpretation of neural networks is fragile. In: AAAI (2019)

2019
[7]

In: NAACL (2019)

Jain, S., Wallace, B.C.: Attention is not explanation. In: NAACL (2019)

2019
[8]

Frontiers in Sustainable Cities (2025)

Javaid, S., et al.: Explainable ai and monocular vision for uav navigation. Frontiers in Sustainable Cities (2025)

2025
[9]

Kadir, A., et al.: On the evaluation of explainable artificial intelligence methods (2023)

2023
[10]

Nature Reviews Physics (2021)

Karniadakis, G.E., Kevrekidis, I.G., Lu, L., Perdikaris, P., Wang, S., Yang, L.: Physics-informed machine learning. Nature Reviews Physics (2021)

2021
[11]

Lakkaraju, H., Bastani, O.: Evaluating explainable ai: Which algorithm should i choose? Foundations and Trends in Machine Learning (2023)

2023
[12]

In: AIES (2019)

Laugel, T., Lesot, M.J., Marsala, C., Renard, X., Detyniecki, M.: Fooling lime and shap: Adversarial attacks on post hoc explanation methods. In: AIES (2019)

2019
[13]

In: NeurIPS (2017)

Lundberg, S., Lee, S.I.: A unified approach to interpreting model predictions. In: NeurIPS (2017)

2017
[14]

Neurocomputing (2024)

Mi, J.X., Jiang, X., Luo, L., Gao, Y.: Toward explainable artificial intelligence: A survey and overview. Neurocomputing (2024)

2024
[15]

arXiv preprint arXiv:2504.19249 (2025)

Nguyen, L.P.T., Nguyen, H.T.T., Cao, H.: Odexai: A comprehensive object detec- tion explainable ai evaluation. arXiv preprint arXiv:2504.19249 (2025)

work page arXiv 2025
[16]

Journal of Computational Physics (2019)

Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics (2019)

2019
[17]

In: CVPR (2016)

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR (2016)

2016
[18]

In: NeurIPS (2015)

Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object de- tection with region proposal networks. In: NeurIPS (2015)

2015
[19]

In: KDD (2016)

Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you? explaining the predictions of any classifier. In: KDD (2016)

2016
[20]

IEEE Access (2025)

Roy, A., et al.: Explainable ai for object detection in satellite imagery. IEEE Access (2025)

2025
[21]

Nature Machine Intelligence1, 206–215 (2019)

Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence1, 206–215 (2019)

2019
[22]

In: ICCV (2017)

Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad- cam: Visual explanations from deep networks via gradient-based localization. In: ICCV (2017)

2017
[23]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional net- works: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2014) Evaluating XAI in Safety-Critical ATR 15

work page internal anchor Pith review arXiv 2014
[24]

In: ICML (2017)

Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: ICML (2017)

2017
[25]

In: Advances in Neural Information Processing Systems (NeurIPS) (2017)

Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems (NeurIPS) (2017)

2017