Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models

Wojciech Samek , Thomas Wiegand , Klaus-Robert M\"uller

Authors on Pith no claims yet

classification 💻 cs.AI cs.CYcs.NEstat.ML

keywords learningdeepmodelsartificialintelligenceclassificationdevelopmentexplaining

read the original abstract

With the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching or even exceeding the human level on an increasing number of complex tasks. Impressive examples of this development can be found in domains such as image classification, sentiment analysis, speech understanding or strategic game playing. However, because of their nested non-linear structure, these highly successful machine learning and artificial intelligence models are usually applied in a black box manner, i.e., no information is provided about what exactly makes them arrive at their predictions. Since this lack of transparency can be a major drawback, e.g., in medical applications, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This paper summarizes recent developments in this field and makes a plea for more interpretability in artificial intelligence. Furthermore, it presents two approaches to explaining predictions of deep learning models, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables. These methods are evaluated on three classification tasks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SeisDiff-intp: a unified prompt-guided flow matching framework for multi-tasks seismic interpretation
physics.geo-ph 2026-04 unverdicted novelty 6.0

SeisDiff-intp is a prompt-conditioned flow matching model that unifies multiple seismic interpretation tasks and generates realistic synthetic training data for complex subsurface features.
Explainability of Recurrent Neural Networks for Enhancing P300-based Brain-Computer Interfaces
cs.LG 2026-05 unverdicted novelty 4.0

A Post-Recurrent Module added to RNNs yields 9% better P300 classification while identifying key spatio-temporal EEG patterns that match established neuroscience descriptions of the P300 wave.