hub

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, Andrew Zisserman · 2013 · cs.CV · arXiv 1312.6034

35 Pith papers cite this work. Polarity classification is still indexing.

35 Pith papers citing it

open full Pith review browse 35 citing papers arXiv PDF

abstract

This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [Erhan et al., 2009], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNets. Finally, we establish the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks [Zeiler et al., 2013].

hub tools

JSON dossier citing papers JSON arXiv source

claims ledger

abstract This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [Erhan et al., 2009], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNe

co-cited works

representative citing papers

From Mechanistic to Compositional Interpretability

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaranteeing concise human-aligned decompositions.

SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.

Mapping data sensitivities in global QCD analysis with linear response and influence functions

hep-ph · 2026-04-30 · unverdicted · novelty 7.0

A framework based on linear response and influence functions maps data sensitivities in global QCD analyses to show how experiments determine central values, uncertainties, and correlations of non-perturbative functions.

Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning

astro-ph.GA · 2026-04-28 · unverdicted · novelty 7.0

A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.

Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings

cond-mat.mtrl-sci · 2026-04-28 · unverdicted · novelty 7.0

Introduces the RealMat-BaG benchmark showing fundamental generalization limits of ML models when predicting experimental bandgaps from DFT-trained data.

Scaling and evaluating sparse autoencoders

cs.LG · 2024-06-06 · unverdicted · novelty 7.0

K-sparse autoencoders with dead-latent fixes produce clean scaling laws and better feature quality metrics that improve with size, shown by training a 16-million-latent model on GPT-4 activations.

ISAAC: Auditing Causal Reasoning in Deep Models for Drug-Target Interaction

cs.LG · 2026-05-03 · unverdicted · novelty 7.0

ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.

From Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Models

cs.CV · 2026-05-01 · unverdicted · novelty 7.0

An iERF-centric framework unifies local, global, and mechanistic interpretability in vision models via SRD for saliency, CAFE for concept anchoring, and ICAT for interlayer attribution.

Mamba-SSM with LLM Reasoning for Feature Selection: Faithfulness-Aware Biomarker Discovery

q-bio.QM · 2026-04-15 · unverdicted · novelty 7.0

LLM chain-of-thought filtering of Mamba saliency features on TCGA-BRCA data produces a 17-gene set with AUC 0.927 that beats both the raw 50-gene saliency list and a 5000-gene baseline while using far fewer features, though it misses many known BRCA genes.

Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

Cross-Layer Transcoders decompose ViT activations into sparse, depth-aware layer contributions that maintain zero-shot accuracy and enable faithful attribution of the final representation.

APEX: Audio Prototype EXplanations for Classification Tasks

cs.SD · 2026-05-11 · unverdicted · novelty 6.0

APEX generates four types of prototype-based explanations for pre-trained audio classifiers that preserve output invariance and target acoustic properties better than gradient methods applied to spectrograms.

Scaling Vision Models Does Not Consistently Improve Localisation-Based Explanation Quality

cs.CV · 2026-05-11 · accept · novelty 6.0

Scaling vision models by depth and parameter count does not consistently improve localisation-based explanation quality across architectures, datasets, and post-hoc methods; smaller models often perform comparably or better.

Evaluating the Alignment Between GeoAI Explanations and Domain Knowledge in Satellite-Based Flood Mapping

cs.CV · 2026-04-28 · unverdicted · novelty 6.0

ADAGE uses Channel-Group SHAP to quantify alignment between GeoAI model explanations and domain knowledge references in satellite-based flood mapping.

Hierarchical, Interpretable, Label-Free Concept Bottleneck Model

cs.CV · 2026-04-02 · unverdicted · novelty 6.0

HIL-CBM is a hierarchical label-free concept bottleneck model that improves classification accuracy and explanation quality over prior single-level CBMs using a visual consistency loss and dual heads.

GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

GRALIS supplies a canonical representation (Q, w, Delta) for every additive linear continuous attribution functional on L^2 via the Riesz Representation Theorem, unifying SHAP, IG, LIME and linearized GradCAM while proving seven simultaneous guarantees including completeness and interaction values.

Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

MA-GIG improves Integrated Gradients by performing path integration in the latent space of a pre-trained VAE so that decoded points remain closer to the learned data manifold and reduce off-manifold gradient noise.

H-Sets: Hessian-Guided Discovery of Set-Level Feature Interactions in Image Classifiers

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

H-Sets detects higher-order feature interactions in image classifiers via Hessian-guided pair merging and attributes them with IDG-Vis to generate more interpretable saliency maps than existing marginal or coarse methods.

On the Importance and Evaluation of Narrativity in Natural Language AI Explanations

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

XAI explanations should be narratives with continuous structure, cause-effect, fluency and diversity, and new metrics are needed to evaluate this better than standard NLP scores.

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks

cs.AI · 2026-04-20 · conditional · novelty 6.0

Token-level contrastive attribution yields informative signals for some LLM benchmark failures but is not universally applicable across datasets and models.

Potential of Gaia XP Spectra in Red Giant Star Asteroseismology: A Deep-Learning Approach

astro-ph.SR · 2026-04-18 · unverdicted · novelty 6.0

Hybrid deep learning models recover large frequency separation, frequency of maximum power, and dipole period spacing from low-resolution Gaia XP spectra with accuracy comparable to moderate-resolution spectroscopy.

Towards Reliable Testing of Machine Unlearning

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

Causal fuzzing with budgeted interventions can detect residual direct and indirect influence of unlearned data that standard attribution methods miss due to proxies, cancellations, and masking.

Cross-Modal Knowledge Distillation for PET-Free Amyloid-Beta Detection from MRI

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

A PET-guided knowledge distillation approach achieves AUCs of 0.74 and 0.68 for amyloid-beta detection from MRI alone across two datasets without requiring PET or clinical covariates at test time.

Learn to Rank: Visual Attribution by Learning Importance Ranking

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

A new end-to-end training scheme for visual attribution maps that optimizes deletion and insertion metrics directly via differentiable ranking relaxation instead of surrogate objectives.

Biological Plausibility and Representational Alignment of Feedback Alignment in Convolutional Networks

cs.AI · 2026-05-08 · unverdicted · novelty 5.0

Modified feedback alignment in convolutional networks produces representations geometrically aligned with backpropagation on CIFAR-10.

citing papers explorer

Showing 35 of 35 citing papers.

From Mechanistic to Compositional Interpretability cs.LG · 2026-05-09 · unverdicted · none · ref 212 · internal anchor
Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaranteeing concise human-aligned decompositions.
SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data cs.LG · 2026-05-08 · unverdicted · none · ref 192 · internal anchor
SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.
Mapping data sensitivities in global QCD analysis with linear response and influence functions hep-ph · 2026-04-30 · unverdicted · none · ref 48 · internal anchor
A framework based on linear response and influence functions maps data sensitivities in global QCD analyses to show how experiments determine central values, uncertainties, and correlations of non-perturbative functions.
Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning astro-ph.GA · 2026-04-28 · unverdicted · none · ref 66 · internal anchor
A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.
Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings cond-mat.mtrl-sci · 2026-04-28 · unverdicted · none · ref 45 · internal anchor
Introduces the RealMat-BaG benchmark showing fundamental generalization limits of ML models when predicting experimental bandgaps from DFT-trained data.
Scaling and evaluating sparse autoencoders cs.LG · 2024-06-06 · unverdicted · none · ref 58 · internal anchor
K-sparse autoencoders with dead-latent fixes produce clean scaling laws and better feature quality metrics that improve with size, shown by training a 16-million-latent model on GPT-4 activations.
ISAAC: Auditing Causal Reasoning in Deep Models for Drug-Target Interaction cs.LG · 2026-05-03 · unverdicted · none · ref 12
ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.
From Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Models cs.CV · 2026-05-01 · unverdicted · none · ref 3
An iERF-centric framework unifies local, global, and mechanistic interpretability in vision models via SRD for saliency, CAFE for concept anchoring, and ICAT for interlayer attribution.
Mamba-SSM with LLM Reasoning for Feature Selection: Faithfulness-Aware Biomarker Discovery q-bio.QM · 2026-04-15 · unverdicted · none · ref 7
LLM chain-of-thought filtering of Mamba saliency features on TCGA-BRCA data produces a 17-gene set with AUC 0.927 that beats both the raw 50-gene saliency list and a 5000-gene baseline while using far fewer features, though it misses many known BRCA genes.
Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision cs.CV · 2026-04-14 · unverdicted · none · ref 31
Cross-Layer Transcoders decompose ViT activations into sparse, depth-aware layer contributions that maintain zero-shot accuracy and enable faithful attribution of the final representation.
APEX: Audio Prototype EXplanations for Classification Tasks cs.SD · 2026-05-11 · unverdicted · none · ref 16 · internal anchor
APEX generates four types of prototype-based explanations for pre-trained audio classifiers that preserve output invariance and target acoustic properties better than gradient methods applied to spectrograms.
Scaling Vision Models Does Not Consistently Improve Localisation-Based Explanation Quality cs.CV · 2026-05-11 · accept · none · ref 4 · internal anchor
Scaling vision models by depth and parameter count does not consistently improve localisation-based explanation quality across architectures, datasets, and post-hoc methods; smaller models often perform comparably or better.
Evaluating the Alignment Between GeoAI Explanations and Domain Knowledge in Satellite-Based Flood Mapping cs.CV · 2026-04-28 · unverdicted · none · ref 21 · internal anchor
ADAGE uses Channel-Group SHAP to quantify alignment between GeoAI model explanations and domain knowledge references in satellite-based flood mapping.
Hierarchical, Interpretable, Label-Free Concept Bottleneck Model cs.CV · 2026-04-02 · unverdicted · none · ref 26 · internal anchor
HIL-CBM is a hierarchical label-free concept bottleneck model that improves classification accuracy and explanation quality over prior single-level CBMs using a visual consistency loss and dual heads.
GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation cs.LG · 2026-05-06 · unverdicted · none · ref 7
GRALIS supplies a canonical representation (Q, w, Delta) for every additive linear continuous attribution functional on L^2 via the Riesz Representation Theorem, unifying SHAP, IG, LIME and linearized GradCAM while proving seven simultaneous guarantees including completeness and interaction values.
Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution cs.LG · 2026-05-04 · unverdicted · none · ref 8
MA-GIG improves Integrated Gradients by performing path integration in the latent space of a pre-trained VAE so that decoded points remain closer to the learned data manifold and reduce off-manifold gradient noise.
H-Sets: Hessian-Guided Discovery of Set-Level Feature Interactions in Image Classifiers cs.CV · 2026-04-23 · unverdicted · none · ref 34
H-Sets detects higher-order feature interactions in image classifiers via Hessian-guided pair merging and attributes them with IDG-Vis to generate more interpretable saliency maps than existing marginal or coarse methods.
On the Importance and Evaluation of Narrativity in Natural Language AI Explanations cs.CL · 2026-04-20 · unverdicted · none · ref 14
XAI explanations should be narratives with continuous structure, cause-effect, fluency and diversity, and new metrics are needed to evaluate this better than standard NLP scores.
Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks cs.AI · 2026-04-20 · conditional · none · ref 56
Token-level contrastive attribution yields informative signals for some LLM benchmark failures but is not universally applicable across datasets and models.
Potential of Gaia XP Spectra in Red Giant Star Asteroseismology: A Deep-Learning Approach astro-ph.SR · 2026-04-18 · unverdicted · none · ref 69
Hybrid deep learning models recover large frequency separation, frequency of maximum power, and dipole period spacing from low-resolution Gaia XP spectra with accuracy comparable to moderate-resolution spectroscopy.
Towards Reliable Testing of Machine Unlearning cs.LG · 2026-04-16 · unverdicted · none · ref 41
Causal fuzzing with budgeted interventions can detect residual direct and indirect influence of unlearned data that standard attribution methods miss due to proxies, cancellations, and masking.
Cross-Modal Knowledge Distillation for PET-Free Amyloid-Beta Detection from MRI cs.CV · 2026-04-14 · unverdicted · none · ref 48
A PET-guided knowledge distillation approach achieves AUCs of 0.74 and 0.68 for amyloid-beta detection from MRI alone across two datasets without requiring PET or clinical covariates at test time.
Learn to Rank: Visual Attribution by Learning Importance Ranking cs.CV · 2026-04-07 · unverdicted · none · ref 56
A new end-to-end training scheme for visual attribution maps that optimizes deletion and insertion metrics directly via differentiable ranking relaxation instead of surrogate objectives.
Biological Plausibility and Representational Alignment of Feedback Alignment in Convolutional Networks cs.AI · 2026-05-08 · unverdicted · none · ref 8 · internal anchor
Modified feedback alignment in convolutional networks produces representations geometrically aligned with backpropagation on CIFAR-10.
ZScribbleSeg: A comprehensive segmentation framework with modeling of efficient annotation and maximization of scribble supervision cs.CV · 2026-05-07 · unverdicted · none · ref 107 · internal anchor
ZScribbleSeg maximizes scribble supervision with efficient annotation forms, spatial regularization, and EM-estimated class ratios to deliver competitive performance on six medical segmentation tasks without full labels.
Seeing What Shouldn't Be There: Counterfactual GANs for Medical Image Attribution cs.CV · 2026-05-06 · unverdicted · none · ref 49
A cycle-consistent GAN generates counterfactual medical images to attribute classification decisions more comprehensively than standard saliency methods.
Understanding the Prompt Sensitivity cs.CL · 2026-04-20 · unverdicted · none · ref 5
LLMs disperse meaning-preserving prompts internally instead of clustering them, which produces an excessively high upper bound on output log-probability differences via Taylor expansion and Cauchy-Schwarz.
Path-Sampled Integrated Gradients cs.LG · 2026-04-15 · unverdicted · none · ref 11
Path-sampled integrated gradients generalizes integrated gradients by averaging gradients over sampled baselines on the linear path, proving equivalence to a weighted version that improves convergence rate to O(m^{-1}) and reduces variance by a factor of 1/3 under uniform sampling.
Interpretable and Explainable Surrogate Modeling for Simulations: A State-of-the-Art Survey and Perspectives on Explainable AI for Decision-Making cs.AI · 2026-04-15 · unverdicted · none · ref 217
This survey synthesizes XAI methods with surrogate modeling workflows for simulations and outlines a research agenda to embed explainability into simulation-driven design and decision-making.
ConceptTracer: Interactive Analysis of Concept Saliency and Selectivity in Neural Representations cs.LG · 2026-04-08 · unverdicted · none · ref 9
ConceptTracer supplies an interactive interface and saliency/selectivity metrics to locate concept-responsive neurons in neural representations, shown on TabPFN.
PECKER: A Precisely Efficient Critical Knowledge Erasure Recipe For Machine Unlearning in Diffusion Models cs.AI · 2026-04-07 · unverdicted · none · ref 26
PECKER uses a saliency mask to prioritize parameter updates in distillation-based unlearning, achieving shorter training times for class and concept forgetting on CIFAR-10 and STL-10 while matching prior methods' efficacy.
Evaluating Explainability in Safety-Critical ATR Systems: Limitations of Post-Hoc Methods and Paths Toward Robust XAI cs.AI · 2026-05-07 · unverdicted · none · ref 23 · internal anchor
Post-hoc XAI methods in ATR systems produce spurious explanations, show instability under perturbations, and induce overtrust, rendering them insufficient for safety-critical deployment without causal grounding.
Learning Coarse-to-Fine Osteoarthritis Representations under Noisy Hierarchical Labels cs.CV · 2026-05-01 · unverdicted · none · ref 32
Dual-head training on hierarchical OA labels yields backbone-dependent gains in KL metrics, more ordered latent severity axes, and better saliency alignment with cartilage for some 3D backbones.
TabSHAP cs.LG · 2026-04-22 · unverdicted · none · ref 3
TabSHAP attributes feature impact in LLM tabular classifiers via sampled Shapley coalitions and JSD on output distributions, reporting higher deletion faithfulness than random or XGBoost-proxy baselines on Adult Income and Heart Disease data.
Explainable Human Activity Recognition: A Unified Review of Concepts and Mechanisms cs.LG · 2026-04-10 · unverdicted · none · ref 14
The paper delivers a mechanism-centric taxonomy and unified perspective on explainable human activity recognition methods across sensing modalities.

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

hub tools

claims ledger

co-cited works

fields

years

verdicts

representative citing papers

citing papers explorer