hub Canonical reference

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, Andrew Zisserman · 2013 · cs.CV · arXiv 1312.6034

Canonical reference. 82% of citing Pith papers cite this work as background.

86 Pith papers citing it

Background 82% of classified citations

open full Pith review browse 86 citing papers arXiv PDF

abstract

This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [Erhan et al., 2009], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNets. Finally, we establish the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks [Zeiler et al., 2013].

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 method 1

citation-polarity summary

background 9 unclear 1 use method 1

claims ledger

abstract This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [Erhan et al., 2009], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNe

co-cited works

representative citing papers

Measuring Cross-Modal Synergy: A Benchmark for VLM Explainability

cs.AI · 2026-05-21 · unverdicted · novelty 7.0

Introduces Synergistic Faithfulness metric based on Shapley Interaction Index to evaluate cross-modal synergy in VLM explainers, revealing over-reliance on visual salience in existing methods.

Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

Spectral Integrated Gradients constructs SVD-based integration paths that activate singular components from largest to smallest, producing cleaner attribution maps and better quantitative scores than standard Integrated Gradients on image classification tasks.

Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

In a combinatorial toy setting, winning lottery tickets preserve families of compatible feature locations in early feature space that balance proximity to final codes with low interference, rather than specific weight subnetworks.

AIM: Adversarial Information Masking for Faithfulness Evaluation of Saliency Maps

cs.LG · 2026-05-16 · unverdicted · novelty 7.0

AIM is a new saliency-guided adversarial feature replacement method to evaluate faithfulness of saliency maps and reliability of masking operators on image, audio, and EEG tasks.

$\alpha$-TCAV: A Unified Framework for Testing with Concept Activation Vectors

stat.ML · 2026-05-15 · unverdicted · novelty 7.0

α-TCAV replaces TCAV's hard indicator with a tunable smooth function to create a unified probabilistic framework with lower variance and guidance for parameter choice or Bayes-optimal scoring.

How to Evaluate and Refine your CAM

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

Introduces synthetic ground-truth dataset for CAM evaluation, proposes ARCC composite metric, and RefineCAM method that aggregates layers for higher-resolution maps outperforming baselines.

SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.

GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation

cs.LG · 2026-05-06 · unverdicted · novelty 7.0 · 2 refs

GRALIS unifies linear XAI attribution methods via a Riesz Representation Theorem-derived canonical form (Q, w, Delta), delivering seven theorems on completeness, convergence, interactions, and multi-scale extensions.

Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution

cs.LG · 2026-05-04 · unverdicted · novelty 7.0 · 2 refs

MA-GIG uses VAE latent space to align Integrated Gradients paths with the data manifold for more faithful feature attributions in deep neural networks.

Mapping data sensitivities in global QCD analysis with linear response and influence functions

hep-ph · 2026-04-30 · unverdicted · novelty 7.0

A framework based on linear response and influence functions maps data sensitivities in global QCD analyses to show how experiments determine central values, uncertainties, and correlations of non-perturbative functions.

Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning

astro-ph.GA · 2026-04-28 · unverdicted · novelty 7.0

A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.

Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings

cond-mat.mtrl-sci · 2026-04-28 · unverdicted · novelty 7.0

Introduces the RealMat-BaG benchmark showing fundamental generalization limits of ML models when predicting experimental bandgaps from DFT-trained data.

TRANSPORTER: Transferring Visual Semantics from VLM Manifolds

cs.CV · 2025-11-23 · unverdicted · novelty 7.0

TRANSPORTER generates videos from VLM logits using optimal transport to interpret model predictions on object attributes, actions, and scenes.

Human-Centered Supervision for Sentiment Analysis in Telugu: A Systematic Inquiry Beyond Accuracy

cs.CL · 2025-08-02 · unverdicted · novelty 7.0

Human rationales in supervision for Telugu sentiment analysis improve model alignment with human reasoning and often produce gains in predictive performance.

Scaling and evaluating sparse autoencoders

cs.LG · 2024-06-06 · unverdicted · novelty 7.0

K-sparse autoencoders with dead-latent fixes produce clean scaling laws and better feature quality metrics that improve with size, shown by training a 16-million-latent model on GPT-4 activations.

Improving Dictionary Learning with Gated Sparse Autoencoders

cs.LG · 2024-04-24 · unverdicted · novelty 7.0

Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for good fidelity.

ISAAC: Auditing Causal Reasoning in Deep Models for Drug-Target Interaction

cs.LG · 2026-05-03 · unverdicted · novelty 7.0

ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.

From Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Models

cs.CV · 2026-05-01 · unverdicted · novelty 7.0

An iERF-centric framework unifies local, global, and mechanistic interpretability in vision models via SRD for saliency, CAFE for concept anchoring, and ICAT for interlayer attribution.

Mamba-SSM with LLM Reasoning for Feature Selection: Faithfulness-Aware Biomarker Discovery

q-bio.QM · 2026-04-15 · unverdicted · novelty 7.0

LLM chain-of-thought filtering of Mamba saliency features on TCGA-BRCA data produces a 17-gene set with AUC 0.927 that beats both the raw 50-gene saliency list and a 5000-gene baseline while using far fewer features, though it misses many known BRCA genes.

Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

Cross-Layer Transcoders decompose ViT activations into sparse, depth-aware layer contributions that maintain zero-shot accuracy and enable faithful attribution of the final representation.

Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Transcoders decompose MLP layers in Gemma 3-4B-IT to trace visual grounding more effectively than SAEs and predict hallucinations from circuit graph features at AUC 0.68.

Rethinking Visual Attribution for Chest X-ray Reasoning in Large Vision Language Models

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

Existing visual attribution methods often fail to identify the visual evidence used by LVLMs in chest X-ray reasoning, while MedFocus using unbalanced optimal transport and targeted interventions substantially outperforms them across multiple models and settings.

OCCAM: Open-set Causal Concept explAnation and Ontology induction for black-box vision Models

cs.AI · 2026-05-18 · unverdicted · novelty 6.0

OCCAM discovers open-set visual concepts, estimates causal contributions via object-level interventions on black-box vision models, and induces a global concept ontology from aggregated dataset evidence.

GCE-MIL: Faithful and Recoverable Evidence for Multiple Instance Learning in Whole-Slide Imaging

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

GCE-MIL is a backbone-agnostic wrapper that directly optimizes MIL evidence for sufficiency, necessity, and recoverability, yielding modest gains in Macro-F1 and C-index plus more faithful patch selection across many backbones and datasets.

citing papers explorer

Showing 36 of 86 citing papers.

Cross-Modal Knowledge Distillation for PET-Free Amyloid-Beta Detection from MRI cs.CV · 2026-04-14 · unverdicted · none · ref 48
A PET-guided knowledge distillation approach achieves AUCs of 0.74 and 0.68 for amyloid-beta detection from MRI alone across two datasets without requiring PET or clinical covariates at test time.
Learn to Rank: Visual Attribution by Learning Importance Ranking cs.CV · 2026-04-07 · unverdicted · none · ref 56
A new end-to-end training scheme for visual attribution maps that optimizes deletion and insertion metrics directly via differentiable ranking relaxation instead of surrogate objectives.
AttnGen: Attention-Guided Saliency Learning for Interpretable Genomic Sequence Classification cs.LG · 2026-05-13 · unverdicted · none · ref 5 · internal anchor
AttnGen embeds attention-based saliency into training via progressive masking to improve both accuracy and interpretability in classifying 200-nucleotide genomic sequences.
Biological Plausibility and Representational Alignment of Feedback Alignment in Convolutional Networks cs.AI · 2026-05-08 · unverdicted · none · ref 8 · internal anchor
Modified feedback alignment in convolutional networks produces representations geometrically aligned with backpropagation on CIFAR-10.
ZScribbleSeg: A comprehensive segmentation framework with modeling of efficient annotation and maximization of scribble supervision cs.CV · 2026-05-07 · unverdicted · none · ref 107 · internal anchor
ZScribbleSeg maximizes scribble supervision with efficient annotation forms, spatial regularization, and EM-estimated class ratios to deliver competitive performance on six medical segmentation tasks without full labels.
Data-driven Sensor Placement for Predictive Applications: A Correlation-Assisted Attribution Framework (CAAF) cs.CE · 2025-10-26 · unverdicted · none · ref 28 · internal anchor
CAAF clusters candidate sensor locations before applying feature attribution to reduce redundancy and improve optimal sensor placement for predictions in dynamical systems.
ReGA: Model-Based Safeguard for LLMs via Representation-Guided Abstraction cs.CR · 2025-06-02 · unverdicted · none · ref 58 · internal anchor
ReGA uses safety-critical representations to guide abstraction in model-based analysis, enabling scalable detection of harmful LLM inputs with reported AUROC of 0.975 at prompt level.
xAI-Drop: Don't Use What You Cannot Explain cs.LG · 2024-07-29 · unverdicted · none · ref 49 · internal anchor
xAI-Drop introduces an explainability-based topological dropping regularizer for GNNs that outperforms state-of-the-art dropping methods in accuracy and explanation quality on real-world datasets.
Explaining Graph Neural Networks for Node Similarity on Graphs cs.LG · 2024-07-10 · unverdicted · none · ref 69 · internal anchor
Empirical comparison shows gradient-based explanations for GNN node similarities are actionable, consistent, and retain effects when sparsified, unlike mutual information explanations.
Enhancing Causal Reasoning in Large Language Models: A Causal Attribution Model for Precision Fine-Tuning cs.AI · 2023-12-30 · unverdicted · none · ref 7 · internal anchor
A causal attribution model is proposed that applies do-operators to quantify component contributions in LLMs' causal reasoning, motivating a fine-tuned model for pairwise causal discovery that combines knowledge and numerical data.
Explaining the Explainers in Graph Neural Networks: a Comparative Study cs.LG · 2022-10-27 · unverdicted · none · ref 100 · internal anchor
Benchmark study of ten GNN explainers on eight architectures and six datasets that isolates usable components and issues practical recommendations.
Explaining an increase in predicted risk for clinical alerts cs.LG · 2019-07-10 · unverdicted · none · ref 8 · internal anchor
Methods are introduced to lift static attribution techniques to dynamical models for explaining risk increases in clinical alert systems.
Unsupervised Domain Alignment to Mitigate Low Level Dataset Biases cs.CV · 2019-07-08 · unverdicted · none · ref 35 · internal anchor
The paper proposes an unsupervised domain alignment method using GANs with cycle consistency, adversarial, and SSIM losses to augment training data and reduce low-level dataset biases in computer vision.
ELF: Embedded Localisation of Features in pre-trained CNN cs.CV · 2019-07-07 · unverdicted · none · ref 35 · internal anchor
ELF derives keypoint locations via gradients on pre-trained CNN feature maps and reaches repeatability and matchability scores comparable to specialized detectors on HPatches, Webcam, and photo-tourism data.
Generative Counterfactual Introspection for Explainable Deep Learning cs.LG · 2019-07-06 · unverdicted · none · ref 4 · internal anchor
A generative-model-driven introspection method produces counterfactual image edits to explain deep neural network predictions on MNIST and CelebA.
DLIME: A Deterministic Local Interpretable Model-Agnostic Explanations Approach for Computer-Aided Diagnosis Systems cs.LG · 2019-06-24 · unverdicted · none · ref 29 · internal anchor
DLIME uses agglomerative hierarchical clustering and KNN to generate stable local explanations for black-box ML predictions on medical data, outperforming LIME on Jaccard similarity of repeated explanations.
Seeing What Shouldn't Be There: Counterfactual GANs for Medical Image Attribution cs.CV · 2026-05-06 · unverdicted · none · ref 49
A cycle-consistent GAN generates counterfactual medical images to attribute classification decisions more comprehensively than standard saliency methods.
Understanding the Prompt Sensitivity cs.CL · 2026-04-20 · unverdicted · none · ref 5
LLMs disperse meaning-preserving prompts internally instead of clustering them, which produces an excessively high upper bound on output log-probability differences via Taylor expansion and Cauchy-Schwarz.
Path-Sampled Integrated Gradients cs.LG · 2026-04-15 · unverdicted · none · ref 11
Path-sampled integrated gradients generalizes integrated gradients by averaging gradients over sampled baselines on the linear path, proving equivalence to a weighted version that improves convergence rate to O(m^{-1}) and reduces variance by a factor of 1/3 under uniform sampling.
Interpretable and Explainable Surrogate Modeling for Simulations: A State-of-the-Art Survey and Perspectives on Explainable AI for Decision-Making cs.AI · 2026-04-15 · unverdicted · none · ref 217
This survey synthesizes XAI methods with surrogate modeling workflows for simulations and outlines a research agenda to embed explainability into simulation-driven design and decision-making.
ConceptTracer: Interactive Analysis of Concept Saliency and Selectivity in Neural Representations cs.LG · 2026-04-08 · unverdicted · none · ref 9
ConceptTracer supplies an interactive interface and saliency/selectivity metrics to locate concept-responsive neurons in neural representations, shown on TabPFN.
Evaluating Explainability in Safety-Critical ATR Systems: Limitations of Post-Hoc Methods and Paths Toward Robust XAI cs.AI · 2026-05-07 · unverdicted · none · ref 23 · internal anchor
Post-hoc XAI methods in ATR systems produce spurious explanations, show instability under perturbations, and induce overtrust, rendering them insufficient for safety-critical deployment without causal grounding.
Explainability Methods for Hardware Trojan Detection: A Systematic Comparison cs.LG · 2026-01-26 · unverdicted · none · ref 31 · internal anchor
Compares domain-aware, case-based, and feature attribution explainability methods for gate-level hardware Trojan detection on the Trust-Hub benchmark dataset.
Automatically Learning Construction Injury Precursors from Text cs.CL · 2019-07-26 · unverdicted · none · ref 61 · internal anchor
Standard NLP classifiers can surface valid injury precursors from raw construction safety reports.
Learning Coarse-to-Fine Osteoarthritis Representations under Noisy Hierarchical Labels cs.CV · 2026-05-01 · unverdicted · none · ref 32
Dual-head training on hierarchical OA labels yields backbone-dependent gains in KL metrics, more ordered latent severity axes, and better saliency alignment with cartilage for some 3D backbones.
TabSHAP cs.LG · 2026-04-22 · unverdicted · none · ref 3
TabSHAP attributes feature impact in LLM tabular classifiers via sampled Shapley coalitions and JSD on output distributions, reporting higher deletion faithfulness than random or XGBoost-proxy baselines on Adult Income and Heart Disease data.
Explainable Human Activity Recognition: A Unified Review of Concepts and Mechanisms cs.LG · 2026-04-10 · unverdicted · none · ref 14
The paper delivers a mechanism-centric taxonomy and unified perspective on explainable human activity recognition methods across sensing modalities.
Can machine learning for quantum-gas experiments be explainable? cond-mat.quant-gas · 2026-05-18 · unverdicted · none · ref 24 · internal anchor
Machine learning assists with image denoising and soliton detection in cold-atom quantum simulators while addressing the need for model interpretability.
Unsupervised Machine Learning to Teach Fluid Dynamicists to Think in 15 Dimensions physics.flu-dyn · 2019-07-23 · unverdicted · none · ref 42 · internal anchor
An autoencoder on 10^12-point stratified turbulence data identifies vertical velocity as a key marker for turbulence features via bleed-over in reconstruction errors.
Machine Learning applications to Galaxy Clusters astro-ph.CO · 2026-05-21 · unverdicted · none · ref 61 · internal anchor
A review summarizing AI applications to galaxy cluster mass estimation, dynamical state characterization, merger analysis, and simulation emulation from multiple observational tracers.
The Neglected Baseline in Model Interpretation cs.CV · 2026-05-21 · unreviewed · ref 21 · internal anchor
ARC-STAR: Auditable Post-Hoc Correction for PDE Foundation Models cs.LG · 2026-05-21 · unreviewed · ref 61 · 2 links · internal anchor
I-SAFE: Wasserstein Coherence Metrics for Structural Auditing of Scientific AI Models cs.LG · 2026-05-20 · unreviewed · ref 29 · internal anchor
From Mechanistic to Compositional Interpretability cs.LG · 2026-05-09 · unreviewed · ref 212 · internal anchor
Evaluation of AI Ethics Tools in Language Models: A Developers' Perspective Case Study cs.CY · 2025-12-16 · unreviewed · ref 166 · internal anchor
PECKER: A Precisely Efficient Critical Knowledge Erasure Recipe For Machine Unlearning in Diffusion Models cs.AI · 2026-04-07 · unreviewed · ref 26

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer