Introduces Synergistic Faithfulness metric based on Shapley Interaction Index to evaluate cross-modal synergy in VLM explainers, revealing over-reliance on visual salience in existing methods.
hub Canonical reference
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Canonical reference. 82% of citing Pith papers cite this work as background.
abstract
This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [Erhan et al., 2009], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNets. Finally, we establish the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks [Zeiler et al., 2013].
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [Erhan et al., 2009], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNe
co-cited works
representative citing papers
Spectral Integrated Gradients constructs SVD-based integration paths that activate singular components from largest to smallest, producing cleaner attribution maps and better quantitative scores than standard Integrated Gradients on image classification tasks.
In a combinatorial toy setting, winning lottery tickets preserve families of compatible feature locations in early feature space that balance proximity to final codes with low interference, rather than specific weight subnetworks.
AIM is a new saliency-guided adversarial feature replacement method to evaluate faithfulness of saliency maps and reliability of masking operators on image, audio, and EEG tasks.
α-TCAV replaces TCAV's hard indicator with a tunable smooth function to create a unified probabilistic framework with lower variance and guidance for parameter choice or Bayes-optimal scoring.
Introduces synthetic ground-truth dataset for CAM evaluation, proposes ARCC composite metric, and RefineCAM method that aggregates layers for higher-resolution maps outperforming baselines.
SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.
GRALIS unifies linear XAI attribution methods via a Riesz Representation Theorem-derived canonical form (Q, w, Delta), delivering seven theorems on completeness, convergence, interactions, and multi-scale extensions.
MA-GIG uses VAE latent space to align Integrated Gradients paths with the data manifold for more faithful feature attributions in deep neural networks.
A framework based on linear response and influence functions maps data sensitivities in global QCD analyses to show how experiments determine central values, uncertainties, and correlations of non-perturbative functions.
A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.
Introduces the RealMat-BaG benchmark showing fundamental generalization limits of ML models when predicting experimental bandgaps from DFT-trained data.
TRANSPORTER generates videos from VLM logits using optimal transport to interpret model predictions on object attributes, actions, and scenes.
Human rationales in supervision for Telugu sentiment analysis improve model alignment with human reasoning and often produce gains in predictive performance.
K-sparse autoencoders with dead-latent fixes produce clean scaling laws and better feature quality metrics that improve with size, shown by training a 16-million-latent model on GPT-4 activations.
Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for good fidelity.
ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.
An iERF-centric framework unifies local, global, and mechanistic interpretability in vision models via SRD for saliency, CAFE for concept anchoring, and ICAT for interlayer attribution.
LLM chain-of-thought filtering of Mamba saliency features on TCGA-BRCA data produces a 17-gene set with AUC 0.927 that beats both the raw 50-gene saliency list and a 5000-gene baseline while using far fewer features, though it misses many known BRCA genes.
Cross-Layer Transcoders decompose ViT activations into sparse, depth-aware layer contributions that maintain zero-shot accuracy and enable faithful attribution of the final representation.
Transcoders decompose MLP layers in Gemma 3-4B-IT to trace visual grounding more effectively than SAEs and predict hallucinations from circuit graph features at AUC 0.68.
Existing visual attribution methods often fail to identify the visual evidence used by LVLMs in chest X-ray reasoning, while MedFocus using unbalanced optimal transport and targeted interventions substantially outperforms them across multiple models and settings.
OCCAM discovers open-set visual concepts, estimates causal contributions via object-level interventions on black-box vision models, and induces a global concept ontology from aggregated dataset evidence.
GCE-MIL is a backbone-agnostic wrapper that directly optimizes MIL evidence for sufficiency, necessity, and recoverability, yielding modest gains in Macro-F1 and C-index plus more faithful patch selection across many backbones and datasets.
citing papers explorer
-
APEX: Audio Prototype EXplanations for Classification Tasks
APEX generates four types of prototype-based explanations for pre-trained audio classifiers that preserve output invariance and target acoustic properties better than gradient methods applied to spectrograms.