Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaranteeing concise human-aligned decompositions.
hub
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
35 Pith papers cite this work. Polarity classification is still indexing.
abstract
This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [Erhan et al., 2009], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNets. Finally, we establish the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks [Zeiler et al., 2013].
hub tools
claims ledger
- abstract This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [Erhan et al., 2009], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNe
co-cited works
representative citing papers
SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.
A framework based on linear response and influence functions maps data sensitivities in global QCD analyses to show how experiments determine central values, uncertainties, and correlations of non-perturbative functions.
A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.
Introduces the RealMat-BaG benchmark showing fundamental generalization limits of ML models when predicting experimental bandgaps from DFT-trained data.
K-sparse autoencoders with dead-latent fixes produce clean scaling laws and better feature quality metrics that improve with size, shown by training a 16-million-latent model on GPT-4 activations.
ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.
An iERF-centric framework unifies local, global, and mechanistic interpretability in vision models via SRD for saliency, CAFE for concept anchoring, and ICAT for interlayer attribution.
LLM chain-of-thought filtering of Mamba saliency features on TCGA-BRCA data produces a 17-gene set with AUC 0.927 that beats both the raw 50-gene saliency list and a 5000-gene baseline while using far fewer features, though it misses many known BRCA genes.
Cross-Layer Transcoders decompose ViT activations into sparse, depth-aware layer contributions that maintain zero-shot accuracy and enable faithful attribution of the final representation.
APEX generates four types of prototype-based explanations for pre-trained audio classifiers that preserve output invariance and target acoustic properties better than gradient methods applied to spectrograms.
Scaling vision models by depth and parameter count does not consistently improve localisation-based explanation quality across architectures, datasets, and post-hoc methods; smaller models often perform comparably or better.
ADAGE uses Channel-Group SHAP to quantify alignment between GeoAI model explanations and domain knowledge references in satellite-based flood mapping.
HIL-CBM is a hierarchical label-free concept bottleneck model that improves classification accuracy and explanation quality over prior single-level CBMs using a visual consistency loss and dual heads.
GRALIS supplies a canonical representation (Q, w, Delta) for every additive linear continuous attribution functional on L^2 via the Riesz Representation Theorem, unifying SHAP, IG, LIME and linearized GradCAM while proving seven simultaneous guarantees including completeness and interaction values.
MA-GIG improves Integrated Gradients by performing path integration in the latent space of a pre-trained VAE so that decoded points remain closer to the learned data manifold and reduce off-manifold gradient noise.
H-Sets detects higher-order feature interactions in image classifiers via Hessian-guided pair merging and attributes them with IDG-Vis to generate more interpretable saliency maps than existing marginal or coarse methods.
XAI explanations should be narratives with continuous structure, cause-effect, fluency and diversity, and new metrics are needed to evaluate this better than standard NLP scores.
Token-level contrastive attribution yields informative signals for some LLM benchmark failures but is not universally applicable across datasets and models.
Hybrid deep learning models recover large frequency separation, frequency of maximum power, and dipole period spacing from low-resolution Gaia XP spectra with accuracy comparable to moderate-resolution spectroscopy.
Causal fuzzing with budgeted interventions can detect residual direct and indirect influence of unlearned data that standard attribution methods miss due to proxies, cancellations, and masking.
A PET-guided knowledge distillation approach achieves AUCs of 0.74 and 0.68 for amyloid-beta detection from MRI alone across two datasets without requiring PET or clinical covariates at test time.
A new end-to-end training scheme for visual attribution maps that optimizes deletion and insertion metrics directly via differentiable ranking relaxation instead of surrogate objectives.
Modified feedback alignment in convolutional networks produces representations geometrically aligned with backpropagation on CIFAR-10.
citing papers explorer
-
From Mechanistic to Compositional Interpretability
Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaranteeing concise human-aligned decompositions.
-
SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data
SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.
-
Mapping data sensitivities in global QCD analysis with linear response and influence functions
A framework based on linear response and influence functions maps data sensitivities in global QCD analyses to show how experiments determine central values, uncertainties, and correlations of non-perturbative functions.
-
Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning
A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.
-
Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings
Introduces the RealMat-BaG benchmark showing fundamental generalization limits of ML models when predicting experimental bandgaps from DFT-trained data.
-
Scaling and evaluating sparse autoencoders
K-sparse autoencoders with dead-latent fixes produce clean scaling laws and better feature quality metrics that improve with size, shown by training a 16-million-latent model on GPT-4 activations.
-
ISAAC: Auditing Causal Reasoning in Deep Models for Drug-Target Interaction
ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.
-
From Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Models
An iERF-centric framework unifies local, global, and mechanistic interpretability in vision models via SRD for saliency, CAFE for concept anchoring, and ICAT for interlayer attribution.
-
Mamba-SSM with LLM Reasoning for Feature Selection: Faithfulness-Aware Biomarker Discovery
LLM chain-of-thought filtering of Mamba saliency features on TCGA-BRCA data produces a 17-gene set with AUC 0.927 that beats both the raw 50-gene saliency list and a 5000-gene baseline while using far fewer features, though it misses many known BRCA genes.
-
Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision
Cross-Layer Transcoders decompose ViT activations into sparse, depth-aware layer contributions that maintain zero-shot accuracy and enable faithful attribution of the final representation.
-
APEX: Audio Prototype EXplanations for Classification Tasks
APEX generates four types of prototype-based explanations for pre-trained audio classifiers that preserve output invariance and target acoustic properties better than gradient methods applied to spectrograms.
-
Scaling Vision Models Does Not Consistently Improve Localisation-Based Explanation Quality
Scaling vision models by depth and parameter count does not consistently improve localisation-based explanation quality across architectures, datasets, and post-hoc methods; smaller models often perform comparably or better.
-
Evaluating the Alignment Between GeoAI Explanations and Domain Knowledge in Satellite-Based Flood Mapping
ADAGE uses Channel-Group SHAP to quantify alignment between GeoAI model explanations and domain knowledge references in satellite-based flood mapping.
-
Hierarchical, Interpretable, Label-Free Concept Bottleneck Model
HIL-CBM is a hierarchical label-free concept bottleneck model that improves classification accuracy and explanation quality over prior single-level CBMs using a visual consistency loss and dual heads.
-
GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation
GRALIS supplies a canonical representation (Q, w, Delta) for every additive linear continuous attribution functional on L^2 via the Riesz Representation Theorem, unifying SHAP, IG, LIME and linearized GradCAM while proving seven simultaneous guarantees including completeness and interaction values.
-
Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution
MA-GIG improves Integrated Gradients by performing path integration in the latent space of a pre-trained VAE so that decoded points remain closer to the learned data manifold and reduce off-manifold gradient noise.
-
H-Sets: Hessian-Guided Discovery of Set-Level Feature Interactions in Image Classifiers
H-Sets detects higher-order feature interactions in image classifiers via Hessian-guided pair merging and attributes them with IDG-Vis to generate more interpretable saliency maps than existing marginal or coarse methods.
-
On the Importance and Evaluation of Narrativity in Natural Language AI Explanations
XAI explanations should be narratives with continuous structure, cause-effect, fluency and diversity, and new metrics are needed to evaluate this better than standard NLP scores.
-
Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks
Token-level contrastive attribution yields informative signals for some LLM benchmark failures but is not universally applicable across datasets and models.
-
Potential of Gaia XP Spectra in Red Giant Star Asteroseismology: A Deep-Learning Approach
Hybrid deep learning models recover large frequency separation, frequency of maximum power, and dipole period spacing from low-resolution Gaia XP spectra with accuracy comparable to moderate-resolution spectroscopy.
-
Towards Reliable Testing of Machine Unlearning
Causal fuzzing with budgeted interventions can detect residual direct and indirect influence of unlearned data that standard attribution methods miss due to proxies, cancellations, and masking.
-
Cross-Modal Knowledge Distillation for PET-Free Amyloid-Beta Detection from MRI
A PET-guided knowledge distillation approach achieves AUCs of 0.74 and 0.68 for amyloid-beta detection from MRI alone across two datasets without requiring PET or clinical covariates at test time.
-
Learn to Rank: Visual Attribution by Learning Importance Ranking
A new end-to-end training scheme for visual attribution maps that optimizes deletion and insertion metrics directly via differentiable ranking relaxation instead of surrogate objectives.
-
Biological Plausibility and Representational Alignment of Feedback Alignment in Convolutional Networks
Modified feedback alignment in convolutional networks produces representations geometrically aligned with backpropagation on CIFAR-10.
-
ZScribbleSeg: A comprehensive segmentation framework with modeling of efficient annotation and maximization of scribble supervision
ZScribbleSeg maximizes scribble supervision with efficient annotation forms, spatial regularization, and EM-estimated class ratios to deliver competitive performance on six medical segmentation tasks without full labels.
-
Seeing What Shouldn't Be There: Counterfactual GANs for Medical Image Attribution
A cycle-consistent GAN generates counterfactual medical images to attribute classification decisions more comprehensively than standard saliency methods.
-
Understanding the Prompt Sensitivity
LLMs disperse meaning-preserving prompts internally instead of clustering them, which produces an excessively high upper bound on output log-probability differences via Taylor expansion and Cauchy-Schwarz.
-
Path-Sampled Integrated Gradients
Path-sampled integrated gradients generalizes integrated gradients by averaging gradients over sampled baselines on the linear path, proving equivalence to a weighted version that improves convergence rate to O(m^{-1}) and reduces variance by a factor of 1/3 under uniform sampling.
-
Interpretable and Explainable Surrogate Modeling for Simulations: A State-of-the-Art Survey and Perspectives on Explainable AI for Decision-Making
This survey synthesizes XAI methods with surrogate modeling workflows for simulations and outlines a research agenda to embed explainability into simulation-driven design and decision-making.
-
ConceptTracer: Interactive Analysis of Concept Saliency and Selectivity in Neural Representations
ConceptTracer supplies an interactive interface and saliency/selectivity metrics to locate concept-responsive neurons in neural representations, shown on TabPFN.
-
PECKER: A Precisely Efficient Critical Knowledge Erasure Recipe For Machine Unlearning in Diffusion Models
PECKER uses a saliency mask to prioritize parameter updates in distillation-based unlearning, achieving shorter training times for class and concept forgetting on CIFAR-10 and STL-10 while matching prior methods' efficacy.
-
Evaluating Explainability in Safety-Critical ATR Systems: Limitations of Post-Hoc Methods and Paths Toward Robust XAI
Post-hoc XAI methods in ATR systems produce spurious explanations, show instability under perturbations, and induce overtrust, rendering them insufficient for safety-critical deployment without causal grounding.
-
Learning Coarse-to-Fine Osteoarthritis Representations under Noisy Hierarchical Labels
Dual-head training on hierarchical OA labels yields backbone-dependent gains in KL metrics, more ordered latent severity axes, and better saliency alignment with cartilage for some 3D backbones.
-
TabSHAP
TabSHAP attributes feature impact in LLM tabular classifiers via sampled Shapley coalitions and JSD on output distributions, reporting higher deletion faithfulness than random or XGBoost-proxy baselines on Adult Income and Heart Disease data.
-
Explainable Human Activity Recognition: A Unified Review of Concepts and Mechanisms
The paper delivers a mechanism-centric taxonomy and unified perspective on explainable human activity recognition methods across sensing modalities.