super hub Canonical reference

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Andrea Vedaldi, Andrew Zisserman, Karen Simonyan · 2013 · cs.CV · arXiv 1312.6034

Canonical reference. 82% of citing Pith papers cite this work as background.

106 Pith papers citing it

Background 82% of classified citations

open full Pith review browse 106 citing papers more from Andrea Vedaldi arXiv PDF

abstract

This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [Erhan et al., 2009], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNets. Finally, we establish the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks [Zeiler et al., 2013].

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 method 1

citation-polarity summary

background 9 unclear 1 use method 1

claims ledger

abstract This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [Erhan et al., 2009], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNe

authors

Andrea Vedaldi Andrew Zisserman Karen Simonyan

co-cited works

representative citing papers

Diffusion Integrated Gradients: Controllable Path Generation for Flexible Feature Attribution

cs.LG · 2026-06-21 · unverdicted · novelty 7.0

DiffIG reformulates attribution path generation as conditional diffusion modeling trained on Stick-Breaking Process paths with guided sampling for user-controllable XAI.

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

cs.LG · 2026-06-08 · unverdicted · novelty 7.0

VFUSE applies sparse autoencoders to diffusion-transformer activations in RoseTTAFold3 and RFDiffusion3 to find monosemantic features that detect hazardous protein designs with AUROC up to 0.84.

Attribution via Distributional Paths for Information Revelation

cs.LG · 2026-06-02 · unverdicted · novelty 7.0

Reveal-IG performs path attribution by integrating model output changes along trajectories in a space of probe distributions rather than input-space paths, retaining completeness and handling multiscale or uncertain features.

Aligning Molecular Graph Explanations with Chemical Identity via InChIfied Invariants

cs.LG · 2026-05-23 · unverdicted · novelty 7.0

InChIfied Invariants based on InChI achieve 99.62% identical representations for chemically equivalent molecular graphs versus 0.35% for standard Daylight invariants on one million PubChem molecules, while preserving predictive performance and enforcing consistent attributions.

Measuring Cross-Modal Synergy: A Benchmark for VLM Explainability

cs.AI · 2026-05-21 · unverdicted · novelty 7.0

Introduces Synergistic Faithfulness metric based on Shapley Interaction Index to evaluate cross-modal synergy in VLM explainers, revealing over-reliance on visual salience in existing methods.

Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

Spectral Integrated Gradients constructs SVD-based integration paths that activate singular components from largest to smallest, producing cleaner attribution maps and better quantitative scores than standard Integrated Gradients on image classification tasks.

Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

In a combinatorial toy setting, winning lottery tickets preserve families of compatible feature locations in early feature space that balance proximity to final codes with low interference, rather than specific weight subnetworks.

AIM: Adversarial Information Masking for Faithfulness Evaluation of Saliency Maps

cs.LG · 2026-05-16 · unverdicted · novelty 7.0

AIM is a new saliency-guided adversarial feature replacement method to evaluate faithfulness of saliency maps and reliability of masking operators on image, audio, and EEG tasks.

$\alpha$-TCAV: A Unified Framework for Testing with Concept Activation Vectors

stat.ML · 2026-05-15 · unverdicted · novelty 7.0

α-TCAV replaces TCAV's hard indicator with a tunable smooth function to create a unified probabilistic framework with lower variance and guidance for parameter choice or Bayes-optimal scoring.

How to Evaluate and Refine your CAM

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

Introduces synthetic ground-truth dataset for CAM evaluation, proposes ARCC composite metric, and RefineCAM method that aggregates layers for higher-resolution maps outperforming baselines.

From Mechanistic to Compositional Interpretability

cs.LG · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

The paper introduces compositional interpretability as a category-theoretic framework that casts mechanistic explanations as commuting syntactic-semantic mappings optimized under faithfulness and complexity constraints derived from minimum description length.

SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.

GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation

cs.LG · 2026-05-06 · unverdicted · novelty 7.0 · 2 refs

GRALIS unifies linear XAI attribution methods via a Riesz Representation Theorem-derived canonical form (Q, w, Delta), delivering seven theorems on completeness, convergence, interactions, and multi-scale extensions.

Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution

cs.LG · 2026-05-04 · unverdicted · novelty 7.0 · 2 refs

MA-GIG uses VAE latent space to align Integrated Gradients paths with the data manifold for more faithful feature attributions in deep neural networks.

Mapping data sensitivities in global QCD analysis with linear response and influence functions

hep-ph · 2026-04-30 · unverdicted · novelty 7.0

A framework based on linear response and influence functions maps data sensitivities in global QCD analyses to show how experiments determine central values, uncertainties, and correlations of non-perturbative functions.

Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning

astro-ph.GA · 2026-04-28 · unverdicted · novelty 7.0

A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.

Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings

cond-mat.mtrl-sci · 2026-04-28 · unverdicted · novelty 7.0

Introduces the RealMat-BaG benchmark showing fundamental generalization limits of ML models when predicting experimental bandgaps from DFT-trained data.

TRANSPORTER: Transferring Visual Semantics from VLM Manifolds

cs.CV · 2025-11-23 · unverdicted · novelty 7.0

TRANSPORTER generates videos from VLM logits using optimal transport to interpret model predictions on object attributes, actions, and scenes.

Human-Centered Supervision for Sentiment Analysis in Telugu: A Systematic Inquiry Beyond Accuracy

cs.CL · 2025-08-02 · unverdicted · novelty 7.0

Human rationales in supervision for Telugu sentiment analysis improve model alignment with human reasoning and often produce gains in predictive performance.

Scaling and evaluating sparse autoencoders

cs.LG · 2024-06-06 · unverdicted · novelty 7.0

K-sparse autoencoders with dead-latent fixes produce clean scaling laws and better feature quality metrics that improve with size, shown by training a 16-million-latent model on GPT-4 activations.

Improving Dictionary Learning with Gated Sparse Autoencoders

cs.LG · 2024-04-24 · unverdicted · novelty 7.0

Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for good fidelity.

ISAAC: Auditing Causal Reasoning in Deep Models for Drug-Target Interaction

cs.LG · 2026-05-03 · unverdicted · novelty 7.0

ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.

From Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Models

cs.CV · 2026-05-01 · unverdicted · novelty 7.0

An iERF-centric framework unifies local, global, and mechanistic interpretability in vision models via SRD for saliency, CAFE for concept anchoring, and ICAT for interlayer attribution.

Mamba-SSM with LLM Reasoning for Feature Selection: Faithfulness-Aware Biomarker Discovery

q-bio.QM · 2026-04-15 · unverdicted · novelty 7.0

LLM chain-of-thought filtering of Mamba saliency features on TCGA-BRCA data produces a 17-gene set with AUC 0.927 that beats both the raw 50-gene saliency list and a 5000-gene baseline while using far fewer features, though it misses many known BRCA genes.

citing papers explorer

Showing 46 of 46 citing papers after filters.

Diffusion Integrated Gradients: Controllable Path Generation for Flexible Feature Attribution cs.LG · 2026-06-21 · unverdicted · none · ref 52 · internal anchor
DiffIG reformulates attribution path generation as conditional diffusion modeling trained on Stick-Breaking Process paths with guided sampling for user-controllable XAI.
VFUSE: Virulent Feature Understanding with Sparse autoEncoders cs.LG · 2026-06-08 · unverdicted · none · ref 4 · internal anchor
VFUSE applies sparse autoencoders to diffusion-transformer activations in RoseTTAFold3 and RFDiffusion3 to find monosemantic features that detect hazardous protein designs with AUROC up to 0.84.
Attribution via Distributional Paths for Information Revelation cs.LG · 2026-06-02 · unverdicted · none · ref 17 · internal anchor
Reveal-IG performs path attribution by integrating model output changes along trajectories in a space of probe distributions rather than input-space paths, retaining completeness and handling multiscale or uncertain features.
Aligning Molecular Graph Explanations with Chemical Identity via InChIfied Invariants cs.LG · 2026-05-23 · unverdicted · none · ref 38 · internal anchor
InChIfied Invariants based on InChI achieve 99.62% identical representations for chemically equivalent molecular graphs versus 0.35% for standard Daylight invariants on one million PubChem molecules, while preserving predictive performance and enforcing consistent attributions.
Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space cs.LG · 2026-05-18 · unverdicted · none · ref 24 · internal anchor
In a combinatorial toy setting, winning lottery tickets preserve families of compatible feature locations in early feature space that balance proximity to final codes with low interference, rather than specific weight subnetworks.
AIM: Adversarial Information Masking for Faithfulness Evaluation of Saliency Maps cs.LG · 2026-05-16 · unverdicted · none · ref 54 · internal anchor
AIM is a new saliency-guided adversarial feature replacement method to evaluate faithfulness of saliency maps and reliability of masking operators on image, audio, and EEG tasks.
From Mechanistic to Compositional Interpretability cs.LG · 2026-05-09 · unverdicted · none · ref 58 · 2 links · internal anchor
The paper introduces compositional interpretability as a category-theoretic framework that casts mechanistic explanations as commuting syntactic-semantic mappings optimized under faithfulness and complexity constraints derived from minimum description length.
SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data cs.LG · 2026-05-08 · unverdicted · none · ref 192 · internal anchor
SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.
GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation cs.LG · 2026-05-06 · unverdicted · none · ref 7 · 2 links · internal anchor
GRALIS unifies linear XAI attribution methods via a Riesz Representation Theorem-derived canonical form (Q, w, Delta), delivering seven theorems on completeness, convergence, interactions, and multi-scale extensions.
Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution cs.LG · 2026-05-04 · unverdicted · none · ref 8 · 2 links · internal anchor
MA-GIG uses VAE latent space to align Integrated Gradients paths with the data manifold for more faithful feature attributions in deep neural networks.
Scaling and evaluating sparse autoencoders cs.LG · 2024-06-06 · unverdicted · none · ref 58 · internal anchor
K-sparse autoencoders with dead-latent fixes produce clean scaling laws and better feature quality metrics that improve with size, shown by training a 16-million-latent model on GPT-4 activations.
Improving Dictionary Learning with Gated Sparse Autoencoders cs.LG · 2024-04-24 · unverdicted · none · ref 126 · internal anchor
Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for good fidelity.
ISAAC: Auditing Causal Reasoning in Deep Models for Drug-Target Interaction cs.LG · 2026-05-03 · unverdicted · none · ref 12
ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.
What LLMs explain is not what they believe: Evaluating explanation sufficiency under models' own input beliefs cs.LG · 2026-06-26 · unverdicted · none · ref 124 · internal anchor
Proposes SCSuff metric for evaluating LLM explanation sufficiency via model-generated alternative inputs, showing explanations are typically insufficient and predictable from hidden states.
Grad Detect: Gradient-Based Hallucination Detection in LLMs cs.LG · 2026-06-23 · unverdicted · none · ref 14 · internal anchor
Grad Detect uses internal gradient patterns from one inference pass to predict LLM hallucinations and abstention, outperforming confidence and sampling baselines on Q&A benchmarks with most signal in the final five layers.
Decoding Naturalistic Emotion Dynamics from the Brain: An LLM-Enhanced Regression Framework cs.LG · 2026-06-05 · unverdicted · none · ref 120 · internal anchor
A multi-target regression framework uses LLM-derived continuous sentiment profiles from narratives and dynamic functional connectivity from fMRI to track naturalistic emotional trajectories, outperforming static ROI measures.
Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models cs.LG · 2026-05-21 · unverdicted · none · ref 16 · internal anchor
Transcoders decompose MLP layers in Gemma 3-4B-IT to trace visual grounding more effectively than SAEs and predict hallucinations from circuit graph features at AUC 0.68.
ARC-STAR: Auditable Post-Hoc Correction for PDE Foundation Models cs.LG · 2026-05-21 · unverdicted · none · ref 61 · 3 links · internal anchor
ARC-STAR reduces velocity rollout error by at least 36x over raw Poseidon across all tested regime cells via auditable global and local correction stages on five flow benchmarks.
I-SAFE: Wasserstein Coherence Metrics for Structural Auditing of Scientific AI Models cs.LG · 2026-05-20 · unverdicted · none · ref 29 · 2 links · internal anchor
I-SAFE is a post-hoc auditing framework that applies quantile-based and Wasserstein coherence metrics to evaluate distributional response of DTI prediction models under structural perturbations from external priors like KLIFS annotations.
B-cos GNNs: Faithful Explanations through Dynamic Linearity cs.LG · 2026-05-19 · unverdicted · none · ref 26 · internal anchor
B-cos GNNs replace non-linear message and update functions with B-cos transforms in GNNs to enable exact per-node per-feature explanations from a single forward-backward pass while retaining competitive accuracy.
From Weight Perturbation to Feature Attribution for Explaining Fully Connected Neural Networks cs.LG · 2026-05-14 · unverdicted · none · ref 24 · internal anchor
XWP and XWP_c are novel attribution methods for FCNNs that estimate feature importance by perturbing attached weights to avoid added bias and out-of-distribution issues in occlusion approaches.
Faster Verified Explanations for Neural Networks cs.LG · 2025-11-28 · unverdicted · none · ref 27 · internal anchor
FaVeX accelerates verified explanations for neural networks via dynamic batch-sequential processing and query reuse while introducing verifier-optimal robust explanations that incorporate verifier incompleteness.
Boosting Team Modeling through Tempo-Relational Representation Learning cs.LG · 2025-07-17 · unverdicted · none · ref 128 · internal anchor
A tempo-relational neural architecture jointly models temporal and relational aspects of team interactions to outperform prior approaches on team performance prediction and enable efficient multi-task prediction of team constructs.
Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation cs.LG · 2025-06-13 · unverdicted · none · ref 17 · internal anchor
Synthetic experiments reveal that class-dependent effects appear in both perturbation-based and ground-truth evaluations of time series feature attributions, often producing contradictory rankings of attribution quality due to differences in feature amplitude or temporal extent between classes.
ExPath: Targeted Pathway Inference for Biological Knowledge Bases via Graph Learning and Explanation cs.LG · 2025-02-25 · unverdicted · none · ref 42 · internal anchor
ExPath is a subgraph inference framework that classifies bio-networks with experimental data and uses explanations to identify targeted pathways, reporting up to 4.5x higher Fidelity+ and 14x lower Fidelity- than baselines on 301 networks.
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation cs.LG · 2023-10-19 · conditional · none · ref 87 · internal anchor
SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
Multi-task Self-Supervised Learning for Human Activity Detection cs.LG · 2019-07-27 · unverdicted · none · ref 63 · internal anchor
A multi-task self-supervised approach trains a temporal CNN to detect transformations on sensory data, yielding features that match or exceed fully supervised performance in semi-supervised and transfer settings for smartphone-based HAR.
Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications cs.LG · 2019-07-19 · unverdicted · none · ref 32 · internal anchor
A scalable framework combining streaming graphs, topology computation, and topology-aware datacubes enables interactive analysis of high-dimensional functions in scientific ML applications.
Towards Reliable Testing of Machine Unlearning cs.LG · 2026-04-16 · unverdicted · none · ref 41
Causal fuzzing with budgeted interventions can detect residual direct and indirect influence of unlearned data that standard attribution methods miss due to proxies, cancellations, and masking.
CIExplainer++: Generating Causal and Interpretable Explanations for Graph Neural Networks cs.LG · 2026-06-17 · unverdicted · none · ref 6 · internal anchor
CIExplainer uses causal inference via the Potential Outcome Framework to find high-impact subgraphs for GNN explanations, with G2TeXplainer generating feature and relation-aware natural language descriptions.
Localization then Neutralization: Gradient-guided Token Suppression against Visual Prompt Injection Attack cs.LG · 2026-05-24 · unverdicted · none · ref 8 · internal anchor
Gradient Token Masking localizes critical adversarial image tokens via hidden-state gradient norms and masks them to neutralize prompt injection attacks in multimodal LLMs with one forward-backward pass.
AttnGen: Attention-Guided Saliency Learning for Interpretable Genomic Sequence Classification cs.LG · 2026-05-13 · unverdicted · none · ref 5 · internal anchor
AttnGen embeds attention-based saliency into training via progressive masking to improve both accuracy and interpretability in classifying 200-nucleotide genomic sequences.
xAI-Drop: Don't Use What You Cannot Explain cs.LG · 2024-07-29 · unverdicted · none · ref 49 · internal anchor
xAI-Drop introduces an explainability-based topological dropping regularizer for GNNs that outperforms state-of-the-art dropping methods in accuracy and explanation quality on real-world datasets.
Explaining Graph Neural Networks for Node Similarity on Graphs cs.LG · 2024-07-10 · unverdicted · none · ref 69 · internal anchor
Empirical comparison shows gradient-based explanations for GNN node similarities are actionable, consistent, and retain effects when sparsified, unlike mutual information explanations.
Explaining the Explainers in Graph Neural Networks: a Comparative Study cs.LG · 2022-10-27 · unverdicted · none · ref 100 · internal anchor
Benchmark study of ten GNN explainers on eight architectures and six datasets that isolates usable components and issues practical recommendations.
Explaining an increase in predicted risk for clinical alerts cs.LG · 2019-07-10 · unverdicted · none · ref 8 · internal anchor
Methods are introduced to lift static attribution techniques to dynamical models for explaining risk increases in clinical alert systems.
Generative Counterfactual Introspection for Explainable Deep Learning cs.LG · 2019-07-06 · unverdicted · none · ref 4 · internal anchor
A generative-model-driven introspection method produces counterfactual image edits to explain deep neural network predictions on MNIST and CelebA.
DLIME: A Deterministic Local Interpretable Model-Agnostic Explanations Approach for Computer-Aided Diagnosis Systems cs.LG · 2019-06-24 · unverdicted · none · ref 29 · internal anchor
DLIME uses agglomerative hierarchical clustering and KNN to generate stable local explanations for black-box ML predictions on medical data, outperforming LIME on Jaccard similarity of repeated explanations.
Path-Sampled Integrated Gradients cs.LG · 2026-04-15 · unverdicted · none · ref 11
Path-sampled integrated gradients generalizes integrated gradients by averaging gradients over sampled baselines on the linear path, proving equivalence to a weighted version that improves convergence rate to O(m^{-1}) and reduces variance by a factor of 1/3 under uniform sampling.
ConceptTracer: Interactive Analysis of Concept Saliency and Selectivity in Neural Representations cs.LG · 2026-04-08 · unverdicted · none · ref 9
ConceptTracer supplies an interactive interface and saliency/selectivity metrics to locate concept-responsive neurons in neural representations, shown on TabPFN.
Deep Reinforcement Learning for Spacecraft Attitude Control During Atmospheric Re-Entry cs.LG · 2026-06-30 · unverdicted · none · ref 115 · internal anchor
Hybrid RL-PID controllers track angle of attack better and show greater robustness than PID alone within a defined operational envelope for re-entry attitude control.
Learning to model pediatric asthma exacerbation from multiple risk factors: a case study in coastal Virginia cs.LG · 2026-06-04 · unverdicted · none · ref 61 · internal anchor
A case study develops a sparse dictionary learning approach to model pediatric asthma exacerbations from multiple risk factors and reports consensus on relative risks across statistical and machine learning models.
Explainability Methods for Hardware Trojan Detection: A Systematic Comparison cs.LG · 2026-01-26 · unverdicted · none · ref 31 · internal anchor
Compares domain-aware, case-based, and feature attribution explainability methods for gate-level hardware Trojan detection on the Trust-Hub benchmark dataset.
TabSHAP cs.LG · 2026-04-22 · unverdicted · none · ref 3
TabSHAP attributes feature impact in LLM tabular classifiers via sampled Shapley coalitions and JSD on output distributions, reporting higher deletion faithfulness than random or XGBoost-proxy baselines on Adult Income and Heart Disease data.
Explainable Human Activity Recognition: A Unified Review of Concepts and Mechanisms cs.LG · 2026-04-10 · unverdicted · none · ref 14
The paper delivers a mechanism-centric taxonomy and unified perspective on explainable human activity recognition methods across sensing modalities.
Explaining Unsupervised Disease Staging in Huntington's Disease: Insights into Model Representations and Clusters cs.LG · 2026-06-05 · unverdicted · none · ref 19 · internal anchor
Explainability analysis shows unsupervised HD staging embeddings align with motor and functional clinical scores, with SHAP revealing stage-specific feature drivers consistent with known progression.

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer