Recognition: unknown
Not Just a Black Box: Learning Important Features Through Propagating Activation Differences
read the original abstract
Note: This paper describes an older version of DeepLIFT. See https://arxiv.org/abs/1704.02685 for the newer version. Original abstract follows: The purported "black box" nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Learning Important FeaTures), an efficient and effective method for computing importance scores in a neural network. DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. We apply DeepLIFT to models trained on natural images and genomic data, and show significant advantages over gradient-based methods.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
From Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Models
An iERF-centric framework unifies local, global, and mechanistic interpretability in vision models via SRD for saliency, CAFE for concept anchoring, and ICAT for interlayer attribution.
-
Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution
MA-GIG improves Integrated Gradients by performing path integration in the latent space of a pre-trained VAE so that decoded points remain closer to the learned data manifold and reduce off-manifold gradient noise.
-
Graph Neural Network based Hierarchy-Aware Embeddings of Knowledge Graphs: Applications to Yeast Phenotype Prediction
GNNs with ontology-derived semantic loss create hierarchy-aware KG embeddings that predict yeast double gene knockout phenotypes with mean R²=0.360 (improved to 0.377 with semantic loss), outperforming baselines, gene...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.