Spectral Integrated Gradients constructs SVD-based integration paths that activate singular components from largest to smallest, producing cleaner attribution maps and better quantitative scores than standard Integrated Gradients on image classification tasks.
hub
SmoothGrad: removing noise by adding noise
36 Pith papers cite this work. Polarity classification is still indexing.
abstract
Explaining the output of a deep network remains a challenge. In the case of an image classifier, one type of explanation is to identify pixels that strongly influence the final decision. A starting point for this strategy is the gradient of the class score function with respect to the input image. This gradient can be interpreted as a sensitivity map, and there are several techniques that elaborate on this basic idea. This paper makes two contributions: it introduces SmoothGrad, a simple method that can help visually sharpen gradient-based sensitivity maps, and it discusses lessons in the visualization of these maps. We publish the code for our experiments and a website with our results.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
AIM is a new saliency-guided adversarial feature replacement method to evaluate faithfulness of saliency maps and reliability of masking operators on image, audio, and EEG tasks.
AGOP-based attribution methods outperform Integrated Gradients and other baselines on pixel-level ground truth benchmarks for explaining image classifier decisions, with AGOP-Global offering zero inference cost.
Transport-geodesic attribution via optimal generative flows selects principled paths for feature attributions by minimizing kinetic action.
MobileMold provides 4941 smartphone microscopy images and shows deep learning models reach 99.5% accuracy on mold detection and food classification tasks.
Introduces learning-augmented robust algorithmic recourse that trades off consistency with accurate future-model predictions against robustness to inaccurate predictions via a novel algorithm.
Defines meta-attributions as directional second-order Shapley values on attribution methods, proves hierarchical decomposition of attributions, and demonstrates applications in language models, vision-language encoders, and diffusion transformers.
An iERF-centric framework unifies local, global, and mechanistic interpretability in vision models via SRD for saliency, CAFE for concept anchoring, and ICAT for interlayer attribution.
Adversarial perturbations possess an inherently low-rank structure that enables more efficient and effective black-box adversarial attacks via subspace projection.
A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.
The Linear Centroids Hypothesis reframes network features as directions in centroid spaces of local affine experts, unifying interpretability methods and yielding sparser, more faithful dictionaries, circuits, and saliency maps.
CascadeFormer tapers Transformer width with depth based on gradient fan-in asymmetry to match uniform baselines in perplexity while cutting latency 8.6%.
CPC-VAR adds Gradient-based Concept Neuron Selection for continual single-concept learning and a context-aware multi-branch composition strategy to reduce forgetting and entanglement in VAR-based personalized image generation.
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
CAAP produces patch attributions in ViTs by direct activation patching on intermediate layers to measure causal contribution to the target class score.
MaskDiME uses adaptive masked diffusion to produce 30x faster, localized, and semantically consistent visual counterfactual explanations without training, matching or exceeding prior performance on five datasets.
CA-LIG is a unified hierarchical attribution method that computes layer-wise Integrated Gradients fused with class-specific attention gradients to generate signed, context-sensitive explanations for transformer models.
LiMA reformulates attribution as submodular subset selection and uses bidirectional greedy search to identify minimal important regions, reporting 36.3% better insertion and 39.6% better deletion scores than prior methods on eight foundation models.
A Shapley-value method with interaction terms that explains object detector decisions by capturing collective pixel contributions for localization and classification.
SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
Saliency-driven interpretation methods reveal that NMT models learn word alignments of better quality than fast-align under force decoding and consistent with automatic tools under free decoding.
Scaling vision models by depth and parameter count does not consistently improve localisation-based explanation quality across architectures, datasets, and post-hoc methods; smaller models often perform comparably or better.
H-Sets detects higher-order feature interactions in image classifiers via Hessian-guided pair merging and attributes them with IDG-Vis to generate more interpretable saliency maps than existing marginal or coarse methods.
A new reliability score computed from the IoU difference between class-specific and class-agnostic heatmaps, boosted by adversarial enhancement, detects false negatives in binary industrial defect detectors with up to 100% recall.
citing papers explorer
-
MONAI: An open-source framework for deep learning in healthcare
MONAI is a community-supported PyTorch framework that extends deep learning to medical data with domain-specific architectures, transforms, and deployment tools.