hub

SmoothGrad: removing noise by adding noise

Smilkov, D · 2017 · cs.LG · arXiv 1706.03825

48 Pith papers cite this work. Polarity classification is still indexing.

48 Pith papers citing it

open full Pith review browse 48 citing papers arXiv PDF

abstract

Explaining the output of a deep network remains a challenge. In the case of an image classifier, one type of explanation is to identify pixels that strongly influence the final decision. A starting point for this strategy is the gradient of the class score function with respect to the input image. This gradient can be interpreted as a sensitivity map, and there are several techniques that elaborate on this basic idea. This paper makes two contributions: it introduces SmoothGrad, a simple method that can help visually sharpen gradient-based sensitivity maps, and it discusses lessons in the visualization of these maps. We publish the code for our experiments and a website with our results.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 method 1

citation-polarity summary

background 3 use method 1

representative citing papers

FedXDS: Leveraging Model Attribution Methods to counteract Data Heterogeneity in Federated Learning

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

FedXDS uses propagation-based attribution to identify task-relevant features for selective data sharing in federated learning, yielding higher accuracy and faster convergence under heterogeneity with formal privacy guarantees.

Global Explanations for Multivariate Time Series Forecasting Models via $K$-Order Markov Approximations

cs.LG · 2026-06-25 · unverdicted · novelty 7.0

KARMA constructs minimal-K Markov transition kernels as surrogates to deliver global explanations for multivariate time series forecasting models and recovers known causal structure on synthetic data.

Diffusion Integrated Gradients: Controllable Path Generation for Flexible Feature Attribution

cs.LG · 2026-06-21 · unverdicted · novelty 7.0

DiffIG reformulates attribution path generation as conditional diffusion modeling trained on Stick-Breaking Process paths with guided sampling for user-controllable XAI.

In Defense of Information Leakage in Concept-based Models

cs.LG · 2026-06-09 · conditional · novelty 7.0

Concept-based models can use controlled 'benign' information leakage to remain accurate and intervenable under real-world concept incompleteness by reframing their training objective.

Attribution via Distributional Paths for Information Revelation

cs.LG · 2026-06-02 · unverdicted · novelty 7.0

Reveal-IG performs path attribution by integrating model output changes along trajectories in a space of probe distributions rather than input-space paths, retaining completeness and handling multiscale or uncertain features.

Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

Spectral Integrated Gradients constructs SVD-based integration paths that activate singular components from largest to smallest, producing cleaner attribution maps and better quantitative scores than standard Integrated Gradients on image classification tasks.

AIM: Adversarial Information Masking for Faithfulness Evaluation of Saliency Maps

cs.LG · 2026-05-16 · unverdicted · novelty 7.0

AIM is a new saliency-guided adversarial feature replacement method to evaluate faithfulness of saliency maps and reliability of masking operators on image, audio, and EEG tasks.

AGOP as Explanation: From Feature Learning to Per-Sample Attribution in Image Classifiers

cs.LG · 2026-05-12 · conditional · novelty 7.0

AGOP-based attribution methods outperform Integrated Gradients and other baselines on pixel-level ground truth benchmarks for explaining image classifier decisions, with AGOP-Global offering zero inference cost.

From Baselines to Transport Geodesics: Axiomatic Attribution via Optimal Generative Flows

cs.LG · 2026-03-05 · unverdicted · novelty 7.0

Transport-geodesic attribution via optimal generative flows selects principled paths for feature attributions by minimizing kinetic action.

MobileMold: A Smartphone-Based Microscopy Dataset for Food Mold Detection

cs.CV · 2026-03-02 · unverdicted · novelty 7.0

MobileMold provides 4941 smartphone microscopy images and shows deep learning models reach 99.5% accuracy on mold detection and food classification tasks.

Learning-Augmented Robust Algorithmic Recourse

cs.LG · 2024-10-02 · unverdicted · novelty 7.0

Introduces learning-augmented robust algorithmic recourse that trades off consistency with accurate future-model predictions against robustness to inaccurate predictions via a novel algorithm.

Attributions All the Way Down? The Metagame of Interpretability

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

Defines meta-attributions as directional second-order Shapley values on attribution methods, proves hierarchical decomposition of attributions, and demonstrates applications in language models, vision-language encoders, and diffusion transformers.

From Local to Global to Mechanistic: An iERF-Centered Unified Framework for Interpreting Vision Models

cs.CV · 2026-05-01 · unverdicted · novelty 7.0

An iERF-centric framework unifies local, global, and mechanistic interpretability in vision models via SRD for saliency, CAFE for concept anchoring, and ICAT for interlayer attribution.

Low Rank Adaptation for Adversarial Perturbation

cs.LG · 2026-04-30 · unverdicted · novelty 7.0

Adversarial perturbations possess an inherently low-rank structure that enables more efficient and effective black-box adversarial attacks via subspace projection.

Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning

astro-ph.GA · 2026-04-28 · unverdicted · novelty 7.0

A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.

The Linear Centroids Hypothesis: Features as Directions Learned by Local Experts

cs.LG · 2026-04-13 · unverdicted · novelty 7.0 · 2 refs

The Linear Centroids Hypothesis reframes network features as directions in centroid spaces of local affine experts, unifying interpretability methods and yielding sparser, more faithful dictionaries, circuits, and saliency maps.

CascadeFormer: Depth-Tapered Transformers Motivated by Gradient Fan-in Asymmetry

cs.LG · 2026-06-25 · unverdicted · novelty 6.0

CascadeFormer tapers Transformer width with depth based on gradient fan-in asymmetry to match uniform baselines in perplexity while cutting latency 8.6%.

Listening makes Vision Clear for VLMs

cs.CV · 2026-06-22 · unverdicted · novelty 6.0

PV-TAM uses prompt-side semantics and a bias filter to improve attention-based and IoU localization metrics for vision-language models over answer-side baselines.

One Lens, Many Worlds : A Capability-Typed Interface for World-Model Interpretability

cs.LG · 2026-06-07 · unverdicted · novelty 6.0

WorldModelLens defines a typed adapter with four core methods and a capability descriptor to unify interpretability tooling across diverse world model architectures.

Physics-Guided Dual Decoding and Spectral Supervision for Global 3D Hydrometeor Prediction

cs.LG · 2026-06-07 · unverdicted · novelty 6.0

PredHydro-Net is a new dual-decoding architecture with wavelet spectral matching and adversarial training that outperforms Earthformer, PredRNNv2, and GFS on extreme-event detection and spectral fidelity in 72-hour global hydrometeor forecasts.

TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment

cs.CV · 2026-06-05 · unverdicted · novelty 6.0

TEVI applies sparse autoencoders and caption-conditioned masking to edit image embeddings, yielding better retrieval on MS COCO, Flickr, IIW, DOCCI, and RoCOCO benchmarks with larger gains on richer captions.

CPC-VAR:Continual Personalized and Compositional Generation in Visual Autoregressive Models

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

CPC-VAR adds Gradient-based Concept Neuron Selection for continual single-concept learning and a context-aware multi-branch composition strategy to reduce forgetting and entanglement in VAR-based personalized image generation.

Instructions Shape Production of Language, not Processing

cs.CL · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.

Causal Attribution via Activation Patching

cs.CV · 2026-03-13 · unverdicted · novelty 6.0

CAAP produces patch attributions in ViTs by direct activation patching on intermediate layers to measure causal contribution to the target class score.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks cs.AI · 2026-04-20 · conditional · none · ref 57
Token-level contrastive attribution yields informative signals for some LLM benchmark failures but is not universally applicable across datasets and models.
Interpretable and Explainable Surrogate Modeling for Simulations: A State-of-the-Art Survey and Perspectives on Explainable AI for Decision-Making cs.AI · 2026-04-15 · unverdicted · none · ref 221
This survey synthesizes XAI methods with surrogate modeling workflows for simulations and outlines a research agenda to embed explainability into simulation-driven design and decision-making.
PECKER: A Precisely Efficient Critical Knowledge Erasure Recipe For Machine Unlearning in Diffusion Models cs.AI · 2026-04-07 · unreviewed · ref 25

SmoothGrad: removing noise by adding noise

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer