pith. machine review for the scientific record. sign in

arxiv: 1902.10186 · v3 · submitted 2019-02-26 · 💻 cs.CL · cs.AI

Recognition: unknown

Attention is not Explanation

Authors on Pith no claims yet
classification 💻 cs.CL cs.AI
keywords attentionweightsexperimentsexplanationsimportancemeaningfulmodelsoften
0
0 comments X
read the original abstract

Attention mechanisms have seen wide adoption in neural NLP models. In addition to improving predictive performance, these are often touted as affording transparency: models equipped with attention provide a distribution over attended-to input units, and this is often presented (at least implicitly) as communicating the relative importance of inputs. However, it is unclear what relationship exists between attention weights and model outputs. In this work, we perform extensive experiments across a variety of NLP tasks that aim to assess the degree to which attention weights provide meaningful `explanations' for predictions. We find that they largely do not. For example, learned attention weights are frequently uncorrelated with gradient-based measures of feature importance, and one can identify very different attention distributions that nonetheless yield equivalent predictions. Our findings show that standard attention modules do not provide meaningful explanations and should not be treated as though they do. Code for all experiments is available at https://github.com/successar/AttentionExplanation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Architecture-Aware Explanation Auditing for Industrial Visual Inspection

    cs.LG 2026-05 conditional novelty 7.0

    Explanation faithfulness for deep classifiers on wafer maps is highest when the explainer matches the model's native readout structure, with ViT-Tiny plus Attention Rollout achieving lower Deletion AUC than mismatched...

  2. Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters

    cs.LG 2026-05 accept novelty 7.0

    Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.

  3. Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

    cs.LG 2026-05 unverdicted novelty 6.0

    Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and ...

  4. Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

    cs.LG 2026-05 unverdicted novelty 6.0

    A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.

  5. Large Vision-Language Models Get Lost in Attention

    cs.AI 2026-05 unverdicted novelty 6.0

    In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.

  6. Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety

    cs.CL 2026-04 unverdicted novelty 6.0

    Stylistic rewrites of harmful prompts raise attack success rates from 3.84% to 36.8-65% across 31 frontier models, indicating weak generalization in safety refusals.

  7. Super Agents and Confounders: Influence of surrounding agents on vehicle trajectory prediction

    cs.LG 2026-04 unverdicted novelty 6.0

    Surrounding agents frequently degrade trajectory prediction accuracy in interactive driving scenes, and integrating a Conditional Information Bottleneck improves results by ignoring non-beneficial contextual signals.

  8. SAIL: Structure-Aware Interpretable Learning for Anatomy-Aligned Post-hoc Explanations in OCT

    cs.CV 2026-05 unverdicted novelty 5.0

    SAIL integrates anatomical priors at the representation level with semantic features via fusion to produce more anatomically aligned attribution maps in OCT without altering existing explainability techniques.

  9. Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs

    cs.CL 2026-04 unverdicted novelty 5.0

    HETA is a new attribution framework for decoder-only LLMs that combines semantic transition vectors, Hessian-based sensitivity scores, and KL divergence to produce more faithful and human-aligned token attributions th...

  10. Uncertainty-Aware Transformers: Conformal Prediction for Language Models

    cs.LG 2026-04 unverdicted novelty 5.0

    CONFIDE applies conformal prediction to transformer embeddings for valid prediction sets, improving accuracy up to 4.09% and efficiency over baselines on models like BERT-tiny.