Attention is not not explanation

Sarah Wiegreffe, Yuval Pinter · 2019 · DOI 10.18653/v1/d19-1002

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open at publisher browse 6 citing papers

representative citing papers

MASPrism: Lightweight Failure Attribution for Multi-Agent Systems Using Prefill-Stage Signals

cs.SE · 2026-05-08 · unverdicted · novelty 7.0

MASPrism attributes failures in LLM multi-agent executions by extracting token-level negative log-likelihood and attention weights from a small model's prefill pass, then ranking candidates with a second prefill, achieving top accuracy on most benchmarks and 6.69x speedup over baselines.

Hidden in the Multiplicative Interaction: Uncovering Fragility in Multimodal Contrastive Learning

cs.LG · 2026-04-07 · unverdicted · novelty 7.0

Multimodal contrastive learning using multilinear products is fragile to single bad modalities, and a gated version improves top-1 retrieval accuracy on synthetic and real trimodal data.

ORBIT: Learning Gene Program Co-Activation Structure for Cell-Type-Stratified Pathway Rewiring Analysis in Single-Cell Transcriptomics

q-bio.GN · 2026-05-04 · unverdicted · novelty 6.0

ORBIT uses an intervention-consistent self-supervised objective in a transformer to infer asymmetric gene program influences from observational scRNA-seq data, recovering Alzheimer's vulnerability patterns and achieving 0.984 macro F1 cell-type classification from 220 pathway scores.

Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

Both humans and LLMs trust content more when labeled human-authored than AI-generated, with LLMs showing denser attention to labels and higher uncertainty under AI labels, mirroring human heuristic patterns.

Rigorous Interpretation Is a Form of Evaluation

cs.CY · 2026-05-06 · unverdicted · novelty 5.0

Rigorous interpretability can function as a principled form of model evaluation if its claims are falsifiable, reproducible, and predictive.

Matched-Learning-Rate Analysis of Attention Drift and Transfer Retention in Fine-Tuned CLIP

cs.LG · 2026-04-01 · unverdicted · novelty 4.0

Matched learning-rate experiments show LoRA retains substantially higher zero-shot transfer (45% vs 11% on EuroSAT, 58% vs 9% on Pets) than Full FT in CLIP adaptation.

citing papers explorer

Showing 6 of 6 citing papers.

MASPrism: Lightweight Failure Attribution for Multi-Agent Systems Using Prefill-Stage Signals cs.SE · 2026-05-08 · unverdicted · none · ref 40
MASPrism attributes failures in LLM multi-agent executions by extracting token-level negative log-likelihood and attention weights from a small model's prefill pass, then ranking candidates with a second prefill, achieving top accuracy on most benchmarks and 6.69x speedup over baselines.
Hidden in the Multiplicative Interaction: Uncovering Fragility in Multimodal Contrastive Learning cs.LG · 2026-04-07 · unverdicted · none · ref 66
Multimodal contrastive learning using multilinear products is fragile to single bad modalities, and a gated version improves top-1 retrieval accuracy on synthetic and real trimodal data.
ORBIT: Learning Gene Program Co-Activation Structure for Cell-Type-Stratified Pathway Rewiring Analysis in Single-Cell Transcriptomics q-bio.GN · 2026-05-04 · unverdicted · none · ref 68
ORBIT uses an intervention-consistent self-supervised objective in a transformer to infer asymmetric gene program influences from observational scRNA-seq data, recovering Alzheimer's vulnerability patterns and achieving 0.984 macro F1 cell-type classification from 220 pathway scores.
Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge cs.AI · 2026-04-07 · unverdicted · none · ref 47
Both humans and LLMs trust content more when labeled human-authored than AI-generated, with LLMs showing denser attention to labels and higher uncertainty under AI labels, mirroring human heuristic patterns.
Rigorous Interpretation Is a Form of Evaluation cs.CY · 2026-05-06 · unverdicted · none · ref 34
Rigorous interpretability can function as a principled form of model evaluation if its claims are falsifiable, reproducible, and predictive.
Matched-Learning-Rate Analysis of Attention Drift and Transfer Retention in Fine-Tuned CLIP cs.LG · 2026-04-01 · unverdicted · none · ref 14
Matched learning-rate experiments show LoRA retains substantially higher zero-shot transfer (45% vs 11% on EuroSAT, 58% vs 9% on Pets) than Full FT in CLIP adaptation.

Attention is not not explanation

fields

years

verdicts

representative citing papers

citing papers explorer