Is Attention Interpretable?

· 2025 · DOI 10.18653/v1/p19-1282

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open at publisher browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges

cs.CL · 2026-06-18 · unverdicted · novelty 8.0

Presents a new expert-curated dataset of multi-turn counterspeech dialogues in five languages targeting hate against seven groups, with span annotations linking to verified external knowledge for RAG applications.

Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads

cs.CL · 2026-07-01 · unverdicted · novelty 7.0

LOCOS scores attention heads via OV-circuit output projection onto answer-token unembedding directions and identifies non-literal retrieval heads whose ablation collapses performance on non-literal benchmarks more than prior literal-copy detectors.

Don't Go Breaking My LLM: The Impact of Pruning Attention Layers on Explanation Faithfulness and Confidence Calibration

cs.LG · 2026-06-23 · unverdicted · novelty 6.0

Pruning attention layers in five LLMs across eight datasets maintains accuracy but degrades faithfulness and calibration.

Beyond Topical Similarity: Contrastive Evidence Retrieval with Interpretable Attention Alignment in RAG

cs.CL · 2026-05-31 · unverdicted · novelty 6.0

CERA fine-tunes a dense retriever with triplet contrastive learning plus attention alignment to human rationales, claiming better retrieval effectiveness and faithfulness on clinical trial reports than Contriever and standard hard-negative baselines.

Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation

cs.CL · 2026-05-21 · conditional · novelty 6.0

LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.

Mitigating Hallucinations in Large Vision-Language Models via Causal Route Gating

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

A causal route gating intervention decomposes attention heads and suppresses text-dominant routes using one-forward/one-gradient estimates to reduce unsupported content generation in LVLMs.

citing papers explorer

Showing 6 of 6 citing papers.

CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges cs.CL · 2026-06-18 · unverdicted · none · ref 37
Presents a new expert-curated dataset of multi-turn counterspeech dialogues in five languages targeting hate against seven groups, with span annotations linking to verified external knowledge for RAG applications.
Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads cs.CL · 2026-07-01 · unverdicted · none · ref 9
LOCOS scores attention heads via OV-circuit output projection onto answer-token unembedding directions and identifies non-literal retrieval heads whose ablation collapses performance on non-literal benchmarks more than prior literal-copy detectors.
Don't Go Breaking My LLM: The Impact of Pruning Attention Layers on Explanation Faithfulness and Confidence Calibration cs.LG · 2026-06-23 · unverdicted · none · ref 47
Pruning attention layers in five LLMs across eight datasets maintains accuracy but degrades faithfulness and calibration.
Beyond Topical Similarity: Contrastive Evidence Retrieval with Interpretable Attention Alignment in RAG cs.CL · 2026-05-31 · unverdicted · none · ref 7
CERA fine-tunes a dense retriever with triplet contrastive learning plus attention alignment to human rationales, claiming better retrieval effectiveness and faithfulness on clinical trial reports than Contriever and standard hard-negative baselines.
Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation cs.CL · 2026-05-21 · conditional · none · ref 25
LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.
Mitigating Hallucinations in Large Vision-Language Models via Causal Route Gating cs.CV · 2026-05-20 · unverdicted · none · ref 8
A causal route gating intervention decomposes attention heads and suppresses text-dominant routes using one-forward/one-gradient estimates to reduce unsupported content generation in LVLMs.

Is Attention Interpretable?

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer