A causal audit with image interventions shows text-only models reach within 5.7 accuracy points of top multimodal VLMs on chest radiography, with some large multimodal models statistically indistinguishable from small text-only baselines.
Attention is not not explanation
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 10representative citing papers
MASPrism attributes failures in multi-agent systems by ranking candidates from prefill-stage NLL and attention signals of a 0.6B SLM, beating baselines by up to 33.41% Top-1 accuracy and proprietary LLMs by up to 89.5% relative improvement while processing traces in 2.66 seconds.
Multimodal contrastive learning using multilinear products is fragile to single bad modalities, and a gated version improves top-1 retrieval accuracy on synthetic and real trimodal data.
PV-TAM uses prompt-side semantics and a bias filter to improve attention-based and IoU localization metrics for vision-language models over answer-side baselines.
CERA fine-tunes a dense retriever with triplet contrastive learning plus attention alignment to human rationales, claiming better retrieval effectiveness and faithfulness on clinical trial reports than Contriever and standard hard-negative baselines.
ORBIT uses an intervention-consistent self-supervised objective in a transformer to infer asymmetric gene program influences from observational scRNA-seq data, recovering Alzheimer's vulnerability patterns and achieving 0.984 macro F1 cell-type classification from 220 pathway scores.
Both humans and LLMs trust content more when labeled human-authored than AI-generated, with LLMs showing denser attention to labels and higher uncertainty under AI labels, mirroring human heuristic patterns.
Profy uses take-level expert-amateur labels on 1083 piano recordings to produce time-aligned highlight scores that correlate with expert review points (r=0.61) on held-out amateur clips.
Rigorous interpretability can function as a principled form of model evaluation if its claims are falsifiable, reproducible, and predictive.
Matched learning-rate experiments show LoRA retains substantially higher zero-shot transfer (45% vs 11% on EuroSAT, 58% vs 9% on Pets) than Full FT in CLIP adaptation.
citing papers explorer
-
Vision-language models for chest radiography do not always need the image
A causal audit with image interventions shows text-only models reach within 5.7 accuracy points of top multimodal VLMs on chest radiography, with some large multimodal models statistically indistinguishable from small text-only baselines.
-
MASPrism: Lightweight Failure Attribution for Multi-Agent Systems Using Prefill-Stage Signals
MASPrism attributes failures in multi-agent systems by ranking candidates from prefill-stage NLL and attention signals of a 0.6B SLM, beating baselines by up to 33.41% Top-1 accuracy and proprietary LLMs by up to 89.5% relative improvement while processing traces in 2.66 seconds.
-
Hidden in the Multiplicative Interaction: Uncovering Fragility in Multimodal Contrastive Learning
Multimodal contrastive learning using multilinear products is fragile to single bad modalities, and a gated version improves top-1 retrieval accuracy on synthetic and real trimodal data.
-
Listening makes Vision Clear for VLMs
PV-TAM uses prompt-side semantics and a bias filter to improve attention-based and IoU localization metrics for vision-language models over answer-side baselines.
-
Beyond Topical Similarity: Contrastive Evidence Retrieval with Interpretable Attention Alignment in RAG
CERA fine-tunes a dense retriever with triplet contrastive learning plus attention alignment to human rationales, claiming better retrieval effectiveness and faithfulness on clinical trial reports than Contriever and standard hard-negative baselines.
-
ORBIT: Learning Gene Program Co-Activation Structure for Cell-Type-Stratified Pathway Rewiring Analysis in Single-Cell Transcriptomics
ORBIT uses an intervention-consistent self-supervised objective in a transformer to infer asymmetric gene program influences from observational scRNA-seq data, recovering Alzheimer's vulnerability patterns and achieving 0.984 macro F1 cell-type classification from 220 pathway scores.
-
Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge
Both humans and LLMs trust content more when labeled human-authored than AI-generated, with LLMs showing denser attention to labels and higher uncertainty under AI labels, mirroring human heuristic patterns.
-
Profy: Interpretable Visualization of Expertise-Dependent Motor Skills Toward Supporting Piano Practice
Profy uses take-level expert-amateur labels on 1083 piano recordings to produce time-aligned highlight scores that correlate with expert review points (r=0.61) on held-out amateur clips.
-
Rigorous Interpretation Is a Form of Evaluation
Rigorous interpretability can function as a principled form of model evaluation if its claims are falsifiable, reproducible, and predictive.
-
Matched-Learning-Rate Analysis of Attention Drift and Transfer Retention in Fine-Tuned CLIP
Matched learning-rate experiments show LoRA retains substantially higher zero-shot transfer (45% vs 11% on EuroSAT, 58% vs 9% on Pets) than Full FT in CLIP adaptation.