Pretrained vision transformers exhibit strong intra-object leakage where each part representation encodes information from the entire object, undermining the faithfulness of attention-based part-centric interpretability methods.
IEEE transactions on pattern analysis and machine intelligence32(9), 1627–1645 (2009)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Metonymy in vision models undermines attention-based interpretability
Pretrained vision transformers exhibit strong intra-object leakage where each part representation encodes information from the entire object, undermining the faithfulness of attention-based part-centric interpretability methods.