Metonymy in vision models undermines attention-based interpretability
Pith reviewed 2026-05-08 13:57 UTC · model grok-4.3
The pith
Vision transformers encode information from the whole object into each part representation, violating the locality needed for attention-based explanations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Modern pretrained vision transformers violate the locality assumption and exhibit a strong intra-object leakage, in which each part encodes information from the whole object, a visual metonymy that compromises the faithfulness of attention-based interpretable-by-design methods for part-based reasoning. A two-stage approach that prevents leakage by design improves attribute-driven part discovery on a variety of tasks.
What carries the argument
Intra-object leakage, the unintended encoding of whole-object information into individual part representations, quantified through part-based attribute annotations.
If this is right
- Attention-based interpretable-by-design methods lose faithfulness for part-based reasoning because of the leakage.
- Two-stage approaches that enforce disentangled feature extraction raise accuracy on attribute-driven part discovery.
- Interpretability in concept bottleneck models and other part-centric systems is undermined when they rely on these representations.
- Preventing leakage offers a direct route to more reliable part-based interpretability.
Where Pith is reading between the lines
- End-to-end training of transformers may systematically favor holistic encodings over strictly local ones.
- Interpretability tools for existing models could add explicit leakage correction steps before applying attention.
- The same metonymy pattern might appear in transformer models for text or multimodal data, affecting part-like explanations there.
Load-bearing premise
Part-based attribute annotations provide a clean probe of the specific information encoded in each latent part representation.
What would settle it
An experiment in which a part representation loses its ability to predict its target part attributes once non-local object regions are masked or removed from the input image would falsify the leakage claim.
Figures
read the original abstract
Part-based reasoning is a classical strategy to make a computer vision model directly focus on the object parts that are relevant to the downstream task. In the context of deep learning, this also serves to improve by-design interpretability, often by using part-centric attention mechanisms on top of a latent image representation provided by a standard, black-box model. This approach is based on a locality assumption: that the latent representation of an object part encodes primarily information about the corresponding image region. In this work, we test this basic assumption, measuring intra-object leakage in vision models using part-based attribute annotations. Through a comprehensive experimental evaluation, we show that modern pretrained vision transformers violate the locality assumption and exhibit a strong intra-object leakage, in which each part encodes information from the whole object, a visual metonymy that compromises the faithfulness of attention-based interpretable-by-design methods for part-based reasoning, ultimately rendering them uninterpretable. In addition, we establish an upper bound using a two-stage approach that prevents leakage by design. We then show that this inherently disentangled feature extraction improves attribute-driven part discovery on a variety of tasks, confirming the practical impact of intra-object leakage. Our results uncover a neglected issue affecting the interpretability of part-based representations, such as those in CBMs relying on part-centric concepts, highlighting that two-stage approaches offer a promising way to mitigate it.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that modern pretrained vision transformers violate the locality assumption underlying part-based reasoning and attention-based interpretability methods. Using part-based attribute annotations as probes, the authors empirically demonstrate strong intra-object leakage (termed visual metonymy), in which each part's latent representation encodes information from the entire object rather than being localized to the corresponding image region. This leakage is argued to compromise the faithfulness of attention mechanisms in interpretable-by-design models such as concept bottleneck models. The work also introduces a two-stage feature extraction approach presented as an upper bound that prevents leakage by design and reports improved performance on attribute-driven part discovery tasks across multiple benchmarks.
Significance. If the leakage measurement can be shown to isolate model encoding from dataset correlations, the findings would meaningfully impact the design of part-centric interpretable models in computer vision. The paper receives credit for its comprehensive experimental evaluation supporting the leakage claim and for the practical two-stage upper bound that demonstrates tangible gains in part discovery. These elements provide a concrete mitigation path and highlight an under-examined issue in current attention-based interpretability techniques.
major comments (2)
- [Experimental Evaluation] Experimental Evaluation: The central claim that pretrained ViTs exhibit model-induced intra-object leakage rests on part-based attribute annotations serving as a clean probe. No controls for inter-attribute correlations (e.g., via synthetic decorrelated attributes, conditional mutual information, or correlation-structure ablations) are described, raising the possibility that cross-part prediction accuracy reflects data semantics rather than non-local encoding in the representations. This is load-bearing for the violation-of-locality conclusion.
- [Two-stage approach] Two-stage upper bound: While the two-stage method is positioned as an independent disentangled baseline, the manuscript does not provide a direct quantitative comparison of leakage metrics between the single-stage pretrained ViT and the two-stage variant on the same attribute probes, making it difficult to attribute performance gains specifically to leakage reduction versus other architectural differences.
minor comments (2)
- [Abstract] Abstract: The metaphorical use of 'metonymy' is evocative but would benefit from a one-sentence clarification or linguistic reference to aid readers outside NLP.
- [Methods] Notation: The leakage metric (cross-part attribute prediction accuracy) is introduced without an explicit equation or pseudocode in the main text, which would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive review. The comments identify important gaps in our experimental design that we will address in the revision. Below we respond point-by-point to the major comments.
read point-by-point responses
-
Referee: The central claim that pretrained ViTs exhibit model-induced intra-object leakage rests on part-based attribute annotations serving as a clean probe. No controls for inter-attribute correlations (e.g., via synthetic decorrelated attributes, conditional mutual information, or correlation-structure ablations) are described, raising the possibility that cross-part prediction accuracy reflects data semantics rather than non-local encoding in the representations. This is load-bearing for the violation-of-locality conclusion.
Authors: We agree that the absence of explicit controls for inter-attribute correlations leaves open the possibility that some of the observed cross-part prediction is driven by dataset semantics rather than model encoding. To isolate the model-induced component, we will add (i) correlation-structure ablations that shuffle or decorrelate attribute labels while keeping the same image regions, (ii) conditional mutual information measurements between part representations and non-local attributes, and (iii) a synthetic dataset experiment in which attributes are generated independently of spatial location. These additions will be reported in a new subsection of the experimental evaluation and will allow readers to quantify how much leakage exceeds what dataset correlations alone would predict. revision: yes
-
Referee: While the two-stage method is positioned as an independent disentangled baseline, the manuscript does not provide a direct quantitative comparison of leakage metrics between the single-stage pretrained ViT and the two-stage variant on the same attribute probes, making it difficult to attribute performance gains specifically to leakage reduction versus other architectural differences.
Authors: We concur that a head-to-head leakage comparison on identical probes is required to attribute gains to leakage reduction. In the revised manuscript we will compute and tabulate the same leakage metrics (cross-part attribute prediction accuracy and mutual information) for both the single-stage pretrained ViT and the two-stage feature extractor on every benchmark and attribute set used in the paper. This will be presented alongside the existing performance tables so that readers can directly observe the reduction in leakage and its correlation with the reported improvements in part discovery. revision: yes
Circularity Check
No significant circularity in empirical measurements
full rationale
The paper conducts an empirical study measuring intra-object leakage in pretrained vision transformers using part-based attribute annotations, without presenting any mathematical derivation, first-principles prediction, or fitted parameter that reduces to its own inputs by construction. The two-stage baseline is introduced as an independent upper bound achieved by design to prevent leakage, rather than a circular fit or self-referential prediction. No self-definitional steps, load-bearing self-citations, or ansatz smuggling are present; the work remains self-contained as a set of experimental comparisons against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Aniraj, A., Dantas, C.F., Ienco, D., Marcos, D.: Masking strategies for background bias removal in computer vision models. In: ICCVW (2023)
work page 2023
-
[2]
Aniraj, A., Dantas, C.F., Ienco, D., Marcos, D.: PDiscoFormer: Relaxing part discovery constraints with vision transformers. In: ECCV (2024)
work page 2024
-
[3]
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
work page internal anchor Pith review arXiv 2016
-
[4]
Bader, J., Girrbach, L., Alaniz, S., Akata, Z.: Sub: Benchmarking cbm generaliza- tion via synthetic attribute substitutions. In: ICCV (2025)
work page 2025
-
[5]
Beery, S., Van Horn, G., Perona, P.: Recognition in terra incognita. In: ECCV (2018)
work page 2018
-
[6]
Caron, M., Touvron, H., Misra, I., Jegou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging Properties in Self-Supervised Vision Transformers. In: ICCV (2021)
work page 2021
-
[7]
Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., Su, J.K.: This looks like that: deep learning for interpretable image recognition. In: NeurIPS (2019)
work page 2019
-
[8]
Nature Machine Intelligence2(12), 772–782 (2020)
Chen, Z., Bei, Y., Rudin, C.: Concept whitening for interpretable image recogni- tion. Nature Machine Intelligence2(12), 772–782 (2020)
work page 2020
-
[9]
Cultrera, L., Seidenari, L., Del Bimbo, A.: Leveraging visual attention for out-of- distribution detection. In: ICCV (2023)
work page 2023
-
[10]
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
work page 2005
-
[11]
Scientific Reports 13(1), 23088 (2023)
Davoodi, O., Mohammadizadehsamakosh, S., Komeili, M.: On the interpretability of part-prototype based classifiers: a human centric analysis. Scientific Reports 13(1), 23088 (2023)
work page 2023
-
[12]
Dittadi, A., Papa, S., De Vita, M., Schölkopf, B., Winther, O., Locatello, F.: Gen- eralization and robustness implications in object-centric learning. In: ICML (2022)
work page 2022
-
[13]
IEEE transactions on pattern analysis and machine intelligence32(9), 1627–1645 (2009)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence32(9), 1627–1645 (2009)
work page 2009
-
[14]
International journal of computer vision61(1), 55–79 (2005)
Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. International journal of computer vision61(1), 55–79 (2005)
work page 2005
-
[15]
Nature Machine In- telligence2(11), 665–673 (2020)
Geirhos, R., Jacobsen, J.H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., Wichmann, F.A.: Shortcut learning in deep neural networks. Nature Machine In- telligence2(11), 665–673 (2020)
work page 2020
-
[16]
Havasi, M., Parbhoo, S., Doshi-Velez, F.: Addressing leakage in concept bottleneck models. NeurIPS (2022)
work page 2022
-
[17]
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR (2022)
work page 2022
-
[18]
does it? shortcomings of latent space prototype interpretability in deep networks
Hoffmann, A., Fanconi, C., Rade, R., Kohler, J.: This looks like that... does it? shortcomings of latent space prototype interpretability in deep networks. arXiv preprint arXiv:2105.02968 (2021)
-
[19]
Huang, Q., Song, J., Hu, J., Zhang, H., Wang, Y., Song, M.: On the concept trustworthiness in concept bottleneck models. In: AAAI (2024)
work page 2024
-
[20]
Huang,Q.,Xue,M.,Huang,W.,Zhang,H.,Song,J.,Jing,Y.,Song,M.:Evaluation and improvement of interpretability for self-explainable part-prototype networks. In: ICCV (2023)
work page 2023
-
[21]
Huang, Z., Li, Y.: Interpretable and accurate fine-grained recognition via region grouping. In: CVPR (2020) 16 A. Aniraj et al
work page 2020
-
[22]
Hung, W.C., Jampani, V., Liu, S., Molchanov, P., Yang, M.H., Kautz, J.: Scops: Self-supervised co-part segmentation. In: CVPR (2019)
work page 2019
-
[23]
Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., et al.: Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In: AAAI (2019)
work page 2019
-
[24]
Izmailov, P., Kirichenko, P., Gruver, N., Wilson, A.G.: On feature learning in the presence of spurious correlations. NeurIPS (2022)
work page 2022
-
[25]
Kapl, F., Mamaghan, A.M.K., Horn, M., Marr, C., Bauer, S., Dittadi, A.: Object- centric representations generalize better compositionally with less compute. In: ICLRW (2025)
work page 2025
-
[26]
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: ICCV (2023)
work page 2023
-
[27]
van der Klis, R., Alaniz, S., Mancini, M., Dantas, C.F., Ienco, D., Akata, Z., Mar- cos, D.: PDiscoNet: Semantically consistent part discovery for fine-grained recog- nition. In: ICCV (2023)
work page 2023
-
[28]
Koh, P.W., Nguyen, T., Tang, Y.S., Mussmann, S., Pierson, E., Kim, B., Liang, P.: Concept bottleneck models. In: ICML (2020)
work page 2020
-
[29]
One weird trick for parallelizing convolutional neural networks
Krizhevsky, A.: One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014)
work page Pith review arXiv 2014
-
[30]
Li, Y., Zhang, K., Cao, J., Timofte, R., Magno, M., Benini, L., Van Goo, L.: Localvit: Analyzing locality in vision transformers. In: IROS (2023)
work page 2023
-
[31]
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)
work page 2015
-
[32]
Locatello, F., Weissenborn, D., Unterthiner, T., Mahendran, A., Heigold, G., Uszkoreit, J., Dosovitskiy, A., Kipf, T.: Object-centric learning with slot atten- tion. NeurIPS33, 11525–11538 (2020)
work page 2020
-
[33]
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
work page 2019
-
[34]
Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts. In: ICLR (2022)
work page 2022
-
[35]
Mahinpei, A., Clark, J., Lage, I., Doshi-Velez, F., Pan, W.: Promises and pitfalls of black-box concept learning models. In: ICML (2021)
work page 2021
-
[36]
Transactions on Machine Learning Research (2024)
Oquab, M., Darcet, T., Moutakanni, T., Vo, H.V., Szafraniec, M., Khalidov, V., Fernandez, P., HAZIZA, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learning robust visual feat...
work page 2024
-
[37]
Parisini, E., Chakraborti, T., Harbron, C., MacArthur, B.D., Banerji, C.R.: Leak- age and interpretability in concept-based models. arXiv preprint arXiv:2504.14094 (2025)
-
[38]
In: International conference on machine learning
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International conference on machine learning. pp. 1310–1318. Pmlr (2013)
work page 2013
-
[39]
IEEE Transactions on Image Processing27(3), 1487–1500 (2017)
Peng, Y., He, X., Zhao, J.: Object-part attention model for fine-grained image classification. IEEE Transactions on Image Processing27(3), 1487–1500 (2017)
work page 2017
-
[40]
Nature Machine Intelligence pp
Pérez-García, F., Sharma, H., Bond-Taylor, S., Bouzid, K., Salvatelli, V., Ilse, M., Bannur, S., Castro, D.C., Schwaighofer, A., Lungren, M.P., et al.: Exploring scal- able medical image encoders beyond text supervision. Nature Machine Intelligence pp. 1–12 (2025)
work page 2025
-
[41]
In: ICLR (2024) Metonymy in vision models 17
Qiu, C., Zhang, T., Wu, Y., Ke, W., Salzmann, M., Süsstrunk, S.: Mind your aug- mentation: The key to decoupling dense self-supervised learning. In: ICLR (2024) Metonymy in vision models 17
work page 2024
-
[42]
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)
work page 2021
-
[43]
Raman, N., Zarlenga, M.E., Jamnik, M.: Understanding inter-concept relationships in concept-based models. In: ICML (2024)
work page 2024
-
[44]
Raman, N.J., Zarlenga, M.E., Heo, J., Jamnik, M.: Do concept bottleneck models respect localities? Transactions on Machine Learning Research (2025)
work page 2025
-
[45]
Ranasinghe,K.,McKinzie,B.,Ravi,S.,Yang,Y.,Toshev,A.,Shlens,J.:Perceptual grouping in contrastive vision-language models. In: ICCV (2023)
work page 2023
-
[46]
Rao, S., Mahajan, S., Böhle, M., Schiele, B.: Discover-then-name: Task-agnostic concept bottlenecks via automated concept discovery. In: ECCV (2024)
work page 2024
-
[47]
Ruiz Luyten, M., van der Schaar, M.: A theoretical design of concept sets: improv- ing the predictability of concept bottleneck models. NeurIPS (2024)
work page 2024
-
[48]
Saha, O., Maji, S.: Particle: Part discovery and contrastive learning for fine-grained recognition. In: ICCV (2023)
work page 2023
-
[49]
Nature Machine Intelligence4(10), 867– 878 (2022)
Saporta, A., Gui, X., Agrawal, A., Pareek, A., Truong, S.Q., Nguyen, C.D., Ngo, V.D., Seekins, J., Blankenberg, F.G., Ng, A.Y., et al.: Benchmarking saliency methods for chest x-ray interpretation. Nature Machine Intelligence4(10), 867– 878 (2022)
work page 2022
-
[50]
Seitzer, M., Horn, M., Zadaianchuk, A., Zietlow, D., Xiao, T., Simon-Gabriel, C.J., He, T., Zhang, Z., Schölkopf, B., Brox, T., et al.: Bridging the gap to real-world object-centric learning. In: ICLR (2023)
work page 2023
-
[51]
Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)
work page internal anchor Pith review arXiv 2025
-
[52]
Analyzing local representations of self-supervised vision transformers, 2024
Vanyan, A., Barseghyan, A., Tamazyan, H., Huroyan, V., Khachatrian, H., Danell- jan, M.: Analyzing local representations of self-supervised vision transformers. arXiv preprint arXiv:2401.00463 (2023)
-
[53]
NeurIPS37, 122484–122523 (2024)
Wang, Q., Lin, Y., Chen, Y., Schmidt, L., Han, B., Zhang, T.: A sober look at the robustness of clips to spurious features. NeurIPS37, 122484–122523 (2024)
work page 2024
-
[54]
Wang, T., Zhou, C., Sun, Q., Zhang, H.: Causal attention for unbiased visual recognition. In: ICCV (2021)
work page 2021
-
[55]
Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Tech. Rep. CNS-TR-2010-001, California Institute of Technology (2010)
work page 2010
-
[56]
IEEE Transactions on Pattern Analysis and Machine Intelligence46(12), 10597–10613 (2024)
Xia, J., Huang, W., Xu, M., Zhang, J., Zhang, H., Sheng, Z., Xu, D.: Unsupervised part discovery via dual representation alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence46(12), 10597–10613 (2024)
work page 2024
-
[57]
Xia, J., Wu, Y., Huang, W., Zhang, J., Zhang, J.: Unsupervised part discovery via descriptor-based masked image restoration with optimized constraints. In: ICCV (2025)
work page 2025
-
[58]
Xiao, K., Engstrom, L., Ilyas, A., Madry, A.: Noise or signal: The role of image backgrounds in object recognition. In: ICLR (2021)
work page 2021
-
[59]
Yun, S., Lee, H., Kim, J., Shin, J.: Patch-level representation learning for self- supervised vision transformers. In: CVPR (2022)
work page 2022
-
[60]
Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: ICCV (2017)
work page 2017
-
[61]
Zhu, Z., Xie, L., Yuille, A.: Object recognition with and without objects. In: IJCAI (2017) 18 A. Aniraj et al. A Implementation details All models are implemented in PyTorch, using a ViT-B backbone initialized with DINOv3 weights [51] (or RAD-DINO [40] for CheXpert). Training was conducted using 8 NVIDIA H100 GPUs. To ensure statistical robustness, all r...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.