Capability neq Interpretability: Human Interpretability of Vision Foundation Models
Pith reviewed 2026-05-21 07:32 UTC · model grok-4.3
The pith
Foundation models produce less human-interpretable features than supervised vision transformers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Foundation models are consistently less interpretable than their supervised counterparts, and the gap is not a capability tradeoff: interpretability does not correlate with downstream task performance on any benchmark we examine. What does correlate is the locality of a feature's activations and coarse-grained semantic alignment with humans -- models with focal activations and representations that reflect the world's broad categorical structure produce more interpretable features, whereas fine-grained perceptual alignment does not. The two protocols yield strongly correlated rankings and share the same predictors, establishing interpretability as an independent, measurable dimension of model
What carries the argument
Two psychophysics protocols (localizability and nameability) applied to sparse autoencoder features, combined with chance-anchored scoring to rank models on one scale.
Load-bearing premise
The localizability and nameability protocols together provide a valid and generalizable measure of human interpretability for the recovered features.
What would settle it
Demonstrating a positive correlation between the measured interpretability scores and performance on a new downstream benchmark not tested in the study would undermine the claim that interpretability is independent of capability.
Figures
read the original abstract
How interpretable are the features of leading vision models? The question is increasingly pressing as these models move from research benchmarks into high-stakes deployments, yet existing methods cannot answer it reliably. We close this gap with a framework for measuring and comparing the human interpretability of vision models, built around two complementary psychophysics protocols: (1) localizability -- can an observer predict where a feature fires on a novel image? -- and (2) nameability -- can an observer accurately describe what the feature represents? Features are recovered via sparse autoencoders, and a chance-anchored scoring function places every model on a common scale. Applying the framework to six vision transformers -- two supervised ViTs and four foundation models (DINOv2, DINOv3, CLIP, SigLIP) -- we collected more than $15{,}000$ behavioral responses, analyzing the $13{,}400$ responses from the $377$ participants who passed our pre-specified quality checks. Foundation models are consistently *less* interpretable than their supervised counterparts, and the gap is not a capability tradeoff: interpretability does not correlate with downstream task performance on any benchmark we examine. What does correlate is the locality of a feature's activations and coarse-grained semantic alignment with humans -- models with focal activations and representations that reflect the world's broad categorical structure produce more interpretable features, whereas fine-grained perceptual alignment does not. The two protocols yield strongly correlated rankings and share the same predictors, establishing interpretability as an independent, measurable dimension of representation quality -- and, surprisingly, one on which every foundation model we tested falls below the supervised baselines that came before. Capability alone cannot close that gap; locality and coarse-grained alignment can.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a framework to measure human interpretability of features in vision transformers using sparse autoencoders combined with two psychophysics protocols: localizability (predicting where a feature activates on novel images) and nameability (describing the feature). Based on 13,400 valid responses from 377 participants who passed pre-specified quality checks, it reports that foundation models (DINOv2, DINOv3, CLIP, SigLIP) are consistently less interpretable than supervised ViTs, that this gap does not reflect a capability tradeoff (no correlation with downstream benchmarks), and that interpretability instead correlates with activation locality and coarse-grained semantic alignment with humans.
Significance. If the protocols prove robust to image selection and participant criteria, the work would establish interpretability as a measurable, independent dimension of representation quality separate from capability. The large behavioral dataset provides solid empirical grounding for the model-type comparisons and identifies actionable predictors (locality, coarse alignment). This has direct relevance for high-stakes vision deployments where human-understandable features matter.
major comments (3)
- [Methods (psychophysics protocols and participant screening)] The central claim that foundation models are less interpretable than supervised ViTs, and that the gap is not a capability tradeoff, depends on the psychophysics protocols capturing intrinsic feature properties rather than artifacts. The image selection process and the quality filters that retained 377 participants (from the initial pool yielding 15,000 responses) could systematically favor focal activations more common in supervised models; without a sensitivity analysis varying image distributions or screening criteria, the observed gap and lack of benchmark correlation may be setup-dependent.
- [Results (correlation with downstream performance)] The assertion that interpretability does not correlate with downstream task performance is load-bearing for the no-tradeoff conclusion. The manuscript should report the exact benchmarks examined, the correlation coefficients (with confidence intervals), and any multiple-comparison corrections, as the null result could be sensitive to benchmark choice or statistical power.
- [Results (protocol agreement)] The statement that the two protocols yield strongly correlated rankings and share the same predictors underpins the claim that interpretability is a coherent dimension. The specific Pearson or Spearman correlation value, sample size, and p-value for the protocol agreement should be provided explicitly.
minor comments (2)
- [Abstract] The abstract states 'more than 15,000 behavioral responses' but analyzes 13,400 from 377 participants; state the exact initial participant count and response total in the main text for transparency.
- [Throughout] Ensure consistent terminology when referring to the four foundation models versus the two supervised ViTs across figures and tables.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive review of our manuscript. We address each major comment below and commit to revisions that strengthen the presentation of our methods and results without altering the core findings.
read point-by-point responses
-
Referee: [Methods (psychophysics protocols and participant screening)] The central claim that foundation models are less interpretable than supervised ViTs, and that the gap is not a capability tradeoff, depends on the psychophysics protocols capturing intrinsic feature properties rather than artifacts. The image selection process and the quality filters that retained 377 participants (from the initial pool yielding 15,000 responses) could systematically favor focal activations more common in supervised models; without a sensitivity analysis varying image distributions or screening criteria, the observed gap and lack of benchmark correlation may be setup-dependent.
Authors: We agree that robustness to these design choices is important to establish. In the revised manuscript we will add a dedicated sensitivity analysis section that re-runs the key comparisons under alternative image sampling distributions and under relaxed or stricter participant quality thresholds. This will directly test whether the interpretability gap and its predictors remain stable. revision: yes
-
Referee: [Results (correlation with downstream performance)] The assertion that interpretability does not correlate with downstream task performance is load-bearing for the no-tradeoff conclusion. The manuscript should report the exact benchmarks examined, the correlation coefficients (with confidence intervals), and any multiple-comparison corrections, as the null result could be sensitive to benchmark choice or statistical power.
Authors: We will expand the relevant results section to list every benchmark examined, report the Pearson correlation coefficients together with 95% confidence intervals, and state the multiple-comparison correction applied. These additions will make the statistical basis for the null result fully transparent. revision: yes
-
Referee: [Results (protocol agreement)] The statement that the two protocols yield strongly correlated rankings and share the same predictors underpins the claim that interpretability is a coherent dimension. The specific Pearson or Spearman correlation value, sample size, and p-value for the protocol agreement should be provided explicitly.
Authors: We will insert the requested quantitative details into the results section, reporting the Spearman rank correlation between the two protocol scores, the number of features on which it is computed, and the associated p-value. This will give explicit support to the claim that the protocols measure a coherent dimension. revision: yes
Circularity Check
No significant circularity: empirical psychophysics measurements are self-contained
full rationale
The paper derives its claims from direct human behavioral data collected via two psychophysics protocols (localizability and nameability) applied to features recovered by sparse autoencoders from six vision transformers. Over 15,000 responses were gathered and filtered to 13,400 from 377 participants using pre-specified quality checks, with rankings and correlations computed from these participant judgments rather than from any fitted parameters, self-definitions, or load-bearing self-citations. The reported gap in interpretability between foundation models and supervised ViTs, along with the lack of correlation to downstream benchmarks and the role of locality and coarse-grained alignment, emerges from the external human responses and does not reduce to the inputs by construction. This is a standard empirical study whose central results are falsifiable against new participant cohorts or image sets and therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Human observers can reliably perform localization and naming tasks on model features when quality controls are applied.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We close this gap with a framework for measuring and comparing the human interpretability of vision models, built around two complementary psychophysics protocols: (1) localizability... and (2) nameability...
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Locality of the representation... Hoyer metric... correlates strongly with localizability (ρ=0.91) and nameability (ρ=0.99)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. InProceedings of the International Conference on Learning Representation...
work page 2021
-
[2]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InProceedings of the International Conference on Machine Learning (ICML), pages 8748–8763. PmLR, 2021
work page 2021
-
[3]
Sigmoid loss for language image pre-training
Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 11975–11986, 2023
work page 2023
-
[4]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. DINOv3. arXiv preprint arXiv:2508.10104, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[6]
Christopher Chiu, Maximilian Heil, Teresa Kim, and Anthony Miyaguchi. Fine-grained classifi- cation for poisonous fungi identification with transfer learning.arXiv preprint arXiv:2407.07492, 2024
-
[7]
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Liu Shilong, Zeng Zhaoyang, Ren Tianhe, Li Feng, Zhang Hao, Yang Jie, Jiang Qing, Li Chun- yuan, Yang Jianwei, Su Hang, Zhu Jun, and Lei Zhang. Grounding dino: Marrying dino with grounded pre-training for open-set object detection.arXiv preprint arXIv:2303.05499, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[8]
Niels G Faber, Seyed Sahand Mohammadi Ziabari, and Fatemeh Karimi Nejadasl. Leveraging foundation models via knowledge distillation in multi-object tracking: Distilling dinov2 features to fairmot.arXiv preprint arXiv:2407.18288, 2024
-
[9]
Dino-tracker: Taming dino for self-supervised point tracking in a single video
Narek Tumanyan, Assaf Singer, Shai Bagon, and Tali Dekel. Dino-tracker: Taming dino for self-supervised point tracking in a single video. InEuropean Conference on Computer Vision, pages 367–385. Springer, 2024
work page 2024
-
[10]
Openvla: An open-source vision-language-action model
Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan P Foster, Pannag R Sanketi, Quan Vuong, et al. Openvla: An open-source vision-language-action model. InConference on Robot Learning, pages 2679–2713. PMLR, 2025
work page 2025
-
[11]
Mohammed Baharoon, Waseem Qureshi, Jiahong Ouyang, Yanwu Xu, Abdulrhman Aljouie, and Wei Peng. Evaluating general purpose vision foundation models for medical image analysis: An experimental study of dinov2 on radiology benchmarks.arXiv preprint arXiv:2312.02366, 2023
-
[12]
Advancing human-centric ai for robust x-ray analysis through holistic self-supervised learning
Théo Moutakanni, Piotr Bojanowski, Guillaume Chassagnon, Céline Hudelot, Armand Joulin, Yann LeCun, Matthew Muckley, Maxime Oquab, Marie-Pierre Revel, and Maria Vakalopoulou. Advancing human-centric ai for robust x-ray analysis through holistic self-supervised learning. arXiv preprint arXiv:2405.01469, 2024
-
[13]
Foundation models meet medical image interpretation.Research, 9:1024, 2026
Licheng Jiao, Jiayao Hao, Ruiyang Li, Lingling Li, Xu Liu, Fang Liu, Wenping Ma, Puhua Chen, Zhongjian Huang, Jingyi Yang, Jiaxuan Zhao, and Qigong Sun. Foundation models meet medical image interpretation.Research, 9:1024, 2026. doi: 10.34133/research.1024. URL https://spj.science.org/doi/abs/10.34133/research.1024
-
[14]
Tugba Akinci D’Antonoli, Christian Bluethgen, Renato Cuocolo, Michail E Klontzas, Andrea Ponsiglione, and Burak Kocak. Foundation models for radiology: fundamentals, applications, opportunities, challenges, risks, and prospects.Diagnostic and Interventional Radiology, 2025. 10
work page 2025
-
[15]
A survey for foundation models in autonomous driving
Haoxiang Gao, Zhongruo Wang, Yaqian Li, Kaiwen Long, Ming Yang, and Yiqing Shen. A survey for foundation models in autonomous driving. In2025 6th International Conference on Computer Vision and Data Mining (ICCVDM), pages 63–71. IEEE, 2025
work page 2025
-
[16]
A survey on vision-language-action models for autonomous driving
Sicong Jiang, Zilin Huang, Kangan Qian, Ziang Luo, Tianze Zhu, Yang Zhong, Yihong Tang, Menglin Kong, Yunlong Wang, Siwen Jiao, et al. A survey on vision-language-action models for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4524–4536, 2025
work page 2025
-
[17]
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse autoen- coders find highly interpretable features in language models.arXiv preprint arXiv:2309.08600, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[18]
Scaling and evaluating sparse autoencoders
Leo Gao, Tom Dupré la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, Alec Radford, Ilya Sutskever, Jan Leike, and Jeffrey Wu. Scaling and evaluating sparse autoencoders.arXiv preprint arXiv:2406.04093, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[19]
Zimmermann, Thomas Klein, and Wieland Brendel
Roland S. Zimmermann, Thomas Klein, and Wieland Brendel. Scale alone does not improve mechanistic interpretability in vision models.Advances in Neural Information Processing Systems (NeurIPS), 36, 2023
work page 2023
-
[20]
Toy models of superposition.Transformer Circuits Thread, 2022
Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, and Christopher Olah. Toy models of superposition.Transformer Circuits Thread, 2022
work page 2022
-
[21]
Adly Templeton, Tom Conerly, Jonathan Marcus, Jack Lindsey, Trenton Bricken, Brian Chen, Adam Pearce, Craig Citro, Emmanuel Ameisen, Andy Jones, Hoagy Cunningham, Nicholas L Turner, Callum McDougall, Monte MacDiarmid, C. Daniel Freeman, Theodore R. Sumers, Edward Rees, Joshua Batson, Adam Jermyn, Shan Carter, Chris Olah, and Tom Henighan. Scaling monosema...
work page 2024
-
[22]
Prince, Matthew Kowal, Victor Boutin, Isabel Papadimitriou, Binxu Wang, Martin Wattenberg, Demba E
Thomas Fel, Ekdeep Singh Lubana, Jacob S. Prince, Matthew Kowal, Victor Boutin, Isabel Papadimitriou, Binxu Wang, Martin Wattenberg, Demba E. Ba, and Talia Konkle. Archetypal SAE: Adaptive and stable dictionary learning for concept extraction in large vision models. InProceedings of the 42nd International Conference on Machine Learning, volume 267 of Proc...
work page 2025
-
[23]
Network dissec- tion: Quantifying interpretability of deep visual representations
David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. Network dissec- tion: Quantifying interpretability of deep visual representations. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 6541–6549, 2017
work page 2017
-
[24]
Zimmermann, Judith Schepers, Robert Geirhos, Thomas S
Judy Borowski, Roland S. Zimmermann, Judith Schepers, Robert Geirhos, Thomas S. A. Wallis, Matthias Bethge, and Wieland Brendel. Exemplary Natural Images Explain CNN Activa- tions Better than State-of-the-Art Feature Visualization. InProceedings of the International Conference on Learning Representations (ICLR), 2021
work page 2021
-
[25]
Zimmermann, David Klindt, and Wieland Brendel
Roland S. Zimmermann, David Klindt, and Wieland Brendel. Measuring per-unit inter- pretability at scale without humans. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Process- ing Systems (NeurIPS), volume 37, pages 48448–48483. Curran Associates, Inc., 2024. doi: 10.52202/07901...
-
[26]
Julien Colin, Lore Goetschalckx, Thomas Fel, Victor Boutin, Thomas Serre, and Nuria Oliver. Choosing the right basis for interpretability: Psychophysical comparison between neuron-based and dictionary-based representations.arXiv preprint arXiv:2411.03993, 2024
-
[27]
Learning important features through propagating activation differences
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. InProceedings of the International Conference on Machine Learning (ICML), 2017
work page 2017
-
[28]
Thomas Fel, Victor Boutin, Louis Béthune, Rémi Cadène, Mazda Moayeri, Léo Andéol, Mathieu Chalvidal, and Thomas Serre. A holistic approach to unifying automatic concept extraction and concept importance estimation.Advances in Neural Information Processing Systems, 36:54805–54818, 2023. 11
work page 2023
-
[29]
Lewis Smith, Senthooran Rajamanoharan, Arthur Conmy, Callum McDougall, Tom Lieberum, János Kramár, Rohin Shah, and Neel Nanda. Negative results for saes on downstream tasks and deprioritising sae research (gdm mech interp team progress update #2. AI Alignment Forum, 2025
work page 2025
-
[30]
Dense sae latents are features, not bugs
Xiaoqing Sun, Alessandro Stolfo, Joshua Engels, Ben Peng Wu, Senthooran Rajamanoharan, Mrinmaya Sachan, and Max Tegmark. Dense sae latents are features, not bugs. InMechanistic Interpretability Workshop at NeurIPS 2025, 2025
work page 2025
-
[31]
Unlocking Feature Visualization for Deeper Networks with Magnitude Constrained Optimization
Thomas Fel, Thibaut Boissin, Victor Boutin, Agustin Picard, Paul Novello, Julien Colin, Drew Linsley, Tom Rousseau, Rémi Cadène, Lore Goetschalckx, Laurent Gardes, and Thomas Serre. Unlocking Feature Visualization for Deeper Networks with Magnitude Constrained Optimization. InAdvances in Neural Information Processing Systems (NeurIPS), volume 36, pages 37...
work page 2023
-
[32]
RISE: Randomized Input Sampling for Explanation of Black-box Models
Vitali Petsiuk, Abir Das, and Kate Saenko. RISE: Randomized Input Sampling for Explanation of Black-box Models. InProceedings of the British Machine Vision Conference (BMVC), 2018
work page 2018
-
[33]
Pure: Turning polysemantic neurons into pure features by identifying relevant circuits
Maximilian Dreyer, Erblina Purelku, Johanna Vielhaben, Wojciech Samek, and Sebastian La- puschkin. Pure: Turning polysemantic neurons into pure features by identifying relevant circuits. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8212–8217, 2024
work page 2024
-
[34]
Scene parsing through ade20k dataset
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Scene parsing through ade20k dataset. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 633–641, 2017
work page 2017
-
[35]
Patrik O. Hoyer. Non-negative matrix factorization with sparseness constraints.The Journal of Machine Learning Research (JMLR), 5(Nov):1457–1469, 2004
work page 2004
-
[36]
Ilia Sucholutsky and Tom Griffiths. Alignment with human representations supports robust few- shot learning.Advances in Neural Information Processing Systems (NeurIPS), 36:73464–73479, 2023
work page 2023
-
[37]
Lukas Muttenthaler, Lorenz Linhardt, Jonas Dippel, Robert A Vandermeulen, Katherine Her- mann, Andrew Lampinen, and Simon Kornblith. Improving neural network representations using human similarity judgments.Advances in Neural Information Processing Systems (NeurIPS), 36:50978–51007, 2023
work page 2023
-
[38]
Shobhita Sundaram, Stephanie Fu, Lukas Muttenthaler, Netanel Tamir, Lucy Chai, Simon Kornblith, Trevor Darrell, and Phillip Isola. When does perceptual alignment benefit vision representations?Advances in Neural Information Processing Systems (NeurIPS), 37:55314– 55341, 2024
work page 2024
-
[39]
Lukas Muttenthaler, Klaus Greff, Frieda Born, Bernhard Spitzer, Simon Kornblith, Michael C Mozer, Klaus-Robert Müller, Thomas Unterthiner, and Andrew K Lampinen. Aligning machine and human visual representations across abstraction levels.Nature, 647(8089):349–355, 2025. doi: 10.1038/s41586-025-09631-6
-
[40]
Jannis Ahlert, Thomas Klein, Felix A. Wichmann, and Robert Geirhos. How aligned are different alignment metrics? InICLR 2024 Workshop on Representational Alignment (Re-Align), 2024. URLhttps://openreview.net/forum?id=cHlKB28bjV
work page 2024
-
[41]
Learning What and Where to Attend
Drew Linsley, Dan Shiebler, Sven Eberhardt, and Thomas Serre. Learning What and Where to Attend. InProceedings of the International Conference on Learning Representations (ICLR), 2019
work page 2019
-
[42]
Thomas Fel, Ivan F Rodriguez Rodriguez, Drew Linsley, and Thomas Serre. Harmonizing the object recognition strategies of deep neural networks with humans.Advances in Neural Information Processing Systems (NeurIPS), 35:9432–9446, 2022
work page 2022
-
[43]
Martin N. Hebart, Charles Y . Zheng, Francisco Pereira, and Chris I. Baker. Revealing the multi- dimensional mental representations of natural objects underlying human similarity judgements. Nature Human Behaviour, 4(11):1173–1185, 2020
work page 2020
-
[44]
Martin N. Hebart, Oliver Contier, Lina Teichmann, Adam H. Rockter, Charles Y . Zheng, Alexis Kidder, Anna Corriveau, Maryam Vaziri-Pashkam, and Chris I. Baker. THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior.eLife, 12:e82580, 2023. doi: 10.7554/eLife.82580. 12
-
[45]
Stephanie Fu, Netanel Tamir, Shobhita Sundaram, Lucy Chai, Richard Zhang, Tali Dekel, and Phillip Isola. DreamSim: Learning new dimensions of human visual similarity using synthetic data.arXiv preprint arXiv:2306.09344, 2023
-
[46]
Fabian Gröger, Shuo Wen, and Maria Brbi´c. Revisiting the platonic representation hypothesis: An aristotelian view.arXiv preprint arXiv:2602.14486, 2026
-
[47]
Getting aligned on representa- tional alignment.arXiv preprint arXiv:2310.13018, 2023
Ilia Sucholutsky, Lukas Muttenthaler, Adrian Weller, Andi Peng, Andreea Bobu, Been Kim, Bradley C Love, Erin Grant, Iris Groen, Jascha Achterberg, et al. Getting aligned on representa- tional alignment.arXiv preprint arXiv:2310.13018, 2023
-
[48]
Natural language descriptions of deep visual features
Evan Hernandez, Sarah Schwettmann, David Bau, Teona Bagashvili, Antonio Torralba, and Jacob Andreas. Natural language descriptions of deep visual features. InProceedings of the International Conference on Learning Representations (ICLR), 2022
work page 2022
-
[49]
Language models can explain neurons in language models.OpenAI Blog, 2023
Steven Bills, Nick Cammarata, Dan Mossing, Henk Tillman, Leo Gao, Gabriel Goh, Ilya Sutskever, Jan Leike, Jeff Wu, and William Saunders. Language models can explain neurons in language models.OpenAI Blog, 2023. URL https://openaipublic.blob.core.windows. net/neuron-explainer/paper/index.html
work page 2023
-
[50]
CLIP-Dissect: Automatic description of neuron representations in deep vision networks
Tuomas Oikarinen and Tsui-Wei Weng. CLIP-Dissect: Automatic description of neuron representations in deep vision networks. InProceedings of the International Conference on Learning Representations (ICLR), 2023
work page 2023
-
[51]
The similarity metric.IEEE transactions on Information Theory, 50(12):3250–3264, 2004
Ming Li, Xin Chen, Xin Li, Bin Ma, and Paul MB Vitányi. The similarity metric.IEEE transactions on Information Theory, 50(12):3250–3264, 2004. 13 A Limitations and broader impact. Limitations.We acknowledge several limitations. First, our model-level analyses span only six architectures, which limits statistical power and makes it difficult to disentangle...
work page 2004
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.