arxiv: 2605.08188 · v1 · submitted 2026-05-05 · 💻 cs.CV · cs.AI

Recognition: no theorem link

Neuroscience-Inspired Analyses of Visual Interestingness in Multimodal Transformers

Mathis Immertreu , Fitim Abdullahu , Thomas Kinfe , Helmut Grabner , Patrick Krauss , Achim Schilling

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:29 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords visual interestingnessmultimodal transformersvision-language modelslinear decodingrepresentational similarity analysisgeneralized discrimination valueneuroscience-inspired analysislayer-wise representations

0 comments

The pith

Multimodal transformers encode human-derived visual interestingness in structured form across their layers without explicit training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether a multimodal vision-language model captures principles of human visual interest by examining its internal states with methods borrowed from neuroscience. It applies linear decoding, dimensionality reduction, and discrimination measures to track how a Flickr-derived Common Interestingness score becomes represented in the model's vision and language components. A reader would care because these models increasingly decide what content reaches people's attention, and evidence of alignment with human interest could shape both cognitive understanding and practical use in communication systems. The results show the information is linearly decodable at the end and grows more organized through successive layers, with concept vectors converging higher up as shown by multiple extraction methods and similarity analysis.

Core claim

Analyses of Qwen3-VL-8B reveal that Common Interestingness information is linearly decodable from final-layer embeddings, indicating alignment with human-derived measures of visual interestingness. Dimensionality reduction and Generalized Discrimination Value analyses show CI-related hidden representations emerge in intermediate vision transformer layers and become progressively more distinguishable across language model layers. Concept vectors obtained through geometric, probe, and Sparse Auto-Encoder methods converge in higher layers according to representational similarity analysis, demonstrating a robust and structured encoding of visual interestingness without explicit supervision.

What carries the argument

Layer-wise application of linear decodability probes, Generalized Discrimination Value (GDV), and representational similarity analysis to track the emergence and convergence of Common Interestingness (CI) representations in vision and language transformer components.

Load-bearing premise

The pre-defined Common Interestingness score from Flickr engagement data accurately captures intrinsic visual interestingness rather than popularity, image quality, or platform biases.

What would settle it

Linear decoders trained on the embeddings would fail to predict held-out CI scores above chance, or GDV values would show no systematic increase in distinguishability from intermediate vision layers onward.

Figures

Figures reproduced from arXiv: 2605.08188 by Achim Schilling, Fitim Abdullahu, Helmut Grabner, Mathis Immertreu, Patrick Krauss, Thomas Kinfe.

**Figure 1.** Figure 1: Representative images across terciles of concept vector activation. Each column corresponds to a distinct method for deriving concept vectors (Tab. 1). Final-layer hidden states were projected onto the learned concept vector for each method. Rows show images from the top, middle, and bottom terciles of projection scores, along with the projection value (Proj) and CI score. High-scoring images typically dep… view at source ↗

**Figure 2.** Figure 2: CI Score Pipeline. We illustrate the Common Interestingness (CI) scoring method proposed by Abdullahu and Grabner (2024). To move beyond the binary interesting/uninteresting distinction, they define CI as a data-driven continuum grounded in cross-user agreement. (a) CLIP embeddings (768D) of ∼500k Flickr images are reduced via UMAP and partitioned into N=200 initial clusters via k-means, providing semantic… view at source ↗

**Figure 3.** Figure 3: Concept vector extraction methodologies. Hidden states h ∈ R d are extracted from layer ℓ of a multimodal neural network. Six distinct approaches across three categories isolate the concept vector vCI. Left: Geometric methods analyze representations directly. These include Difference of Means (computed between the top and bottom 20% CI samples) and PCA-based selection, which utilizes either the dominant … view at source ↗

**Figure 4.** Figure 4: Multidimensional scaling projections of embedding spaces across Vision Transformer (ViT) and Language Model (LLM) layers including a simplified Qwen3-8B VL architecture (a) 2D MDS projections of embeddings from three ViT layers and three LLM layers colored-coded by: continuous CI scores (row 1), binary CI groups via median split (row 2), trend groups from Abdullahu and Grabner (2024) (row 3), and quintile … view at source ↗

**Figure 5.** Figure 5: Layerwise Generalized Discrimination Values (GDV) across the full model hierarchy. Top row: GDV computed in the original high-dimensional embedding space for each layer using three grouping strategies (Binary CI: median split; Trend Groups: commonly, intermediate and subjectively interesting; CI Score Bins: quintiles). Lower values indicate better separation between CI groups. Bottom row: Layerwise GDV com… view at source ↗

**Figure 6.** Figure 6: Concept vector projections correlate with Common Interestingness (CI) scores across model layers. (a) Scatter plots showing the relationship between concept vector projection values and CI scores for representative layers from early and late vision processing (ViT L2, L26) and middle and late language processing (LLM L13, L31) across five extraction methods: Difference of Means, PCA (Best), Probe (Clf), P… view at source ↗

**Figure 7.** Figure 7: Pearson correlations and Representational Similarity Analysis reveal agreement across concept vector methods. Pairwise comparisons between concept vector extraction methods and CI scores at intermediate vision (ViT L13) and late language processing (LLM L31). (a) Pearson correlations between projection values onto each concept vector and CI scores. (b) Representational Similarity Analysis (RSA) computed as… view at source ↗

read the original abstract

Human attention is the gateway to conscious perception, memory and decision-making. However, its role in modern transformer models remains largely unexplored. As these systems increasingly influence what people see, prefer and buy, the question arises as to whether they encode principles of human interest or merely exploit large-scale correlations. Addressing this issue is crucial for understanding cognition and ensuring the responsible use of AI in communication and marketing. In order to address this issue, the concept of visual interest was examined within the multimodal vision-language-model Qwen3-VL-8B, using a pre-defined Common Interestingness (CI) score derived from large-scale human engagement data on the photo-sharing platform Flickr. Here, we analyzed internal representations across vision and language components using methods from the neurosciences. Our analyses revealed that CI information is linearly decodable from final-layer embeddings, indicating that it is aligned with human-derived measures of visual interestingness. Dimensionality reduction and Generalized Discrimination Value (GDV) analyses demonstrate that CI-related hidden representations emerge in intermediate vision transformer layers and becomes progressively more distinguishable across language model layers. Concept vectors derived using geometric, probe, and Sparse Auto-Encoder based methods converge in higher layers, as confirmed by representational similarity analysis. This indicates a robust and structured encoding of visual interestingness without explicit supervision. Future work will seek to identify shared computational principles linking human brain dynamics and transformer architectures, with the ultimate goal of uncovering the organizing mechanisms that give rise to attention and interest in both biological and artificial systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies neuroscience tools to decode Flickr-derived interestingness from Qwen3-VL layers but rests on an unvalidated proxy and supplies no numbers to assess the claims.

read the letter

The core observation is that CI information from Flickr engagement becomes linearly decodable from the model's final embeddings, with patterns of emergence in vision layers and increasing distinguishability in language layers, plus convergence of concept vectors across extraction methods. That is the punchline the colleague should take away first. The work stays post-hoc on a frozen model, which avoids any circularity in fitting to the target score. What the paper does reasonably well is bring together multiple established neuroscience-style checks: GDV for discrimination value, dimensionality reduction, linear probes, geometric vectors, and sparse autoencoders, then confirming their agreement with representational similarity analysis in higher layers. Using several independent routes to the same concept vector is a straightforward way to add some robustness. The soft spots are clear and central. The Common Interestingness score comes from Flickr engagement data, which mixes visual interest with popularity, upload timing, social effects, image aesthetics, and platform promotion. The abstract gives no sign of controls for those factors, such as partial correlations, matched image subsets, or alternative labels. If low-level visual statistics that drive engagement also drive the embeddings, the decodability and layer-wise patterns could be artifacts rather than evidence of human-aligned encoding. The stress-test concern holds on the information available. In addition, the abstract contains no quantitative results, error bars, statistical tests, or data-split details, so the claims of progressive emergence and convergence cannot be evaluated. This leaves the paper in a preliminary state. The work is aimed at researchers who study interpretability in multimodal models and possible overlaps with cognitive measures. A reader already working on transferring neuroscience analysis techniques to transformers would find the method combination worth seeing. It deserves peer review so that referees can examine the full methods section, any unreported controls, and the actual numbers behind the layer-wise patterns.

Referee Report

3 major / 2 minor

Summary. The manuscript examines visual interestingness in the multimodal transformer Qwen3-VL-8B by analyzing its internal representations with neuroscience-inspired tools. It uses a pre-defined Common Interestingness (CI) score computed from large-scale Flickr engagement data as the target variable. Linear decoding, dimensionality reduction, Generalized Discrimination Value (GDV) trajectories, and representational similarity analysis of concept vectors (obtained via geometric, probing, and Sparse Auto-Encoder routes) are applied across vision and language layers. The central claims are that CI information is linearly decodable from final-layer embeddings, that CI-related structure emerges in intermediate vision-transformer layers and grows progressively distinguishable through the language-model stack, and that the three families of concept vectors converge in higher layers, indicating unsupervised, structured encoding of human-derived visual interest.

Significance. If the quantitative results and controls hold, the work would supply evidence that multimodal transformers spontaneously align internal representations with human visual interestingness without any supervision on that variable. Such a finding would strengthen the case for using representational-similarity and decoding methods from neuroscience to interpret emergent properties in large vision-language models and could inform both cognitive modeling and the design of more transparent multimodal systems.

major comments (3)

[Abstract / Methods] Abstract and Methods: the manuscript relies on a Flickr-derived CI score as the sole human-interest proxy yet reports no controls, partial correlations, or matched-subset analyses for known confounds (image aesthetics, resolution, upload timing, social-network effects, or platform promotion). Without such checks, the reported linear decodability and layer-wise GDV increases could reflect low-level visual statistics rather than genuine alignment with intrinsic interestingness.
[Abstract] Abstract: all claims of linear decodability, progressive distinguishability, and concept-vector convergence are stated without any numerical values, error bars, statistical tests, layer indices, or sample sizes. This absence prevents assessment of effect magnitude or robustness and is load-bearing for the central emergence claim.
[Results] Results (GDV and RSA sections): the progressive increase in distinguishability across language-model layers and the convergence of geometric/probe/SAE concept vectors are presented without baseline comparisons (e.g., shuffled labels, random embeddings, or control tasks) or explicit layer-by-layer statistics, leaving open whether the observed trajectories exceed what would be expected from generic depth-dependent specialization.

minor comments (2)

[Abstract] Abstract: subject-verb agreement error in 'CI-related hidden representations emerge ... and becomes progressively more distinguishable'; 'representations' is plural, so 'become' is required.
[Abstract / Methods] Notation: the acronym 'GDV' is introduced without an explicit expansion or reference on first use; a brief parenthetical definition would aid readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify how to strengthen the presentation and robustness of our findings on the alignment between multimodal transformer representations and human visual interestingness. We address each major comment below and commit to the corresponding revisions.

read point-by-point responses

Referee: [Abstract / Methods] Abstract and Methods: the manuscript relies on a Flickr-derived CI score as the sole human-interest proxy yet reports no controls, partial correlations, or matched-subset analyses for known confounds (image aesthetics, resolution, upload timing, social-network effects, or platform promotion). Without such checks, the reported linear decodability and layer-wise GDV increases could reflect low-level visual statistics rather than genuine alignment with intrinsic interestingness.

Authors: We agree that explicit controls for potential confounds are necessary to support the interpretation that the decoded structure reflects human-derived interestingness rather than low-level image properties. In the revised manuscript we will add partial-correlation analyses that control for image aesthetics (using established computational metrics), resolution, and available metadata on upload timing. We will also report results on matched subsets where images are equated on these variables, and include these controls in both the Methods and Results sections with appropriate statistical reporting. revision: yes
Referee: [Abstract] Abstract: all claims of linear decodability, progressive distinguishability, and concept-vector convergence are stated without any numerical values, error bars, statistical tests, layer indices, or sample sizes. This absence prevents assessment of effect magnitude or robustness and is load-bearing for the central emergence claim.

Authors: We accept that the abstract must convey quantitative information to allow readers to evaluate the strength of the claims. The revised abstract will include the key numerical results: linear decoding accuracy (with standard error), GDV values and their layer-wise increases (with statistical tests), specific layer indices where structure emerges, and the number of images and layers analyzed. Error bars and p-values will be stated where they support the reported effects. revision: yes
Referee: [Results] Results (GDV and RSA sections): the progressive increase in distinguishability across language-model layers and the convergence of geometric/probe/SAE concept vectors are presented without baseline comparisons (e.g., shuffled labels, random embeddings, or control tasks) or explicit layer-by-layer statistics, leaving open whether the observed trajectories exceed what would be expected from generic depth-dependent specialization.

Authors: We acknowledge that baseline controls and layer-wise statistics are required to demonstrate that the reported trajectories are specific to the CI variable rather than generic consequences of depth. In the revision we will add shuffled-label and random-embedding baselines for both GDV and RSA analyses, together with layer-by-layer statistical tests (e.g., repeated-measures ANOVA or paired t-tests with appropriate multiple-comparison correction). These controls and statistics will be presented in the Results text and in updated figures. revision: yes

Circularity Check

0 steps flagged

No circularity: post-hoc decoding of external CI score

full rationale

The paper applies standard post-hoc neuroscience methods (linear probes, GDV, dimensionality reduction, RSA, concept vectors via probes/SAE) to the frozen embeddings of a pre-trained Qwen3-VL model. The CI target is a pre-defined external score computed from Flickr engagement data and is never used for model training, fine-tuning, or parameter optimization. No equations, predictions, or uniqueness claims reduce to fitted inputs or self-citations by construction. The derivation chain consists entirely of observational measurements on independent data, making the analysis self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract relies on the external validity of the Flickr-derived CI score as ground truth and on standard interpretability assumptions; no new entities are postulated and no parameters appear to be fitted to the target variable.

axioms (1)

domain assumption Linear decodability from embeddings indicates meaningful alignment with the target concept
Invoked when interpreting probe results as evidence of encoding; common in interpretability literature but not proven here.

pith-pipeline@v0.9.0 · 5584 in / 1311 out tokens · 65109 ms · 2026-05-12T01:29:35.643355+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages · 7 internal anchors

[1]

arXiv preprint arXiv:2601.12837 , year=

Cognition spaces: natural, artificial, and hybrid , author=. arXiv preprint arXiv:2601.12837 , year=

work page arXiv
[2]

The Platonic Representation Hypothesis

The platonic representation hypothesis , author=. arXiv preprint arXiv:2405.07987 , year=

work page Pith review arXiv
[3]

Human vision and electronic imaging XVIII , volume=

The effect of familiarity on perceived interestingness of images , author=. Human vision and electronic imaging XVIII , volume=. 2013 , organization=

work page 2013
[4]

Proceedings of the 21st ACM international conference on Multimedia , pages=

Visual interestingness in image sequences , author=. Proceedings of the 21st ACM international conference on Multimedia , pages=

work page
[5]

Proceedings of the IEEE international conference on computer vision , pages=

The interestingness of images , author=. Proceedings of the IEEE international conference on computer vision , pages=

work page
[6]

Advances in neural information processing systems , volume=

Imagenet classification with deep convolutional neural networks , author=. Advances in neural information processing systems , volume=

work page
[7]

International journal of computer vision , volume=

Imagenet large scale visual recognition challenge , author=. International journal of computer vision , volume=. 2015 , publisher=

work page 2015
[8]

Proceedings of the IEEE Conference on computer vision and pattern recognition , pages=

Emotional attention: A study of image sentiment and visual attention , author=. Proceedings of the IEEE Conference on computer vision and pattern recognition , pages=

work page
[9]

Internet Imaging VI , volume=

Multimodal approaches for emotion recognition: a survey , author=. Internet Imaging VI , volume=. 2005 , organization=

work page 2005
[10]

Cognitive neuroscience , volume=

Predictive coding, precision and synchrony , author=. Cognitive neuroscience , volume=. 2012 , publisher=

work page 2012
[11]

Philosophical transactions of the Royal Society B: Biological sciences , volume=

Predictive coding under the free-energy principle , author=. Philosophical transactions of the Royal Society B: Biological sciences , volume=. 2009 , publisher=

work page 2009
[12]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Visual Interestingness Decoded: How GPT-4o Mirrors Human Interests , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[13]

Advances in Neural Information Processing Systems , volume=

Understanding aesthetics with language: A photo critique dataset for aesthetic assessment , author=. Advances in Neural Information Processing Systems , volume=

work page
[14]

European Conference on Computer Vision , pages=

Commonly interesting images , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024
[15]

The chinese room , author=

work page
[16]

PLoS computational biology , volume=

Could a neuroscientist understand a microprocessor? , author=. PLoS computational biology , volume=. 2017 , publisher=

work page 2017
[17]

Qwen Technical Report

Qwen technical report , author=. arXiv preprint arXiv:2309.16609 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking , author=. arXiv preprint arXiv:2601.04720 , year=

work page internal anchor Pith review arXiv
[19]

International conference on machine learning , pages=

Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021
[20]

2025 , eprint=

Qwen3-VL Technical Report , author=. 2025 , eprint=

work page 2025
[21]

Neural Networks , volume=

Quantifying the separability of data classes in neural networks , author=. Neural Networks , volume=. 2021 , publisher=

work page 2021
[22]

Brain , volume=

Predictive coding and stochastic resonance as fundamental principles of auditory phantom perception , author=. Brain , volume=. 2023 , publisher=

work page 2023
[23]

Scientific reports , volume=

A statistical method for analyzing and comparing spatiotemporal cortical activation patterns , author=. Scientific reports , volume=. 2018 , publisher=

work page 2018
[24]

arXiv preprint arXiv:2501.08145 , year=

Refusal behavior in large language models: A nonlinear perspective , author=. arXiv preprint arXiv:2501.08145 , year=

work page arXiv
[25]

Journal of Neurophysiology , volume=

The Bayesian brain: world models and conscious dimensions of auditory phantom perception , author=. Journal of Neurophysiology , volume=. 2024 , publisher=

work page 2024
[26]

NeuroImage , volume=

Deep learning based decoding of single local field potential events , author=. NeuroImage , volume=. 2024 , publisher=

work page 2024
[27]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Sparks of artificial general intelligence: Early experiments with gpt-4 , author=. arXiv preprint arXiv:2303.12712 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[28]

Nature News , volume=

Can we open the black box of AI? , author=. Nature News , volume=

work page
[29]

Cognitive Computation , volume=

Interpreting black-box models: a review on explainable artificial intelligence , author=. Cognitive Computation , volume=. 2024 , publisher=

work page 2024
[30]

Frontiers in systems neuroscience , volume=

Representational similarity analysis-connecting the branches of systems neuroscience , author=. Frontiers in systems neuroscience , volume=. 2008 , publisher=

work page 2008
[31]

Visual population codes: towards a common multivariate framework for cell recording and functional imaging , year=

Representational similarity analysis of object population codes in humans, monkeys, and models , author=. Visual population codes: towards a common multivariate framework for cell recording and functional imaging , year=

work page
[32]

arXiv preprint arXiv:2208.10576 , year=

Different spectral representations in optimized artificial neural networks and brains , author=. arXiv preprint arXiv:2208.10576 , year=

work page arXiv
[33]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=

work page
[34]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2023
[35]

Revisiting Multimodal Positional Encoding in Vision-Language Models

Revisiting Multimodal Positional Encoding in Vision-Language Models , author=. arXiv preprint arXiv:2510.23095 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[36]

Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations , pages=

Transformers: State-of-the-art natural language processing , author=. Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations , pages=

work page 2020
[37]

Workshop on Job Scheduling Strategies for Parallel Processing , pages=

Architecture of the slurm workload manager , author=. Workshop on Job Scheduling Strategies for Parallel Processing , pages=. 2023 , organization=

work page 2023
[38]

Transactions on Machine Learning Research , year=

Mechanistic Interpretability for AI Safety-A Review , author=. Transactions on Machine Learning Research , year=

work page
[39]

Computational Linguistics , volume=

Probing classifiers: Promises, shortcomings, and advances , author=. Computational Linguistics , volume=. 2022 , publisher=

work page 2022
[40]

Understanding intermediate layers using linear classifier probes , author=

work page
[41]

Designing and Interpreting Probes with Control Tasks , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages=

work page 2019
[42]

A practical review of mechanistic interpretability for transformer-based language models.arXiv preprint arXiv:2407.02646,

A practical review of mechanistic interpretability for transformer-based language models , author=. arXiv preprint arXiv:2407.02646 , year=

work page arXiv
[43]

Toy Models of Superposition

Toy models of superposition , author=. arXiv preprint arXiv:2209.10652 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[44]

Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP , pages=

Attribution Patching Outperforms Automated Circuit Discovery , author=. Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP , pages=

work page
[45]

Advances in neural information processing systems , volume=

Locating and editing factual associations in gpt , author=. Advances in neural information processing systems , volume=

work page
[46]

Findings of the Association for Computational Linguistics ACL 2024 , pages=

A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task , author=. Findings of the Association for Computational Linguistics ACL 2024 , pages=

work page 2024
[47]

The Twelfth International Conference on Learning Representations , year=

Sparse autoencoders find highly interpretable features in language models , author=. The Twelfth International Conference on Learning Representations , year=

work page
[48]

arXiv preprint arXiv:2506.18167 , year=

Understanding reasoning in thinking language models via steering vectors , author=. arXiv preprint arXiv:2506.18167 , year=

work page arXiv
[49]

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

The geometry of truth: Emergent linear structure in large language model representations of true/false datasets , author=. arXiv preprint arXiv:2310.06824 , year=

work page internal anchor Pith review arXiv
[50]

International conference on machine learning , pages=

Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav) , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[51]

arXiv e-prints , pages=

Probing ranking llms: Mechanistic interpretability in information retrieval , author=. arXiv e-prints , pages=

work page
[52]

arXiv preprint arXiv:2405.15454 , year=

Linearly controlled language generation with performative guarantees , author=. arXiv preprint arXiv:2405.15454 , year=

work page arXiv
[53]

arXiv preprint arXiv:2512.17639 , year=

Linear Personality Probing and Steering in LLMs: A Big Five Study , author=. arXiv preprint arXiv:2512.17639 , year=

work page arXiv
[54]

Tutorials in quantitative methods for psychology , volume=

A review of multidimensional scaling (MDS) and its utility in various psychological domains , author=. Tutorials in quantitative methods for psychology , volume=

work page
[55]

Encyclopedia of cognitive science , volume=

Multidimensional scaling , author=. Encyclopedia of cognitive science , volume=. 2002 , publisher=

work page 2002
[56]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Umap: Uniform manifold approximation and projection for dimension reduction , author=. arXiv preprint arXiv:1802.03426 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[57]

International conference on image and signal processing , pages=

Considerably improving clustering algorithms using UMAP dimensionality reduction technique: a comparative study , author=. International conference on image and signal processing , pages=. 2020 , organization=

work page 2020
[58]

Journal of machine learning research , volume=

Visualizing data using t-SNE , author=. Journal of machine learning research , volume=

work page
[59]

understanding ai

" understanding ai": Semantic grounding in large language models , author=. arXiv preprint arXiv:2402.10992 , year=

work page arXiv
[60]

Philosophical Transactions of the Royal Society B: Biological Sciences , volume=

The case of CAUSE: neurobiological mechanisms for grounding an abstract concept , author=. Philosophical Transactions of the Royal Society B: Biological Sciences , volume=. 2018 , publisher=

work page 2018
[61]

bioRxiv , pages=

Prediction, Syntax and Semantic Grounding in the Brain and Large Language Models , author=. bioRxiv , pages=. 2025 , publisher=

work page 2025
[62]

IEEE transactions on image processing , volume=

NIMA: Neural image assessment , author=. IEEE transactions on image processing , volume=. 2018 , publisher=

work page 2018
[63]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

work page
[64]

Authorea Preprints , year=

The Core Of The Scientific Method , author=. Authorea Preprints , year=

work page
[65]

Authorea Preprints , year=

Platonic Attractors , author=. Authorea Preprints , year=

work page
[66]

IEEE transactions on pattern analysis and machine intelligence , volume=

State-of-the-art in visual attention modeling , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2012 , publisher=

work page 2012
[67]

DeepGaze II: Reading fixations from deep features trained on object recognition

DeepGaze II: Reading fixations from deep features trained on object recognition , author=. arXiv preprint arXiv:1610.01563 , year=

work page Pith review arXiv
[68]

Proceedings of the ieee/cvf international conference on computer vision , pages=

DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling , author=. Proceedings of the ieee/cvf international conference on computer vision , pages=

work page
[69]

, author=

Short-term conceptual memory for pictures. , author=. Journal of experimental psychology: human learning and memory , volume=. 1976 , publisher=

work page 1976
[70]

Progress in brain research , volume=

Building the gist of a scene: The role of global image features in recognition , author=. Progress in brain research , volume=. 2006 , publisher=

work page 2006
[71]

International journal of computer vision , volume=

Modeling the shape of the scene: A holistic representation of the spatial envelope , author=. International journal of computer vision , volume=. 2001 , publisher=

work page 2001
[72]

Nature , volume=

Neural mechanisms of rapid natural scene categorization in human visual cortex , author=. Nature , volume=. 2009 , publisher=

work page 2009
[73]

Visual cognition , volume=

How long to get to the “gist” of real-world natural scenes? , author=. Visual cognition , volume=. 2005 , publisher=

work page 2005
[74]

Empirical studies of the arts , volume=

Cognitive appraisals and interest in visual art: Exploring an appraisal theory of aesthetic emotions , author=. Empirical studies of the arts , volume=. 2005 , publisher=

work page 2005
[75]

Psychological science , volume=

The briefest of glances: The time course of natural scene understanding , author=. Psychological science , volume=. 2009 , publisher=

work page 2009
[76]

Trends in cognitive sciences , volume=

Object vision in a structured world , author=. Trends in cognitive sciences , volume=. 2019 , publisher=

work page 2019
[77]

Trends in cognitive sciences , volume=

Neuroaesthetics , author=. Trends in cognitive sciences , volume=. 2014 , publisher=

work page 2014
[78]

Perspectives on psychological science , volume=

Neuroaesthetics: The cognitive neuroscience of aesthetic experience , author=. Perspectives on psychological science , volume=. 2016 , publisher=

work page 2016
[79]

, title =

Berlyne, Daniel E. , title =. 1971 , publisher =

work page 1971