Vision models converge on universal object dimensions that are semantically interpretable and align more closely with biological vision than model-specific ones.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2representative citing papers
VLMs exhibit size, center, and saliency biases in scene understanding, relying less on people than humans do, with size bias as a key driver of divergence.
citing papers explorer
-
Characterizing Universal Object Representations Across Vision Models
Vision models converge on universal object dimensions that are semantically interpretable and align more closely with biological vision than model-specific ones.
-
Revealing the Gap in Human and VLM Scene Perception through Counterfactual Semantic Saliency
VLMs exhibit size, center, and saliency biases in scene understanding, relying less on people than humans do, with size bias as a key driver of divergence.