Vision-language models show some human-like patterns in visual search effort (flat for features, rising for conjunctions) but diverge on target-present vs absent slopes and enumeration accuracy when reasoning tokens proxy reaction time.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
KnotBench benchmark shows state-of-the-art VLMs perform near random on diagrammatic knot reasoning tasks and lack ability to simulate structural moves.
Introduces an information-theoretic formalization of the binding problem and a probing method to quantify binding information in deep learning model representations, tested on ViTs across challenging datasets.
citing papers explorer
-
Do vision-language models search like humans? Reasoning tokens as a reaction-time analog in classic visual-search paradigms
Vision-language models show some human-like patterns in visual search effort (flat for features, rising for conjunctions) but diverge on target-present vs absent slopes and enumeration accuracy when reasoning tokens proxy reaction time.
-
The Gordian Knot for VLMs: Diagrammatic Knot Reasoning as a Hard Benchmark
KnotBench benchmark shows state-of-the-art VLMs perform near random on diagrammatic knot reasoning tasks and lack ability to simulate structural moves.
-
Formalizing the Binding Problem
Introduces an information-theoretic formalization of the binding problem and a probing method to quantify binding information in deep learning model representations, tested on ViTs across challenging datasets.