arXiv preprint arXiv:2507.21917 (2025)

Fanelli, N · 2025 · arXiv 2507.21917

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Appear2Meaning: A Cross-Cultural Benchmark for Structured Cultural Metadata Inference from Images

cs.CV · 2026-04-08 · unverdicted · novelty 7.0

A new cross-cultural benchmark shows vision-language models infer structured cultural metadata from images inconsistently, with fragmented signals and large performance gaps across regions and metadata types.

Understanding How MLLMs Describe Artworks Using Token Activation Maps

cs.CV · 2026-06-26 · unverdicted · novelty 6.0

Token Activation Maps applied to MLLM art descriptions reveal that visual grounding strength varies by token category, with better artist identification than title prediction.

Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding

cs.AI · 2026-03-19 · unverdicted · novelty 6.0

MLLMs exhibit a consistent recognition-reasoning inversion on discrete visual symbols across domains, underperforming on elementary perception while appearing competent on higher-level reasoning via linguistic compensation.

citing papers explorer

Showing 3 of 3 citing papers.

Appear2Meaning: A Cross-Cultural Benchmark for Structured Cultural Metadata Inference from Images cs.CV · 2026-04-08 · unverdicted · none · ref 19
A new cross-cultural benchmark shows vision-language models infer structured cultural metadata from images inconsistently, with fragmented signals and large performance gaps across regions and metadata types.
Understanding How MLLMs Describe Artworks Using Token Activation Maps cs.CV · 2026-06-26 · unverdicted · none · ref 17
Token Activation Maps applied to MLLM art descriptions reveal that visual grounding strength varies by token category, with better artist identification than title prediction.
Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding cs.AI · 2026-03-19 · unverdicted · none · ref 31
MLLMs exhibit a consistent recognition-reasoning inversion on discrete visual symbols across domains, underperforming on elementary perception while appearing competent on higher-level reasoning via linguistic compensation.

arXiv preprint arXiv:2507.21917 (2025)

fields

years

verdicts

representative citing papers

citing papers explorer