Current VLMs excel at individual manga panel interpretation but systematically fail at temporal causality and cross-panel cohesion in long-form narratives.
Do vision and language models share con- cepts? a vector space alignment study.Transactions of the Association for Computational Linguistics, 12:1232–1249
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Re:Verse -- Can Your VLM Read a Manga?
Current VLMs excel at individual manga panel interpretation but systematically fail at temporal causality and cross-panel cohesion in long-form narratives.