pith. sign in

Treat visual tokens as text? but your mllm only needs fewer efforts to see

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

fields

cs.CV 4

years

2026 3 2025 1

representative citing papers

Counting to Four is still a Chore for VLMs

cs.CV · 2026-04-11 · unverdicted · novelty 6.0

VLMs fail at counting because visual evidence degrades in later language layers, and a lightweight Modality Attention Share intervention can encourage better use of image information during answer generation.

citing papers explorer

Showing 4 of 4 citing papers.