COAST prunes 77.8% of visual tokens in LVLMs with a 2.15x speedup while keeping 98.64% of original performance by adaptively routing semantic and spatial context via contrastive scores.
An image is worth 1/2 tokens after layer 2: Plug-and-play inference acceleration for large vision-language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
baseline 1
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
baseline 1polarities
baseline 1representative citing papers
Empirical study finds background semantics, random pruning, and recency-based allocation improve token efficiency for GUI visual agents.
citing papers explorer
-
Evading Visual Aphasia: Contrastive Adaptive Semantic Token Pruning for Vision-Language Models
COAST prunes 77.8% of visual tokens in LVLMs with a 2.15x speedup while keeping 98.64% of original performance by adaptively routing semantic and spatial context via contrastive scores.
-
Rethinking Token Pruning for Historical Screenshots in GUI Visual Agents: Semantic, Spatial, and Temporal Perspectives
Empirical study finds background semantics, random pruning, and recency-based allocation improve token efficiency for GUI visual agents.