Variation-aware vision token dropping for faster large vision-language models.CoRR, abs/2509.01552

Junjie Chen, Xuyang Liu, Zichen Wen, Yiyu Wang, Siteng Huang, Honggang Chen · 2025 · arXiv 2509.01552

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Evading Visual Aphasia: Contrastive Adaptive Semantic Token Pruning for Vision-Language Models

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

COAST prunes 77.8% of visual tokens in LVLMs with a 2.15x speedup while keeping 98.64% of original performance by adaptively routing semantic and spatial context via contrastive scores.

HeadRouter: Dynamic Head-Weight Routing for Task-Adaptive Audio Token Pruning in Large Audio Language Models

cs.SD · 2026-04-26 · unverdicted · novelty 6.0

HeadRouter prunes audio tokens more effectively by dynamically routing based on per-head importance for semantic versus acoustic tasks, exceeding baseline performance at 70% token retention on Qwen2.5-Omni models.

Rethinking Token Pruning for Historical Screenshots in GUI Visual Agents: Semantic, Spatial, and Temporal Perspectives

cs.CV · 2026-03-27 · unverdicted · novelty 5.0

Empirical study finds background semantics, random pruning, and recency-based allocation improve token efficiency for GUI visual agents.

citing papers explorer

Showing 3 of 3 citing papers.

Evading Visual Aphasia: Contrastive Adaptive Semantic Token Pruning for Vision-Language Models cs.CV · 2026-05-10 · unverdicted · none · ref 43
COAST prunes 77.8% of visual tokens in LVLMs with a 2.15x speedup while keeping 98.64% of original performance by adaptively routing semantic and spatial context via contrastive scores.
HeadRouter: Dynamic Head-Weight Routing for Task-Adaptive Audio Token Pruning in Large Audio Language Models cs.SD · 2026-04-26 · unverdicted · none · ref 3
HeadRouter prunes audio tokens more effectively by dynamically routing based on per-head importance for semantic versus acoustic tasks, exceeding baseline performance at 70% token retention on Qwen2.5-Omni models.
Rethinking Token Pruning for Historical Screenshots in GUI Visual Agents: Semantic, Spatial, and Temporal Perspectives cs.CV · 2026-03-27 · unverdicted · none · ref 5
Empirical study finds background semantics, random pruning, and recency-based allocation improve token efficiency for GUI visual agents.

Variation-aware vision token dropping for faster large vision-language models.CoRR, abs/2509.01552

fields

years

verdicts

representative citing papers

citing papers explorer