Variation-aware vision token dropping for faster large vision-language models

Junjie Chen, Xuyang Liu, Zichen Wen, Yiyu Wang, Siteng Huang, Honggang Chen · 2025 · arXiv 2509.01552

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

EarlyTom: Early Token Compression Completes Fast Video Understanding

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

EarlyTom is a training-free early token compression method inside the vision encoder with decoupled spatial selection that reduces TTFT up to 2.65x and FLOPs 61% on LLaVA-OneVision-7B while keeping accuracy comparable to full tokens.

A More Word-like Image Tokenization for MLLMs

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

DiVT clusters patch embeddings into coherent semantic units and adapts token count to image complexity, matching or exceeding baselines with fewer visual tokens on multimodal benchmarks.

LRCP: Low-Rank Compressibility Guided Visual Token Pruning for Efficient LVLMs

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

LRCP prunes visual tokens in LVLMs by scoring projection residuals onto a PCA-estimated low-rank subspace, achieving 88.9% image token reduction with 94.7% performance retention and 87.5% video reduction with 97.8% accuracy retention.

Evading Visual Aphasia: Contrastive Adaptive Semantic Token Pruning for Vision-Language Models

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

COAST prunes 77.8% of visual tokens in LVLMs with a 2.15x speedup while keeping 98.64% of original performance by adaptively routing semantic and spatial context via contrastive scores.

HeadRouter: Dynamic Head-Weight Routing for Task-Adaptive Audio Token Pruning in Large Audio Language Models

cs.SD · 2026-04-26 · unverdicted · novelty 6.0

HeadRouter prunes audio tokens more effectively by dynamically routing based on per-head importance for semantic versus acoustic tasks, exceeding baseline performance at 70% token retention on Qwen2.5-Omni models.

Rethinking Token Pruning for Historical Screenshots in GUI Visual Agents: Semantic, Spatial, and Temporal Perspectives

cs.CV · 2026-03-27 · unverdicted · novelty 5.0

Empirical study finds background semantics, random pruning, and recency-based allocation improve token efficiency for GUI visual agents.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Evading Visual Aphasia: Contrastive Adaptive Semantic Token Pruning for Vision-Language Models cs.CV · 2026-05-10 · unverdicted · none · ref 43
COAST prunes 77.8% of visual tokens in LVLMs with a 2.15x speedup while keeping 98.64% of original performance by adaptively routing semantic and spatial context via contrastive scores.

Variation-aware vision token dropping for faster large vision-language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer