Beyond Attention Scores: SVD-Based Vision Token Pruning for Efficient Vision-Language Models

· 2026 · cs.CV · arXiv 2604.11530

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Vision-Language Models (VLMs) have revolutionized multi-modal learning by jointly processing visual and textual information. Yet, they face significant challenges due to the high computational and memory demands of processing long sequences of vision tokens. Many existing methods rely on local heuristics, such as attention scores or token norms. However, these criteria suffer from positional bias and information dispersion, limiting their ability to preserve essential content at high pruning ratios and leading to performance degradation on visually detailed images. To address these issues, we propose SVD-Prune, a training-free, plug-and-play token pruning method based on Singular Value Decomposition. It decomposes the vision token feature matrix and selects the top-k tokens using statistical leverage scores, ensuring only tokens contributing most to the dominant global variance are preserved. Experiments show that SVD-Prune consistently outperforms prior pruning methods under extreme vision token budgets, maintaining strong performance even with 32 and 16 vision tokens.

representative citing papers

Beyond Attention Scores: SVD-Based Vision Token Pruning for Efficient Vision-Language Models

cs.CV · 2026-04-13 · unverdicted · novelty 6.0 · 2 refs

SVD-Prune selects vision tokens via SVD leverage scores to outperform attention-based pruning at extreme budgets of 32 or 16 tokens.

citing papers explorer

Showing 1 of 1 citing paper.

Beyond Attention Scores: SVD-Based Vision Token Pruning for Efficient Vision-Language Models cs.CV · 2026-04-13 · unverdicted · none · ref 2 · 2 links · internal anchor
SVD-Prune selects vision tokens via SVD leverage scores to outperform attention-based pruning at extreme budgets of 32 or 16 tokens.

Beyond Attention Scores: SVD-Based Vision Token Pruning for Efficient Vision-Language Models

fields

years

verdicts

representative citing papers

citing papers explorer