Visual token pruning in MLLMs fails on complex reasoning due to Relevant Visual Information Shift during decoding, but the DSTP framework fixes it training-free across models.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
VisMMoE exploits visual-expert affinity via token pruning to achieve up to 2.68x faster VL-MoE inference on memory-constrained hardware while keeping accuracy competitive.
LightKV compresses vision-token KV cache in LVLMs to 55% size via prompt-guided cross-modality aggregation, halving cache memory, cutting compute 40%, and maintaining performance on benchmarks.
citing papers explorer
-
Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding
Visual token pruning in MLLMs fails on complex reasoning due to Relevant Visual Information Shift during decoding, but the DSTP framework fixes it training-free across models.
-
VisMMOE: Exploiting Visual-Expert Affinity for Efficient Visual-Language MoE Offloading
VisMMoE exploits visual-expert affinity via token pruning to achieve up to 2.68x faster VL-MoE inference on memory-constrained hardware while keeping accuracy competitive.
-
Make Your LVLM KV Cache More Lightweight
LightKV compresses vision-token KV cache in LVLMs to 55% size via prompt-guided cross-modality aggregation, halving cache memory, cutting compute 40%, and maintaining performance on benchmarks.