V-Skip applies block-wise structured sparsity to skip saturated visual self-attention in deeper MLLM layers while retaining FFNs, using few-shot calibration for task-specific paths and achieving 94.16-100.31% performance retention.
InAAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA, pages 22128– 22136
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
HalfV disentangles MLLM visual redundancy into universal IVR and architecture-dependent SSR via a three-stage lifecycle, delivering 4.1x FLOPs speedup with 96.8% performance retention on Qwen25-VL.
citing papers explorer
-
Look Less, Reason More: Block-wise Attention Skipping for Efficient Multimodal LLMs
V-Skip applies block-wise structured sparsity to skip saturated visual self-attention in deeper MLLM layers while retaining FFNs, using few-shot calibration for task-specific paths and achieving 94.16-100.31% performance retention.
-
From Inheritance to Saturation: Disentangling the Evolution of Visual Redundancy for Architecture-Aware MLLM Inference Acceleration
HalfV disentangles MLLM visual redundancy into universal IVR and architecture-dependent SSR via a three-stage lifecycle, delivering 4.1x FLOPs speedup with 96.8% performance retention on Qwen25-VL.