Llava-prumerge: Adaptive token reduction for efficient large multimodal models

Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan · 2025

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

RotateK uses online PCA-based rotation to align token-dependent key channel importance into a shared subspace, enabling accurate head-wise structured pruning and faster decoding in VLMs compared to prior token or channel methods.

CoRDS: Coreset-based Representative and Diverse Selection for Streaming Video Understanding

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

CoRDS selects a compact KV-cache subset via joint-space coreset coverage and log-det diversity to outperform token-wise heuristics on long-video VLM benchmarks.

VideoRouter: Query-Adaptive Dual Routing for Efficient Long-Video Understanding

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

VideoRouter uses dual semantic and image routers for query-adaptive token compression in long-video models, delivering up to 67.9% reduction while outperforming the InternVL baseline on VideoMME, MLVU, and LongVideoBench.

OmniRefine: Alignment-Aware Cooperative Compression for Efficient Omnimodal Large Language Models

cs.AI · 2026-05-12 · unverdicted · novelty 5.0

OmniRefine introduces alignment-aware chunk refinement via similarity and dynamic programming followed by modality-cooperative token compression, achieving near-baseline accuracy at 44% token retention on WorldSense.

citing papers explorer

Showing 4 of 4 citing papers.

Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference cs.CV · 2026-05-19 · unverdicted · none · ref 32
RotateK uses online PCA-based rotation to align token-dependent key channel importance into a shared subspace, enabling accurate head-wise structured pruning and faster decoding in VLMs compared to prior token or channel methods.
CoRDS: Coreset-based Representative and Diverse Selection for Streaming Video Understanding cs.CV · 2026-05-14 · unverdicted · none · ref 27
CoRDS selects a compact KV-cache subset via joint-space coreset coverage and log-det diversity to outperform token-wise heuristics on long-video VLM benchmarks.
VideoRouter: Query-Adaptive Dual Routing for Efficient Long-Video Understanding cs.CV · 2026-05-07 · unverdicted · none · ref 7
VideoRouter uses dual semantic and image routers for query-adaptive token compression in long-video models, delivering up to 67.9% reduction while outperforming the InternVL baseline on VideoMME, MLVU, and LongVideoBench.
OmniRefine: Alignment-Aware Cooperative Compression for Efficient Omnimodal Large Language Models cs.AI · 2026-05-12 · unverdicted · none · ref 36
OmniRefine introduces alignment-aware chunk refinement via similarity and dynamic programming followed by modality-cooperative token compression, achieving near-baseline accuracy at 44% token retention on WorldSense.

Llava-prumerge: Adaptive token reduction for efficient large multimodal models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer