Investigating redundancy in multimodal large language models with multiple vision encoders

Yizhou Wang, Song Mao, Yang Chen, Yufan Shen, Pinlong Cai, Ding Wang, Guohang Yan, Zhi Yu, Yinqiao Yan, Xuming Hu, Botian Shi · 2026 · arXiv 2507.03262

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

Beyond Encoder Accumulation: Measuring Encoder Roles in Multi-Encoder VLMs

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

Retraining all 31 subsets of five vision encoders shows Capacity and Necessity are distinct, pre-projector effective rank predicts residual performance at fixed parameter count, and high-Capacity plus adaptive complement pairs match the full five-encoder model.

citing papers explorer

Showing 1 of 1 citing paper.

Beyond Encoder Accumulation: Measuring Encoder Roles in Multi-Encoder VLMs cs.CV · 2026-06-02 · unverdicted · none · ref 36
Retraining all 31 subsets of five vision encoders shows Capacity and Necessity are distinct, pre-projector effective rank predicts residual performance at fixed parameter count, and high-Capacity plus adaptive complement pairs match the full five-encoder model.

Investigating redundancy in multimodal large language models with multiple vision encoders

fields

years

verdicts

representative citing papers

citing papers explorer