Retraining all 31 subsets of five vision encoders shows Capacity and Necessity are distinct, pre-projector effective rank predicts residual performance at fixed parameter count, and high-Capacity plus adaptive complement pairs match the full five-encoder model.
Investigating redundancy in multimodal large language models with multiple vision encoders
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Beyond Encoder Accumulation: Measuring Encoder Roles in Multi-Encoder VLMs
Retraining all 31 subsets of five vision encoders shows Capacity and Necessity are distinct, pre-projector effective rank predicts residual performance at fixed parameter count, and high-Capacity plus adaptive complement pairs match the full five-encoder model.