Reroute turns irreversible visual-token pruning into recoverable routing that reuses existing attention scores, improving grounding performance under aggressive reduction on LLaVA-1.5 and Qwen while preserving TFLOPs and KV-cache budgets.
γ−mod: Exploring mixture-of-depth adaptation for multimodal large language models.arXiv preprint arXiv:2410.13859, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
A literature survey that categorizes how Mixture-of-Experts architectures address multimodal learning challenges and identifies open research gaps.
citing papers explorer
-
Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models
Reroute turns irreversible visual-token pruning into recoverable routing that reuses existing attention scores, improving grounding performance under aggressive reduction on LLaVA-1.5 and Qwen while preserving TFLOPs and KV-cache budgets.
-
Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey
A literature survey that categorizes how Mixture-of-Experts architectures address multimodal learning challenges and identifies open research gaps.