Expert upcycling duplicates experts in an existing MoE checkpoint and continues pre-training to match fixed-size baseline performance with 32% less compute.
Grove MoE: Towards efficient and superior MoE LLMs with adjugate experts
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
SMoES improves MoE-VLM performance and efficiency via soft modality-guided expert routing and inter-bin mutual information regularization, yielding 0.9-4.2% task gains and 56% communication reduction.
citing papers explorer
-
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
Expert upcycling duplicates experts in an existing MoE checkpoint and continues pre-training to match fixed-size baseline performance with 32% less compute.
-
SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs
SMoES improves MoE-VLM performance and efficiency via soft modality-guided expert routing and inter-bin mutual information regularization, yielding 0.9-4.2% task gains and 56% communication reduction.