Expert upcycling duplicates experts in an existing MoE checkpoint and continues pre-training to match fixed-size baseline performance with 32% less compute.
Towards an empirical understanding of moe design choices
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Routing sensitivity in MoE models is necessary but insufficient for stereotype control because bias and knowledge remain entangled within expert groups and preference shifts do not transfer to generated text.
citing papers explorer
-
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
Expert upcycling duplicates experts in an existing MoE checkpoint and continues pre-training to match fixed-size baseline performance with 32% less compute.
-
Routing Sensitivity Without Controllability: A Diagnostic Study of Fairness in MoE Language Models
Routing sensitivity in MoE models is necessary but insufficient for stereotype control because bias and knowledge remain entangled within expert groups and preference shifts do not transfer to generated text.