An analytical post-training method restructures FFNs into MoE by partitioning neurons based on activation patterns and building a router from statistics, achieving 1.17x speedup with minimal resources.
Xmoe: Sparse models with fine-grained and adaptive expert selection
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
CAEE reduces MoE inference latency 8-18% on 671B DeepSeek-R1 by cost-aware expert pruning and low-overhead compensation while keeping accuracy drop under 1%.
citing papers explorer
-
Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis
An analytical post-training method restructures FFNs into MoE by partitioning neurons based on activation patterns and building a router from statistics, achieving 1.17x speedup with minimal resources.
-
Beyond Uniform Experts: Cost-Aware Expert Execution for Efficient Multi-Device MoE Inference
CAEE reduces MoE inference latency 8-18% on 671B DeepSeek-R1 by cost-aware expert pruning and low-overhead compensation while keeping accuracy drop under 1%.