EvoESAP uses evolutionary search guided by a speculative-decoding-inspired ESAP metric to discover non-uniform layer-wise sparsity allocations for MoE expert pruning, improving generation accuracy up to 19.6% at 50% sparsity.
arXiv preprint arXiv:2502.07780 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 4years
2026 4roles
background 1polarities
background 1representative citing papers
Different calibration objectives produce distinct layer pruning patterns in LLMs, while search algorithms converge to similar solutions under a fixed objective.
Pruning pretrained MoE models outperforms training from scratch under fixed budget, different expert compression methods converge after continued training, and progressive pruning plus multi-token KD improves the final 23A2B model.
citing papers explorer
-
EvoESAP: Non-Uniform Expert Pruning for Sparse MoE
EvoESAP uses evolutionary search guided by a speculative-decoding-inspired ESAP metric to discover non-uniform layer-wise sparsity allocations for MoE expert pruning, improving generation accuracy up to 19.6% at 50% sparsity.
-
Rethinking Layer Redundancy in Large Language Models: Calibration Objectives and Search for Depth Pruning
Different calibration objectives produce distinct layer pruning patterns in LLMs, while search algorithms converge to similar solutions under a fixed objective.
-
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training
Pruning pretrained MoE models outperforms training from scratch under fixed budget, different expert compression methods converge after continued training, and progressive pruning plus multi-token KD improves the final 23A2B model.
- TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability