PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model

Bailan He; Shuo Chen; Volker Tresp; Yilun Liu; Yunpu Ma; Zhen Han; Zifeng Ding

arxiv: 2411.08212 · v1 · pith:Z2VAQMGEnew · submitted 2024-11-12 · 💻 cs.LG · cs.AI

PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model

Yilun Liu , Yunpu Ma , Shuo Chen , Zifeng Ding , Bailan He , Zhen Han , Volker Tresp This is my paper

classification 💻 cs.LG cs.AI

keywords fine-tuningpeftdesignframeworkparameter-efficientperftmodelsrouted

0 comments

read the original abstract

The Mixture-of-Experts (MoE) paradigm has emerged as a powerful approach for scaling transformers with improved resource utilization. However, efficiently fine-tuning MoE models remains largely underexplored. Inspired by recent works on Parameter-Efficient Fine-Tuning (PEFT), we present a unified framework for integrating PEFT modules directly into the MoE mechanism. Aligning with the core principles and architecture of MoE, our framework encompasses a set of design dimensions including various functional and composition strategies. By combining design choices within our framework, we introduce Parameter-Efficient Routed Fine-Tuning (PERFT) as a flexible and scalable family of PEFT strategies tailored for MoE models. Extensive experiments on adapting OLMoE-1B-7B and Mixtral-8$\times$7B for commonsense and arithmetic reasoning tasks demonstrate the effectiveness, scalability, and intriguing dynamics of PERFT. Additionally, we provide empirical findings for each specific design choice to facilitate better application of MoE and PEFT.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

EPnG: Adaptive Expert Prune-and-Grow for Parameter-Efficient MoE Fine-tuning
cs.LG 2026-07 unverdicted novelty 6.0

EPnG reallocates LoRA capacity in MoE models by pruning experts with low router gate probabilities and expanding high-importance ones via rank growth, outperforming standard LoRA and nearing full fine-tuning performan...