Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

· 2025 · cs.AI · arXiv 2505.24037

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Sparse large language models (LLMs) offer an attractive direction toward efficient deployment, but adapting them to downstream tasks remains challenging. The central difficulty is to enable effective task adaptation without sacrificing the efficiency advantages of sparsity. Existing fine-tuning methods are not well-suited to this setting, as they either introduce additional dense parameters or assume a fixed sparse topology, limiting their compatibility with sparse LLMs. In this paper, we propose Sparsity Evolution Fine-Tuning (SEFT), a fine-tuning framework designed specifically for sparse LLMs. SEFT allows sparse structure to evolve during fine-tuning by periodically reallocating sparse task-specific updates and reactivating previously pruned weights when beneficial. At the same time, SEFT preserves the efficiency advantages of sparsity through topology adaptation based on parameter importance. Experiments on LLaMA, DeepSeek, and Mistral models across multiple benchmarks show that SEFT delivers stronger performance while offering superior memory and time efficiency compared to existing baselines. Our code is publicly available at: https://github.com/QiaoXiao7282/SEFT.

representative citing papers

When Data Is Scarce: Scaling Sparse Language Models with Repeated Training

cs.LG · 2026-05-31 · unverdicted · novelty 6.0

Sparse LLMs in data-scarce multi-epoch regimes follow a scaling law based on active parameters, unique tokens, repetition count, and sparsity level that predicts performance and delays data saturation.

citing papers explorer

Showing 1 of 1 citing paper.

When Data Is Scarce: Scaling Sparse Language Models with Repeated Training cs.LG · 2026-05-31 · unverdicted · none · ref 19 · internal anchor
Sparse LLMs in data-scarce multi-epoch regimes follow a scaling law based on active parameters, unique tokens, repetition count, and sparsity level that predicts performance and delays data saturation.

Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

fields

years

verdicts

representative citing papers

citing papers explorer