A single dominant layer in LLMs, found by activation outliers, accounts for most ZO fine-tuning gains and can replace full-model updates across models and tasks.
Sparse mezo: Less parameters for better performance in zeroth-order llm fine-tuning.arXiv preprint arXiv:2402.15751
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 3years
2026 3roles
background 1polarities
background 1representative citing papers
ZO-MOPI accelerates zeroth-order LLM fine-tuning by applying partial spectral orthogonalization from power iteration inside a momentum-projected subspace to reduce variance and exploit dominant directions.
AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.
citing papers explorer
-
Accelerating Zeroth-Order Spectral Optimization with Partial Orthogonalization from Power Iteration
ZO-MOPI accelerates zeroth-order LLM fine-tuning by applying partial spectral orthogonalization from power iteration inside a momentum-projected subspace to reduce variance and exploit dominant directions.