Subtrack++: Gradient subspace tracking for scalable llm training

Sahar Rajabi, Nayeema Nonta, Sirisha Rambhatla · arXiv 2502.01586

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Pro-KLShampoo projects KL-Shampoo preconditioners to a spike-and-flat parametric form on an r-dimensional subspace and recovers the full algebraic preconditioner via orthogonalization, outperforming KL-Shampoo on GPT-2 and LLaMA pre-training scales.

MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation

cs.LG · 2025-06-02 · conditional · novelty 6.0

MLorc compresses optimizer momentum with low-rank methods to enable memory-efficient full fine-tuning of LLMs, outperforming LoRA and GaLore while matching full-parameter performance at small ranks.

citing papers explorer

Showing 2 of 2 citing papers.

Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization cs.LG · 2026-05-07 · unverdicted · none · ref 14
Pro-KLShampoo projects KL-Shampoo preconditioners to a spike-and-flat parametric form on an r-dimensional subspace and recovers the full algebraic preconditioner via orthogonalization, outperforming KL-Shampoo on GPT-2 and LLaMA pre-training scales.
MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation cs.LG · 2025-06-02 · conditional · none · ref 9
MLorc compresses optimizer momentum with low-rank methods to enable memory-efficient full fine-tuning of LLMs, outperforming LoRA and GaLore while matching full-parameter performance at small ranks.

Subtrack++: Gradient subspace tracking for scalable llm training

fields

years

verdicts

representative citing papers

citing papers explorer