Bernoulli-LoRA: A Theoretical Framework for Randomized Low-Rank Adaptation

Abdurakhmon Sadiev; Fawaz S Al-Qahtani; Igor Sokolov; Peter Richt\'arik; Yury Demidovich

arxiv: 2508.03820 · v1 · pith:LUZMJ7CDnew · submitted 2025-08-05 · 💻 cs.LG · math.OC

Bernoulli-LoRA: A Theoretical Framework for Randomized Low-Rank Adaptation

Igor Sokolov , Abdurakhmon Sadiev , Yury Demidovich , Fawaz S Al-Qahtani , Peter Richt\'arik This is my paper

classification 💻 cs.LG math.OC

keywords theoreticalapproachframeworkloralow-rankmethodspeftwork

0 comments

read the original abstract

Parameter-efficient fine-tuning (PEFT) has emerged as a crucial approach for adapting large foundational models to specific tasks, particularly as model sizes continue to grow exponentially. Among PEFT methods, Low-Rank Adaptation (LoRA) (arXiv:2106.09685) stands out for its effectiveness and simplicity, expressing adaptations as a product of two low-rank matrices. While extensive empirical studies demonstrate LoRA's practical utility, theoretical understanding of such methods remains limited. Recent work on RAC-LoRA (arXiv:2410.08305) took initial steps toward rigorous analysis. In this work, we introduce Bernoulli-LoRA, a novel theoretical framework that unifies and extends existing LoRA approaches. Our method introduces a probabilistic Bernoulli mechanism for selecting which matrix to update. This approach encompasses and generalizes various existing update strategies while maintaining theoretical tractability. Under standard assumptions from non-convex optimization literature, we analyze several variants of our framework: Bernoulli-LoRA-GD, Bernoulli-LoRA-SGD, Bernoulli-LoRA-PAGE, Bernoulli-LoRA-MVR, Bernoulli-LoRA-QGD, Bernoulli-LoRA-MARINA, and Bernoulli-LoRA-EF21, establishing convergence guarantees for each variant. Additionally, we extend our analysis to convex non-smooth functions, providing convergence rates for both constant and adaptive (Polyak-type) stepsizes. Through extensive experiments on various tasks, we validate our theoretical findings and demonstrate the practical efficacy of our approach. This work is a step toward developing theoretically grounded yet practically effective PEFT methods.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SILAGE: Memory-Efficient, Full-Gradient-Free Nonconvex Optimization for Nested Finite Sums
cs.LG 2026-06 unverdicted novelty 7.0

SILAGE is a variance-reduced algorithm for nested finite-sum nonconvex optimization that uses O(n) memory, evaluates at most one local group gradient per iteration, and adapts convergence to data heterogeneity paramet...