MARINA-P: Superior Performance in Non-smooth Federated Optimization with Adaptive Stepsizes

Igor Sokolov; Peter Richt\'arik

arxiv: 2412.17082 · v1 · pith:WA2D4IKCnew · submitted 2024-12-22 · 💻 cs.LG · math.OC· stat.ML

MARINA-P: Superior Performance in Non-smooth Federated Optimization with Adaptive Stepsizes

Igor Sokolov , Peter Richt\'arik This is my paper

classification 💻 cs.LG math.OCstat.ML

keywords non-smoothconvexcommunicationmarina-psettingoptimizationadaptivearxiv

0 comments

read the original abstract

Non-smooth communication-efficient federated optimization is crucial for many machine learning applications, yet remains largely unexplored theoretically. Recent advancements have primarily focused on smooth convex and non-convex regimes, leaving a significant gap in understanding the non-smooth convex setting. Additionally, existing literature often overlooks efficient server-to-worker communication (downlink), focusing primarily on worker-to-server communication (uplink). We consider a setup where uplink costs are negligible and focus on optimizing downlink communication by improving state-of-the-art schemes like EF21-P (arXiv:2209.15218) and MARINA-P (arXiv:2402.06412) in the non-smooth convex setting. We extend the non-smooth convex theory of EF21-P [Anonymous, 2024], originally developed for single-node scenarios, to the distributed setting, and extend MARINA-P to the non-smooth convex setting. For both algorithms, we prove an optimal $O(1/\sqrt{T})$ convergence rate and establish communication complexity bounds matching classical subgradient methods. We provide theoretical guarantees under constant, decreasing, and adaptive (Polyak-type) stepsizes. Our experiments demonstrate that MARINA-P with correlated compressors outperforms other methods in both smooth non-convex and non-smooth convex settings. This work presents the first theoretical results for distributed non-smooth optimization with server-to-worker compression, along with comprehensive analysis for various stepsize schemes.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SILAGE: Memory-Efficient, Full-Gradient-Free Nonconvex Optimization for Nested Finite Sums
cs.LG 2026-06 unverdicted novelty 7.0

SILAGE is a variance-reduced algorithm for nested finite-sum nonconvex optimization that uses O(n) memory, evaluates at most one local group gradient per iteration, and adapts convergence to data heterogeneity paramet...