Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity

Aleksandr Beznosikov; Alexander Gasnikov; Dmitry Kovalev; Ekaterina Borodich; Gesualdo Scutari

arxiv: 2205.15136 · v1 · pith:HUITCSJYnew · submitted 2022-05-30 · 🧮 math.OC · cs.DC· cs.LG

Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity

Dmitry Kovalev , Aleksandr Beznosikov , Ekaterina Borodich , Alexander Gasnikov , Gesualdo Scutari This is my paper

classification 🧮 math.OC cs.DCcs.LG

keywords gradientcomplexitydistributedsimilarityconvexmathcalmethodoptimization

0 comments

read the original abstract

We study structured convex optimization problems, with additive objective $r:=p + q$, where $r$ is ($\mu$-strongly) convex, $q$ is $L_q$-smooth and convex, and $p$ is $L_p$-smooth, possibly nonconvex. For such a class of problems, we proposed an inexact accelerated gradient sliding method that can skip the gradient computation for one of these components while still achieving optimal complexity of gradient calls of $p$ and $q$, that is, $\mathcal{O}(\sqrt{L_p/\mu})$ and $\mathcal{O}(\sqrt{L_q/\mu})$, respectively. This result is much sharper than the classic black-box complexity $\mathcal{O}(\sqrt{(L_p+L_q)/\mu})$, especially when the difference between $L_q$ and $L_q$ is large. We then apply the proposed method to solve distributed optimization problems over master-worker architectures, under agents' function similarity, due to statistical data similarity or otherwise. The distributed algorithm achieves for the first time lower complexity bounds on {\it both} communication and local gradient calls, with the former having being a long-standing open problem. Finally the method is extended to distributed saddle-problems (under function similarity) by means of solving a class of variational inequalities, achieving lower communication and computation complexity bounds.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SILAGE: Memory-Efficient, Full-Gradient-Free Nonconvex Optimization for Nested Finite Sums
cs.LG 2026-06 unverdicted novelty 7.0

SILAGE is a variance-reduced algorithm for nested finite-sum nonconvex optimization that uses O(n) memory, evaluates at most one local group gradient per iteration, and adapts convergence to data heterogeneity paramet...
Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity
cs.LG 2026-05 unverdicted novelty 6.0

Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and ...
Stochastic Optimization and Data Science
math.OC 2026-05 unverdicted novelty 2.0

The paper motivates stochastic optimization problems from statistical perspectives and describes offline and online approaches to solve expectation minimization problems.