A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

Anastasia Koloskova; Martin Jaggi; Nicolas Loizou; Sadra Boreiri; Sebastian U. Stich

arxiv: 2003.10422 · v3 · pith:LMZCB3FInew · submitted 2020-03-23 · 💻 cs.LG · cs.DC· math.OC· stat.ML

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

Anastasia Koloskova , Nicolas Loizou , Sadra Boreiri , Martin Jaggi , Sebastian U. Stich This is my paper

classification 💻 cs.LG cs.DCmath.OCstat.ML

keywords convergencedecentralizedlocalratesupdatescoversdatadifferent

0 comments

read the original abstract

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak assumptions (typically improving over prior work in several aspects) and recover (and improve) the best known complexity results for a host of important scenarios, such as for instance coorperative SGD and federated averaging (local SGD).

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Decentralized Inexact Cubic Newton Method with Consensus Procedure
math.OC 2026-05 unverdicted novelty 7.0

The paper develops decentralized inexact Cubic Newton methods for convex and strongly convex optimization that match centralized iteration complexities with only polylogarithmic extra communication rounds via consensu...
Decentralized Inexact Cubic Newton Method with Consensus Procedure
math.OC 2026-05 unverdicted novelty 6.0

Decentralized Cubic Newton method for convex optimization that matches exact centralized iteration complexity with polylogarithmic extra communication rounds under gradient L1-smoothness and Hessian L2-Lipschitz continuity.
Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity
cs.LG 2026-05 unverdicted novelty 6.0

Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and ...