pith. machine review for the scientific record. sign in

arxiv: 1608.06879 · v1 · submitted 2016-08-24 · 🧮 math.OC · cs.LG· stat.ML

Recognition: unknown

AIDE: Fast and Communication Efficient Distributed Optimization

Alex Smola, Barnab\'as P\'ocz\'os, Jakub Kone\v{c}n\'y, Peter Richt\'arik, Sashank J. Reddi

Authors on Pith no claims yet
classification 🧮 math.OC cs.LGstat.ML
keywords algorithmcommunicationdaneaideboundsdistributedefficientfirst
0
0 comments X
read the original abstract

In this paper, we present two new communication-efficient methods for distributed minimization of an average of functions. The first algorithm is an inexact variant of the DANE algorithm that allows any local algorithm to return an approximate solution to a local subproblem. We show that such a strategy does not affect the theoretical guarantees of DANE significantly. In fact, our approach can be viewed as a robustification strategy since the method is substantially better behaved than DANE on data partition arising in practice. It is well known that DANE algorithm does not match the communication complexity lower bounds. To bridge this gap, we propose an accelerated variant of the first method, called AIDE, that not only matches the communication lower bounds but can also be implemented using a purely first-order oracle. Our empirical results show that AIDE is superior to other communication efficient algorithms in settings that naturally arise in machine learning applications.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Federated Learning: Strategies for Improving Communication Efficiency

    cs.LG 2016-10 conditional novelty 8.0

    Structured updates (low-rank or masked) and sketched updates (quantized, rotated, subsampled) reduce uplink communication in federated learning by up to two orders of magnitude on convolutional and recurrent networks.