Geometric Convergence for Distributed Optimization with Barzilai-Borwein Step Sizes

Juan Gao; Peng Yang; Xinwei Liu; Yakui Huang; Yu-Hong Dai

arxiv: 1907.07852 · v1 · pith:EPM7FH4Anew · submitted 2019-07-18 · 🧮 math.OC

Geometric Convergence for Distributed Optimization with Barzilai-Borwein Step Sizes

Juan Gao , Xinwei Liu , Yu-Hong Dai , Yakui Huang , Peng Yang This is my paper

Pith reviewed 2026-05-24 20:04 UTC · model grok-4.3

classification 🧮 math.OC

keywords distributed optimizationgeometric convergenceBarzilai-Borwein step sizedynamic average consensusmulti-consensusundirected graphexact convergencemulti-agent optimization

0 comments

The pith

A distributed gradient method with local Barzilai-Borwein step sizes reaches the exact optimum at a geometric rate using only a fixed number of consensus steps per iteration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a distributed gradient method, called DGM-BB-C, that lets each agent pick its own step size from local information via the Barzilai-Borwein rule while still guaranteeing both exact convergence to the global optimum and a geometric rate. The construction combines an adapt-then-combine dynamic-average-consensus step with a constant number of inner consensus loops inside each outer iteration. Because the step-size rule needs no global tuning and permits larger values than conservative fixed-step analyses, the method reduces the usual trade-off between communication cost and convergence speed. Simulations on a distributed sensing task show fewer iterations, gradient evaluations, and communications than several existing distributed schemes.

Core claim

Over a time-invariant undirected graph, the DGM-BB-C algorithm, which updates each agent's local estimate by an adapt-then-combine dynamic-average-consensus step followed by a Barzilai-Borwein gradient step and a fixed number of multi-consensus inner iterations, converges geometrically to the exact minimizer of the average objective; the step size at each agent depends only on its own past iterates and gradients.

What carries the argument

DGM-BB-C, the combination of adapt-then-combine dynamic average consensus, Barzilai-Borwein step-size selection from local history, and a fixed number of multi-consensus inner loops per outer iteration.

If this is right

Each agent selects its step size independently from its own two most recent iterates and gradients, eliminating the need for a shared step-size parameter.
The exact global optimum is recovered even though the number of consensus communications per outer iteration remains constant.
Larger step sizes than those permitted by uniform conservative bounds become admissible without losing the geometric guarantee.
The method requires only neighbor-to-neighbor exchanges and local storage of two vectors per agent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same local BB rule might be combined with other consensus protocols that achieve average tracking with fewer messages when the graph diameter is known.
If the inner-loop count is allowed to grow slowly with the outer iteration index, the same proof structure could cover time-varying graphs whose connectivity satisfies a joint connectivity condition.
Because the step-size choice is memory-two and gradient-based, the scheme may be directly applicable to distributed stochastic-gradient settings once a suitable variance-reduction step replaces the exact gradient.

Load-bearing premise

The communication graph is fixed and undirected, and the chosen fixed number of inner consensus steps keeps the dynamic average sufficiently accurate for the local Barzilai-Borwein updates to retain their geometric-rate property.

What would settle it

Run DGM-BB-C on a cycle graph of four nodes with strongly convex quadratics whose minimizers differ; if the outer-iteration error sequence fails to decrease geometrically once the number of inner consensus steps is held constant, the claimed rate does not hold.

Figures

Figures reproduced from arXiv: 1907.07852 by Juan Gao, Peng Yang, Xinwei Liu, Yakui Huang, Yu-Hong Dai.

**Figure 2.** Figure 2: The performance of DGM-BB-C with different [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗

**Figure 3.** Figure 3: Convergence rates comparison of different distribu [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗

**Figure 4.** Figure 4: Convergence rates comparison of different distribu [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

read the original abstract

We consider a distributed multi-agent optimization problem over a time-invariant undirected graph, where each agent possesses a local objective function and all agents collaboratively minimize the average of all objective functions through local computations and communications among neighbors. Recently, a class of distributed gradient methods has been proposed that achieves both exact and geometric convergence when a constant step size is used. The geometric convergence of these methods is ensured for conservatively selected step sizes, but how to choose an appropriate step size while running the algorithms has not been fully addressed. The Barzilai-Borwein (BB) method is a simple and effective technique for step sizes and requires few storage and inexpensive computations. It has been widely applied in various areas. In this paper, we introduce the BB method to distributed optimization. Based on an adapt-then-combine variation of the dynamic average consensus approach and using multi-consensus inner loops, we propose a distributed gradient method with BB step sizes (DGM-BB-C). Our method computes the step size for each agent automatically which only depends on its local information and is independent of that for other agents, and the larger step sizes are always permissible. Our method can seek the exact optimum when the number of consensus steps stays constant. We prove that DGM-BB-C has geometric convergence to the optimal solution. Simulation results on a distributed sensing problem show that our method is superior to some advanced methods in terms of iterations, gradient evaluations, communications and the related cost framework. These results validate our theoretical discoveries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DGM-BB-C puts local Barzilai-Borwein steps inside an adapt-then-combine dynamic-average consensus scheme with fixed inner loops and claims geometric convergence to the exact optimum on time-invariant undirected graphs.

read the letter

The new piece here is the concrete DGM-BB-C algorithm: each agent runs a fixed number of inner consensus steps, then takes a locally computed BB step size based on its own gradient and iterate differences. The abstract states that this keeps both exactness and a geometric rate even though the inner consensus count stays constant. That combination looks like the main technical move relative to the constant-step distributed gradient papers they cite. Simulations on a distributed sensing problem report fewer iterations, gradient calls, and communications than some existing methods, which is the practical payoff they emphasize. The work is aimed at people who need distributed first-order methods on networks where a single global step size is inconvenient or unavailable. The central claim rests on the inner consensus error being absorbed into a uniform contraction factor. The paper restricts to time-invariant undirected graphs so the consensus operator has a fixed spectral gap; that is necessary for the argument to close, but it also limits how far the result travels. Without seeing the actual derivation it is hard to tell whether the BB curvature estimate stays reliable once the tracking error is present, or whether the descent inequality holds uniformly across iterations. The abstract gives no convexity or strong-convexity assumptions and no explicit rate expression, so the support for the geometric claim is thin on first reading. Still, the construction is specific enough and the simulations are concrete, so the paper is worth sending out for refereeing. A serious reviewer can check whether the error analysis actually closes and whether the constant-inner-loop assumption is stated cleanly.

Referee Report

3 major / 2 minor

Summary. The paper proposes DGM-BB-C, a distributed gradient method for multi-agent optimization over time-invariant undirected graphs. It combines locally computed Barzilai-Borwein step sizes (from local gradient and iterate differences) with an adapt-then-combine dynamic-average consensus scheme that uses a fixed number of inner multi-consensus loops per outer iteration. The central claims are that the method achieves exact convergence to the global optimum (despite constant consensus steps) and that the iterates converge geometrically to the optimum, with the BB steps chosen automatically from local information only.

Significance. If the geometric-rate proof closes, the result would be significant: it supplies an adaptive, parameter-light step-size rule for distributed first-order methods that still guarantees exact linear convergence on fixed undirected graphs, avoiding the conservative fixed-step tuning required by prior exact distributed gradient methods. The local-only BB computation and fixed inner-loop count are practical strengths.

major comments (3)

[convergence analysis (likely §4 or §5)] The convergence analysis must absorb the fixed-inner-loop consensus error into a uniform contraction factor that is independent of the outer iteration index. Because the BB curvature estimate is formed from local gradient/iterate differences that are themselves corrupted by the dynamic-average tracking error, it is not immediate that the descent inequality remains valid with a rate strictly less than one; the proof must exhibit an explicit bound on this perturbation that contracts uniformly.
[Theorem on geometric convergence] The geometric-rate claim is stated for any fixed number of inner consensus steps (provided the graph is time-invariant and undirected). The analysis therefore requires that the inner consensus operator possesses a spectral gap bounded away from zero independently of the outer iteration; if the number of inner steps is chosen too small relative to the graph diameter, the tracking error may prevent the overall map from being contractive. The paper should state the minimal inner-loop count (or a sufficient condition on it) that guarantees the rate.
[exactness argument] The exactness claim (convergence to the true optimum rather than a neighborhood) with constant inner loops relies on the dynamic-average consensus exactly tracking the average gradient in the limit. The proof must show that the residual tracking error vanishes asymptotically even though the number of inner loops is fixed; otherwise the BB step sizes may converge to an incorrect value and the fixed point may shift.

minor comments (2)

[§2 or §3] Notation for the multi-consensus operator and the distinction between outer and inner iterations should be introduced earlier and used consistently.
[numerical experiments] The simulation section compares iteration counts, gradient evaluations and communications, but does not report the total number of inner consensus steps used; this makes it hard to assess the communication cost relative to methods that use a single consensus step per iteration.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and insightful comments on our manuscript. We will revise the paper to address the concerns regarding the convergence analysis, making the bounds explicit and adding necessary conditions.

read point-by-point responses

Referee: The convergence analysis must absorb the fixed-inner-loop consensus error into a uniform contraction factor that is independent of the outer iteration index. Because the BB curvature estimate is formed from local gradient/iterate differences that are themselves corrupted by the dynamic-average tracking error, it is not immediate that the descent inequality remains valid with a rate strictly less than one; the proof must exhibit an explicit bound on this perturbation that contracts uniformly.

Authors: We agree with this observation. In the revised manuscript, we will explicitly derive a uniform bound on the consensus tracking error that is independent of the outer iteration. This bound will be incorporated into the descent inequality for the BB step sizes, ensuring that the overall contraction factor is strictly less than one and uniform across iterations. We will add the necessary lemmas to bound the perturbation due to the dynamic-average consensus error. revision: yes
Referee: The geometric-rate claim is stated for any fixed number of inner consensus steps (provided the graph is time-invariant and undirected). The analysis therefore requires that the inner consensus operator possesses a spectral gap bounded away from zero independently of the outer iteration; if the number of inner steps is chosen too small relative to the graph diameter, the tracking error may prevent the overall map from being contractive. The paper should state the minimal inner-loop count (or a sufficient condition on it) that guarantees the rate.

Authors: We will revise the theorem statement and the analysis to include a sufficient condition on the number of inner consensus steps. Specifically, we will require that the number of inner loops is large enough so that the spectral radius of the consensus error operator is sufficiently small to ensure the overall map is contractive, based on the second largest eigenvalue of the graph's mixing matrix. This condition will be stated explicitly in the revised version. revision: yes
Referee: The exactness claim (convergence to the true optimum rather than a neighborhood) with constant inner loops relies on the dynamic-average consensus exactly tracking the average gradient in the limit. The proof must show that the residual tracking error vanishes asymptotically even though the number of inner loops is fixed; otherwise the BB step sizes may converge to an incorrect value and the fixed point may shift.

Authors: We will strengthen the exactness proof by showing that as the outer iterates converge, the local gradients converge to the same value, which implies that the dynamic average consensus error goes to zero even with a fixed number of inner loops. This is because the input to the consensus (the gradient differences) becomes constant in the limit, allowing exact tracking asymptotically. We will add this argument to ensure the BB steps converge to the correct values and the fixed point is the true optimum. revision: yes

Circularity Check

0 steps flagged

No circularity: geometric rate derived from local BB updates plus fixed multi-consensus contraction on time-invariant graphs

full rationale

The derivation proceeds from the standard BB curvature estimate computed on local gradient/iterate differences, combined with an adapt-then-combine dynamic-average consensus operator whose contraction is controlled by a fixed number of inner loops and the spectral gap of a time-invariant undirected graph. Neither the step-size formula nor the contraction factor is obtained by fitting to the target convergence rate or by renaming a prior result of the same authors; the proof therefore closes from first-principles assumptions that remain external to the claimed rate. No self-definitional, fitted-input, or self-citation-load-bearing reductions appear in the central argument.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The abstract invokes standard domain assumptions of distributed optimization (undirected time-invariant graph, local objective functions whose average is minimized) but supplies no explicit list of mathematical axioms or free parameters; the central claim rests on the unstated details of the convergence proof.

axioms (2)

domain assumption Communication graph is time-invariant and undirected
Stated in the opening problem formulation.
domain assumption A fixed number of inner consensus steps suffices to maintain the dynamic average consensus property
Required for the exactness claim when the outer iteration count grows.

pith-pipeline@v0.9.0 · 5808 in / 1417 out tokens · 22456 ms · 2026-05-24T20:04:06.814910+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

[1]

Convex optimizati on for big data: Scalable, randomized, and parallel algorit hms for big data analytics,

V . Cevher, S. Becker, and M. Schmidt, “Convex optimizati on for big data: Scalable, randomized, and parallel algorit hms for big data analytics,” IEEE Signal Process. Mag. , vol. 31, no. 5, pp. 32–43, Sep. 2014

work page 2014
[2]

D istributed optimization and statistical learning via the a lternating direction method of multipliers,

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “D istributed optimization and statistical learning via the a lternating direction method of multipliers,” F ound. Trends Mach. Learn., vol. 3, no. 1, pp. 11–22, Jan. 2011

work page 2011
[3]

Application of a smoothi ng technique to decomposition in convex optimization,

I. Necoara and J. A. K. Suykens, “Application of a smoothi ng technique to decomposition in convex optimization,” IEEE Trans. Autom. Control , vol. 53, no. 11, pp. 2674–2679, Dec. 2008. July 19, 2019 DRAFT 23

work page 2008
[4]

Distributed spectru m sensing for cognitive radio networks by exploiting sparsi ty,

J. A. Bazerque and G. B. Giannakis, “Distributed spectru m sensing for cognitive radio networks by exploiting sparsi ty,” IEEE Trans. Signal Process. , vol. 58, no. 3, pp. 1847–1862, Mar. 2010

work page 2010
[5]

Efﬁcient information aggregation strat egies for distributed control and signal processing,

A. Olshevsky, “Efﬁcient information aggregation strat egies for distributed control and signal processing,” PhD t hesis, Massachusetts Inst. Tech., 2010

work page 2010
[6]

Consensus based formation control strategies f or multi-vehicle systems,

W. Ren, “Consensus based formation control strategies f or multi-vehicle systems,” in 2006 Amer . Control Conf., Jun. 2006, pp. 4237–4242

work page 2006
[7]

Noise reduction by swarming in social foraging,

S. Pu, A. Garcia, and Z. Lin, “Noise reduction by swarming in social foraging,” IEEE Trans. Autom. Control , vol. 61, no. 12, pp. 4007–4013, Dec. 2016

work page 2016
[8]

Distributed learni ng algorithms for spectrum sharing in spatial random access wireless networks,

K. Cohen, A. Nedi´ c, and R. Srikant, “Distributed learni ng algorithms for spectrum sharing in spatial random access wireless networks,” IEEE Trans. Autom. Control , vol. 62, no. 6, pp. 2854–2869, Jun. 2017

work page 2017
[9]

Decentralized sparse signal recove ry for compressive sleeping wireless sensor networks,

Q. Ling and Z. Tian, “Decentralized sparse signal recove ry for compressive sleeping wireless sensor networks,” IEEE Trans. Signal Process. , vol. 58, no. 7, pp. 3816–3827, Jul. 2010

work page 2010
[10]

Optimal decentralized p rotocol for electric vehicle charging,

L. Gan, U. Topcu, and S. H. Low, “Optimal decentralized p rotocol for electric vehicle charging,” IEEE Trans. Power Syst. , vol. 28, no. 2, pp. 940–951, May 2013

work page 2013
[11]

Distributed optimization in se nsor networks,

M. Rabbat and R. Nowak, “Distributed optimization in se nsor networks,” in Proc. 3rd int. symp. on Inform. process. sensor netw., Apr. 2004, pp. 20–27

work page 2004
[12]

Distributed asynchronous computation of ﬁxed points,

D. Bertsekas, “Distributed asynchronous computation of ﬁxed points,” Math. Program., vol. 27, no. 1, pp. 107–120, 1983

work page 1983
[13]

Distribut ed asynchronous deterministic and stochastic gradient opt imization algorithms,

J. Tsitsiklis, D. Bertsekas, and M. Athans, “Distribut ed asynchronous deterministic and stochastic gradient opt imization algorithms,” IEEE Trans. Autom. Control , vol. 31, no. 9, pp. 803–812, Sep. 1986

work page 1986
[14]

Distributed subgradient me thods for multi-agent optimization,

A. Nedi´ c and A. Ozdaglar, “Distributed subgradient me thods for multi-agent optimization,” IEEE Trans. Autom. Control , vol. 54, no. 1, pp. 48–61, Jan. 2009

work page 2009
[15]

Fast distr ibuted gradient methods,

D. Jakoveti´ c, J. Xavier, and J. M. F. Moura, “Fast distr ibuted gradient methods,” IEEE Trans. Autom. Control , vol. 59, no. 5, pp. 1131–1146, May 2014

work page 2014
[16]

On the convergence of decen tralized gradient descent,

K. Y uan, Q. Ling, and W. Yin, “On the convergence of decen tralized gradient descent,” SIAM J. Optim. , vol. 26, no. 3, pp. 1835–1854, 2016

work page 2016
[17]

Decentralized mu lti-agent optimization via dual decomposition,

H. Terelius, U. Topcu, and R. Murray, “Decentralized mu lti-agent optimization via dual decomposition,” in 18th IF AC Proceedings V olumes, vol. 44, no. 1, Aug. 2011, pp. 11 245–11 251

work page 2011
[18]

D-admm: A communication-efﬁcient distributed algor ithm for separable optimization,

J. F. C. Mota, J. M. F. Xavier, P . M. Q. Aguiar, and M. P¨ usc hel, “D-admm: A communication-efﬁcient distributed algor ithm for separable optimization,” IEEE Trans. Signal Process. , vol. 61, no. 10, pp. 2718–2723, May 2013

work page 2013
[19]

On the linear c onvergence of the admm in decentralized consensus optimization,

W. Shi, Q. Ling, K. Y uan, G. Wu, and W. Yin, “On the linear c onvergence of the admm in decentralized consensus optimization,” IEEE Trans. Signal Process. , vol. 62, no. 7, pp. 1750–1761, Apr. 2014

work page 2014
[20]

A decentra lized second-order method with exact linear convergence ra te for consensus optimization,

A. Mokhtari, W. Shi, Q. Ling, and A. Ribeiro, “A decentra lized second-order method with exact linear convergence ra te for consensus optimization,” IEEE Trans. Signal Inform. Process. Netw. , vol. 2, no. 4, pp. 507–522, Dec. 2016

work page 2016
[21]

Decentralized q uasi-newton methods,

M. Eisen, A. Mokhtari, and A. Ribeiro, “Decentralized q uasi-newton methods,” IEEE Trans. Signal Process. , vol. 65, no. 10, pp. 2613–2628, May 2017

work page 2017
[22]

B alancing communication and computation in distributed optimization,

A. Berahas, R. Bollapragada, N. S. Keskar, and E. Wei, “B alancing communication and computation in distributed optimization,” IEEE Trans. Autom. Control , pp. 1–1, 2018

work page 2018
[23]

Extra: An exact ﬁrst-o rder algorithm for decentralized consensus optimization,

W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact ﬁrst-o rder algorithm for decentralized consensus optimization, ” SIAM J. Optim. , vol. 25, no. 2, pp. 944–966, 2015

work page 2015
[24]

Dextra: A fast algorithm for optimi zation over directed graphs,

C. Xi and U. A. Khan, “Dextra: A fast algorithm for optimi zation over directed graphs,” IEEE Trans. Autom. Control , vol. 62, no. 10, pp. 4980–4993, Oct. 2017

work page 2017
[25]

Augmented distributed optimization for networ ked systems,

J. Xu, “Augmented distributed optimization for networ ked systems,” PhD thesis, Nanyang Tech. Univ., 2016. July 19, 2019 DRAFT 24

work page 2016
[26]

Augmented distribut ed gradient methods for multi-agent optimization under uncoordinated constant stepsizes,

J. Xu, S. Zhu, Y . C. Soh, and L. Xie, “Augmented distribut ed gradient methods for multi-agent optimization under uncoordinated constant stepsizes,” in Proc. 54th IEEE Conf. Decis. Control (CDC) , Dec. 2015, pp. 2055–2060

work page 2015
[27]

Achieving geometri c convergence for distributed optimization over time-vary ing graphs,

A. Nedic, A. Olshevsky, and W. Shi, “Achieving geometri c convergence for distributed optimization over time-vary ing graphs,” SIAM J. Optim. , vol. 27, no. 4, pp. 2597–2633, 2017

work page 2017
[28]

Harnessing smoothness to accelerate di stributed optimization,

G. Qu and N. Li, “Harnessing smoothness to accelerate di stributed optimization,” IEEE Trans. Control Netw. Syst. , vol. 5, no. 3, pp. 1245–1260, Sep. 2018

work page 2018
[29]

Geometr ically convergent distributed optimization with uncoordi nated step-sizes,

A. Nedic, A. Olshevsky, W. Shi, and C. A. Uribe, “Geometr ically convergent distributed optimization with uncoordi nated step-sizes,” in 2017 Amer . Control Conf. (ACC) , May 2017, pp. 3950–3955

work page 2017
[30]

Geometrical convergence rate for distributed optimization with time-varying directed g raphs and uncoordinated step-sizes,

Q. L¨ u, H. Li, and D. Xia, “Geometrical convergence rate for distributed optimization with time-varying directed g raphs and uncoordinated step-sizes,” Inform. Sci. , vol. 422, pp. 516–530, 2018

work page 2018
[31]

Convergence of async hronous distributed gradient methods over stochastic netw orks,

J. Xu, S. Zhu, Y . C. Soh, and L. Xie, “Convergence of async hronous distributed gradient methods over stochastic netw orks,” IEEE Trans. Autom. Control , vol. 63, no. 2, pp. 434–448, Feb. 2018

work page 2018
[32]

Frost–fast row-stochasti c optimization with uncoordinated step-sizes,

R. Xin, C. Xi, and U. A. Khan, “Frost–fast row-stochasti c optimization with uncoordinated step-sizes,” J. Adv. Signal Process., vol. 2019, no. 1, 2019

work page 2019
[33]

Discrete-time dynamic average consensus,

M. Zhu and S. Martinez, “Discrete-time dynamic average consensus,” Automatica, vol. 46, no. 2, pp. 322–329, 2010

work page 2010
[34]

Two-point step size grad ient methods,

J. Barzilai and J. M. Borwein, “Two-point step size grad ient methods,” IMA J. Numer . Anal. , vol. 8, no. 1, pp. 141–148, 1988

work page 1988
[35]

The barzilai and borwein gradient method fo r the large scale unconstrained minimization problem,

M. Raydan, “The barzilai and borwein gradient method fo r the large scale unconstrained minimization problem,” SIAM J. Optim., vol. 7, no. 1, pp. 26–33, 1997

work page 1997
[36]

Projected barzilai-borwein methods for large-scale box-constrained quadratic progra mming,

Y . H. Dai and R. Fletcher, “Projected barzilai-borwein methods for large-scale box-constrained quadratic progra mming,” Numer . Math., vol. 100, no. 1, pp. 21–47, 2005

work page 2005
[37]

The cyclic barzilai-borwein method for unconstrained optimization,

Y . H. Dai, W. W. Hager, K. Schittkowski, and H. C. Zhang, “ The cyclic barzilai-borwein method for unconstrained optimization,” IMA J. Numer . Anal. , vol. 26, no. 3, pp. 604–627, 2006

work page 2006
[38]

A new analysis on the barzilai-borwein gradi ent method,

Y . H. Dai, “A new analysis on the barzilai-borwein gradi ent method,” J. oper . Res. Soc. China , vol. 1, no. 2, pp. 187–198, 2013

work page 2013
[39]

A family of spectral gradient methods for optimization,

Y . H. Dai, Y . K. Huang, and X. W. Liu, “A family of spectral gradient methods for optimization,” Comput. Optim. Appl. , pp. 1–23, 2018

work page 2018
[40]

Projected barzilai-borwein met hod for large-scale nonnegative image restoration,

Y . F. Wang and S. Q. Ma, “Projected barzilai-borwein met hod for large-scale nonnegative image restoration,” Inverse Probl. Sci. Eng. , vol. 15, no. 6, pp. 559–583, 2007

work page 2007
[41]

Spars e reconstruction by separable approximation,

S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo, “Spars e reconstruction by separable approximation,” IEEE Trans. Signal Process. , vol. 57, no. 7, pp. 2479–2493, Jul. 2009

work page 2009
[42]

A fast algor ithm for sparse reconstruction based on shrinkage, subspac e optimization, and continuation,

Z. W. Wen, W. Yin, D. Goldfarb, and Y . Zhang, “A fast algor ithm for sparse reconstruction based on shrinkage, subspac e optimization, and continuation,” SIAM J. Sci. Comput. , vol. 32, no. 4, pp. 1832–1857, 2010

work page 2010
[43]

Coordinated beamforming for miso interference channel: Complexity analysis and efﬁcie nt algorithms,

Y . Liu, Y . Dai, and Z. Luo, “Coordinated beamforming for miso interference channel: Complexity analysis and efﬁcie nt algorithms,” IEEE Trans. Signal Process. , vol. 59, no. 3, pp. 1142–1157, Mar. 2011

work page 2011
[44]

Quadratic regular ization projected barzilai–borwein method for nonnegativ e matrix factorization,

Y . K. Huang, H. W. Liu, and S. S. Zhou, “Quadratic regular ization projected barzilai–borwein method for nonnegativ e matrix factorization,” Data min. knowl. discov. , vol. 29, no. 6, pp. 1665–1684, 2015

work page 2015
[45]

Barzilai-bo rwein step size for stochastic gradient descent,

C. H. Tan, S. Q. Ma, Y . H. Dai, and Y . Q. Qian, “Barzilai-bo rwein step size for stochastic gradient descent,” in Adv. Neur . Inform. Process. Syst. , 2016, pp. 685–693

work page 2016
[46]

Projected barzila i-borwein methods applied to distributed compressive spec trum sensing,

L. T. Tan, H. Y . Kong, and V . N. Q. Bao, “Projected barzila i-borwein methods applied to distributed compressive spec trum sensing,” in 2010 IEEE Symp. New Front. Dynamic Spectrum (DySPAN) , Apr. 2010, pp. 1–7. July 19, 2019 DRAFT 25

work page 2010
[47]

A ccelerated iterative distributed controller synthesis wi th a barzilai-borwein step size,

F. Deroo, M. Ulbrich, B. D. O. Anderson, and S. Hirche, “A ccelerated iterative distributed controller synthesis wi th a barzilai-borwein step size,” in 2012 IEEE 51st IEEE Conf. Decis. Control (CDC) , Dec. 2012, pp. 4864–4870

work page 2012
[48]

Convex optimization: Algorithms and compl exity,

S. Bubeck, “Convex optimization: Algorithms and compl exity,” F ound. Trends Mach. Learn., vol. 8, no. 3-4, pp. 231–357, Nov. 2015

work page 2015
[49]

R. A. Horn and C. R. Johnson, Matrix analysis . Cambridge university press, 2012

work page 2012
[50]

Add-opt: Accelerated dist ributed directed optimization,

C. Xi, R. Xin, and U. A. Khan, “Add-opt: Accelerated dist ributed directed optimization,” IEEE Transactions on Automatic Control, vol. 63, no. 5, pp. 1329–1339, May 2018

work page 2018
[51]

On random graphs i,

P . Erdos and A. Renyi, “On random graphs i,” Publ. Math. Debrecen , vol. 6, pp. 290–297, 1959

work page 1959
[52]

Fastest mixing marko v chain on a graph,

S. Boyd, P . Diaconis, and L. Xiao, “Fastest mixing marko v chain on a graph,” SIAM rev., vol. 46, no. 4, pp. 667–689, 2004

work page 2004
[53]

A decentralized proximal-gra dient method with network independent step-sizes and separ ated convergence rates,

Z. Li, W. Shi, and M. Yan, “A decentralized proximal-gra dient method with network independent step-sizes and separ ated convergence rates,” IEEE Trans. Signal Process. , Apr. 2019. July 19, 2019 DRAFT

work page 2019

[1] [1]

Convex optimizati on for big data: Scalable, randomized, and parallel algorit hms for big data analytics,

V . Cevher, S. Becker, and M. Schmidt, “Convex optimizati on for big data: Scalable, randomized, and parallel algorit hms for big data analytics,” IEEE Signal Process. Mag. , vol. 31, no. 5, pp. 32–43, Sep. 2014

work page 2014

[2] [2]

D istributed optimization and statistical learning via the a lternating direction method of multipliers,

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “D istributed optimization and statistical learning via the a lternating direction method of multipliers,” F ound. Trends Mach. Learn., vol. 3, no. 1, pp. 11–22, Jan. 2011

work page 2011

[3] [3]

Application of a smoothi ng technique to decomposition in convex optimization,

I. Necoara and J. A. K. Suykens, “Application of a smoothi ng technique to decomposition in convex optimization,” IEEE Trans. Autom. Control , vol. 53, no. 11, pp. 2674–2679, Dec. 2008. July 19, 2019 DRAFT 23

work page 2008

[4] [4]

Distributed spectru m sensing for cognitive radio networks by exploiting sparsi ty,

J. A. Bazerque and G. B. Giannakis, “Distributed spectru m sensing for cognitive radio networks by exploiting sparsi ty,” IEEE Trans. Signal Process. , vol. 58, no. 3, pp. 1847–1862, Mar. 2010

work page 2010

[5] [5]

Efﬁcient information aggregation strat egies for distributed control and signal processing,

A. Olshevsky, “Efﬁcient information aggregation strat egies for distributed control and signal processing,” PhD t hesis, Massachusetts Inst. Tech., 2010

work page 2010

[6] [6]

Consensus based formation control strategies f or multi-vehicle systems,

W. Ren, “Consensus based formation control strategies f or multi-vehicle systems,” in 2006 Amer . Control Conf., Jun. 2006, pp. 4237–4242

work page 2006

[7] [7]

Noise reduction by swarming in social foraging,

S. Pu, A. Garcia, and Z. Lin, “Noise reduction by swarming in social foraging,” IEEE Trans. Autom. Control , vol. 61, no. 12, pp. 4007–4013, Dec. 2016

work page 2016

[8] [8]

Distributed learni ng algorithms for spectrum sharing in spatial random access wireless networks,

K. Cohen, A. Nedi´ c, and R. Srikant, “Distributed learni ng algorithms for spectrum sharing in spatial random access wireless networks,” IEEE Trans. Autom. Control , vol. 62, no. 6, pp. 2854–2869, Jun. 2017

work page 2017

[9] [9]

Decentralized sparse signal recove ry for compressive sleeping wireless sensor networks,

Q. Ling and Z. Tian, “Decentralized sparse signal recove ry for compressive sleeping wireless sensor networks,” IEEE Trans. Signal Process. , vol. 58, no. 7, pp. 3816–3827, Jul. 2010

work page 2010

[10] [10]

Optimal decentralized p rotocol for electric vehicle charging,

L. Gan, U. Topcu, and S. H. Low, “Optimal decentralized p rotocol for electric vehicle charging,” IEEE Trans. Power Syst. , vol. 28, no. 2, pp. 940–951, May 2013

work page 2013

[11] [11]

Distributed optimization in se nsor networks,

M. Rabbat and R. Nowak, “Distributed optimization in se nsor networks,” in Proc. 3rd int. symp. on Inform. process. sensor netw., Apr. 2004, pp. 20–27

work page 2004

[12] [12]

Distributed asynchronous computation of ﬁxed points,

D. Bertsekas, “Distributed asynchronous computation of ﬁxed points,” Math. Program., vol. 27, no. 1, pp. 107–120, 1983

work page 1983

[13] [13]

Distribut ed asynchronous deterministic and stochastic gradient opt imization algorithms,

J. Tsitsiklis, D. Bertsekas, and M. Athans, “Distribut ed asynchronous deterministic and stochastic gradient opt imization algorithms,” IEEE Trans. Autom. Control , vol. 31, no. 9, pp. 803–812, Sep. 1986

work page 1986

[14] [14]

Distributed subgradient me thods for multi-agent optimization,

A. Nedi´ c and A. Ozdaglar, “Distributed subgradient me thods for multi-agent optimization,” IEEE Trans. Autom. Control , vol. 54, no. 1, pp. 48–61, Jan. 2009

work page 2009

[15] [15]

Fast distr ibuted gradient methods,

D. Jakoveti´ c, J. Xavier, and J. M. F. Moura, “Fast distr ibuted gradient methods,” IEEE Trans. Autom. Control , vol. 59, no. 5, pp. 1131–1146, May 2014

work page 2014

[16] [16]

On the convergence of decen tralized gradient descent,

K. Y uan, Q. Ling, and W. Yin, “On the convergence of decen tralized gradient descent,” SIAM J. Optim. , vol. 26, no. 3, pp. 1835–1854, 2016

work page 2016

[17] [17]

Decentralized mu lti-agent optimization via dual decomposition,

H. Terelius, U. Topcu, and R. Murray, “Decentralized mu lti-agent optimization via dual decomposition,” in 18th IF AC Proceedings V olumes, vol. 44, no. 1, Aug. 2011, pp. 11 245–11 251

work page 2011

[18] [18]

D-admm: A communication-efﬁcient distributed algor ithm for separable optimization,

J. F. C. Mota, J. M. F. Xavier, P . M. Q. Aguiar, and M. P¨ usc hel, “D-admm: A communication-efﬁcient distributed algor ithm for separable optimization,” IEEE Trans. Signal Process. , vol. 61, no. 10, pp. 2718–2723, May 2013

work page 2013

[19] [19]

On the linear c onvergence of the admm in decentralized consensus optimization,

W. Shi, Q. Ling, K. Y uan, G. Wu, and W. Yin, “On the linear c onvergence of the admm in decentralized consensus optimization,” IEEE Trans. Signal Process. , vol. 62, no. 7, pp. 1750–1761, Apr. 2014

work page 2014

[20] [20]

A decentra lized second-order method with exact linear convergence ra te for consensus optimization,

A. Mokhtari, W. Shi, Q. Ling, and A. Ribeiro, “A decentra lized second-order method with exact linear convergence ra te for consensus optimization,” IEEE Trans. Signal Inform. Process. Netw. , vol. 2, no. 4, pp. 507–522, Dec. 2016

work page 2016

[21] [21]

Decentralized q uasi-newton methods,

M. Eisen, A. Mokhtari, and A. Ribeiro, “Decentralized q uasi-newton methods,” IEEE Trans. Signal Process. , vol. 65, no. 10, pp. 2613–2628, May 2017

work page 2017

[22] [22]

B alancing communication and computation in distributed optimization,

A. Berahas, R. Bollapragada, N. S. Keskar, and E. Wei, “B alancing communication and computation in distributed optimization,” IEEE Trans. Autom. Control , pp. 1–1, 2018

work page 2018

[23] [23]

Extra: An exact ﬁrst-o rder algorithm for decentralized consensus optimization,

W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact ﬁrst-o rder algorithm for decentralized consensus optimization, ” SIAM J. Optim. , vol. 25, no. 2, pp. 944–966, 2015

work page 2015

[24] [24]

Dextra: A fast algorithm for optimi zation over directed graphs,

C. Xi and U. A. Khan, “Dextra: A fast algorithm for optimi zation over directed graphs,” IEEE Trans. Autom. Control , vol. 62, no. 10, pp. 4980–4993, Oct. 2017

work page 2017

[25] [25]

Augmented distributed optimization for networ ked systems,

J. Xu, “Augmented distributed optimization for networ ked systems,” PhD thesis, Nanyang Tech. Univ., 2016. July 19, 2019 DRAFT 24

work page 2016

[26] [26]

Augmented distribut ed gradient methods for multi-agent optimization under uncoordinated constant stepsizes,

J. Xu, S. Zhu, Y . C. Soh, and L. Xie, “Augmented distribut ed gradient methods for multi-agent optimization under uncoordinated constant stepsizes,” in Proc. 54th IEEE Conf. Decis. Control (CDC) , Dec. 2015, pp. 2055–2060

work page 2015

[27] [27]

Achieving geometri c convergence for distributed optimization over time-vary ing graphs,

A. Nedic, A. Olshevsky, and W. Shi, “Achieving geometri c convergence for distributed optimization over time-vary ing graphs,” SIAM J. Optim. , vol. 27, no. 4, pp. 2597–2633, 2017

work page 2017

[28] [28]

Harnessing smoothness to accelerate di stributed optimization,

G. Qu and N. Li, “Harnessing smoothness to accelerate di stributed optimization,” IEEE Trans. Control Netw. Syst. , vol. 5, no. 3, pp. 1245–1260, Sep. 2018

work page 2018

[29] [29]

Geometr ically convergent distributed optimization with uncoordi nated step-sizes,

A. Nedic, A. Olshevsky, W. Shi, and C. A. Uribe, “Geometr ically convergent distributed optimization with uncoordi nated step-sizes,” in 2017 Amer . Control Conf. (ACC) , May 2017, pp. 3950–3955

work page 2017

[30] [30]

Geometrical convergence rate for distributed optimization with time-varying directed g raphs and uncoordinated step-sizes,

Q. L¨ u, H. Li, and D. Xia, “Geometrical convergence rate for distributed optimization with time-varying directed g raphs and uncoordinated step-sizes,” Inform. Sci. , vol. 422, pp. 516–530, 2018

work page 2018

[31] [31]

Convergence of async hronous distributed gradient methods over stochastic netw orks,

J. Xu, S. Zhu, Y . C. Soh, and L. Xie, “Convergence of async hronous distributed gradient methods over stochastic netw orks,” IEEE Trans. Autom. Control , vol. 63, no. 2, pp. 434–448, Feb. 2018

work page 2018

[32] [32]

Frost–fast row-stochasti c optimization with uncoordinated step-sizes,

R. Xin, C. Xi, and U. A. Khan, “Frost–fast row-stochasti c optimization with uncoordinated step-sizes,” J. Adv. Signal Process., vol. 2019, no. 1, 2019

work page 2019

[33] [33]

Discrete-time dynamic average consensus,

M. Zhu and S. Martinez, “Discrete-time dynamic average consensus,” Automatica, vol. 46, no. 2, pp. 322–329, 2010

work page 2010

[34] [34]

Two-point step size grad ient methods,

J. Barzilai and J. M. Borwein, “Two-point step size grad ient methods,” IMA J. Numer . Anal. , vol. 8, no. 1, pp. 141–148, 1988

work page 1988

[35] [35]

The barzilai and borwein gradient method fo r the large scale unconstrained minimization problem,

M. Raydan, “The barzilai and borwein gradient method fo r the large scale unconstrained minimization problem,” SIAM J. Optim., vol. 7, no. 1, pp. 26–33, 1997

work page 1997

[36] [36]

Projected barzilai-borwein methods for large-scale box-constrained quadratic progra mming,

Y . H. Dai and R. Fletcher, “Projected barzilai-borwein methods for large-scale box-constrained quadratic progra mming,” Numer . Math., vol. 100, no. 1, pp. 21–47, 2005

work page 2005

[37] [37]

The cyclic barzilai-borwein method for unconstrained optimization,

Y . H. Dai, W. W. Hager, K. Schittkowski, and H. C. Zhang, “ The cyclic barzilai-borwein method for unconstrained optimization,” IMA J. Numer . Anal. , vol. 26, no. 3, pp. 604–627, 2006

work page 2006

[38] [38]

A new analysis on the barzilai-borwein gradi ent method,

Y . H. Dai, “A new analysis on the barzilai-borwein gradi ent method,” J. oper . Res. Soc. China , vol. 1, no. 2, pp. 187–198, 2013

work page 2013

[39] [39]

A family of spectral gradient methods for optimization,

Y . H. Dai, Y . K. Huang, and X. W. Liu, “A family of spectral gradient methods for optimization,” Comput. Optim. Appl. , pp. 1–23, 2018

work page 2018

[40] [40]

Projected barzilai-borwein met hod for large-scale nonnegative image restoration,

Y . F. Wang and S. Q. Ma, “Projected barzilai-borwein met hod for large-scale nonnegative image restoration,” Inverse Probl. Sci. Eng. , vol. 15, no. 6, pp. 559–583, 2007

work page 2007

[41] [41]

Spars e reconstruction by separable approximation,

S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo, “Spars e reconstruction by separable approximation,” IEEE Trans. Signal Process. , vol. 57, no. 7, pp. 2479–2493, Jul. 2009

work page 2009

[42] [42]

A fast algor ithm for sparse reconstruction based on shrinkage, subspac e optimization, and continuation,

Z. W. Wen, W. Yin, D. Goldfarb, and Y . Zhang, “A fast algor ithm for sparse reconstruction based on shrinkage, subspac e optimization, and continuation,” SIAM J. Sci. Comput. , vol. 32, no. 4, pp. 1832–1857, 2010

work page 2010

[43] [43]

Coordinated beamforming for miso interference channel: Complexity analysis and efﬁcie nt algorithms,

Y . Liu, Y . Dai, and Z. Luo, “Coordinated beamforming for miso interference channel: Complexity analysis and efﬁcie nt algorithms,” IEEE Trans. Signal Process. , vol. 59, no. 3, pp. 1142–1157, Mar. 2011

work page 2011

[44] [44]

Quadratic regular ization projected barzilai–borwein method for nonnegativ e matrix factorization,

Y . K. Huang, H. W. Liu, and S. S. Zhou, “Quadratic regular ization projected barzilai–borwein method for nonnegativ e matrix factorization,” Data min. knowl. discov. , vol. 29, no. 6, pp. 1665–1684, 2015

work page 2015

[45] [45]

Barzilai-bo rwein step size for stochastic gradient descent,

C. H. Tan, S. Q. Ma, Y . H. Dai, and Y . Q. Qian, “Barzilai-bo rwein step size for stochastic gradient descent,” in Adv. Neur . Inform. Process. Syst. , 2016, pp. 685–693

work page 2016

[46] [46]

Projected barzila i-borwein methods applied to distributed compressive spec trum sensing,

L. T. Tan, H. Y . Kong, and V . N. Q. Bao, “Projected barzila i-borwein methods applied to distributed compressive spec trum sensing,” in 2010 IEEE Symp. New Front. Dynamic Spectrum (DySPAN) , Apr. 2010, pp. 1–7. July 19, 2019 DRAFT 25

work page 2010

[47] [47]

A ccelerated iterative distributed controller synthesis wi th a barzilai-borwein step size,

F. Deroo, M. Ulbrich, B. D. O. Anderson, and S. Hirche, “A ccelerated iterative distributed controller synthesis wi th a barzilai-borwein step size,” in 2012 IEEE 51st IEEE Conf. Decis. Control (CDC) , Dec. 2012, pp. 4864–4870

work page 2012

[48] [48]

Convex optimization: Algorithms and compl exity,

S. Bubeck, “Convex optimization: Algorithms and compl exity,” F ound. Trends Mach. Learn., vol. 8, no. 3-4, pp. 231–357, Nov. 2015

work page 2015

[49] [49]

R. A. Horn and C. R. Johnson, Matrix analysis . Cambridge university press, 2012

work page 2012

[50] [50]

Add-opt: Accelerated dist ributed directed optimization,

C. Xi, R. Xin, and U. A. Khan, “Add-opt: Accelerated dist ributed directed optimization,” IEEE Transactions on Automatic Control, vol. 63, no. 5, pp. 1329–1339, May 2018

work page 2018

[51] [51]

On random graphs i,

P . Erdos and A. Renyi, “On random graphs i,” Publ. Math. Debrecen , vol. 6, pp. 290–297, 1959

work page 1959

[52] [52]

Fastest mixing marko v chain on a graph,

S. Boyd, P . Diaconis, and L. Xiao, “Fastest mixing marko v chain on a graph,” SIAM rev., vol. 46, no. 4, pp. 667–689, 2004

work page 2004

[53] [53]

A decentralized proximal-gra dient method with network independent step-sizes and separ ated convergence rates,

Z. Li, W. Shi, and M. Yan, “A decentralized proximal-gra dient method with network independent step-sizes and separ ated convergence rates,” IEEE Trans. Signal Process. , Apr. 2019. July 19, 2019 DRAFT

work page 2019