Decentralized Inexact Cubic Newton Method with Consensus Procedure

Alexander Dyakonov; Alexander Gasnikov; Alexander Rogozin; Anton Novitskii; Artem Agafonov; Dmitry Kamzolov; Martin Tak\'a\v{c}; Yury Sokolov

arxiv: 2605.21169 · v2 · pith:ZPUM3EZDnew · submitted 2026-05-20 · 🧮 math.OC

Decentralized Inexact Cubic Newton Method with Consensus Procedure

Artem Agafonov , Anton Novitskii , Alexander Rogozin , Yury Sokolov , Dmitry Kamzolov , Alexander Dyakonov , Martin Tak\'a\v{c} , Alexander Gasnikov This is my paper

Pith reviewed 2026-05-22 09:32 UTC · model grok-4.3

classification 🧮 math.OC

keywords decentralized optimizationcubic Newton methodconsensus procedureinexact second-order methodsdistributed machine learningstrongly convex optimizationgeneralized linear modelscommunication efficiency

0 comments

The pith

Decentralized Cubic Newton method matches the iteration complexity of the exact version under gradient smoothness and Hessian Lipschitz conditions, adding only polylogarithmic communication rounds for consensus.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a decentralized second-order optimization approach where each agent holds a local objective and exchanges information only with network neighbors. It introduces an inexact consensus procedure to approximately average iterates, gradients, and Hessians, then proves that the resulting method retains the iteration count of the centralized exact Cubic Newton method for convex problems. The analysis explicitly bounds the errors from local disagreements. An accelerated variant is given for strongly convex cases that likewise matches the exact accelerated complexity. For generalized linear models the communication reduces to vectors rather than full matrices, improving practicality in high dimensions.

Core claim

Under L1-smoothness of gradients and L2-Lipschitz continuity of Hessians, the Decentralized Inexact Cubic Newton method with consensus achieves the same iteration complexity as the exact Cubic Newton method for convex optimization while requiring only additional polylogarithmic communication rounds to reach the necessary consensus accuracy; the theory tracks the inexactness terms arising from disagreement between local copies. The accelerated version does the same for strongly convex objectives.

What carries the argument

The inexact consensus procedure that approximately averages local iterates, gradients, and Hessians via neighbor-to-neighbor communications while controlling the resulting disagreement errors inside the cubic regularization step.

Load-bearing premise

The consensus procedure can be driven to sufficient accuracy using a number of rounds that is only polylogarithmic in the relevant parameters, without inflating the overall iteration complexity or breaking the cubic regularization guarantees.

What would settle it

A controlled numerical test that caps consensus rounds at a fixed constant (instead of polylog) and measures whether the number of outer iterations required to reach target accuracy exceeds that of the exact centralized Cubic Newton method.

Figures

Figures reproduced from arXiv: 2605.21169 by Alexander Dyakonov, Alexander Gasnikov, Alexander Rogozin, Anton Novitskii, Artem Agafonov, Dmitry Kamzolov, Martin Tak\'a\v{c}, Yury Sokolov.

read the original abstract

Distributed optimization is widely used in large-scale and privacy-preserving machine learning, where each agent stores a local objective and communicates only with its neighbors in a connected network. We study decentralized second-order optimization and focus on consensus procedures that approximately average local iterates, gradients, and Hessians through neighbor-to-neighbor communications. We propose a general Decentralized Cubic Newton method for convex optimization under $L_1$-smoothness of gradients and $L_2$-Lipschitz continuity of Hessians, and develop a theory that accurately tracks the inaccuracies caused by consensus and by disagreement between local iterates. Under these assumptions, the method matches the iteration complexity of the exact Cubic Newton method and requires only additional polylogarithmic communication-round overhead to reach the necessary consensus accuracy. We further propose an Accelerated Decentralized Cubic Newton method for strongly convex objectives and show that it matches the iteration complexity of the exact Accelerated Cubic Newton method, again with only additional polylogarithmic communication-round overhead. Finally, although the general method requires exchanging full $d \times d$ Hessian matrices, we show how it can be implemented for generalized linear models by transmitting only vectors, making the approach substantially more practical in high dimensions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Decentralized cubic Newton matches centralized iteration complexity with only polylog communication overhead via explicit consensus error tracking, plus a vector-only GLM version.

read the letter

The one or two things to know: this paper presents a decentralized inexact cubic Newton method for convex optimization that matches the iteration complexity of the exact centralized version while incurring only additional polylogarithmic communication rounds for consensus. They combine cubic regularization with a consensus procedure that approximately averages local information. The key is their analysis of how disagreement between agents affects the iterates, gradients, and Hessians, and how to choose the number of consensus rounds to keep those errors controlled. This works under L1-smoothness of gradients and L2-Lipschitz Hessians. They extend it to an accelerated version for strongly convex problems and give a vector-based implementation for generalized linear models. The paper does well in making the inexactness terms explicit and showing they can be absorbed without changing the main rate. The polylog overhead is a clean result that suggests the communication cost is asymptotically negligible. A soft spot is that the analysis relies on the network being connected and the consensus method achieving the required accuracy in polylog rounds, which holds in theory but the constants might depend on the graph's mixing time or diameter. Outside of GLMs, full Hessians still need to be exchanged, limiting scalability in very high dimensions. This work is for specialists in distributed second-order optimization. A reader focused on large-scale convex problems with communication limits would find the concrete complexity bounds and practical variant useful. It deserves serious peer review because the central claims are testable and the error tracking provides a clear path to verify the results.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a Decentralized Inexact Cubic Newton method for convex optimization over networks, where agents maintain local objectives and communicate only with neighbors. It develops an analysis that explicitly tracks inaccuracies arising from approximate consensus on iterates, gradients, and Hessians. Under L1-smoothness of gradients and L2-Lipschitz continuity of Hessians, the method is claimed to match the iteration complexity of the exact centralized Cubic Newton method while incurring only polylogarithmic additional communication-round overhead. An accelerated variant is presented for strongly convex objectives with analogous guarantees, and a practical implementation for generalized linear models is given that communicates only vectors rather than full Hessians.

Significance. If the central claims hold, the work is significant for distributed optimization because it shows that second-order methods can be decentralized with only logarithmic communication overhead while preserving the fast local convergence rates of cubic regularization. The explicit accounting for disagreement-induced inexactness in the analysis is a clear strength that distinguishes this from prior decentralized first-order work. The vector-only variant for GLMs also improves practicality in high dimensions.

minor comments (3)

The main theorem statement would benefit from an explicit display of the polylogarithmic factor in the communication rounds (e.g., dependence on network size, condition number, or target accuracy) rather than leaving it implicit in the proof sketch.
Notation for the consensus mixing matrix and the per-iteration accuracy parameter should be introduced in a dedicated preliminary subsection to improve readability before the complexity analysis begins.
In the GLM implementation section, a short remark comparing total bits communicated per iteration (vector vs. matrix) would help quantify the practical savings.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review, accurate summary of our contributions, and recommendation for minor revision. The significance assessment correctly highlights the value of tracking consensus-induced inexactness and the practical GLM implementation. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central claim is that an inexact decentralized cubic Newton method, with consensus-induced errors in iterates/gradients/Hessians bounded via polylog communication rounds, preserves the iteration complexity of the exact cubic Newton method under L1-smoothness and L2-Hessian Lipschitz assumptions. The provided abstract and reader extraction indicate that the analysis explicitly accounts for disagreement errors as additive inexactness terms inside the tolerance budget of the exact method, without reducing the claimed complexity to a fitted parameter, self-defined quantity, or load-bearing self-citation. No equations or steps are shown to collapse by construction to the inputs; the result is presented as an independent extension that tracks consensus inaccuracies while inheriting the exact-method guarantees. This is the standard structure of an inexact-oracle analysis and qualifies as self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard smoothness assumptions plus the existence of a consensus procedure whose error can be controlled independently of the optimization progress.

axioms (2)

domain assumption Gradients are L1-smooth and Hessians are L2-Lipschitz continuous
Stated in the abstract as the setting under which the complexity guarantees hold.
domain assumption A connected network allows neighbor-to-neighbor communication sufficient to achieve consensus accuracy
Implicit in the decentralized setup and the polylogarithmic overhead claim.

pith-pipeline@v0.9.0 · 5766 in / 1431 out tokens · 30157 ms · 2026-05-22T09:32:23.516176+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel (J-cost uniqueness) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a general Decentralized Cubic Newton method for convex optimization under L1-smoothness of gradients and L2-Lipschitz continuity of Hessians... matches the iteration complexity of the exact Cubic Newton method and requires only additional polylogarithmic communication-round overhead
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking (D=3 forcing) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption 3 (Contraction property)... W^k_τ U − Ū_F ≤ (1−λ)∥U−Ū∥_F

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 2 internal anchors

[1]

Kairouz, H

P . Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummingset al., ‘‘Advances and open problems in federated learning,’’F oundations and trends®in machine learning, vol. 14, no. 1–2, pp. 1– 210, 2021. 10

work page 2021
[2]

Necoara, V

I. Necoara, V . Nedelcu, and I. Dumitrache, ‘‘Parallel and distributed optimization methods for estimation and control in networks,’’Journal of Process Control, vol. 21, no. 5, pp. 756–766, 2011

work page 2011
[3]

Distributed dual gradient methods and error bound conditions

I. Necoara and V . Nedelcu, ‘‘Distributed dual gradient methods and error bound conditions,’’arXiv preprint arXiv:1401.4398, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[4]

T. T. Doan and A. Olshevsky, ‘‘Distributed resource allocation on dynamic networks in quadratic time,’’Sys- tems & Control Letters, vol. 99, pp. 57–63, 2017

work page 2017
[5]

Nedić, A

A. Nedić, A. Olshevsky, and W. Shi, ‘‘Improved con- vergence rates for distributed resource allocation,’’ in 2018 IEEE Conference on Decision and Control (CDC). IEEE, 2018, pp. 172–177

work page 2018
[6]

Ren, ‘‘Consensus based formation control strategies for multi-vehicle systems,’’ in2006 American Control Conference

W. Ren, ‘‘Consensus based formation control strategies for multi-vehicle systems,’’ in2006 American Control Conference. IEEE, 2006, pp. 6–pp

work page 2006
[7]

J. N. Tsitsiklis, ‘‘Problems in decentralized decision making and computation.’’ Massachusetts Inst of Tech Cambridge Lab for Information and Decision Systems, Tech. Rep. LIDS-TH-1424, 1984

work page 1984
[8]

D. P . Bertsekas and J. N. Tsitsiklis,Parallel and dis- tributed computation: numerical methods. Prentice hall Englewood Cliffs, NJ, 1989, vol. 23

work page 1989
[9]

Nedić and A

A. Nedić and A. Ozdaglar, ‘‘Distributed subgradient methods for multi-agent optimization,’’IEEE Transac- tions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009

work page 2009
[10]

Nedic and A

A. Nedic and A. Ozdaglar, ‘‘Subgradient methods for saddle-point problems,’’Journal of optimization theory and applications, vol. 142, no. 1, pp. 205–228, 2009

work page 2009
[11]

W. Shi, Q. Ling, G. Wu, and W. Yin, ‘‘Extra: An exact first-order algorithm for decentralized consensus op- timization,’’SIAM Journal on Optimization, vol. 25, no. 2, pp. 944–966, 2015

work page 2015
[12]

Koloskova, N

A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, and S. U. Stich, ‘‘A unified theory of decentralized sgd with changing topology and local updates,’’ICML 2020, arXiv preprint arXiv:2003.10422, 2020

work page arXiv 2020
[13]

H. Li, C. Fang, W. Yin, and Z. Lin, ‘‘Decentralized accelerated gradient methods with increasing penalty parameters,’’IEEE Transactions on Signal Processing, vol. 68, pp. 4855–4870, 2020

work page 2020
[14]

Nedic, A

A. Nedic, A. Olshevsky, and W. Shi, ‘‘Achieving ge- ometric convergence for distributed optimization over time-varying graphs,’’SIAM Journal on Optimization, vol. 27, no. 4, pp. 2597–2633, 2017

work page 2017
[15]

S. Pu, W. Shi, J. Xu, and A. Nedić, ‘‘Push–pull gradi- ent methods for distributed optimization in networks,’’ IEEE Transactions on Automatic Control, vol. 66, no. 1, pp. 1–16, 2020

work page 2020
[16]

Pu and A

S. Pu and A. Nedić, ‘‘Distributed stochastic gradient tracking methods,’’Mathematical Programming, vol. 187, no. 1, pp. 409–457, 2021

work page 2021
[17]

Li and Z

H. Li and Z. Lin, ‘‘Accelerated gradient tracking over time-varying graphs for decentralized optimization,’’ Journal of Machine Learning Research, vol. 25, no. 274, pp. 1–52, 2024

work page 2024
[18]

S. A. Alghunaim, E. K. Ryu, K. Y uan, and A. H. Sayed, ‘‘Decentralized proximal gradient algorithms with lin- ear convergence rates,’’IEEE Transactions on Auto- matic Control, vol. 66, no. 6, pp. 2787–2794, 2020

work page 2020
[19]

Scaman, F

K. Scaman, F. Bach, S. Bubeck, Y . T. Lee, and L. Mas- soulié, ‘‘Optimal algorithms for smooth and strongly convex distributed optimization in networks,’’ inPro- ceedings of the 34th International Conference on Ma- chine Learning-V olume 70. JMLR. org, 2017, pp. 3027–3036

work page 2017
[20]

Kovalev, A

D. Kovalev, A. Salim, and P . Richtárik, ‘‘Optimal and practical algorithms for smooth and strongly convex decentralized optimization,’’Advances in Neural Infor- mation Processing Systems, vol. 33, 2020

work page 2020
[21]

Kovalev, E

D. Kovalev, E. Shulgin, P . Richtárik, A. V . Rogozin, and A. Gasnikov, ‘‘Adom: Accelerated decentralized optimization method for time-varying networks,’’ inIn- ternational Conference on Machine Learning. PMLR, 2021, pp. 5784–5793

work page 2021
[22]

Kovalev, E

D. Kovalev, E. Gasanov, A. Gasnikov, and P . Richtarik, ‘‘Lower bounds and optimal algorithms for smooth and strongly convex decentralized optimization over time- varying networks,’’Advances in Neural Information Processing Systems, vol. 34, 2021

work page 2021
[23]

A. S. Nemirovski and D. B. Y udin,Problem Complexity and Method Efficiency in Optimization, ser. A Wiley- Interscience publication. Wiley, 1983

work page 1983
[24]

Nesterov,Introductory Lectures on Convex Optimiza- tion: a basic course

Y . Nesterov,Introductory Lectures on Convex Optimiza- tion: a basic course. Kluwer Academic Publishers, Massachusetts, 2004

work page 2004
[25]

S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Ecksteinet al., ‘‘Distributed optimization and statistical learning via the alternating direction method of multipliers,’’F ounda- tions and Trends®in Machine learning, vol. 3, no. 1, pp. 1–122, 2011

work page 2011
[26]

Chang, ‘‘A proximal dual consensus admm method for multi-agent constrained optimization,’’ IEEE Transactions on Signal Processing, vol

T.-H. Chang, ‘‘A proximal dual consensus admm method for multi-agent constrained optimization,’’ IEEE Transactions on Signal Processing, vol. 64, no. 14, pp. 3719–3734, 2016

work page 2016
[27]

Falsone, I

A. Falsone, I. Notarnicola, G. Notarstefano, and M. Prandini, ‘‘Tracking-admm for distributed constraint-coupled optimization,’’Automatica, vol. 117, p. 108962, 2020

work page 2020
[28]

X. Wu, H. Wang, and J. Lu, ‘‘Distributed optimization with coupling constraints,’’IEEE Transactions on Auto- matic Control, vol. 68, no. 3, pp. 1847–1854, 2022

work page 2022
[29]

arXiv preprint arXiv:2310.15596 , year=

K. Gong and L. Zhang, ‘‘Decentralized proximal method of multipliers for convex optimization with coupled constraints,’’arXiv preprint arXiv:2310.15596, 2023

work page arXiv 2023
[30]

Nesterov,Lectures on Convex Optimization, 2nd ed

Y . Nesterov,Lectures on Convex Optimization, 2nd ed. Springer Cham, 2018

work page 2018
[31]

Nesterov and B

Y . Nesterov and B. T. Polyak, ‘‘Cubic regularization of Newton method and its global performance,’’ 11 Mathematical Programming, vol. 108, pp. 177– 205, 2006. [Online]. Available: https://doi.org/10.1007/ s10107-006-0706-8

work page 2006
[32]

Doikov and Y

N. Doikov and Y . Nesterov, ‘‘Gradient regularization of newton method with bregman distances,’’Mathematical programming, vol. 204, no. 1, pp. 1–25, 2024

work page 2024
[33]

Mishchenko, ‘‘Regularized Newton method with globalO 1 k 2 convergence,’’SIAM Journal on Optimization, vol

K. Mishchenko, ‘‘Regularized Newton method with globalO 1 k 2 convergence,’’SIAM Journal on Optimization, vol. 33, pp. 1440–1462, 2023. [Online]. Available: https://doi.org/10.1137/22M1488752

work page doi:10.1137/22m1488752 2023
[34]

Jiang, C

Y . Jiang, C. He, C. Zhang, D. Ge, B. Jiang, and Y . Y e, ‘‘Beyond nonconvexity: A universal trust-region method with new analyses,’’Journal of Scientific Com- puting, vol. 106, no. 1, p. 28, 2026

work page 2026
[35]

Hanzely, D

S. Hanzely, D. Kamzolov, D. Pasechnyuk, A. Gasnikov, P . Richtárik, and M. Takáč, ‘‘A damped Newton method achieves globalO 1 k 2 and local quadratic convergence rate,’’ inAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 25 320–25 334. [O...

work page 2022
[36]

Islamov, X

R. Islamov, X. Qian, and P . Richtárik, ‘‘Distributed sec- ond order methods with fast rates and compressed com- munication,’’ inInternational conference on machine learning. PMLR, 2021, pp. 4617–4628

work page 2021
[37]

S. Wang, F. Roosta, P . Xu, and M. W. Mahoney, ‘‘Giant: Globally improved approximate newton method for dis- tributed optimization,’’Advances in neural information processing systems, vol. 31, 2018

work page 2018
[38]

Safaryan, R

M. Safaryan, R. Islamov, X. Qian, and P . Richtarik, ‘‘Fednl: Making newton-type methods applicable to federated learning,’’ inInternational Conference on Ma- chine Learning. PMLR, 2022, pp. 18 959–19 010

work page 2022
[39]

Islamov, X

R. Islamov, X. Qian, S. Hanzely, M. Safaryan, and P . Richtárik, ‘‘Distributed newton-type methods with communication compression and bernoulli aggrega- tion,’’Transactions on Machine Learning Research, 2023

work page 2023
[40]

, author Fält, M

A. Agafonov, P . Dvurechensky, G. Scutari, A. Gasnikov, D. Kamzolov, A. Lukashevich, and A. Daneshmand, ‘‘An accelerated second-order method for distributed stochastic optimization,’’ in2021 60th IEEE Conference on Decision and Control (CDC), 2021, pp. 2407–2413. [Online]. Available: https://doi.org/10.1109/CDC45484.2021.9683400

work page doi:10.1109/cdc45484.2021.9683400 2021
[41]

Agafonov, D

A. Agafonov, D. Kamzolov, R. Tappenden, A. Gas- nikov, and M. Takáč, ‘‘Flecs: A federated learning second-order framework via compression and sketch- ing,’’Optimization Methods and Software, vol. 40, no. 5, pp. 1222–1248, 2025

work page 2025
[42]

Daneshmand, G

A. Daneshmand, G. Scutari, P . Dvurechensky, and A. Gasnikov, ‘‘Newton method over networks is fast up to the statistical precision,’’ inInternational Conference on Machine Learning. PMLR, 2021, pp. 2398–2409

work page 2021
[43]

Eisen, A

M. Eisen, A. Mokhtari, and A. Ribeiro, ‘‘Decentralized quasi-newton methods,’’IEEE Transactions on Signal Processing, vol. 65, no. 10, pp. 2613–2628, 2017

work page 2017
[44]

Zhang, H

J. Zhang, H. Liu, A. M.-C. So, and Q. Ling, ‘‘V ariance- reduced stochastic quasi-newton methods for decentral- ized learning,’’IEEE Transactions on Signal Process- ing, vol. 71, pp. 311–326, 2023

work page 2023
[45]

Zhang, Q

J. Zhang, Q. Ling, and A. M.-C. So, ‘‘A newton track- ing algorithm with exact linear convergence for decen- tralized consensus optimization,’’IEEE Transactions on Signal and Information Processing over Networks, vol. 7, pp. 346–358, 2021

work page 2021
[46]

C. A. Uribe and A. Jadbabaie, ‘‘A distributed cubic-regularized newton method for smooth convex optimization over networks,’’arXiv preprint arXiv:2007.03562, 2020

work page arXiv 2007
[47]

Zhang, K

Z. Zhang, K. Che, S. Y ang, and W. Xu, ‘‘Communication-efficient distributed cubic newton with compressed lazy hessian,’’Neural Networks, vol. 174, p. 106212, 2024

work page 2024
[48]

Mokhtari, Q

A. Mokhtari, Q. Ling, and A. Ribeiro, ‘‘Network newton distributed optimization methods,’’IEEE Transactions on Signal Processing, vol. 65, no. 1, pp. 146–161, 2016

work page 2016
[49]

Nesterov, ‘‘Accelerating the cubic regularization of Newton’s method on convex problems,’’Mathematical Programming, vol

Y . Nesterov, ‘‘Accelerating the cubic regularization of Newton’s method on convex problems,’’Mathematical Programming, vol. 112, pp. 159–181, 2008. [Online]. Available: https://doi.org/10.1007/s10107-006-0089-x

work page doi:10.1007/s10107-006-0089-x 2008
[50]

Salim, L

A. Salim, L. Condat, D. Kovalev, and P . Richtárik, ‘‘An optimal algorithm for strongly convex minimization un- der affine constraints,’’ inInternational conference on artificial intelligence and statistics. PMLR, 2022, pp. 4482–4498

work page 2022
[51]

Agafonov, D

A. Agafonov, D. Kamzolov, P . Dvurechensky, A. Gasnikov, and M. Takáč, ‘‘Inexact tensor methods and their application to stochastic convex optimization,’’Optimization Methods and Software, vol. 39, no. 1, pp. 42–83, 2024. [Online]. Available: https://doi.org/10.1080/10556788.2023.2261604

work page doi:10.1080/10556788.2023.2261604 2024
[52]

Kamzolov, D

D. Kamzolov, D. Pasechnyuk, A. Agafonov, A. Gas- nikov, and M. Takáč, ‘‘Optami: Global superlinear convergence of high-order methods,’’arXiv preprint arXiv:2410.04083, 2024

work page arXiv 2024
[53]

Second-Order Methods with Cubic Regularization Under Inexact Information

S. Ghadimi, H. Liu, and T. Zhang, ‘‘Second-order meth- ods with cubic regularization under inexact informa- tion,’’arXiv preprint arXiv:1710.05782, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[54]

Antonakopoulos, A

K. Antonakopoulos, A. Kavis, and V . Cevher, ‘‘Extra-newton: A first approach to noise-adaptive accelerated second-order methods,’’ inAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 29 859–29 872. [Online]. Available: https: //proceeding...

work page 2022
[55]

Agafonov, D

A. Agafonov, D. Kamzolov, A. Gasnikov, A. Kavis, K. Antonakopoulos, V . Cevher, and M. Takáč, ‘‘Advanc- ing the lower bounds: an accelerated, stochastic, second- order method with optimal adaptation to inexactness,’’ inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=otU31x3fus AP...

work page 2024
[56]

Ifεis sufficiently small, namelyε≤12(L+ ¯L2)D3, we set: N= q 108(L+¯L2)D3 ε −2,(61) ˆ∆ˆx ≤min √ 2ε 288¯L1D , √ 3ε(L+¯L2) 144¯L2 √ D , √ε 3 √ (L+¯L2)D .(62)

work page
[57]

Then after N+ 1iterations Algorithm 1 outputs anε-solution, i.e., f( ¯xN+1)−f(x ∗)≤ε

Otherwise, ifε >12(L+ ¯L2)D3, we set N= 1and: ˆ∆ˆx ≤min √ 2ε 288¯L1D , √ 3ε(L+¯L2) 144¯L2 √ D , ε1/3 (6(L+¯L2))1/3 ,D .(63) We also setγ= √ (N+1)(N+2) 6D . Then after N+ 1iterations Algorithm 1 outputs anε-solution, i.e., f( ¯xN+1)−f(x ∗)≤ε. Proof.By the conditions of Theorem 4, the regularization parameters must satisfyδ 1,k =δ 1 ≥max j∈[N] ∆1|ˆx,j andδ ...

work page
[58]

= ε 8 ≤ ε 6 , 3δ1D 5√ 6 = 15√ 6 D √ 2 72 ε D = 5 24 √ 3 ε≤ ε 6 , 9δ2D2 5 6 = 15 2 D2 √ 3 36 q ε(L+¯L2) D ≤ 15 2·72 ε≤ ε 6 . Then, using the third and the fourth bounds from (63), for (65) we obtain: L+¯L2 3 (3) ˆ∆3 ˆx ≤(L+ ¯L2) h ε 6(L+¯L2) i = ε 6 , 9 6D δ1 ˆ∆2 ˆx ≤ 3 2D √ 2 72 ε D D2 = √ 2 48 ε≤ ε 12 , 3δ2 ˆ∆2 ˆx ≤3 √ 3 36 q ε(L+¯L2) D D2 ≤ ε 24 . Summi...

work page

[1] [1]

Kairouz, H

P . Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummingset al., ‘‘Advances and open problems in federated learning,’’F oundations and trends®in machine learning, vol. 14, no. 1–2, pp. 1– 210, 2021. 10

work page 2021

[2] [2]

Necoara, V

I. Necoara, V . Nedelcu, and I. Dumitrache, ‘‘Parallel and distributed optimization methods for estimation and control in networks,’’Journal of Process Control, vol. 21, no. 5, pp. 756–766, 2011

work page 2011

[3] [3]

Distributed dual gradient methods and error bound conditions

I. Necoara and V . Nedelcu, ‘‘Distributed dual gradient methods and error bound conditions,’’arXiv preprint arXiv:1401.4398, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[4] [4]

T. T. Doan and A. Olshevsky, ‘‘Distributed resource allocation on dynamic networks in quadratic time,’’Sys- tems & Control Letters, vol. 99, pp. 57–63, 2017

work page 2017

[5] [5]

Nedić, A

A. Nedić, A. Olshevsky, and W. Shi, ‘‘Improved con- vergence rates for distributed resource allocation,’’ in 2018 IEEE Conference on Decision and Control (CDC). IEEE, 2018, pp. 172–177

work page 2018

[6] [6]

Ren, ‘‘Consensus based formation control strategies for multi-vehicle systems,’’ in2006 American Control Conference

W. Ren, ‘‘Consensus based formation control strategies for multi-vehicle systems,’’ in2006 American Control Conference. IEEE, 2006, pp. 6–pp

work page 2006

[7] [7]

J. N. Tsitsiklis, ‘‘Problems in decentralized decision making and computation.’’ Massachusetts Inst of Tech Cambridge Lab for Information and Decision Systems, Tech. Rep. LIDS-TH-1424, 1984

work page 1984

[8] [8]

D. P . Bertsekas and J. N. Tsitsiklis,Parallel and dis- tributed computation: numerical methods. Prentice hall Englewood Cliffs, NJ, 1989, vol. 23

work page 1989

[9] [9]

Nedić and A

A. Nedić and A. Ozdaglar, ‘‘Distributed subgradient methods for multi-agent optimization,’’IEEE Transac- tions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009

work page 2009

[10] [10]

Nedic and A

A. Nedic and A. Ozdaglar, ‘‘Subgradient methods for saddle-point problems,’’Journal of optimization theory and applications, vol. 142, no. 1, pp. 205–228, 2009

work page 2009

[11] [11]

W. Shi, Q. Ling, G. Wu, and W. Yin, ‘‘Extra: An exact first-order algorithm for decentralized consensus op- timization,’’SIAM Journal on Optimization, vol. 25, no. 2, pp. 944–966, 2015

work page 2015

[12] [12]

Koloskova, N

A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, and S. U. Stich, ‘‘A unified theory of decentralized sgd with changing topology and local updates,’’ICML 2020, arXiv preprint arXiv:2003.10422, 2020

work page arXiv 2020

[13] [13]

H. Li, C. Fang, W. Yin, and Z. Lin, ‘‘Decentralized accelerated gradient methods with increasing penalty parameters,’’IEEE Transactions on Signal Processing, vol. 68, pp. 4855–4870, 2020

work page 2020

[14] [14]

Nedic, A

A. Nedic, A. Olshevsky, and W. Shi, ‘‘Achieving ge- ometric convergence for distributed optimization over time-varying graphs,’’SIAM Journal on Optimization, vol. 27, no. 4, pp. 2597–2633, 2017

work page 2017

[15] [15]

S. Pu, W. Shi, J. Xu, and A. Nedić, ‘‘Push–pull gradi- ent methods for distributed optimization in networks,’’ IEEE Transactions on Automatic Control, vol. 66, no. 1, pp. 1–16, 2020

work page 2020

[16] [16]

Pu and A

S. Pu and A. Nedić, ‘‘Distributed stochastic gradient tracking methods,’’Mathematical Programming, vol. 187, no. 1, pp. 409–457, 2021

work page 2021

[17] [17]

Li and Z

H. Li and Z. Lin, ‘‘Accelerated gradient tracking over time-varying graphs for decentralized optimization,’’ Journal of Machine Learning Research, vol. 25, no. 274, pp. 1–52, 2024

work page 2024

[18] [18]

S. A. Alghunaim, E. K. Ryu, K. Y uan, and A. H. Sayed, ‘‘Decentralized proximal gradient algorithms with lin- ear convergence rates,’’IEEE Transactions on Auto- matic Control, vol. 66, no. 6, pp. 2787–2794, 2020

work page 2020

[19] [19]

Scaman, F

K. Scaman, F. Bach, S. Bubeck, Y . T. Lee, and L. Mas- soulié, ‘‘Optimal algorithms for smooth and strongly convex distributed optimization in networks,’’ inPro- ceedings of the 34th International Conference on Ma- chine Learning-V olume 70. JMLR. org, 2017, pp. 3027–3036

work page 2017

[20] [20]

Kovalev, A

D. Kovalev, A. Salim, and P . Richtárik, ‘‘Optimal and practical algorithms for smooth and strongly convex decentralized optimization,’’Advances in Neural Infor- mation Processing Systems, vol. 33, 2020

work page 2020

[21] [21]

Kovalev, E

D. Kovalev, E. Shulgin, P . Richtárik, A. V . Rogozin, and A. Gasnikov, ‘‘Adom: Accelerated decentralized optimization method for time-varying networks,’’ inIn- ternational Conference on Machine Learning. PMLR, 2021, pp. 5784–5793

work page 2021

[22] [22]

Kovalev, E

D. Kovalev, E. Gasanov, A. Gasnikov, and P . Richtarik, ‘‘Lower bounds and optimal algorithms for smooth and strongly convex decentralized optimization over time- varying networks,’’Advances in Neural Information Processing Systems, vol. 34, 2021

work page 2021

[23] [23]

A. S. Nemirovski and D. B. Y udin,Problem Complexity and Method Efficiency in Optimization, ser. A Wiley- Interscience publication. Wiley, 1983

work page 1983

[24] [24]

Nesterov,Introductory Lectures on Convex Optimiza- tion: a basic course

Y . Nesterov,Introductory Lectures on Convex Optimiza- tion: a basic course. Kluwer Academic Publishers, Massachusetts, 2004

work page 2004

[25] [25]

S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Ecksteinet al., ‘‘Distributed optimization and statistical learning via the alternating direction method of multipliers,’’F ounda- tions and Trends®in Machine learning, vol. 3, no. 1, pp. 1–122, 2011

work page 2011

[26] [26]

Chang, ‘‘A proximal dual consensus admm method for multi-agent constrained optimization,’’ IEEE Transactions on Signal Processing, vol

T.-H. Chang, ‘‘A proximal dual consensus admm method for multi-agent constrained optimization,’’ IEEE Transactions on Signal Processing, vol. 64, no. 14, pp. 3719–3734, 2016

work page 2016

[27] [27]

Falsone, I

A. Falsone, I. Notarnicola, G. Notarstefano, and M. Prandini, ‘‘Tracking-admm for distributed constraint-coupled optimization,’’Automatica, vol. 117, p. 108962, 2020

work page 2020

[28] [28]

X. Wu, H. Wang, and J. Lu, ‘‘Distributed optimization with coupling constraints,’’IEEE Transactions on Auto- matic Control, vol. 68, no. 3, pp. 1847–1854, 2022

work page 2022

[29] [29]

arXiv preprint arXiv:2310.15596 , year=

K. Gong and L. Zhang, ‘‘Decentralized proximal method of multipliers for convex optimization with coupled constraints,’’arXiv preprint arXiv:2310.15596, 2023

work page arXiv 2023

[30] [30]

Nesterov,Lectures on Convex Optimization, 2nd ed

Y . Nesterov,Lectures on Convex Optimization, 2nd ed. Springer Cham, 2018

work page 2018

[31] [31]

Nesterov and B

Y . Nesterov and B. T. Polyak, ‘‘Cubic regularization of Newton method and its global performance,’’ 11 Mathematical Programming, vol. 108, pp. 177– 205, 2006. [Online]. Available: https://doi.org/10.1007/ s10107-006-0706-8

work page 2006

[32] [32]

Doikov and Y

N. Doikov and Y . Nesterov, ‘‘Gradient regularization of newton method with bregman distances,’’Mathematical programming, vol. 204, no. 1, pp. 1–25, 2024

work page 2024

[33] [33]

Mishchenko, ‘‘Regularized Newton method with globalO 1 k 2 convergence,’’SIAM Journal on Optimization, vol

K. Mishchenko, ‘‘Regularized Newton method with globalO 1 k 2 convergence,’’SIAM Journal on Optimization, vol. 33, pp. 1440–1462, 2023. [Online]. Available: https://doi.org/10.1137/22M1488752

work page doi:10.1137/22m1488752 2023

[34] [34]

Jiang, C

Y . Jiang, C. He, C. Zhang, D. Ge, B. Jiang, and Y . Y e, ‘‘Beyond nonconvexity: A universal trust-region method with new analyses,’’Journal of Scientific Com- puting, vol. 106, no. 1, p. 28, 2026

work page 2026

[35] [35]

Hanzely, D

S. Hanzely, D. Kamzolov, D. Pasechnyuk, A. Gasnikov, P . Richtárik, and M. Takáč, ‘‘A damped Newton method achieves globalO 1 k 2 and local quadratic convergence rate,’’ inAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 25 320–25 334. [O...

work page 2022

[36] [36]

Islamov, X

R. Islamov, X. Qian, and P . Richtárik, ‘‘Distributed sec- ond order methods with fast rates and compressed com- munication,’’ inInternational conference on machine learning. PMLR, 2021, pp. 4617–4628

work page 2021

[37] [37]

S. Wang, F. Roosta, P . Xu, and M. W. Mahoney, ‘‘Giant: Globally improved approximate newton method for dis- tributed optimization,’’Advances in neural information processing systems, vol. 31, 2018

work page 2018

[38] [38]

Safaryan, R

M. Safaryan, R. Islamov, X. Qian, and P . Richtarik, ‘‘Fednl: Making newton-type methods applicable to federated learning,’’ inInternational Conference on Ma- chine Learning. PMLR, 2022, pp. 18 959–19 010

work page 2022

[39] [39]

Islamov, X

R. Islamov, X. Qian, S. Hanzely, M. Safaryan, and P . Richtárik, ‘‘Distributed newton-type methods with communication compression and bernoulli aggrega- tion,’’Transactions on Machine Learning Research, 2023

work page 2023

[40] [40]

, author Fält, M

A. Agafonov, P . Dvurechensky, G. Scutari, A. Gasnikov, D. Kamzolov, A. Lukashevich, and A. Daneshmand, ‘‘An accelerated second-order method for distributed stochastic optimization,’’ in2021 60th IEEE Conference on Decision and Control (CDC), 2021, pp. 2407–2413. [Online]. Available: https://doi.org/10.1109/CDC45484.2021.9683400

work page doi:10.1109/cdc45484.2021.9683400 2021

[41] [41]

Agafonov, D

A. Agafonov, D. Kamzolov, R. Tappenden, A. Gas- nikov, and M. Takáč, ‘‘Flecs: A federated learning second-order framework via compression and sketch- ing,’’Optimization Methods and Software, vol. 40, no. 5, pp. 1222–1248, 2025

work page 2025

[42] [42]

Daneshmand, G

A. Daneshmand, G. Scutari, P . Dvurechensky, and A. Gasnikov, ‘‘Newton method over networks is fast up to the statistical precision,’’ inInternational Conference on Machine Learning. PMLR, 2021, pp. 2398–2409

work page 2021

[43] [43]

Eisen, A

M. Eisen, A. Mokhtari, and A. Ribeiro, ‘‘Decentralized quasi-newton methods,’’IEEE Transactions on Signal Processing, vol. 65, no. 10, pp. 2613–2628, 2017

work page 2017

[44] [44]

Zhang, H

J. Zhang, H. Liu, A. M.-C. So, and Q. Ling, ‘‘V ariance- reduced stochastic quasi-newton methods for decentral- ized learning,’’IEEE Transactions on Signal Process- ing, vol. 71, pp. 311–326, 2023

work page 2023

[45] [45]

Zhang, Q

J. Zhang, Q. Ling, and A. M.-C. So, ‘‘A newton track- ing algorithm with exact linear convergence for decen- tralized consensus optimization,’’IEEE Transactions on Signal and Information Processing over Networks, vol. 7, pp. 346–358, 2021

work page 2021

[46] [46]

C. A. Uribe and A. Jadbabaie, ‘‘A distributed cubic-regularized newton method for smooth convex optimization over networks,’’arXiv preprint arXiv:2007.03562, 2020

work page arXiv 2007

[47] [47]

Zhang, K

Z. Zhang, K. Che, S. Y ang, and W. Xu, ‘‘Communication-efficient distributed cubic newton with compressed lazy hessian,’’Neural Networks, vol. 174, p. 106212, 2024

work page 2024

[48] [48]

Mokhtari, Q

A. Mokhtari, Q. Ling, and A. Ribeiro, ‘‘Network newton distributed optimization methods,’’IEEE Transactions on Signal Processing, vol. 65, no. 1, pp. 146–161, 2016

work page 2016

[49] [49]

Nesterov, ‘‘Accelerating the cubic regularization of Newton’s method on convex problems,’’Mathematical Programming, vol

Y . Nesterov, ‘‘Accelerating the cubic regularization of Newton’s method on convex problems,’’Mathematical Programming, vol. 112, pp. 159–181, 2008. [Online]. Available: https://doi.org/10.1007/s10107-006-0089-x

work page doi:10.1007/s10107-006-0089-x 2008

[50] [50]

Salim, L

A. Salim, L. Condat, D. Kovalev, and P . Richtárik, ‘‘An optimal algorithm for strongly convex minimization un- der affine constraints,’’ inInternational conference on artificial intelligence and statistics. PMLR, 2022, pp. 4482–4498

work page 2022

[51] [51]

Agafonov, D

A. Agafonov, D. Kamzolov, P . Dvurechensky, A. Gasnikov, and M. Takáč, ‘‘Inexact tensor methods and their application to stochastic convex optimization,’’Optimization Methods and Software, vol. 39, no. 1, pp. 42–83, 2024. [Online]. Available: https://doi.org/10.1080/10556788.2023.2261604

work page doi:10.1080/10556788.2023.2261604 2024

[52] [52]

Kamzolov, D

D. Kamzolov, D. Pasechnyuk, A. Agafonov, A. Gas- nikov, and M. Takáč, ‘‘Optami: Global superlinear convergence of high-order methods,’’arXiv preprint arXiv:2410.04083, 2024

work page arXiv 2024

[53] [53]

Second-Order Methods with Cubic Regularization Under Inexact Information

S. Ghadimi, H. Liu, and T. Zhang, ‘‘Second-order meth- ods with cubic regularization under inexact informa- tion,’’arXiv preprint arXiv:1710.05782, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[54] [54]

Antonakopoulos, A

K. Antonakopoulos, A. Kavis, and V . Cevher, ‘‘Extra-newton: A first approach to noise-adaptive accelerated second-order methods,’’ inAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 29 859–29 872. [Online]. Available: https: //proceeding...

work page 2022

[55] [55]

Agafonov, D

A. Agafonov, D. Kamzolov, A. Gasnikov, A. Kavis, K. Antonakopoulos, V . Cevher, and M. Takáč, ‘‘Advanc- ing the lower bounds: an accelerated, stochastic, second- order method with optimal adaptation to inexactness,’’ inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=otU31x3fus AP...

work page 2024

[56] [56]

Ifεis sufficiently small, namelyε≤12(L+ ¯L2)D3, we set: N= q 108(L+¯L2)D3 ε −2,(61) ˆ∆ˆx ≤min √ 2ε 288¯L1D , √ 3ε(L+¯L2) 144¯L2 √ D , √ε 3 √ (L+¯L2)D .(62)

work page

[57] [57]

Then after N+ 1iterations Algorithm 1 outputs anε-solution, i.e., f( ¯xN+1)−f(x ∗)≤ε

Otherwise, ifε >12(L+ ¯L2)D3, we set N= 1and: ˆ∆ˆx ≤min √ 2ε 288¯L1D , √ 3ε(L+¯L2) 144¯L2 √ D , ε1/3 (6(L+¯L2))1/3 ,D .(63) We also setγ= √ (N+1)(N+2) 6D . Then after N+ 1iterations Algorithm 1 outputs anε-solution, i.e., f( ¯xN+1)−f(x ∗)≤ε. Proof.By the conditions of Theorem 4, the regularization parameters must satisfyδ 1,k =δ 1 ≥max j∈[N] ∆1|ˆx,j andδ ...

work page

[58] [58]

= ε 8 ≤ ε 6 , 3δ1D 5√ 6 = 15√ 6 D √ 2 72 ε D = 5 24 √ 3 ε≤ ε 6 , 9δ2D2 5 6 = 15 2 D2 √ 3 36 q ε(L+¯L2) D ≤ 15 2·72 ε≤ ε 6 . Then, using the third and the fourth bounds from (63), for (65) we obtain: L+¯L2 3 (3) ˆ∆3 ˆx ≤(L+ ¯L2) h ε 6(L+¯L2) i = ε 6 , 9 6D δ1 ˆ∆2 ˆx ≤ 3 2D √ 2 72 ε D D2 = √ 2 48 ε≤ ε 12 , 3δ2 ˆ∆2 ˆx ≤3 √ 3 36 q ε(L+¯L2) D D2 ≤ ε 24 . Summi...

work page