Accelerated Decentralized Constraint-Coupled Optimization: A Dual$^2$ Approach

Jingwang Li; Vincent Lau

arxiv: 2505.03719 · v4 · submitted 2025-05-06 · 🧮 math.OC · cs.SY· eess.SY

Accelerated Decentralized Constraint-Coupled Optimization: A Dual² Approach

Jingwang Li , Vincent Lau This is my paper

Pith reviewed 2026-05-22 15:54 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SY

keywords decentralized optimizationconstraint-coupled problemsdual approachaccelerated gradient methodsasymptotic convergencelinear convergencenetworked systemscomplexity bounds

0 comments

The pith

A dual squared approach produces two accelerated algorithms for decentralized optimization with shared constraints that converge under milder conditions on the public cost function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper focuses on problems where separate agents each minimize their own private costs but must collectively satisfy one shared equality constraint that links all their decisions through a public variable. The authors build a dual squared method that dualizes the constraint twice, enabling two new accelerated gradient algorithms to coordinate without a central coordinator. If these algorithms work as described, agents could solve such problems with less data exchange and fewer calculations than before, which matters for applications like distributed resource allocation where networks of devices must agree on totals without revealing all private details. The key advance is that convergence holds even when the public function meets only weaker requirements than those demanded by earlier decentralized solvers.

Core claim

Building on a novel dual² approach, the paper develops the inexact Dual² Accelerated gradient method and the Multi-consensus inexact Dual² Accelerated gradient method. Both iD2A and MiD2A guarantee asymptotic convergence under a milder condition on h compared to existing algorithms. Under additional assumptions they establish linear convergence rates and significantly lower communication and computational complexity bounds.

What carries the argument

The dual² approach, which applies dualization twice to the constraint-coupled problem so that local dual updates can coordinate the shared equality constraint across the network while supporting acceleration and inexact steps.

If this is right

The algorithms reach the optimal solution asymptotically whenever h meets the stated milder condition.
Linear convergence holds when extra assumptions such as strong convexity are added.
Communication rounds and per-agent computations drop below the bounds reported for earlier decentralized solvers of the same problem class.
Multi-consensus steps in one variant further cut the number of neighbor exchanges needed for coordination.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dual² construction could extend to problems with inequality constraints or time-varying networks if the connectivity requirement is relaxed.
Lower complexity bounds suggest these methods might run on battery-limited sensors or edge devices that cannot afford frequent messaging.
Similar dualization tricks may help other multi-agent problems where only partial information is shared among neighbors.

Load-bearing premise

The communication network must be undirected and connected so that information about the shared constraint propagates to every agent.

What would settle it

A numerical test or analytic counterexample in which iD2A or MiD2A fails to converge asymptotically on a problem where h satisfies only the paper's milder condition but not the stronger conditions used by prior methods.

Figures

Figures reproduced from arXiv: 2505.03719 by Jingwang Li, Vincent Lau.

**Figure 1.** Figure 1: Results of Experiment I: Decentralized Elastic Net Regression (n = 8, p = 20, d = 9, κC = 25, κf = 1, κpd = 98603) 10 . 0.0 0.2 0.4 0.6 0.8 1.0 Number of calls to f and proxg 1e5 10 4 10 3 10 2 10 1 10 0 10 1 Optimality gap DCPA NPGA-EXTRA NPGA-ATC tracking NPGA-II iD2A-PDPG, > 0 iD2A-iDAPG, > 0 MiD2A-iDAPG, > 0 0.0 0.2 0.4 0.6 0.8 1.0 Number of calls to A, A , and h * /proxh *1e5 10 4 10 3 10 2 10 1 10 0 … view at source ↗

**Figure 2.** Figure 2: Results of Experiment II: Decentralized Constrained Linear Regression (n = 8, p = 9, d = 9, κC = 25, κf = 1, κAρ = 8.24 × 1015, κA′ ρ = 4.89 × 1016). efficient algorithm, defined as the one with minimal oracle complexity. In Appendix S, we summarize the SOTA oracle complexities for different cases of (SPP), enabling us to conveniently choose the best algorithm and obtain its associated oracle complexity. R… view at source ↗

**Figure 3.** Figure 3: Results of Experiment III: Decentralized Resource Allocation (n = 20, p = 10, d = 40, κC = 99, κf = 1000, κA = 8)14 . and g in the ERM problem (96) can typically be decomposed into a sum of local functions. For instance, when θ = [θ1, · · · , θn] with θi ∈ R di , we have • ℓ2 regularizer: 1 2 ∥θ∥ 2 = Pn i=1 1 2 ∥θi∥ 2 ; • ℓ1 regularizer: ∥θ∥1 = Pn i=1 ∥θi∥1 ; • Nonnegative indicator function: ιRd + (θ) = P… view at source ↗

read the original abstract

In this paper, we focus on a class of decentralized constraint-coupled optimization problem: $\min_{x_i \in \mathbb{R}^{d_i}, i \in \mathcal{I}; y \in \mathbb{R}^p}$ $\sum_{i=1}^n\left(f_i(x_i) + g_i(x_i)\right) + h(y) \ \text{s.t.} \ \sum_{i=1}^{n}A_ix_i = y$, over an undirected and connected network of $n$ agents. Here, $f_i$, $g_i$, and $A_i$ represent private information of agent $i \in \mathcal{I} = \{1, \cdots, n\}$, while $h$ is public for all agents. Building on a novel dual$^2$ approach, we develop two accelerated algorithms to solve this problem: the inexact Dual$^2$ Accelerated (iD2A) gradient method and the Multi-consensus inexact Dual$^2$ Accelerated (MiD2A) gradient method. We demonstrate that both iD2A and MiD2A can guarantee asymptotic convergence under a milder condition on $h$ compared to existing algorithms. Furthermore, under additional assumptions, we establish linear convergence rates and derive significantly lower communication and computational complexity bounds than those of existing algorithms. Several numerical experiments validate our theoretical analysis and demonstrate the practical superiority of the proposed algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The dual² approach relaxes conditions on the public function h and yields algorithms with lower communication complexity for decentralized constraint-coupled problems.

read the letter

Hi colleague, The punchline on this paper is that the dual² approach allows for asymptotic convergence in decentralized constraint-coupled optimization under a milder condition on the public function h than previous methods, while also providing linear convergence rates and reduced communication and computational complexity under additional assumptions. What is actually new is the construction of the iD2A and MiD2A algorithms. These use an inexact accelerated gradient method on a dual formulation that incorporates a second dual layer to manage the shared constraint across agents. The paper derives the convergence results using standard but carefully applied Lyapunov-style arguments, and the complexity bounds show measurable improvements over cited existing algorithms. The numerical experiments validate the theory by showing practical superiority in terms of iterations needed for convergence in sample problems. The paper does well in clearly stating the problem setup with private functions f_i, g_i and public h, and in making the network assumptions explicit. The citation pattern builds on dual decomposition methods without reinventing the wheel, and the focus on communication efficiency is relevant for real-world multi-agent applications. Where the soft spots are, they are relatively minor. The requirement for an undirected and connected network is necessary for the dual coordination but means the results do not directly apply to directed or dynamic networks, which could be a limitation in some control scenarios. The significant complexity reductions appear only when extra assumptions like strong convexity are added, so the headline benefits are not unconditional. The soundness of the proofs seems okay from the internal checks, with no obvious circularity. This paper is aimed at researchers in distributed optimization and multi-agent control systems. Readers who deal with resource allocation or decentralized decision making will find the algorithms and bounds useful for designing faster methods. It shows clear thinking and honest engagement with the literature, so it deserves a serious referee. I recommend that a serious editor send this to peer review rather than desk rejecting it.

Referee Report

2 major / 3 minor

Summary. The manuscript addresses a class of decentralized constraint-coupled optimization problems: minimize sum_i (f_i(x_i) + g_i(x_i)) + h(y) subject to sum_i A_i x_i = y, where f_i, g_i, A_i are private to agents on an undirected connected network and h is public. It introduces a dual² approach yielding two accelerated methods—the inexact Dual² Accelerated (iD2A) gradient method and the Multi-consensus inexact Dual² Accelerated (MiD2A) gradient method. The central claims are asymptotic convergence under a milder condition on h than prior algorithms, plus linear convergence rates and improved communication/computational complexity bounds under additional assumptions, supported by Lyapunov-style analysis and numerical experiments.

Significance. If the derivations hold, the work is significant for distributed optimization because it relaxes conditions on the public coupling function h while delivering lower complexity, which is valuable for scalable networked systems. The reliance on standard Lyapunov arguments for inexact accelerated updates and the explicit focus on communication efficiency are strengths; the post-hoc numerical validation helps but would be stronger with quantitative baseline comparisons.

major comments (2)

[§4.2] §4.2, convergence theorem for iD2A: the milder condition on h is load-bearing for the headline claim; the proof sketch should explicitly isolate the step where the dual² construction relaxes the requirement relative to standard dual methods (e.g., by contrasting the Lipschitz or strong-convexity assumption on h).
[Table 2] Table 2 (complexity comparison): the reported communication complexity for MiD2A is O(1/ε) versus O(1/√ε) for baselines; the table should include the precise dependence on network size n and the condition number to substantiate the 'significantly lower' claim.

minor comments (3)

[Abstract] Abstract: the milder condition on h is mentioned but not stated explicitly; adding one sentence summarizing the precise relaxation would improve readability.
[§5] §5, numerical experiments: the figures show practical superiority but lack error bars or multiple random seeds; including these would strengthen the validation of the complexity claims.
[Notation] Notation section: the dual variables introduced in the dual² reformulation should be defined at first appearance rather than deferred to the algorithm box.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and positive assessment of our manuscript. We address each major comment point by point below, indicating the revisions we will incorporate to improve clarity.

read point-by-point responses

Referee: [§4.2] §4.2, convergence theorem for iD2A: the milder condition on h is load-bearing for the headline claim; the proof sketch should explicitly isolate the step where the dual² construction relaxes the requirement relative to standard dual methods (e.g., by contrasting the Lipschitz or strong-convexity assumption on h).

Authors: We agree that an explicit isolation of the relaxation step would strengthen the presentation. In the revised version, we will expand the proof sketch in §4.2 to include a direct contrast with standard dual methods. We will highlight the precise point in the Lyapunov analysis where the dual² construction permits milder conditions on h (without requiring strong convexity or Lipschitz continuity as in conventional dual approaches), thereby clarifying how the nested dual structure achieves the claimed relaxation. revision: yes
Referee: [Table 2] Table 2 (complexity comparison): the reported communication complexity for MiD2A is O(1/ε) versus O(1/√ε) for baselines; the table should include the precise dependence on network size n and the condition number to substantiate the 'significantly lower' claim.

Authors: We will update Table 2 to explicitly report the dependence on network size n and the condition number for all compared algorithms. This revision will provide a more detailed and substantiated comparison, confirming that MiD2A achieves O(1/ε) communication complexity with improved scaling in n and the condition number relative to the O(1/√ε) bounds of the baselines. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper develops iD2A and MiD2A via a novel dual² construction and proves asymptotic convergence under a milder condition on h (plus linear rates and complexity bounds under extra assumptions) using standard Lyapunov arguments for inexact accelerated updates. Network connectivity is invoked only to ensure dual coordination propagates, which is a conventional assumption rather than a fitted or self-referential step. No load-bearing claim reduces by construction to a parameter fit, self-citation chain, or renamed input; the central results remain independent of the target convergence statements.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard domain assumptions for decentralized optimization; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption The communication network is undirected and connected.
Invoked in the problem statement to ensure consensus and constraint satisfaction across agents.
domain assumption The local functions f_i, g_i and public function h satisfy conditions sufficient for the dual updates to converge (convexity or smoothness implied).
Required for the stated asymptotic and linear convergence guarantees.

pith-pipeline@v0.9.0 · 5789 in / 1321 out tokens · 34910 ms · 2026-05-22T15:54:23.199771+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Building on a novel dual² approach, we develop two accelerated algorithms... guarantee asymptotic convergence under a milder condition on h... linear convergence rates and significantly lower communication and computational complexity bounds
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean costAlphaLog_high_calibrated_iff unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

F_ρ(y) = H^*(−√C y) ... Condition 1: F_ρ is convex and L_Fρ-smooth

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 2 internal anchors

[1]

Dual consensus proximal algorithm for multi-agent sharing problems,

S. A. Alghunaim, Q. Lyu, M. Yan, and A. H. Sayed, “Dual consensus proximal algorithm for multi-agent sharing problems,” IEEE Transactions on Signal Processing, vol. 69, pp. 5568–5579, 2021

work page 2021
[2]

Npga: A unified algorithmic framework for decentralized constraint-coupled optimization,

J. Li and H. Su, “Npga: A unified algorithmic framework for decentralized constraint-coupled optimization,”IEEE Transactions on Control of Network Systems, vol. 11, no. 3, pp. 1655–1666, 2024

work page 2024
[3]

Tracking-admm for distributed constraint-coupled optimiza- tion,

A. Falsone, I. Notarnicola, G. Notarstefano, and M. Prandini, “Tracking-admm for distributed constraint-coupled optimiza- tion,”Automatica, vol. 117, p. 108962, 2020

work page 2020
[4]

Implicit tracking-based distributed constraint-coupled optimization,

J. Li and H. Su, “Implicit tracking-based distributed constraint-coupled optimization,”IEEE Transactions on Control of Network Systems, vol. 10, no. 1, pp. 479–490, 2022

work page 2022
[5]

Distributed energy resource management: All-time resource-demand feasibility, delay-tolerance, nonlinearity, and beyond,

M. Doostmohammadian, “Distributed energy resource management: All-time resource-demand feasibility, delay-tolerance, nonlinearity, and beyond,”IEEE Control Systems Letters, 2023

work page 2023
[6]

Distributed optimization of constraint-coupled systems via approximations of the dual function,

V . Yfantis, “Distributed optimization of constraint-coupled systems via approximations of the dual function,” Ph.D. dissertation, Rheinland-Pf ¨alzische Technische Universit ¨at Kaiserslautern-Landau, 2024

work page 2024
[7]

Vertical federated learning: Concepts, advances, and challenges,

Y . Liu, Y . Kang, T. Zou, Y . Pu, Y . He, X. Ye, Y . Ouyang, Y .-Q. Zhang, and Q. Yang, “Vertical federated learning: Concepts, advances, and challenges,”IEEE Transactions on Knowledge and Data Engineering, 2024

work page 2024
[8]

Nesterov,Lectures on Convex Optimization

Y . Nesterov,Lectures on Convex Optimization. Springer, 2018, vol. 137

work page 2018
[9]

Multi-agent distributed optimization via inexact consensus admm,

T.-H. Chang, M. Hong, and X. Wang, “Multi-agent distributed optimization via inexact consensus admm,”IEEE Transactions on Signal Processing, vol. 63, no. 2, pp. 482–497, 2014

work page 2014
[10]

Distributed primal-dual method for convex optimization with coupled constraints,

Y . Su, Q. Wang, and C. Sun, “Distributed primal-dual method for convex optimization with coupled constraints,”IEEE Transactions on Signal Processing, vol. 70, pp. 523–535, 2022

work page 2022
[11]

A proximal diffusion strategy for multiagent optimization with sparse affine constraints,

S. A. Alghunaim, K. Yuan, and A. H. Sayed, “A proximal diffusion strategy for multiagent optimization with sparse affine constraints,”IEEE Transactions on Automatic Control, vol. 65, no. 11, pp. 4554–4567, 2019

work page 2019
[12]

Proximal nested primal-dual gradient algorithms for distributed constraint-coupled composite optimization,

J. Li, Q. An, and H. Su, “Proximal nested primal-dual gradient algorithms for distributed constraint-coupled composite optimization,”Applied Mathematics and Computation, vol. 444, p. 127801, 2023

work page 2023
[13]

Improved convergence rates for distributed resource allocation,

A. Nedi ´c, A. Olshevsky, and W. Shi, “Improved convergence rates for distributed resource allocation,” in2018 IEEE Conference on Decision and Control. IEEE, 2018, pp. 172–177

work page 2018
[14]

Dual acceleration for minimax optimization: Linear convergence under relaxed assumptions,

J. Li and X. Li, “Dual acceleration for minimax optimization: Linear convergence under relaxed assumptions,”arXiv preprint arXiv:2505.02115, 2025

work page arXiv 2025
[15]

Distributed optimization and statistical learning via the alternating direction method of multipliers,

S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Ecksteinet al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,”Foundations and Trends® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011

work page 2011
[16]

K. J. Arrow, L. Hurwicz, and H. Uzawa,Studies in Linear and Nonlinear Programming. Stanford University Press, 1958

work page 1958
[17]

A first-order primal-dual algorithm for convex problems with applications to imaging,

A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,”Journal of Mathematical Imaging and Vision, vol. 40, pp. 120–145, 2011

work page 2011
[18]

Discrete-time dynamic average consensus,

M. Zhu and S. Mart ´ınez, “Discrete-time dynamic average consensus,”Automatica, vol. 46, no. 2, pp. 322–329, 2010

work page 2010
[19]

Distributed dual gradient tracking for resource allocation in unbalanced networks,

J. Zhang, K. You, and K. Cai, “Distributed dual gradient tracking for resource allocation in unbalanced networks,”IEEE Transactions on Signal Processing, vol. 68, pp. 2186–2198, 2020

work page 2020
[20]

Decentralized constraint-coupled optimization with inexact oracle,

J. Li and H. Su, “Decentralized constraint-coupled optimization with inexact oracle,”arXiv preprint arXiv:2309.06330, 2023. 25

work page arXiv 2023
[21]

A universal catalyst for first-order optimization,

H. Lin, J. Mairal, and Z. Harchaoui, “A universal catalyst for first-order optimization,”Advances in Neural Information Processing Systems, vol. 28, 2015

work page 2015
[22]

Optimal algorithms for smooth and strongly convex distributed optimization in networks,

K. Scaman, F. Bach, S. Bubeck, Y . T. Lee, and L. Massouli ´e, “Optimal algorithms for smooth and strongly convex distributed optimization in networks,” inInternational Conference on Machine Learning. PMLR, 2017, pp. 3027–3036

work page 2017
[23]

Ideal: Inexact decentralized accelerated augmented lagrangian method,

Y . Arjevani, J. Bruna, B. Can, M. Gurbuzbalaban, S. Jegelka, and H. Lin, “Ideal: Inexact decentralized accelerated augmented lagrangian method,”Advances in Neural Information Processing Systems, vol. 33, pp. 20 648–20 659, 2020

work page 2020
[24]

Deep learning with differential privacy,

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” inProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016, pp. 308–318

work page 2016
[25]

Hiriart-Urruty and C

J.-B. Hiriart-Urruty and C. Lemar ´echal,Fundamentals of Convex Analysis. Springer Science & Business Media, 2004

work page 2004
[26]

Vandenberghe,Optimization Methods for Large-Scale Systems

L. Vandenberghe,Optimization Methods for Large-Scale Systems. Lecture Slides, UCLA, 2022. [Online]. Available: https://www.seas.ucla.edu/∼vandenbe/236C

work page 2022
[27]

R. T. Rockafellar,Convex Analysis. Princeton University Press, 1970

work page 1970
[28]

Extra: An exact first-order algorithm for decentralized consensus optimization,

W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact first-order algorithm for decentralized consensus optimization,” SIAM Journal on Optimization, vol. 25, no. 2, pp. 944–966, 2015

work page 2015
[29]

Smoothness of the Augmented Lagrangian Dual in Convex Optimization

J. Li and V . Lau, “Smoothness of the augmented lagrangian dual in convex optimization,”arXiv preprint arXiv:2505.01824, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

On accelerated proximal gradient methods for convex-concave optimization,

P. Tseng, “On accelerated proximal gradient methods for convex-concave optimization,”submitted to SIAM Journal on Optimization, 2008. [Online]. Available: https://www.mit.edu/ ∼dimitrib/PTseng/papers/apgm.pdf

work page 2008
[31]

Convergence rates of inexact proximal-gradient methods for convex optimization,

M. Schmidt, N. Roux, and F. Bach, “Convergence rates of inexact proximal-gradient methods for convex optimization,” Advances in Neural Information Processing Systems, vol. 24, 2011

work page 2011
[32]

Auzinger and J

W. Auzinger and J. Melenk,Iterative Solution of Large Linear Systems. 13 Lecture Notes, TU Wien, 2017. [Online]. Available: https://storage.jingwangli.com/doc/script.pdf

work page 2017
[33]

Chebyshev acceleration of iterative refinement,

M. Arioli and J. Scott, “Chebyshev acceleration of iterative refinement,”Numerical Algorithms, vol. 66, no. 3, pp. 591–608, 2014

work page 2014
[34]

Lifted primal-dual method for bilinearly coupled smooth minimax optimization,

K. K. Thekumparampil, N. He, and S. Oh, “Lifted primal-dual method for bilinearly coupled smooth minimax optimization,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 4281–4308

work page 2022
[35]

Accelerated primal-dual proximal gradient splitting methods for convex-concave saddle-point problems,

H. Luo, “Accelerated primal-dual proximal gradient splitting methods for convex-concave saddle-point problems,”arXiv preprint arXiv:2407.20195, 2024

work page arXiv 2024
[36]

Accelerated primal-dual gradient method for smooth and convex-concave saddle-point problems with bilinear coupling,

D. Kovalev, A. Gasnikov, and P. Richt ´arik, “Accelerated primal-dual gradient method for smooth and convex-concave saddle-point problems with bilinear coupling,”Advances in Neural Information Processing Systems, vol. 35, pp. 21 725– 21 737, 2022

work page 2022
[37]

An optimal algorithm for strongly convex minimization under affine constraints,

A. Salim, L. Condat, D. Kovalev, and P. Richt ´arik, “An optimal algorithm for strongly convex minimization under affine constraints,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 4482–4498

work page 2022
[38]

On the evolution of random graphs,

P. Erdos and A. R ´enyi, “On the evolution of random graphs,”Publications of the Mathematical Institute of the Hungarian Academy of Sciences, vol. 5, no. 1, pp. 17–60, 1960

work page 1960
[39]

Boyd and L

S. Boyd and L. Vandenberghe,Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares. Cambridge university press, 2018

work page 2018
[40]

H. H. Bauschke and P. L. Combettes,Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, 2017

work page 2017
[41]

On the Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient

X. Zhou, “On the fenchel duality between strong convexity and lipschitz continuous gradient,”arXiv preprint arXiv:1803.06573, 2018. 13The original lecture notes (formerly at https://www.asc.tuwien.ac.at/ ∼winfried/) are now unavailable. With Prof. Winfried Auzinger’s permission, we host an archived copy at https://storage.jingwangli.com/doc/script.pdf for...

work page internal anchor Pith review Pith/arXiv arXiv 2018
[42]

Hiriart-Urruty and C

J.-B. Hiriart-Urruty and C. Lemar ´echal,Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods. Springer-Verlag Berlin Heidelberg, 1993, vol. 306

work page 1993
[43]

J. W. Hardin and J. M. Hilbe,Generalized Linear Models and Extensions. Stata Press, 2018

work page 2018
[44]

On the fitting of generalized linear models with nonnegativity parameter constraints,

J. W. McDonald and I. D. Diamond, “On the fitting of generalized linear models with nonnegativity parameter constraints,” Biometrics, pp. 201–206, 1990

work page 1990
[45]

R. A. Horn and C. R. Johnson,Matrix Analysis. Cambridge University Press, 2012

work page 2012
[46]

On lower iteration complexity bounds for the convex concave saddle point problems,

J. Zhang, M. Hong, and S. Zhang, “On lower iteration complexity bounds for the convex concave saddle point problems,” Mathematical Programming, vol. 194, no. 1, pp. 901–935, 2022. Jingwang Lireceived the B.Mgt. degree in Engineering Management from Huazhong Agricultural University (2019), and the M.Eng. degree in Control Science and Engineering from Huazh...

work page 2022
[47]

He is currently the Chair Professor and the Founding Director of the Huawei-HKUST Innovation Laboratory, HKUST. His current research interests include robust and delay-optimal cross layer optimization for MIMO/OFDM wireless systems, interference mitigation techniques for wireless networks, massive MIMO, M2M, and network control systems. 27 APPENDIXA PROOF...

work page
[48]

A specific class of decentralized resource allocation problem (DRAP), known as the decentralized energy resource management problem, aims to optimize energy costs [5]

Decentralized Resource Allocation:Consider a system comprising ofn p energy-production nodes andn r energy-reserving nodes. A specific class of decentralized resource allocation problem (DRAP), known as the decentralized energy resource management problem, aims to optimize energy costs [5]. This can be formulated 40 as min xi∈R,i=1,···,n p; yj ∈R,j=1,···,...

work page
[49]

41 Problem (94) is a special case of (P1)

Decentralized Model Predictive Control:Consider a class of decentralized linear model predictive control problems given in [6]: min x0 i ,···,x K i ∈Rdxi ; u0 i ,···,u K−1 i ∈Rdui , i∈I nX i=1 J f i (xK i ) + K−1X k=0 Ji(xk i , uk i ) ! s.t.x k+1 i =A ixk i +B iuk i , k= 0,· · ·, K−1, i∈ I, x0 i = ˜xi(t0), i∈ I, xk i ∈ X i ⊆R dxi , k= 1,· · ·, K, i∈ I, uk...

work page
[50]

Number of calls toAandA ⊤

Decentralized Learning:Consider a dataset with a raw feature matrixX ′ ∈R p×(d−1) and a label vector y∈R p, letX= [X ′,1 p]∈R p×d. When employing generalized linear models (GLMs) [43], such as linear regression, logistic regression, Poisson regression, or constrained GLMs [44] like nonnegative least squares, to fit the dataset, the associated regularized ...

work page

[1] [1]

Dual consensus proximal algorithm for multi-agent sharing problems,

S. A. Alghunaim, Q. Lyu, M. Yan, and A. H. Sayed, “Dual consensus proximal algorithm for multi-agent sharing problems,” IEEE Transactions on Signal Processing, vol. 69, pp. 5568–5579, 2021

work page 2021

[2] [2]

Npga: A unified algorithmic framework for decentralized constraint-coupled optimization,

J. Li and H. Su, “Npga: A unified algorithmic framework for decentralized constraint-coupled optimization,”IEEE Transactions on Control of Network Systems, vol. 11, no. 3, pp. 1655–1666, 2024

work page 2024

[3] [3]

Tracking-admm for distributed constraint-coupled optimiza- tion,

A. Falsone, I. Notarnicola, G. Notarstefano, and M. Prandini, “Tracking-admm for distributed constraint-coupled optimiza- tion,”Automatica, vol. 117, p. 108962, 2020

work page 2020

[4] [4]

Implicit tracking-based distributed constraint-coupled optimization,

J. Li and H. Su, “Implicit tracking-based distributed constraint-coupled optimization,”IEEE Transactions on Control of Network Systems, vol. 10, no. 1, pp. 479–490, 2022

work page 2022

[5] [5]

Distributed energy resource management: All-time resource-demand feasibility, delay-tolerance, nonlinearity, and beyond,

M. Doostmohammadian, “Distributed energy resource management: All-time resource-demand feasibility, delay-tolerance, nonlinearity, and beyond,”IEEE Control Systems Letters, 2023

work page 2023

[6] [6]

Distributed optimization of constraint-coupled systems via approximations of the dual function,

V . Yfantis, “Distributed optimization of constraint-coupled systems via approximations of the dual function,” Ph.D. dissertation, Rheinland-Pf ¨alzische Technische Universit ¨at Kaiserslautern-Landau, 2024

work page 2024

[7] [7]

Vertical federated learning: Concepts, advances, and challenges,

Y . Liu, Y . Kang, T. Zou, Y . Pu, Y . He, X. Ye, Y . Ouyang, Y .-Q. Zhang, and Q. Yang, “Vertical federated learning: Concepts, advances, and challenges,”IEEE Transactions on Knowledge and Data Engineering, 2024

work page 2024

[8] [8]

Nesterov,Lectures on Convex Optimization

Y . Nesterov,Lectures on Convex Optimization. Springer, 2018, vol. 137

work page 2018

[9] [9]

Multi-agent distributed optimization via inexact consensus admm,

T.-H. Chang, M. Hong, and X. Wang, “Multi-agent distributed optimization via inexact consensus admm,”IEEE Transactions on Signal Processing, vol. 63, no. 2, pp. 482–497, 2014

work page 2014

[10] [10]

Distributed primal-dual method for convex optimization with coupled constraints,

Y . Su, Q. Wang, and C. Sun, “Distributed primal-dual method for convex optimization with coupled constraints,”IEEE Transactions on Signal Processing, vol. 70, pp. 523–535, 2022

work page 2022

[11] [11]

A proximal diffusion strategy for multiagent optimization with sparse affine constraints,

S. A. Alghunaim, K. Yuan, and A. H. Sayed, “A proximal diffusion strategy for multiagent optimization with sparse affine constraints,”IEEE Transactions on Automatic Control, vol. 65, no. 11, pp. 4554–4567, 2019

work page 2019

[12] [12]

Proximal nested primal-dual gradient algorithms for distributed constraint-coupled composite optimization,

J. Li, Q. An, and H. Su, “Proximal nested primal-dual gradient algorithms for distributed constraint-coupled composite optimization,”Applied Mathematics and Computation, vol. 444, p. 127801, 2023

work page 2023

[13] [13]

Improved convergence rates for distributed resource allocation,

A. Nedi ´c, A. Olshevsky, and W. Shi, “Improved convergence rates for distributed resource allocation,” in2018 IEEE Conference on Decision and Control. IEEE, 2018, pp. 172–177

work page 2018

[14] [14]

Dual acceleration for minimax optimization: Linear convergence under relaxed assumptions,

J. Li and X. Li, “Dual acceleration for minimax optimization: Linear convergence under relaxed assumptions,”arXiv preprint arXiv:2505.02115, 2025

work page arXiv 2025

[15] [15]

Distributed optimization and statistical learning via the alternating direction method of multipliers,

S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Ecksteinet al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,”Foundations and Trends® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011

work page 2011

[16] [16]

K. J. Arrow, L. Hurwicz, and H. Uzawa,Studies in Linear and Nonlinear Programming. Stanford University Press, 1958

work page 1958

[17] [17]

A first-order primal-dual algorithm for convex problems with applications to imaging,

A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,”Journal of Mathematical Imaging and Vision, vol. 40, pp. 120–145, 2011

work page 2011

[18] [18]

Discrete-time dynamic average consensus,

M. Zhu and S. Mart ´ınez, “Discrete-time dynamic average consensus,”Automatica, vol. 46, no. 2, pp. 322–329, 2010

work page 2010

[19] [19]

Distributed dual gradient tracking for resource allocation in unbalanced networks,

J. Zhang, K. You, and K. Cai, “Distributed dual gradient tracking for resource allocation in unbalanced networks,”IEEE Transactions on Signal Processing, vol. 68, pp. 2186–2198, 2020

work page 2020

[20] [20]

Decentralized constraint-coupled optimization with inexact oracle,

J. Li and H. Su, “Decentralized constraint-coupled optimization with inexact oracle,”arXiv preprint arXiv:2309.06330, 2023. 25

work page arXiv 2023

[21] [21]

A universal catalyst for first-order optimization,

H. Lin, J. Mairal, and Z. Harchaoui, “A universal catalyst for first-order optimization,”Advances in Neural Information Processing Systems, vol. 28, 2015

work page 2015

[22] [22]

Optimal algorithms for smooth and strongly convex distributed optimization in networks,

K. Scaman, F. Bach, S. Bubeck, Y . T. Lee, and L. Massouli ´e, “Optimal algorithms for smooth and strongly convex distributed optimization in networks,” inInternational Conference on Machine Learning. PMLR, 2017, pp. 3027–3036

work page 2017

[23] [23]

Ideal: Inexact decentralized accelerated augmented lagrangian method,

Y . Arjevani, J. Bruna, B. Can, M. Gurbuzbalaban, S. Jegelka, and H. Lin, “Ideal: Inexact decentralized accelerated augmented lagrangian method,”Advances in Neural Information Processing Systems, vol. 33, pp. 20 648–20 659, 2020

work page 2020

[24] [24]

Deep learning with differential privacy,

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” inProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016, pp. 308–318

work page 2016

[25] [25]

Hiriart-Urruty and C

J.-B. Hiriart-Urruty and C. Lemar ´echal,Fundamentals of Convex Analysis. Springer Science & Business Media, 2004

work page 2004

[26] [26]

Vandenberghe,Optimization Methods for Large-Scale Systems

L. Vandenberghe,Optimization Methods for Large-Scale Systems. Lecture Slides, UCLA, 2022. [Online]. Available: https://www.seas.ucla.edu/∼vandenbe/236C

work page 2022

[27] [27]

R. T. Rockafellar,Convex Analysis. Princeton University Press, 1970

work page 1970

[28] [28]

Extra: An exact first-order algorithm for decentralized consensus optimization,

W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact first-order algorithm for decentralized consensus optimization,” SIAM Journal on Optimization, vol. 25, no. 2, pp. 944–966, 2015

work page 2015

[29] [29]

Smoothness of the Augmented Lagrangian Dual in Convex Optimization

J. Li and V . Lau, “Smoothness of the augmented lagrangian dual in convex optimization,”arXiv preprint arXiv:2505.01824, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

On accelerated proximal gradient methods for convex-concave optimization,

P. Tseng, “On accelerated proximal gradient methods for convex-concave optimization,”submitted to SIAM Journal on Optimization, 2008. [Online]. Available: https://www.mit.edu/ ∼dimitrib/PTseng/papers/apgm.pdf

work page 2008

[31] [31]

Convergence rates of inexact proximal-gradient methods for convex optimization,

M. Schmidt, N. Roux, and F. Bach, “Convergence rates of inexact proximal-gradient methods for convex optimization,” Advances in Neural Information Processing Systems, vol. 24, 2011

work page 2011

[32] [32]

Auzinger and J

W. Auzinger and J. Melenk,Iterative Solution of Large Linear Systems. 13 Lecture Notes, TU Wien, 2017. [Online]. Available: https://storage.jingwangli.com/doc/script.pdf

work page 2017

[33] [33]

Chebyshev acceleration of iterative refinement,

M. Arioli and J. Scott, “Chebyshev acceleration of iterative refinement,”Numerical Algorithms, vol. 66, no. 3, pp. 591–608, 2014

work page 2014

[34] [34]

Lifted primal-dual method for bilinearly coupled smooth minimax optimization,

K. K. Thekumparampil, N. He, and S. Oh, “Lifted primal-dual method for bilinearly coupled smooth minimax optimization,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 4281–4308

work page 2022

[35] [35]

Accelerated primal-dual proximal gradient splitting methods for convex-concave saddle-point problems,

H. Luo, “Accelerated primal-dual proximal gradient splitting methods for convex-concave saddle-point problems,”arXiv preprint arXiv:2407.20195, 2024

work page arXiv 2024

[36] [36]

Accelerated primal-dual gradient method for smooth and convex-concave saddle-point problems with bilinear coupling,

D. Kovalev, A. Gasnikov, and P. Richt ´arik, “Accelerated primal-dual gradient method for smooth and convex-concave saddle-point problems with bilinear coupling,”Advances in Neural Information Processing Systems, vol. 35, pp. 21 725– 21 737, 2022

work page 2022

[37] [37]

An optimal algorithm for strongly convex minimization under affine constraints,

A. Salim, L. Condat, D. Kovalev, and P. Richt ´arik, “An optimal algorithm for strongly convex minimization under affine constraints,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 4482–4498

work page 2022

[38] [38]

On the evolution of random graphs,

P. Erdos and A. R ´enyi, “On the evolution of random graphs,”Publications of the Mathematical Institute of the Hungarian Academy of Sciences, vol. 5, no. 1, pp. 17–60, 1960

work page 1960

[39] [39]

Boyd and L

S. Boyd and L. Vandenberghe,Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares. Cambridge university press, 2018

work page 2018

[40] [40]

H. H. Bauschke and P. L. Combettes,Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, 2017

work page 2017

[41] [41]

On the Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient

X. Zhou, “On the fenchel duality between strong convexity and lipschitz continuous gradient,”arXiv preprint arXiv:1803.06573, 2018. 13The original lecture notes (formerly at https://www.asc.tuwien.ac.at/ ∼winfried/) are now unavailable. With Prof. Winfried Auzinger’s permission, we host an archived copy at https://storage.jingwangli.com/doc/script.pdf for...

work page internal anchor Pith review Pith/arXiv arXiv 2018

[42] [42]

Hiriart-Urruty and C

J.-B. Hiriart-Urruty and C. Lemar ´echal,Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods. Springer-Verlag Berlin Heidelberg, 1993, vol. 306

work page 1993

[43] [43]

J. W. Hardin and J. M. Hilbe,Generalized Linear Models and Extensions. Stata Press, 2018

work page 2018

[44] [44]

On the fitting of generalized linear models with nonnegativity parameter constraints,

J. W. McDonald and I. D. Diamond, “On the fitting of generalized linear models with nonnegativity parameter constraints,” Biometrics, pp. 201–206, 1990

work page 1990

[45] [45]

R. A. Horn and C. R. Johnson,Matrix Analysis. Cambridge University Press, 2012

work page 2012

[46] [46]

On lower iteration complexity bounds for the convex concave saddle point problems,

J. Zhang, M. Hong, and S. Zhang, “On lower iteration complexity bounds for the convex concave saddle point problems,” Mathematical Programming, vol. 194, no. 1, pp. 901–935, 2022. Jingwang Lireceived the B.Mgt. degree in Engineering Management from Huazhong Agricultural University (2019), and the M.Eng. degree in Control Science and Engineering from Huazh...

work page 2022

[47] [47]

He is currently the Chair Professor and the Founding Director of the Huawei-HKUST Innovation Laboratory, HKUST. His current research interests include robust and delay-optimal cross layer optimization for MIMO/OFDM wireless systems, interference mitigation techniques for wireless networks, massive MIMO, M2M, and network control systems. 27 APPENDIXA PROOF...

work page

[48] [48]

A specific class of decentralized resource allocation problem (DRAP), known as the decentralized energy resource management problem, aims to optimize energy costs [5]

Decentralized Resource Allocation:Consider a system comprising ofn p energy-production nodes andn r energy-reserving nodes. A specific class of decentralized resource allocation problem (DRAP), known as the decentralized energy resource management problem, aims to optimize energy costs [5]. This can be formulated 40 as min xi∈R,i=1,···,n p; yj ∈R,j=1,···,...

work page

[49] [49]

41 Problem (94) is a special case of (P1)

Decentralized Model Predictive Control:Consider a class of decentralized linear model predictive control problems given in [6]: min x0 i ,···,x K i ∈Rdxi ; u0 i ,···,u K−1 i ∈Rdui , i∈I nX i=1 J f i (xK i ) + K−1X k=0 Ji(xk i , uk i ) ! s.t.x k+1 i =A ixk i +B iuk i , k= 0,· · ·, K−1, i∈ I, x0 i = ˜xi(t0), i∈ I, xk i ∈ X i ⊆R dxi , k= 1,· · ·, K, i∈ I, uk...

work page

[50] [50]

Number of calls toAandA ⊤

Decentralized Learning:Consider a dataset with a raw feature matrixX ′ ∈R p×(d−1) and a label vector y∈R p, letX= [X ′,1 p]∈R p×d. When employing generalized linear models (GLMs) [43], such as linear regression, logistic regression, Poisson regression, or constrained GLMs [44] like nonnegative least squares, to fit the dataset, the associated regularized ...

work page