Convergence of the Safeguarded Augmented Lagrangian Method under the Polyak-Lojasiewicz constraint qualification for Constrained Composite Optimization
Pith reviewed 2026-06-25 20:53 UTC · model grok-4.3
The pith
The safeguarded augmented Lagrangian method converges globally to an M-stationary point under the Polyak-Lojasiewicz constraint qualification when the nonsmooth objective term is locally Lipschitz continuous.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the Polyak-Lojasiewicz constraint qualification, the safeguarded augmented Lagrangian method converges globally to an M-stationary point for constrained composite optimization provided the Lagrange multipliers stay bounded, which holds whenever the nonsmooth part of the objective is locally Lipschitz continuous.
What carries the argument
The Polyak-Lojasiewicz constraint qualification together with bounded Lagrange multipliers, where boundedness follows from local Lipschitz continuity of the nonsmooth objective term.
If this is right
- Global convergence to an M-stationary point holds whenever the nonsmooth objective term is locally Lipschitz continuous.
- The method produces better numerical results on sparse portfolio problems when the sparsity term satisfies the Lipschitz condition.
- Without local Lipschitz continuity one cannot expect bounded multipliers, as shown by an explicit counterexample.
Where Pith is reading between the lines
- The local Lipschitz requirement may be essential for reliable practical behavior whenever an algorithm depends on multiplier boundedness.
- Similar boundedness arguments could be developed for other augmented Lagrangian variants applied to composite problems.
- Portfolio problems using non-Lipschitz penalties may require alternative regularization strategies to restore multiplier stability.
Load-bearing premise
The nonsmooth part of the objective function is locally Lipschitz continuous, which is used to prove that the Lagrange multipliers remain bounded.
What would settle it
An instance of a constrained composite problem where the nonsmooth term fails to be locally Lipschitz continuous, the Polyak-Lojasiewicz constraint qualification holds, yet the Lagrange multipliers become unbounded and the method does not converge to an M-stationary point.
Figures
read the original abstract
In this work we provide theoretical and practical results of the Safeguarded Augmented Lagrangian Method (SALM) for constrained composite optimization problems whose objective is the sum of a smooth and a nonsmooth function. We obtain global convergence results to an M-stationary point for SALM under the Polyak-Lojasiewicz constraint qualification (PLCQ). For this result the boundedness of the Lagrange multipliers is crucial, and this is shown under the assumption that the nonsmooth part of the objective function is locally Lipschitz continuous. A counterexample shows that one cannot expect to get bounded multipliers without such an assumption. The performance of the algorithm is evaluated numerically on a set of sparse portfolio optimization problems with two different regularization terms, one being Lipschitz and the other one being non-Lipschitz. The results are significantly better for the Lipschitz sparsity term, whereas the underlying method generates seemingly unbounded multipliers in many instances when using the non-Lipschitz sparsity function.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes the Safeguarded Augmented Lagrangian Method (SALM) for constrained composite optimization min f(x) + g(x) s.t. h(x)=0, where f is smooth and g nonsmooth. It establishes global convergence of SALM iterates to an M-stationary point under the Polyak-Łojasiewicz constraint qualification (PLCQ), with the key step being a proof that Lagrange multipliers remain bounded when g is locally Lipschitz continuous. A counterexample demonstrates that boundedness fails without the local Lipschitz assumption on g. Numerical results on sparse portfolio problems compare a Lipschitz regularizer (better performance, bounded multipliers) against a non-Lipschitz one (unbounded multipliers in many runs).
Significance. If the convergence analysis holds, the work supplies a precise sufficient condition (local Lipschitz continuity of the nonsmooth term) for bounded multipliers and thus global convergence to M-stationarity under PLCQ, together with an explicit counterexample and reproducible numerical distinction between the two regimes. This strengthens the theoretical toolkit for safeguarded AL methods on composite problems and directly informs practical choice of regularizers.
minor comments (2)
- The abstract and introduction would benefit from an explicit statement of the precise form of the composite problem (smooth + nonsmooth objective, equality constraints) and the definition of M-stationarity used throughout.
- Notation for the safeguarded AL subproblem and the update rules for the penalty and multiplier parameters should be collected in a single preliminary section for easier reference.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work on the convergence of the Safeguarded Augmented Lagrangian Method under PLCQ, the recognition of the role of local Lipschitz continuity for bounded multipliers, and the recommendation to accept.
Circularity Check
No significant circularity
full rationale
The paper delivers a standard convergence proof for SALM to an M-stationary point under the external PLCQ assumption, with a separate bounded-multiplier result that holds only under the local Lipschitz condition on the nonsmooth term (plus an explicit counterexample showing necessity). No derivation step reduces by construction to a fitted quantity, self-definition, or load-bearing self-citation chain; the central claims rest on standard variational analysis and are externally falsifiable via the supplied counterexample and numerics. This is the normal non-circular outcome for a convergence analysis.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Polyak-Lojasiewicz constraint qualification (PLCQ)
Reference graph
Works this paper leans on
-
[1]
On augmented La- grangian methods with general lower-level constraints
R. Andreani, E. G. Birgin, J. M. Martínez, and M. L. Schuverdt. “On augmented La- grangian methods with general lower-level constraints”. In:SIAM Journal on Optimization 18.4 (2007), pp. 1286–1309.doi:10.1137/060654797
-
[2]
Andreani, G
R. Andreani, G. Haeser, R. Prado, and L. Secchin.Primal-dual global convergence of an augmented Lagrangian method under the error bound condition. July 22, 2025.url: https://optimization-online.org/?p=31199
2025
-
[3]
Andreani, G
R. Andreani, G. Haeser, M. L. Schuverdt, and L. D. Secchin.A relaxed quasinormality condition and the boundedness of dual augmented Lagrangian sequences. Oct. 8, 2024.url: https://optimization-online.org/?p=25207
2024
-
[4]
A new sequential optimality condition for constrained optimization and algorithmic consequences
R. Andreani, J. M. Martínez, and B. F. Svaiter. “A new sequential optimality condition for constrained optimization and algorithmic consequences”. In:SIAM Journal on Opti- mization20.6 (2010), pp. 3533–3554.doi:10.1137/090777189
-
[5]
H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. “Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka- Łojasiewicz inequality”. In:Mathematics of Operations Research35.2 (2010), pp. 438– 457.doi:10.1287/moor.1100.0449
-
[6]
Linearly constrained non-Lipschitz optimization for image restora- tion
W. Bian and X. Chen. “Linearly constrained non-Lipschitz optimization for image restora- tion”. In:SIAM Journal on Imaging Sciences8.4 (2015), pp. 2294–2322.doi:10.1137/ 140985639
2015
-
[7]
E. G. Birgin and J. M. Martínez.Practical augmented Lagrangian methods for constrained optimization. Vol. 10. Fundamentals of Algorithms. Philadelphia, PA: Society for Indus- trial and Applied Mathematics (SIAM), 2014.doi:10.1137/1.9781611973365
-
[8]
A singular value thresholding algorithm for matrix completion
J.-F. Cai, E. J. Candès, and Z. Shen. “A singular value thresholding algorithm for matrix completion”. In:SIAM Journal on Optimization20.4 (2010), pp. 1956–1982.doi:10 . 1137/080738970
2010
-
[9]
AnaugmentedLagrangianmethodfornon-Lipschitz nonconvex programming
X.Chen,L.Guo,Z.Lu,andJ.J.Ye.“AnaugmentedLagrangianmethodfornon-Lipschitz nonconvex programming”. In:SIAM Journal on Numerical Analysis55.1 (2017), pp. 168– 193.doi:10.1137/15M1052834
-
[10]
G. Cornuéjols, J. Peña, and R. Tütüncü.Optimization Methods in Finance.2nd edition. Cambridge: Cambridge University Press, 2018.doi:10.1017/9781107297340
-
[11]
Y. Cui and J.-S. Pang.Modern Nonconvex Nondifferentiable Optimization. Vol. 29. MOS- SIAM Series on Optimization. Philadelphia, PA: Society for Industrial and Applied Math- ematics (SIAM), 2021.doi:10.1137/1.9781611976748
-
[12]
S. vom Dahl, A. D. Marchi, and C. Kanzow.Proximal limited-memory quasi-Newton methods for nonsmooth monconvex optimization. 2026. arXiv:2605.11627 [math.OC]
Pith/arXiv arXiv 2026
-
[13]
Nonlinear Covariance Control via Differential Dynamic Programming,
A. De Marchi. “Constrained and sparse switching times optimization via augmented La- grangian proximal methods”. In:2020 American Control Conference, ACC 2020, Denver, CO, USA, July 1-3, 2020. IEEE, 2020, pp. 3633–3638.doi:10.23919/ACC45564.2020. 9147892. 21
-
[14]
Implicit augmented Lagrangian and generalized optimization
A. De Marchi. “Implicit augmented Lagrangian and generalized optimization”. In:Journal of Applied and Numerical Optimization6.2 (2024), pp. 291–320.doi:10.23952/jano.6. 2024.2.08
-
[15]
A.DeMarchi,T.Hoheisel,andP.Mehlitz.Augmented Lagrangian methods for fully convex composite optimization. 2026. arXiv:2511.07117 [math.OC]
Pith/arXiv arXiv 2026
-
[16]
Constrained composite optimization and augmented Lagrangian methods
A. De Marchi, X. Jia, C. Kanzow, and P. Mehlitz. “Constrained composite optimization and augmented Lagrangian methods”. In:Mathematical Programming201.1-2 (A) (2023), pp. 863–896.doi:10.1007/s10107-022-01922-4
-
[17]
A. De Marchi and A. Themelis. “Proximal gradient algorithms under local Lipschitz gra- dient continuity. A convergence and robustness analysis of PANOC”. In:Journal of Opti- mization Theory and Applications194.3 (2022), pp. 771–794.doi:10.1007/s10957-022- 02048-5
-
[18]
Error bounds, quadratic growth, and linear convergence of proximal methods
D. Drusvyatskiy and A. S. Lewis. “Error bounds, quadratic growth, and linear convergence of proximal methods”. In:Mathematics of Operations Research43.3 (2018), pp. 919–948. doi:10.1287/moor.2017.0889
-
[19]
Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties
J. Fan and R. Li. “Variable selection via nonconcave penalized likelihood and its oracle properties.” In:Journal of the American Statistical Association96.456 (2001), pp. 1348– 1360.doi:10.1198/016214501753382273
-
[20]
Complementarity formula- tions ofl 0-norm optimization problems
M. Feng, J. E. Mitchell, J.-S. Pang, X. Shen, and A. Wächter. “Complementarity formula- tions ofl 0-norm optimization problems”. In:Pacific Journal of Optimization14.2 (2018), pp. 273–305
2018
-
[21]
SDP diagonalizations and perspective cuts for a class of nonseparable MIQP
A. Frangioni and C. Gentile. “SDP diagonalizations and perspective cuts for a class of nonseparable MIQP”. In:Operations Research Letters35.2 (2007), pp. 181–185.doi:10. 1016/j.orl.2006.03.008
2007
-
[22]
Journal of Optimization Theory and Applications 4: 303--320
M. R. Hestenes. “Multiplier and gradient methods”. In:Journal of Optimization Theory and Applications4 (1969), pp. 303–320.doi:10.1007/BF00927673
-
[23]
On approximate solutions of systems of linear inequalities
A. J. Hoffman. “On approximate solutions of systems of linear inequalities”. In:Journal of research of the National Bureau of Standards49 (1952), p. 263.url:https://api. semanticscholar.org/CorpusID:17483872
1952
-
[24]
Convergence properties of monotone and nonmonotone prox- imal gradient methods revisited
C. Kanzow and P. Mehlitz. “Convergence properties of monotone and nonmonotone prox- imal gradient methods revisited”. In:Journal of Optimization Theory and Applications 195.2 (2022), pp. 624–646.doi:10.1007/s10957-022-02101-3
-
[25]
An example comparing the standard and safeguarded aug- mented Lagrangian methods
C. Kanzow and D. Steck. “An example comparing the standard and safeguarded aug- mented Lagrangian methods”. In:Operations Research Letters45.6 (2017), pp. 598–603. doi:10.1016/j.orl.2017.09.005
-
[26]
Linear convergence of gradient and proximal- gradient methods inder the Polyak-Łojasiewicz condition
H. Karimi, J. Nutini, and M. Schmidt. “Linear convergence of gradient and proximal- gradient methods inder the Polyak-Łojasiewicz condition”. In:Machine Learning and Knowledge Discovery in Databases. Ed. by P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken. Cham: Springer International Publishing, 2016, pp. 795–811.doi:https : //doi.org/10.1007/978-3-319-46128-1
-
[27]
Statistical inverse problems: discretization, model reduction and inverse crimes
Y. Liu and R. Lin. “A bisection method for computing the proximal operator of the ℓp-norm for any0< p <1with application to Schattenp-norms”. In:Journal of Compu- tational and Applied Mathematics447 (2024). Id/No 115897, p. 9.doi:10.1016/j.cam. 2024.115897
-
[28]
A topological property of real analytic subsets
S. Łojasiewicz. “A topological property of real analytic subsets”. In:Les Équations aux Dérivées Partielles(1963), pp. 87–89. 22
1963
-
[29]
An augmented Lagrangian approach for sparse principal component analysis
Z. Lu and Y. Zhang. “An augmented Lagrangian approach for sparse principal component analysis”. In:Mathematical Programming135.1-2 (A) (2012), pp. 149–193.doi:10.1007/ s10107-011-0452-4
2012
-
[30]
B. S. Mordukhovich.Variational Analysis and Applications. Springer Monographs in Mathematics. Cham: Springer, 2018.doi:10.1007/978-3-319-92775-6
-
[31]
Linear convergence of first order methods for non-strongly convex optimization
I. Necoara, Y. Nesterov, and F. Glineur. “Linear convergence of first order methods for non-strongly convex optimization”. In:Mathematical Programming175.1-2 (A) (2019), pp. 69–107.doi:10.1007/s10107-018-1232-1
-
[32]
Error bounds in mathematical programming
J.-S. Pang. “Error bounds in mathematical programming”. In:Mathematical Programming 79.1-3 (B) (1997), pp. 299–332.doi:10.1007/BF02614322
-
[33]
copt: composite optimization in Python
F. Pedregosa, G. Negiar, and G. Dresdner. “copt: composite optimization in Python”. In: (2020).doi:10.5281/zenodo.1283339.url:http://openopt.github.io/copt/
work page doi:10.5281/zenodo.1283339.url:http://openopt.github.io/copt/ 2020
-
[34]
Gradient methods for the minimisation of functionals
B. T. Polyak. “Gradient methods for the minimisation of functionals”. In:Zhurnal Vychis- litel’noi Matematiki i Matematicheskoi Fiziki3 (1963), pp. 643–653
1963
-
[35]
M. J. D. Powell.A method for nonlinear constraints in minimization problems. Optimiza- tion, Sympos. Inst. Math. Appl. Univ. Keele, England. 1969
1969
-
[36]
The multiplier method of Hestenes and Powell applied to convex programming
R. T. Rockafellar. “The multiplier method of Hestenes and Powell applied to convex programming”. In:Journal of Optimization Theory and applications12.6 (1973), pp. 555– 562
1973
-
[37]
R. T. Rockafellar and R. J.-B. Wets.Variational Analysis. Vol. 317. Grundlehren der Mathematischen Wissenschaften. Berlin: Springer, 1998.doi:10 . 1007 / 978 - 3 - 642 - 02431-3
1998
-
[38]
Structured sparsity promoting functions
L. Shen, B. W. Suter, and E. E. Tripp. “Structured sparsity promoting functions”. In: Journal of Optimization Theory and Applications183.2 (2019), pp. 386–421.doi:10 . 1007/s10957-019-01565-0
2019
-
[39]
A dependence maximization view of clustering
L. Song, A. Smola, A. Gretton, and K. M. Borgwardt. “A dependence maximization view of clustering”. In:Proceedings of the 24th International Conference on Machine Learning. ICML ’07. Corvalis, Oregon, USA: Association for Computing Machinery, 2007, pp. 815– 822.doi:10 . 1145 / 1273496 . 1273599.url:https : / / doi . org / 10 . 1145 / 1273496 . 1273599
2007
-
[40]
A. Themelis, L. Stella, and P. Patrinos. “Forward-backward envelope for the sum of two nonconvex functions: further properties and nonmonotone linesearch algorithms”. In: SIAM Journal on Optimization28.3 (2018), pp. 2274–2303.doi:10.1137/16M1080240
-
[41]
Sparse reconstruction by separable approximation
S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo. “Sparse reconstruction by separable approximation”. In:IEEE Transactions on Signal Processing57.7 (2009), pp. 2479–2493. doi:10.1109/TSP.2009.2016892
-
[42]
Nearly unbiased variable selection under minimax concave penalty
C.-H. Zhang. “Nearly unbiased variable selection under minimax concave penalty”. In:The Annals of Statstics38.2 (2010), pp. 894–942. A Appendix This proof for CAM-stationarity as a necessary optimality condition is a direct extension of [4, Theorem 3.3] to the composite setting. Theorem A.1.Letx ∗ be a local minimizer of(1.1). Thenx ∗ is CAM-stationary. 2...
2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.