Convergence of the Safeguarded Augmented Lagrangian Method under the Polyak-Lojasiewicz constraint qualification for Constrained Composite Optimization

Christian Kanzow; Jannis Kr\"uger

arxiv: 2606.25567 · v1 · pith:YMZGTBK7new · submitted 2026-06-24 · 🧮 math.OC

Convergence of the Safeguarded Augmented Lagrangian Method under the Polyak-Lojasiewicz constraint qualification for Constrained Composite Optimization

Christian Kanzow , Jannis Kr\"uger This is my paper

Pith reviewed 2026-06-25 20:53 UTC · model grok-4.3

classification 🧮 math.OC

keywords safeguarded augmented Lagrangian methodPolyak-Lojasiewicz constraint qualificationM-stationary pointconstrained composite optimizationLipschitz continuityglobal convergenceportfolio optimization

0 comments

The pith

The safeguarded augmented Lagrangian method converges globally to an M-stationary point under the Polyak-Lojasiewicz constraint qualification when the nonsmooth objective term is locally Lipschitz continuous.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes global convergence of the safeguarded augmented Lagrangian method to M-stationary points for constrained composite optimization problems whose objective combines a smooth function with a nonsmooth function. The convergence result depends on the Polyak-Lojasiewicz constraint qualification together with bounded Lagrange multipliers, and the paper proves that local Lipschitz continuity of the nonsmooth term guarantees those multipliers remain bounded. A counterexample demonstrates that multipliers can become unbounded without the Lipschitz assumption. Numerical experiments on sparse portfolio optimization problems show markedly better performance when the regularization term satisfies the Lipschitz condition than when it does not.

Core claim

Under the Polyak-Lojasiewicz constraint qualification, the safeguarded augmented Lagrangian method converges globally to an M-stationary point for constrained composite optimization provided the Lagrange multipliers stay bounded, which holds whenever the nonsmooth part of the objective is locally Lipschitz continuous.

What carries the argument

The Polyak-Lojasiewicz constraint qualification together with bounded Lagrange multipliers, where boundedness follows from local Lipschitz continuity of the nonsmooth objective term.

If this is right

Global convergence to an M-stationary point holds whenever the nonsmooth objective term is locally Lipschitz continuous.
The method produces better numerical results on sparse portfolio problems when the sparsity term satisfies the Lipschitz condition.
Without local Lipschitz continuity one cannot expect bounded multipliers, as shown by an explicit counterexample.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The local Lipschitz requirement may be essential for reliable practical behavior whenever an algorithm depends on multiplier boundedness.
Similar boundedness arguments could be developed for other augmented Lagrangian variants applied to composite problems.
Portfolio problems using non-Lipschitz penalties may require alternative regularization strategies to restore multiplier stability.

Load-bearing premise

The nonsmooth part of the objective function is locally Lipschitz continuous, which is used to prove that the Lagrange multipliers remain bounded.

What would settle it

An instance of a constrained composite problem where the nonsmooth term fails to be locally Lipschitz continuous, the Polyak-Lojasiewicz constraint qualification holds, yet the Lagrange multipliers become unbounded and the method does not converge to an M-stationary point.

Figures

Figures reproduced from arXiv: 2606.25567 by Christian Kanzow, Jannis Kr\"uger.

**Figure 1.** Figure 1: Plot of φ h2(x) = x 2 . The resulting optimization problem has the following properties. Lemma 4.1. Consider the optimization problem min x∈R φ(x) s.t. x = 0, x2 = 0. (4.2) Then the following statements hold. (a) For all n ∈ N, we have ∂φ 1 n = R. (b) PŁCQ holds in x ∗ = 0. Proof. (a) Let n ∈ N. Then we have φ [PITH_FULL_IMAGE:figures/full_fig_p013_1.png] view at source ↗

**Figure 2.** Figure 2: Two surrogate sparsity functions: MCP and SCAD (with [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

**Figure 3.** Figure 3: Performance profile of cumulated inner iterations [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

**Figure 4.** Figure 4: Performance profile of for computation time in seconds [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

read the original abstract

In this work we provide theoretical and practical results of the Safeguarded Augmented Lagrangian Method (SALM) for constrained composite optimization problems whose objective is the sum of a smooth and a nonsmooth function. We obtain global convergence results to an M-stationary point for SALM under the Polyak-Lojasiewicz constraint qualification (PLCQ). For this result the boundedness of the Lagrange multipliers is crucial, and this is shown under the assumption that the nonsmooth part of the objective function is locally Lipschitz continuous. A counterexample shows that one cannot expect to get bounded multipliers without such an assumption. The performance of the algorithm is evaluated numerically on a set of sparse portfolio optimization problems with two different regularization terms, one being Lipschitz and the other one being non-Lipschitz. The results are significantly better for the Lipschitz sparsity term, whereas the underlying method generates seemingly unbounded multipliers in many instances when using the non-Lipschitz sparsity function.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SALM converges to M-stationary points under PLCQ when multipliers stay bounded, which the paper proves requires local Lipschitz continuity on the nonsmooth term and supports with a counterexample plus numerics.

read the letter

The paper shows global convergence of the safeguarded augmented Lagrangian method to an M-stationary point under the Polyak-Lojasiewicz constraint qualification. Bounded multipliers are required for the result, and the authors prove this boundedness only when the nonsmooth part of the objective is locally Lipschitz continuous. They also give an explicit counterexample where the multipliers become unbounded without that assumption.

The new pieces are the convergence statement under PLCQ and the counterexample that establishes the Lipschitz condition cannot be removed. The numerical tests on sparse portfolio problems with two different regularizers line up with the theory: the Lipschitz case performs better and keeps multipliers bounded, while the non-Lipschitz case often produces unbounded multipliers.

The argument follows standard lines for this literature. It uses the external PLCQ to reach stationarity and treats the boundedness step separately. No circularity appears in the setup, and the counterexample is a direct way to show necessity rather than an afterthought.

The main limitation is that the bounded-multiplier guarantee still needs the local Lipschitz condition on the nonsmooth term. This excludes some common non-Lipschitz penalties, but the paper states the restriction clearly and tests it numerically instead of glossing over it. The overall scope stays inside augmented Lagrangian theory for composite problems.

This is for people working on convergence analysis of AL-type methods for constrained composite optimization. A reader tracking refinements to constraint qualifications or multiplier boundedness would get direct value from the theorem, the counterexample, and the matching experiments. It deserves a serious referee because the central claim is new, the supporting pieces are explicit, and the limitations are acknowledged rather than hidden.

Referee Report

0 major / 2 minor

Summary. The paper analyzes the Safeguarded Augmented Lagrangian Method (SALM) for constrained composite optimization min f(x) + g(x) s.t. h(x)=0, where f is smooth and g nonsmooth. It establishes global convergence of SALM iterates to an M-stationary point under the Polyak-Łojasiewicz constraint qualification (PLCQ), with the key step being a proof that Lagrange multipliers remain bounded when g is locally Lipschitz continuous. A counterexample demonstrates that boundedness fails without the local Lipschitz assumption on g. Numerical results on sparse portfolio problems compare a Lipschitz regularizer (better performance, bounded multipliers) against a non-Lipschitz one (unbounded multipliers in many runs).

Significance. If the convergence analysis holds, the work supplies a precise sufficient condition (local Lipschitz continuity of the nonsmooth term) for bounded multipliers and thus global convergence to M-stationarity under PLCQ, together with an explicit counterexample and reproducible numerical distinction between the two regimes. This strengthens the theoretical toolkit for safeguarded AL methods on composite problems and directly informs practical choice of regularizers.

minor comments (2)

The abstract and introduction would benefit from an explicit statement of the precise form of the composite problem (smooth + nonsmooth objective, equality constraints) and the definition of M-stationarity used throughout.
Notation for the safeguarded AL subproblem and the update rules for the penalty and multiplier parameters should be collected in a single preliminary section for easier reference.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work on the convergence of the Safeguarded Augmented Lagrangian Method under PLCQ, the recognition of the role of local Lipschitz continuity for bounded multipliers, and the recommendation to accept.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper delivers a standard convergence proof for SALM to an M-stationary point under the external PLCQ assumption, with a separate bounded-multiplier result that holds only under the local Lipschitz condition on the nonsmooth term (plus an explicit counterexample showing necessity). No derivation step reduces by construction to a fitted quantity, self-definition, or load-bearing self-citation chain; the central claims rest on standard variational analysis and are externally falsifiable via the supplied counterexample and numerics. This is the normal non-circular outcome for a convergence analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review supplies insufficient detail to enumerate free parameters or invented entities; the PLCQ is treated as an external domain assumption.

axioms (1)

domain assumption Polyak-Lojasiewicz constraint qualification (PLCQ)
Invoked to obtain the global convergence result to an M-stationary point.

pith-pipeline@v0.9.1-grok · 5696 in / 1123 out tokens · 19859 ms · 2026-06-25T20:53:04.413627+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 24 canonical work pages

[1]

On augmented La- grangian methods with general lower-level constraints

R. Andreani, E. G. Birgin, J. M. Martínez, and M. L. Schuverdt. “On augmented La- grangian methods with general lower-level constraints”. In:SIAM Journal on Optimization 18.4 (2007), pp. 1286–1309.doi:10.1137/060654797

work page doi:10.1137/060654797 2007
[2]

Andreani, G

R. Andreani, G. Haeser, R. Prado, and L. Secchin.Primal-dual global convergence of an augmented Lagrangian method under the error bound condition. July 22, 2025.url: https://optimization-online.org/?p=31199

2025
[3]

Andreani, G

R. Andreani, G. Haeser, M. L. Schuverdt, and L. D. Secchin.A relaxed quasinormality condition and the boundedness of dual augmented Lagrangian sequences. Oct. 8, 2024.url: https://optimization-online.org/?p=25207

2024
[4]

A new sequential optimality condition for constrained optimization and algorithmic consequences

R. Andreani, J. M. Martínez, and B. F. Svaiter. “A new sequential optimality condition for constrained optimization and algorithmic consequences”. In:SIAM Journal on Opti- mization20.6 (2010), pp. 3533–3554.doi:10.1137/090777189

work page doi:10.1137/090777189 2010
[5]

Proximal alternating mini- mization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality

H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. “Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka- Łojasiewicz inequality”. In:Mathematics of Operations Research35.2 (2010), pp. 438– 457.doi:10.1287/moor.1100.0449

work page doi:10.1287/moor.1100.0449 2010
[6]

Linearly constrained non-Lipschitz optimization for image restora- tion

W. Bian and X. Chen. “Linearly constrained non-Lipschitz optimization for image restora- tion”. In:SIAM Journal on Imaging Sciences8.4 (2015), pp. 2294–2322.doi:10.1137/ 140985639

2015
[7]

E. G. Birgin and J. M. Martínez.Practical augmented Lagrangian methods for constrained optimization. Vol. 10. Fundamentals of Algorithms. Philadelphia, PA: Society for Indus- trial and Applied Mathematics (SIAM), 2014.doi:10.1137/1.9781611973365

work page doi:10.1137/1.9781611973365 2014
[8]

A singular value thresholding algorithm for matrix completion

J.-F. Cai, E. J. Candès, and Z. Shen. “A singular value thresholding algorithm for matrix completion”. In:SIAM Journal on Optimization20.4 (2010), pp. 1956–1982.doi:10 . 1137/080738970

2010
[9]

AnaugmentedLagrangianmethodfornon-Lipschitz nonconvex programming

X.Chen,L.Guo,Z.Lu,andJ.J.Ye.“AnaugmentedLagrangianmethodfornon-Lipschitz nonconvex programming”. In:SIAM Journal on Numerical Analysis55.1 (2017), pp. 168– 193.doi:10.1137/15M1052834

work page doi:10.1137/15m1052834 2017
[10]

Cornuéjols, J

G. Cornuéjols, J. Peña, and R. Tütüncü.Optimization Methods in Finance.2nd edition. Cambridge: Cambridge University Press, 2018.doi:10.1017/9781107297340

work page doi:10.1017/9781107297340 2018
[11]

Cui and J.-S

Y. Cui and J.-S. Pang.Modern Nonconvex Nondifferentiable Optimization. Vol. 29. MOS- SIAM Series on Optimization. Philadelphia, PA: Society for Industrial and Applied Math- ematics (SIAM), 2021.doi:10.1137/1.9781611976748

work page doi:10.1137/1.9781611976748 2021
[12]

vom Dahl, A

S. vom Dahl, A. D. Marchi, and C. Kanzow.Proximal limited-memory quasi-Newton methods for nonsmooth monconvex optimization. 2026. arXiv:2605.11627 [math.OC]

Pith/arXiv arXiv 2026
[13]

Nonlinear Covariance Control via Differential Dynamic Programming,

A. De Marchi. “Constrained and sparse switching times optimization via augmented La- grangian proximal methods”. In:2020 American Control Conference, ACC 2020, Denver, CO, USA, July 1-3, 2020. IEEE, 2020, pp. 3633–3638.doi:10.23919/ACC45564.2020. 9147892. 21

work page doi:10.23919/acc45564.2020 2020
[14]

Implicit augmented Lagrangian and generalized optimization

A. De Marchi. “Implicit augmented Lagrangian and generalized optimization”. In:Journal of Applied and Numerical Optimization6.2 (2024), pp. 291–320.doi:10.23952/jano.6. 2024.2.08

work page doi:10.23952/jano.6 2024
[15]

A.DeMarchi,T.Hoheisel,andP.Mehlitz.Augmented Lagrangian methods for fully convex composite optimization. 2026. arXiv:2511.07117 [math.OC]

Pith/arXiv arXiv 2026
[16]

Constrained composite optimization and augmented Lagrangian methods

A. De Marchi, X. Jia, C. Kanzow, and P. Mehlitz. “Constrained composite optimization and augmented Lagrangian methods”. In:Mathematical Programming201.1-2 (A) (2023), pp. 863–896.doi:10.1007/s10107-022-01922-4

work page doi:10.1007/s10107-022-01922-4 2023
[17]

Proximal gradient algorithms under local Lipschitz gra- dient continuity. A convergence and robustness analysis of PANOC

A. De Marchi and A. Themelis. “Proximal gradient algorithms under local Lipschitz gra- dient continuity. A convergence and robustness analysis of PANOC”. In:Journal of Opti- mization Theory and Applications194.3 (2022), pp. 771–794.doi:10.1007/s10957-022- 02048-5

work page doi:10.1007/s10957-022- 2022
[18]

Error bounds, quadratic growth, and linear convergence of proximal methods

D. Drusvyatskiy and A. S. Lewis. “Error bounds, quadratic growth, and linear convergence of proximal methods”. In:Mathematics of Operations Research43.3 (2018), pp. 919–948. doi:10.1287/moor.2017.0889

work page doi:10.1287/moor.2017.0889 2018
[19]

Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties

J. Fan and R. Li. “Variable selection via nonconcave penalized likelihood and its oracle properties.” In:Journal of the American Statistical Association96.456 (2001), pp. 1348– 1360.doi:10.1198/016214501753382273

work page doi:10.1198/016214501753382273 2001
[20]

Complementarity formula- tions ofl 0-norm optimization problems

M. Feng, J. E. Mitchell, J.-S. Pang, X. Shen, and A. Wächter. “Complementarity formula- tions ofl 0-norm optimization problems”. In:Pacific Journal of Optimization14.2 (2018), pp. 273–305

2018
[21]

SDP diagonalizations and perspective cuts for a class of nonseparable MIQP

A. Frangioni and C. Gentile. “SDP diagonalizations and perspective cuts for a class of nonseparable MIQP”. In:Operations Research Letters35.2 (2007), pp. 181–185.doi:10. 1016/j.orl.2006.03.008

2007
[22]

Journal of Optimization Theory and Applications 4: 303--320

M. R. Hestenes. “Multiplier and gradient methods”. In:Journal of Optimization Theory and Applications4 (1969), pp. 303–320.doi:10.1007/BF00927673

work page doi:10.1007/bf00927673 1969
[23]

On approximate solutions of systems of linear inequalities

A. J. Hoffman. “On approximate solutions of systems of linear inequalities”. In:Journal of research of the National Bureau of Standards49 (1952), p. 263.url:https://api. semanticscholar.org/CorpusID:17483872

1952
[24]

Convergence properties of monotone and nonmonotone prox- imal gradient methods revisited

C. Kanzow and P. Mehlitz. “Convergence properties of monotone and nonmonotone prox- imal gradient methods revisited”. In:Journal of Optimization Theory and Applications 195.2 (2022), pp. 624–646.doi:10.1007/s10957-022-02101-3

work page doi:10.1007/s10957-022-02101-3 2022
[25]

An example comparing the standard and safeguarded aug- mented Lagrangian methods

C. Kanzow and D. Steck. “An example comparing the standard and safeguarded aug- mented Lagrangian methods”. In:Operations Research Letters45.6 (2017), pp. 598–603. doi:10.1016/j.orl.2017.09.005

work page doi:10.1016/j.orl.2017.09.005 2017
[26]

Linear convergence of gradient and proximal- gradient methods inder the Polyak-Łojasiewicz condition

H. Karimi, J. Nutini, and M. Schmidt. “Linear convergence of gradient and proximal- gradient methods inder the Polyak-Łojasiewicz condition”. In:Machine Learning and Knowledge Discovery in Databases. Ed. by P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken. Cham: Springer International Publishing, 2016, pp. 795–811.doi:https : //doi.org/10.1007/978-3-319-46128-1

work page doi:10.1007/978-3-319-46128-1 2016
[27]

Statistical inverse problems: discretization, model reduction and inverse crimes

Y. Liu and R. Lin. “A bisection method for computing the proximal operator of the ℓp-norm for any0< p <1with application to Schattenp-norms”. In:Journal of Compu- tational and Applied Mathematics447 (2024). Id/No 115897, p. 9.doi:10.1016/j.cam. 2024.115897

work page doi:10.1016/j.cam 2024
[28]

A topological property of real analytic subsets

S. Łojasiewicz. “A topological property of real analytic subsets”. In:Les Équations aux Dérivées Partielles(1963), pp. 87–89. 22

1963
[29]

An augmented Lagrangian approach for sparse principal component analysis

Z. Lu and Y. Zhang. “An augmented Lagrangian approach for sparse principal component analysis”. In:Mathematical Programming135.1-2 (A) (2012), pp. 149–193.doi:10.1007/ s10107-011-0452-4

2012
[30]

B. S. Mordukhovich.Variational Analysis and Applications. Springer Monographs in Mathematics. Cham: Springer, 2018.doi:10.1007/978-3-319-92775-6

work page doi:10.1007/978-3-319-92775-6 2018
[31]

Linear convergence of first order methods for non-strongly convex optimization

I. Necoara, Y. Nesterov, and F. Glineur. “Linear convergence of first order methods for non-strongly convex optimization”. In:Mathematical Programming175.1-2 (A) (2019), pp. 69–107.doi:10.1007/s10107-018-1232-1

work page doi:10.1007/s10107-018-1232-1 2019
[32]

Error bounds in mathematical programming

J.-S. Pang. “Error bounds in mathematical programming”. In:Mathematical Programming 79.1-3 (B) (1997), pp. 299–332.doi:10.1007/BF02614322

work page doi:10.1007/bf02614322 1997
[33]

copt: composite optimization in Python

F. Pedregosa, G. Negiar, and G. Dresdner. “copt: composite optimization in Python”. In: (2020).doi:10.5281/zenodo.1283339.url:http://openopt.github.io/copt/

work page doi:10.5281/zenodo.1283339.url:http://openopt.github.io/copt/ 2020
[34]

Gradient methods for the minimisation of functionals

B. T. Polyak. “Gradient methods for the minimisation of functionals”. In:Zhurnal Vychis- litel’noi Matematiki i Matematicheskoi Fiziki3 (1963), pp. 643–653

1963
[35]

M. J. D. Powell.A method for nonlinear constraints in minimization problems. Optimiza- tion, Sympos. Inst. Math. Appl. Univ. Keele, England. 1969

1969
[36]

The multiplier method of Hestenes and Powell applied to convex programming

R. T. Rockafellar. “The multiplier method of Hestenes and Powell applied to convex programming”. In:Journal of Optimization Theory and applications12.6 (1973), pp. 555– 562

1973
[37]

R. T. Rockafellar and R. J.-B. Wets.Variational Analysis. Vol. 317. Grundlehren der Mathematischen Wissenschaften. Berlin: Springer, 1998.doi:10 . 1007 / 978 - 3 - 642 - 02431-3

1998
[38]

Structured sparsity promoting functions

L. Shen, B. W. Suter, and E. E. Tripp. “Structured sparsity promoting functions”. In: Journal of Optimization Theory and Applications183.2 (2019), pp. 386–421.doi:10 . 1007/s10957-019-01565-0

2019
[39]

A dependence maximization view of clustering

L. Song, A. Smola, A. Gretton, and K. M. Borgwardt. “A dependence maximization view of clustering”. In:Proceedings of the 24th International Conference on Machine Learning. ICML ’07. Corvalis, Oregon, USA: Association for Computing Machinery, 2007, pp. 815– 822.doi:10 . 1145 / 1273496 . 1273599.url:https : / / doi . org / 10 . 1145 / 1273496 . 1273599

2007
[40]

Forward-backward envelope for the sum of two nonconvex functions: further properties and nonmonotone linesearch algorithms

A. Themelis, L. Stella, and P. Patrinos. “Forward-backward envelope for the sum of two nonconvex functions: further properties and nonmonotone linesearch algorithms”. In: SIAM Journal on Optimization28.3 (2018), pp. 2274–2303.doi:10.1137/16M1080240

work page doi:10.1137/16m1080240 2018
[41]

Sparse reconstruction by separable approximation

S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo. “Sparse reconstruction by separable approximation”. In:IEEE Transactions on Signal Processing57.7 (2009), pp. 2479–2493. doi:10.1109/TSP.2009.2016892

work page doi:10.1109/tsp.2009.2016892 2009
[42]

Nearly unbiased variable selection under minimax concave penalty

C.-H. Zhang. “Nearly unbiased variable selection under minimax concave penalty”. In:The Annals of Statstics38.2 (2010), pp. 894–942. A Appendix This proof for CAM-stationarity as a necessary optimality condition is a direct extension of [4, Theorem 3.3] to the composite setting. Theorem A.1.Letx ∗ be a local minimizer of(1.1). Thenx ∗ is CAM-stationary. 2...

2010

[1] [1]

On augmented La- grangian methods with general lower-level constraints

R. Andreani, E. G. Birgin, J. M. Martínez, and M. L. Schuverdt. “On augmented La- grangian methods with general lower-level constraints”. In:SIAM Journal on Optimization 18.4 (2007), pp. 1286–1309.doi:10.1137/060654797

work page doi:10.1137/060654797 2007

[2] [2]

Andreani, G

R. Andreani, G. Haeser, R. Prado, and L. Secchin.Primal-dual global convergence of an augmented Lagrangian method under the error bound condition. July 22, 2025.url: https://optimization-online.org/?p=31199

2025

[3] [3]

Andreani, G

R. Andreani, G. Haeser, M. L. Schuverdt, and L. D. Secchin.A relaxed quasinormality condition and the boundedness of dual augmented Lagrangian sequences. Oct. 8, 2024.url: https://optimization-online.org/?p=25207

2024

[4] [4]

A new sequential optimality condition for constrained optimization and algorithmic consequences

R. Andreani, J. M. Martínez, and B. F. Svaiter. “A new sequential optimality condition for constrained optimization and algorithmic consequences”. In:SIAM Journal on Opti- mization20.6 (2010), pp. 3533–3554.doi:10.1137/090777189

work page doi:10.1137/090777189 2010

[5] [5]

Proximal alternating mini- mization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality

H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. “Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka- Łojasiewicz inequality”. In:Mathematics of Operations Research35.2 (2010), pp. 438– 457.doi:10.1287/moor.1100.0449

work page doi:10.1287/moor.1100.0449 2010

[6] [6]

Linearly constrained non-Lipschitz optimization for image restora- tion

W. Bian and X. Chen. “Linearly constrained non-Lipschitz optimization for image restora- tion”. In:SIAM Journal on Imaging Sciences8.4 (2015), pp. 2294–2322.doi:10.1137/ 140985639

2015

[7] [7]

E. G. Birgin and J. M. Martínez.Practical augmented Lagrangian methods for constrained optimization. Vol. 10. Fundamentals of Algorithms. Philadelphia, PA: Society for Indus- trial and Applied Mathematics (SIAM), 2014.doi:10.1137/1.9781611973365

work page doi:10.1137/1.9781611973365 2014

[8] [8]

A singular value thresholding algorithm for matrix completion

J.-F. Cai, E. J. Candès, and Z. Shen. “A singular value thresholding algorithm for matrix completion”. In:SIAM Journal on Optimization20.4 (2010), pp. 1956–1982.doi:10 . 1137/080738970

2010

[9] [9]

AnaugmentedLagrangianmethodfornon-Lipschitz nonconvex programming

X.Chen,L.Guo,Z.Lu,andJ.J.Ye.“AnaugmentedLagrangianmethodfornon-Lipschitz nonconvex programming”. In:SIAM Journal on Numerical Analysis55.1 (2017), pp. 168– 193.doi:10.1137/15M1052834

work page doi:10.1137/15m1052834 2017

[10] [10]

Cornuéjols, J

G. Cornuéjols, J. Peña, and R. Tütüncü.Optimization Methods in Finance.2nd edition. Cambridge: Cambridge University Press, 2018.doi:10.1017/9781107297340

work page doi:10.1017/9781107297340 2018

[11] [11]

Cui and J.-S

Y. Cui and J.-S. Pang.Modern Nonconvex Nondifferentiable Optimization. Vol. 29. MOS- SIAM Series on Optimization. Philadelphia, PA: Society for Industrial and Applied Math- ematics (SIAM), 2021.doi:10.1137/1.9781611976748

work page doi:10.1137/1.9781611976748 2021

[12] [12]

vom Dahl, A

S. vom Dahl, A. D. Marchi, and C. Kanzow.Proximal limited-memory quasi-Newton methods for nonsmooth monconvex optimization. 2026. arXiv:2605.11627 [math.OC]

Pith/arXiv arXiv 2026

[13] [13]

Nonlinear Covariance Control via Differential Dynamic Programming,

A. De Marchi. “Constrained and sparse switching times optimization via augmented La- grangian proximal methods”. In:2020 American Control Conference, ACC 2020, Denver, CO, USA, July 1-3, 2020. IEEE, 2020, pp. 3633–3638.doi:10.23919/ACC45564.2020. 9147892. 21

work page doi:10.23919/acc45564.2020 2020

[14] [14]

Implicit augmented Lagrangian and generalized optimization

A. De Marchi. “Implicit augmented Lagrangian and generalized optimization”. In:Journal of Applied and Numerical Optimization6.2 (2024), pp. 291–320.doi:10.23952/jano.6. 2024.2.08

work page doi:10.23952/jano.6 2024

[15] [15]

A.DeMarchi,T.Hoheisel,andP.Mehlitz.Augmented Lagrangian methods for fully convex composite optimization. 2026. arXiv:2511.07117 [math.OC]

Pith/arXiv arXiv 2026

[16] [16]

Constrained composite optimization and augmented Lagrangian methods

A. De Marchi, X. Jia, C. Kanzow, and P. Mehlitz. “Constrained composite optimization and augmented Lagrangian methods”. In:Mathematical Programming201.1-2 (A) (2023), pp. 863–896.doi:10.1007/s10107-022-01922-4

work page doi:10.1007/s10107-022-01922-4 2023

[17] [17]

Proximal gradient algorithms under local Lipschitz gra- dient continuity. A convergence and robustness analysis of PANOC

A. De Marchi and A. Themelis. “Proximal gradient algorithms under local Lipschitz gra- dient continuity. A convergence and robustness analysis of PANOC”. In:Journal of Opti- mization Theory and Applications194.3 (2022), pp. 771–794.doi:10.1007/s10957-022- 02048-5

work page doi:10.1007/s10957-022- 2022

[18] [18]

Error bounds, quadratic growth, and linear convergence of proximal methods

D. Drusvyatskiy and A. S. Lewis. “Error bounds, quadratic growth, and linear convergence of proximal methods”. In:Mathematics of Operations Research43.3 (2018), pp. 919–948. doi:10.1287/moor.2017.0889

work page doi:10.1287/moor.2017.0889 2018

[19] [19]

Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties

J. Fan and R. Li. “Variable selection via nonconcave penalized likelihood and its oracle properties.” In:Journal of the American Statistical Association96.456 (2001), pp. 1348– 1360.doi:10.1198/016214501753382273

work page doi:10.1198/016214501753382273 2001

[20] [20]

Complementarity formula- tions ofl 0-norm optimization problems

M. Feng, J. E. Mitchell, J.-S. Pang, X. Shen, and A. Wächter. “Complementarity formula- tions ofl 0-norm optimization problems”. In:Pacific Journal of Optimization14.2 (2018), pp. 273–305

2018

[21] [21]

SDP diagonalizations and perspective cuts for a class of nonseparable MIQP

A. Frangioni and C. Gentile. “SDP diagonalizations and perspective cuts for a class of nonseparable MIQP”. In:Operations Research Letters35.2 (2007), pp. 181–185.doi:10. 1016/j.orl.2006.03.008

2007

[22] [22]

Journal of Optimization Theory and Applications 4: 303--320

M. R. Hestenes. “Multiplier and gradient methods”. In:Journal of Optimization Theory and Applications4 (1969), pp. 303–320.doi:10.1007/BF00927673

work page doi:10.1007/bf00927673 1969

[23] [23]

On approximate solutions of systems of linear inequalities

A. J. Hoffman. “On approximate solutions of systems of linear inequalities”. In:Journal of research of the National Bureau of Standards49 (1952), p. 263.url:https://api. semanticscholar.org/CorpusID:17483872

1952

[24] [24]

Convergence properties of monotone and nonmonotone prox- imal gradient methods revisited

C. Kanzow and P. Mehlitz. “Convergence properties of monotone and nonmonotone prox- imal gradient methods revisited”. In:Journal of Optimization Theory and Applications 195.2 (2022), pp. 624–646.doi:10.1007/s10957-022-02101-3

work page doi:10.1007/s10957-022-02101-3 2022

[25] [25]

An example comparing the standard and safeguarded aug- mented Lagrangian methods

C. Kanzow and D. Steck. “An example comparing the standard and safeguarded aug- mented Lagrangian methods”. In:Operations Research Letters45.6 (2017), pp. 598–603. doi:10.1016/j.orl.2017.09.005

work page doi:10.1016/j.orl.2017.09.005 2017

[26] [26]

Linear convergence of gradient and proximal- gradient methods inder the Polyak-Łojasiewicz condition

H. Karimi, J. Nutini, and M. Schmidt. “Linear convergence of gradient and proximal- gradient methods inder the Polyak-Łojasiewicz condition”. In:Machine Learning and Knowledge Discovery in Databases. Ed. by P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken. Cham: Springer International Publishing, 2016, pp. 795–811.doi:https : //doi.org/10.1007/978-3-319-46128-1

work page doi:10.1007/978-3-319-46128-1 2016

[27] [27]

Statistical inverse problems: discretization, model reduction and inverse crimes

Y. Liu and R. Lin. “A bisection method for computing the proximal operator of the ℓp-norm for any0< p <1with application to Schattenp-norms”. In:Journal of Compu- tational and Applied Mathematics447 (2024). Id/No 115897, p. 9.doi:10.1016/j.cam. 2024.115897

work page doi:10.1016/j.cam 2024

[28] [28]

A topological property of real analytic subsets

S. Łojasiewicz. “A topological property of real analytic subsets”. In:Les Équations aux Dérivées Partielles(1963), pp. 87–89. 22

1963

[29] [29]

An augmented Lagrangian approach for sparse principal component analysis

Z. Lu and Y. Zhang. “An augmented Lagrangian approach for sparse principal component analysis”. In:Mathematical Programming135.1-2 (A) (2012), pp. 149–193.doi:10.1007/ s10107-011-0452-4

2012

[30] [30]

B. S. Mordukhovich.Variational Analysis and Applications. Springer Monographs in Mathematics. Cham: Springer, 2018.doi:10.1007/978-3-319-92775-6

work page doi:10.1007/978-3-319-92775-6 2018

[31] [31]

Linear convergence of first order methods for non-strongly convex optimization

I. Necoara, Y. Nesterov, and F. Glineur. “Linear convergence of first order methods for non-strongly convex optimization”. In:Mathematical Programming175.1-2 (A) (2019), pp. 69–107.doi:10.1007/s10107-018-1232-1

work page doi:10.1007/s10107-018-1232-1 2019

[32] [32]

Error bounds in mathematical programming

J.-S. Pang. “Error bounds in mathematical programming”. In:Mathematical Programming 79.1-3 (B) (1997), pp. 299–332.doi:10.1007/BF02614322

work page doi:10.1007/bf02614322 1997

[33] [33]

copt: composite optimization in Python

F. Pedregosa, G. Negiar, and G. Dresdner. “copt: composite optimization in Python”. In: (2020).doi:10.5281/zenodo.1283339.url:http://openopt.github.io/copt/

work page doi:10.5281/zenodo.1283339.url:http://openopt.github.io/copt/ 2020

[34] [34]

Gradient methods for the minimisation of functionals

B. T. Polyak. “Gradient methods for the minimisation of functionals”. In:Zhurnal Vychis- litel’noi Matematiki i Matematicheskoi Fiziki3 (1963), pp. 643–653

1963

[35] [35]

M. J. D. Powell.A method for nonlinear constraints in minimization problems. Optimiza- tion, Sympos. Inst. Math. Appl. Univ. Keele, England. 1969

1969

[36] [36]

The multiplier method of Hestenes and Powell applied to convex programming

R. T. Rockafellar. “The multiplier method of Hestenes and Powell applied to convex programming”. In:Journal of Optimization Theory and applications12.6 (1973), pp. 555– 562

1973

[37] [37]

R. T. Rockafellar and R. J.-B. Wets.Variational Analysis. Vol. 317. Grundlehren der Mathematischen Wissenschaften. Berlin: Springer, 1998.doi:10 . 1007 / 978 - 3 - 642 - 02431-3

1998

[38] [38]

Structured sparsity promoting functions

L. Shen, B. W. Suter, and E. E. Tripp. “Structured sparsity promoting functions”. In: Journal of Optimization Theory and Applications183.2 (2019), pp. 386–421.doi:10 . 1007/s10957-019-01565-0

2019

[39] [39]

A dependence maximization view of clustering

L. Song, A. Smola, A. Gretton, and K. M. Borgwardt. “A dependence maximization view of clustering”. In:Proceedings of the 24th International Conference on Machine Learning. ICML ’07. Corvalis, Oregon, USA: Association for Computing Machinery, 2007, pp. 815– 822.doi:10 . 1145 / 1273496 . 1273599.url:https : / / doi . org / 10 . 1145 / 1273496 . 1273599

2007

[40] [40]

Forward-backward envelope for the sum of two nonconvex functions: further properties and nonmonotone linesearch algorithms

A. Themelis, L. Stella, and P. Patrinos. “Forward-backward envelope for the sum of two nonconvex functions: further properties and nonmonotone linesearch algorithms”. In: SIAM Journal on Optimization28.3 (2018), pp. 2274–2303.doi:10.1137/16M1080240

work page doi:10.1137/16m1080240 2018

[41] [41]

Sparse reconstruction by separable approximation

S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo. “Sparse reconstruction by separable approximation”. In:IEEE Transactions on Signal Processing57.7 (2009), pp. 2479–2493. doi:10.1109/TSP.2009.2016892

work page doi:10.1109/tsp.2009.2016892 2009

[42] [42]

Nearly unbiased variable selection under minimax concave penalty

C.-H. Zhang. “Nearly unbiased variable selection under minimax concave penalty”. In:The Annals of Statstics38.2 (2010), pp. 894–942. A Appendix This proof for CAM-stationarity as a necessary optimality condition is a direct extension of [4, Theorem 3.3] to the composite setting. Theorem A.1.Letx ∗ be a local minimizer of(1.1). Thenx ∗ is CAM-stationary. 2...

2010