Recognition: 2 theorem links
· Lean TheoremProximal Limited-Memory Quasi-Newton Methods for Nonsmooth Nonconvex Optimization
Pith reviewed 2026-05-13 01:25 UTC · model grok-4.3
The pith
A proximal limited-memory quasi-Newton scheme converges globally for nonsmooth nonconvex problems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce a proximal limited-memory quasi-Newton scheme for minimizing the sum of a continuously differentiable function and a proper, lower semicontinuous and prox-bounded, possibly nonsmooth, function. Both functions might be nonconvex. The method builds upon the computation of scaled proximal operators and is globalized by adaptively updating a regularization parameter based on a criterion of sufficient decrease. We prove global convergence under mild assumptions and then establish convergence of the entire sequence (with rates) under the Kurdyka--Lojasiewicz property. To efficiently solve the subproblems, we exploit the compact representation of limited-memory quasi-Newton updates. We
What carries the argument
The proximal limited-memory quasi-Newton update that uses a compact representation to approximate the inverse Hessian of the smooth part and compute the scaled proximal operator at each iteration.
If this is right
- Global convergence to stationary points holds under the stated mild assumptions on the smooth and nonsmooth parts.
- Convergence of the entire sequence together with explicit rates follows when the objective satisfies the Kurdyka-Lojasiewicz property.
- Subproblems remain tractable because the limited-memory updates admit a compact representation that avoids storing full matrices.
- A rank-one limited-memory Kleinmichel formula preserves positive definiteness under the same condition required for BFGS.
Where Pith is reading between the lines
- The compact representation may extend the method to problem dimensions where full-matrix storage is impossible.
- The observed numerical speedups suggest the scheme could replace slower first-order proximal methods on large-scale instances.
- Similar limited-memory proximal ideas might improve other nonconvex algorithms that rely on proximal mappings.
Load-bearing premise
The nonsmooth function is proper, lower semicontinuous and prox-bounded, the smooth part is continuously differentiable, and the adaptive regularization guarantees sufficient decrease at every step.
What would settle it
A concrete example or counterexample in which the iterates satisfy the sufficient-decrease condition at every iteration yet do not approach a stationary point, or in which the full sequence fails to converge when the objective satisfies the Kurdyka-Lojasiewicz property.
Figures
read the original abstract
We introduce a proximal limited--memory quasi--Newton scheme for minimizing the sum of a continuously differentiable function and a proper, lower semicontinuous and prox-bounded, possibly nonsmooth, function. Both functions might be nonconvex. The method builds upon the computation of scaled proximal operators and is globalized by adaptively updating a regularization parameter based on a criterion of sufficient decrease. We prove global convergence under mild assumptions and then establish convergence of the entire sequence (with rates) under the Kurdyka--Lojasiewicz property. To efficiently solve the subproblems, we exploit the compact representation of limited-memory quasi-Newton updates. We derive also a compact representation of the limited--memory Kleinmichel formula, a rank-one quasi-Newton scheme that preserves positive definiteness under the same condition as the BFGS update. Numerical results show a significant speed up compared to other methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a proximal limited-memory quasi-Newton algorithm for minimizing f(x) + g(x) where f is continuously differentiable (possibly nonconvex) and g is proper, lower semicontinuous, and prox-bounded (possibly nonsmooth and nonconvex). The method employs scaled proximal operators with limited-memory quasi-Newton updates, globalized by an adaptive regularization parameter enforcing sufficient decrease. It claims global convergence to stationary points under mild assumptions, followed by convergence of the full sequence (with rates) under the Kurdyka-Łojasiewicz property. Compact representations of the limited-memory updates are derived, including a new compact form of the Kleinmichel formula that preserves positive definiteness under the same conditions as BFGS. Numerical experiments report significant speedups relative to other methods.
Significance. If the convergence results hold, the work supplies a practical, theoretically supported extension of limited-memory quasi-Newton methods to nonsmooth nonconvex problems, addressing a gap where such approximations are often needed for scalability. The compact representations and Kleinmichel variant are concrete contributions that could aid implementation, and the reported numerical gains indicate potential practical impact in large-scale optimization.
major comments (2)
- [convergence analysis] The global convergence argument (abstract and convergence section) rests on the adaptive regularization ensuring sufficient decrease at each step while using the limited-memory quasi-Newton direction; however, the manuscript must explicitly verify that the compact representation preserves the descent property required for the decrease condition, as the curvature approximation could otherwise violate the assumptions used in the proximal-operator analysis.
- [KL convergence theorem] The claim of full-sequence convergence with rates under the KL property (abstract) depends on the subproblem solver and the properties of the scaled proximal operator; the interaction between the limited-memory update and the KL inequality needs a self-contained argument showing that the sequence remains in the region where the KL exponent applies, without additional hidden restrictions on the prox-boundedness constant.
minor comments (2)
- [abstract] The abstract states that both functions 'might be nonconvex' but the assumptions list only C1 for the smooth part; clarify whether the nonconvexity of f is fully covered by the stated hypotheses or requires an additional local Lipschitz gradient assumption.
- [algorithm description] Notation for the limited-memory matrices (e.g., the compact form of the Kleinmichel update) should be introduced with an explicit reference to the standard L-BFGS two-loop recursion to aid readers unfamiliar with the rank-one variant.
Simulated Author's Rebuttal
We thank the referee for the careful reading of our manuscript and the constructive comments. We are pleased that the referee recognizes the significance of extending limited-memory quasi-Newton methods to nonsmooth nonconvex problems. We address the major comments below and will revise the manuscript accordingly to incorporate the suggested clarifications.
read point-by-point responses
-
Referee: [convergence analysis] The global convergence argument (abstract and convergence section) rests on the adaptive regularization ensuring sufficient decrease at each step while using the limited-memory quasi-Newton direction; however, the manuscript must explicitly verify that the compact representation preserves the descent property required for the decrease condition, as the curvature approximation could otherwise violate the assumptions used in the proximal-operator analysis.
Authors: We agree that an explicit verification is beneficial for clarity. In the revised manuscript, we will add a short lemma (or remark) in the global convergence section demonstrating that the compact representations of the limited-memory updates (including the new Kleinmichel form) are mathematically equivalent to the standard updates and therefore preserve the curvature conditions, positive definiteness, and descent property required for the sufficient decrease condition. This equivalence follows directly from the derivations in Section 3, and the added lemma will make the connection to the proximal-operator analysis explicit without altering the existing proof structure. revision: yes
-
Referee: [KL convergence theorem] The claim of full-sequence convergence with rates under the KL property (abstract) depends on the subproblem solver and the properties of the scaled proximal operator; the interaction between the limited-memory update and the KL inequality needs a self-contained argument showing that the sequence remains in the region where the KL exponent applies, without additional hidden restrictions on the prox-boundedness constant.
Authors: We thank the referee for this observation. In the revised version, we will expand the KL convergence section with a self-contained argument that explicitly shows the sequence remains in the relevant neighborhood for the KL inequality. This will use the adaptive regularization and the prox-boundedness assumption to bound the iterates, confirming that the KL exponent applies uniformly without imposing any new or hidden restrictions on the prox-boundedness constant. The argument will also tie the subproblem solver properties back to the scaled proximal operator to support the full-sequence convergence and rates as stated. revision: yes
Circularity Check
No significant circularity detected
full rationale
The derivation chain consists of a standard proximal quasi-Newton framework globalized via adaptive regularization to enforce sufficient decrease, followed by a separate KL-property argument for full-sequence convergence and rates. These steps rely on established proximal-operator properties, lower semicontinuity, and prox-boundedness assumptions that are stated independently of the algorithm's limited-memory representation or Kleinmichel formula. The compact representations are introduced solely for subproblem efficiency and do not enter the convergence proofs. No equation or claim reduces to a fitted parameter, self-defined quantity, or load-bearing self-citation whose validity depends on the present paper.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The nonsmooth function is proper, lower semicontinuous, and prox-bounded
- domain assumption Kurdyka-Lojasiewicz property
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquationwashburn_uniqueness_aczel unclearWe prove global convergence under mild assumptions and then establish convergence of the entire sequence (with rates) under the Kurdyka–Łojasiewicz property.
-
IndisputableMonolith/Foundation/BranchSelectionbranch_selection unclearThe method builds upon the computation of scaled proximal operators and is globalized by adaptively updating a regularization parameter based on a criterion of sufficient decrease.
Reference graph
Works this paper leans on
-
[1]
F. J. Arag´ on Artacho, R. M. T. Fleming, and P. T. Vuong. Accelerating the DC algorithm for smooth functions.Mathematical Programming, 169(1):95–118, 2018. doi:10.1007/s10107-017-1180-1
-
[2]
A. Y. Aravkin, R. Baraldi, and D. Orban. A proximal quasi-Newton trust-region method for nonsmooth regularized optimization.SIAM Journal on Optimization, 32(2):900–929, 2022. doi:10.1137/21M1409536
-
[3]
H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. Proximal alternating minimization and projec- tion methods for nonconvex problems: An approach based on the Kurdyka- Lojasiewicz inequality. Mathematics of Operations Research, 35:438–457, 2010. doi:10.1287/moor.1100.0449
-
[4]
S. Becker and J. Fadili. A quasi-Newton proximal splitting method. InAdvances in Neural Infor- mation Processing Systems, volume 25, 2012
work page 2012
-
[5]
S. Becker, J. Fadili, and P. Ochs. On quasi-Newton forward-backward splitting: Proximal calculus and convergence.SIAM Journal on Optimization, 29(4):2445–2481, 2019. doi:10.1137/18M1167152
-
[6]
W. Bian and X. Chen. Linearly constrained non-Lipschitz optimization for image restoration.SIAM Journal on Imaging Sciences, 8(4):2294–2322, 2015. doi:10.1137/140985639. 31
-
[7]
E. G. Birgin, J. M. Mart´ ınez, and M. Raydan. Nonmonotone spectral projected gra- dient methods on convex sets.SIAM Journal on Optimization, 10(4):1196–1211, 2000. doi:10.1137/S1052623497330963
-
[8]
E. G. Birgin, J. M. Mart´ ınez, and M. Raydan. Spectral projected gradient methods: Review and perspectives.Journal of Statistical Software, 60(3):1–21, 2014. doi:10.18637/jss.v060.i03
-
[9]
J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota. Clarke subgradients of stratifiable functions.SIAM Journal on Optimization, 18:556–572, 2007. doi:10.1137/060670080
-
[10]
J. Bolte, T. Nguyen, J. Peypouquet, and B. Suter. From error bounds to the complexity of first-order descent methods for convex functions.Mathematical Programming, 165, 10 2016. doi:10.1007/s10107-016-1091-6
-
[11]
R. H. Byrd, J. Nocedal, and R. B. Schnabel. Representations of quasi-Newton matrices and their use in limited memory methods.Mathematical Programming, 63(1):129–156, 1994. doi:10.1007/BF01582063
-
[12]
A. De Marchi. Proximal gradient methods beyond monotony.Journal of Nonsmooth Analysis and Optimization, 4, 2023. doi:10.46298/jnsao-2023-10290
-
[13]
A. De Marchi and A. Themelis. Proximal gradient algorithms under local Lipschitz gradient conti- nuity.Journal of Optimization Theory and Applications, 194(3):771–794, 2022. doi:10.1007/s10957- 022-02048-5
-
[14]
Y. Diouane, M. L. Habiboullah, and D. Orban. A proximal modified quasi-Newton method for nonsmooth regularized optimization.SIAM Journal on Optimization, 36(2):534–563, 2026. doi:10.1137/24M169761X
-
[15]
M. Gollier, M. L. Habiboullah, G. Leconte, R. Baraldi, A. De Marchi, D. Orban, and Y. Diouane. RegularizedOptimization.jl: A Julia framework for regularized and nonsmooth optimization.Jour- nal of Open Source Software, 11(118):9344, 2026. doi:10.21105/joss.09344
-
[16]
L. Grippo, F. Lampariello, and S. Lucidi. A nonmonotone line search technique for Newton’s method.SIAM Journal on Numerical Analysis, 23(4):707–716, 1986. doi:10.1137/0723046
-
[17]
X. Jia, C. Kanzow, and P. Mehlitz. Convergence analysis of the proximal gradient method in the presence of the Kurdyka– Lojasiewicz property without global Lipschitz assumptions.SIAM Journal on Optimization, 33(4):3038–3056, 2023. doi:10.1137/23M1548293
-
[18]
Jin.Lectures on Nonsmooth Optimization
Q. Jin.Lectures on Nonsmooth Optimization. Springer, 2025. doi:10.1007/978-3-031-91417-1
-
[19]
C. Kanzow and T. Lechner. Efficient regularized proximal quasi-Newton methods for large-scale nonconvex composite optimization problems.Pacific Journal of Optimization, 20:537–568, 2024. doi:10.61208/pjo-2023-036
-
[20]
C. Kanzow and L. Lehmann. Convergence of nonmonotone proximal gradient methods under the Kurdyka- Lojasiewicz property without a global Lipschitz assumption.Journal of Optimization Theory and Applications, 207(1):4, 2025. doi:10.1007/s10957-025-02762-w
-
[21]
C. Kanzow and P. Mehlitz. Convergence properties of monotone and nonmonotone proximal gradient methods revisited.Journal of Optimization Theory and Applications, 195:1–23, 2022. doi:10.1007/s10957-022-02101-3
-
[22]
C. Kanzow and D. Steck. Regularization of limited memory quasi-Newton methods for large- scale nonconvex minimization.Mathematical Programming Computation, 15(3):417–444, 2023. doi:10.1007/s12532-023-00238-4
-
[23]
H. Kleinmichel. Quasi-Newton-Verfahren vom Rang-Eins-Typ zur L¨ osung unrestringierter Min- imierungsprobleme, Teil 1.Numerische Mathematik, 38(2):219–228, 1981. doi:10.1007/BF01397091
-
[24]
H. Kleinmichel. Quasi-Newton-Verfahren vom Rang-Eins-Typ zur L¨ osung unrestringierter Min- imierungsprobleme, Teil 2.Numerische Mathematik, 38(2):229–244, 1981. doi:10.1007/BF01397092. 32
- [25]
-
[26]
Lechner.Proximal Methods for Nonconvex Composite Optimization Problems
T. Lechner.Proximal Methods for Nonconvex Composite Optimization Problems. PhD Thesis, Institute of Mathematics, University of W¨ urzburg, 2022. doi:10.25972/OPUS-28907
-
[27]
J. Lee, Y. Sun, and M. Saunders. Proximal Newton-type methods for minimizing composite func- tions.SIAM Journal on Optimization, 24:1420–1443, 2014. doi:10.1137/130921428
- [28]
-
[29]
Mathematical Programming , author =
D. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization.Mathe- matical Programming, 45:503–528, 1989. doi:10.1007/BF01589116
-
[30]
R. Liu, S. Pan, Y. Wu, and X. Yang. An inexact regularized proximal Newton method for nonconvex and nonsmooth optimization.Computational Optimization and Applications, 88(2):603–641, 2024. doi:10.1007/s10589-024-00560-0
-
[31]
Mordukhovich.Variational Analysis and Applications, volume 30
B. Mordukhovich.Variational Analysis and Applications, volume 30. Springer, 2018. doi:10.1007/978-3-319-92775-6
-
[32]
J. Nocedal. Updating quasi-Newton matrices with limited storage.Mathematics of Computation, 35(151):773–782, 1980. doi:10.1090/S0025-5718-1980-0572855-7
-
[33]
R. Rockafellar and R. J.-B. Wets.Variational Analysis. Springer-Verlag, Heidelberg, Berlin, New York, 1998
work page 1998
-
[34]
Physica D: Nonlinear Phenomena60(1-4), 259–268 (1992)
L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena, 60:259–268, 1992. doi:10.1016/0167-2789(92)90242-F
-
[35]
K. Scheinberg and X. Tang. Practical inexact proximal quasi-Newton method with global complexity analysis.Mathematical Programming, 160(1–2):495–529, 2016. doi:10.1007/s10107-016-0997-3
-
[36]
P. Spellucci. A modified rank one update which converges Q-superlinearly.Computational Opti- mization and Applications, 19:273–296, 2001. doi:10.1023/A:1011259905470
-
[37]
L. Stella, A. Themelis, P. Sopasakis, and P. Patrinos. A simple and efficient algorithm for nonlinear model predictive control. In2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 1939–1944, 2017. doi:10.1109/CDC.2017.8263933
-
[38]
A. Themelis, L. Stella, and P. Patrinos. Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms.SIAM Journal on Optimiza- tion, 28(3):2274–2303, 2018. doi:10.1137/16M1080240
-
[39]
R. Tibshirani. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267–288, 1996. doi:10.1111/j.2517-6161.1996.tb02080.x
-
[40]
K. Ueda and N. Yamashita. A regularized Newton method without line search for unconstrained op- timization.Computational Optimization and Applications, 59(1):321–351, 2014. doi:10.1007/s10589- 014-9656-x
-
[41]
S. vom Dahl and C. Kanzow. An inexact regularized proximal Newton method without line search. Computational Optimization and Applications, 89(3):585–624, 2024. doi:10.1007/s10589-024-00600- 9
-
[42]
S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo. Sparse reconstruction by sep- arable approximation.IEEE Transactions on Signal Processing, 57(7):2479–2493, 2009. doi:10.1109/TSP.2009.2016892
-
[43]
H. Zhang and W. W. Hager. A nonmonotone line search technique and its applica- tion to unconstrained optimization.SIAM Journal on Optimization, 14(4):1043–1056, 2004. doi:10.1137/S1052623403428208. 33
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.