A Nonmonotone Descent Method for Optimization Problems Defined by Upper-mathcal{C}² Functions over Submanifolds
Pith reviewed 2026-06-29 15:44 UTC · model grok-4.3
The pith
A nonmonotone subgradient method finds stationary points for upper-C^2 optimization on submanifolds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The descent property of upper-C^2 functions carries over to submanifolds when the recent notion of projectional subdifferentials is used; this permits a nonmonotone subgradient method whose generated sequences have stationary accumulation points, with convergence and rate-of-convergence guarantees under the Kurdyka-Lojasiewicz property.
What carries the argument
Nonmonotone subgradient iteration that employs projectional subdifferentials to enforce the manifold constraint while preserving the upper-C^2 descent inequality.
Load-bearing premise
The descent property of upper-C^2 functions carries over to submanifolds when projectional subdifferentials are used.
What would settle it
An explicit upper-C^2 function on a submanifold together with an initial point for which every sequence generated by the algorithm has a non-stationary accumulation point.
Figures
read the original abstract
We consider the optimization problem of minimizing a nonsmooth function characterized by a nonsmooth formulation of the descent lemma over a manifold. In the unconstrained case over a Euclidean space, this class of functions is called upper-$\mathcal{C}^2$. Using the recent notion of projectional subdifferentials, we show that their descent property carries over to submanifolds. We propose a nonmonotone subgradient method to solve these problems and prove stationarity of accumulation points of the generated sequence as well as convergence and rate-of-convergence results under the Kurdyka-Lojasiewicz property. We also perform numerical experiments and show how our approach can be applied to a certain type of difference of convex functions as well as clustering problems on manifolds.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript addresses minimization of nonsmooth functions satisfying a nonsmooth descent lemma (upper-C² functions) over submanifolds. It extends the descent property to this setting via projectional subdifferentials, proposes a nonmonotone subgradient algorithm, proves that accumulation points are stationary, and establishes convergence and rate results under the Kurdyka-Łojasiewicz property. Numerical experiments illustrate the method on difference-of-convex problems and manifold clustering.
Significance. If the extension of the descent property and the subsequent convergence analysis hold, the work supplies a new class of tractable nonsmooth problems on manifolds together with a practical algorithm and KL-based guarantees. This could enlarge the scope of nonmonotone methods beyond Euclidean upper-C² functions. The inclusion of concrete numerical tests on clustering is a positive feature.
minor comments (3)
- §2: the definition and basic properties of the projectional subdifferential on the manifold should be stated explicitly rather than only referenced, to make the descent-property proof self-contained.
- Algorithm 1: the precise rule for choosing the nonmonotone parameter sequence (e.g., the window length or the update of the reference value) is not fully specified; a short pseudocode block would improve reproducibility.
- Numerical section: the manifold clustering experiments would benefit from a brief description of the retraction and the projectional-subdifferential oracle used in the implementation.
Simulated Author's Rebuttal
We thank the referee for the careful reading and positive assessment of our manuscript. The referee correctly identifies the core contributions: extension of the nonsmooth descent lemma to submanifolds via projectional subdifferentials, the nonmonotone subgradient algorithm, stationarity of accumulation points, and KL-based convergence and rate results. We are pleased that the numerical experiments on difference-of-convex problems and manifold clustering are viewed favorably. Since the report lists no specific major comments, we interpret the minor_revision recommendation as an invitation to incorporate any editorial or minor technical suggestions that may appear in a subsequent round; we will prepare a revised version accordingly.
Circularity Check
No significant circularity
full rationale
The paper's core claims rest on extending the descent property of upper-C² functions to submanifolds via the external notion of projectional subdifferentials, then constructing a nonmonotone subgradient method whose stationarity and KL-convergence results follow from standard arguments in nonsmooth optimization on manifolds. No equations, fitted parameters, or self-citations are shown to reduce any prediction or theorem to a tautology by construction. The derivation chain remains independent of its own outputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Projectional subdifferentials preserve the nonsmooth descent lemma on submanifolds
- domain assumption Kurdyka-Lojasiewicz property holds for the objective
Reference graph
Works this paper leans on
-
[1]
Absil, R
P.-A. Absil, R. Mahony, and R. Sepulchre.Optimization Algorithms on Matrix Man- ifolds. Princeton University Press, 2008
2008
-
[2]
Nonmonotone subgradient methods based on a local descent lemma
F. J. Aragón-Artacho, R. Campoy, P. Pérez-Aros, and D. Torregrosa-Belén. “Non- monotone subgradient methods based on a local descent lemma”. In: (2025). arXiv: 2510.19341 [math.OC]
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Accelerating the DC al- gorithm for smooth functions
F. J. Aragón-Artacho, R. M. T. Fleming, and P. T. Vuong. “Accelerating the DC al- gorithm for smooth functions”. In:Mathematical Programming169.1 (2018), pp. 95– 118.doi:10.1007/s10107-017-1180-1
-
[4]
TheBoostedDouble– proximalSubgradientAlgorithmfornonconvexoptimization
F.J.Aragón-Artacho,P.Pérez-Aros,andD.Torregrosa-Belén.“TheBoostedDouble– proximalSubgradientAlgorithmfornonconvexoptimization”.In:Mathematical Pro- gramming214 (1 2025), pp. 491–537.doi:10.1007/s10107-024-02190-0
-
[5]
The Boosted Difference of Convex Func- tions Algorithm for nonsmooth functions
F. J. Aragón-Artacho and P. T. Vuong. “The Boosted Difference of Convex Func- tions Algorithm for nonsmooth functions”. In:SIAM Journal on Optimization30.1 (2020), pp. 980–1006.doi:10.1137/18M123339X
-
[6]
MinimizationoffunctionshavingLipschitz-continuousfirstpartialderiva- tives
L.Armijo.“MinimizationoffunctionshavingLipschitz-continuousfirstpartialderiva- tives”. In:Pacific Journal of Mathematics16 (1966), pp. 1–3
1966
-
[7]
On the convergence of the proximal algorithm for nons- mooth functions involving analytic features
H. Attouch and J. Bolte. “On the convergence of the proximal algorithm for nons- mooth functions involving analytic features”. In:Mathematical Programming116.1 (2009), pp. 5–16.doi:10.1007/s10107-007-0133-5
-
[8]
H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. “Proximal alternating mini- mization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality”. In:Mathematics of Operations Research35.2 (2010), pp. 438–457.doi:10.1287/moor.1100.0449. 28
-
[9]
Two-Point step size gradient methods
J. Barzilai and J. M. Borwein. “Two-Point step size gradient methods”. In:IMA Journal of Numerical Analysis8.1 (Jan. 1988), pp. 141–148.issn: 0272-4979.doi: 10.1093/imanum/8.1.141
-
[10]
A Grassmann manifold handbook: basic geometry and computational aspects
T. Bendokat, R. Zimmermann, and P.-A. Absil. “A Grassmann manifold handbook: basic geometry and computational aspects”. In:Advances in Computational Math- ematics50 (1 2024).doi:10.1007/s10444-023-10090-8
-
[11]
Nonmonotone Spectral Projected Gradient Methods on Convex Sets
E. G. Birgin, J. M. Martínez, and M. Raydan. “Nonmonotone Spectral Projected Gradient Methods on Convex Sets”. In:SIAM Journal on Optimization10.4 (2000), pp. 1196–1211.doi:10.1137/S105262349733096
-
[12]
J.Bolte,A.Daniilidis,andA.Lewis.“TheŁojasiewiczinequalityfornonsmoothsub- analytic functions with applications to subgradient dynamical systems”. In:SIAM Journal on Optimization17.4 (2007), pp. 1205–1223.doi:10.1137/050644641
-
[13]
Clarke subgradients of stratifiable functions
J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota. “Clarke subgradients of stratifiable functions”. In:SIAM Journal on Optimization18.2 (2007), pp. 556–572.doi:10. 1137/060670080
2007
-
[14]
Boumal.An Introduction to Optimization on Smooth Manifolds
N. Boumal.An Introduction to Optimization on Smooth Manifolds. Cambridge Uni- versity Press, 2023.doi:10.1017/9781009166164
-
[15]
Global rates of convergence for nonconvex optimizationonmanifolds
N. Boumal, P.-A. Absil, and C. Cartis. “Global rates of convergence for nonconvex optimizationonmanifolds”.In:IMA Journal of Numerical Analysis39.1(Feb.2018), pp. 1–33.issn: 0272-4979.doi:10.1093/imanum/drx080
-
[17]
Intrinsic mean shift for clustering on Stiefel and Grassmann manifolds
H. E. Cetingul and R. Vidal. “Intrinsic mean shift for clustering on Stiefel and Grassmann manifolds”. In:2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, pp. 1896–1902.doi:10.1109/CVPR.2009.5206806
-
[18]
Chikuse.Statistics on Special Manifolds
Y. Chikuse.Statistics on Special Manifolds. Springer New York, NY, 2003.isbn: 978-0-387-00160-9.doi:10.1007/978-0-387-21540-2
-
[19]
F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and R. R. Wolenski.Nonsmooth Analysis and Control Theory. Springer New York, NY, 1998.doi:10.1007/b97650
-
[20]
Proximal gradient methods beyond monotony
A. De Marchi. “Proximal gradient methods beyond monotony”. In:Journal of Non- smooth Analysis and Optimization4 (2023)
2023
-
[21]
I. S. Dhillon and D. S. Modha. “Concept decompositions for large sparse text data using clustering”. In:Machine Learning42 (1 2001), pp. 143–175.doi:10.1023/A: 1007612920971
work page doi:10.1023/a: 2001
- [22]
-
[23]
Algorithms for clustering on the sphere: Advances & applications
M. Golzy, M. Markatou, and A. Shivram. “Algorithms for clustering on the sphere: Advances & applications”. In:WCECS 2016 - World Congress on Engineering and Computer Science 2016. Lecture Notes in Engineering and Computer Science. News- wood Limited, 2016, pp. 420–425
2016
-
[24]
A nonmonotone line search technique for Newton’s method
L. Grippo, F. Lampariello, and S. Lucidi. “A nonmonotone line search technique for Newton’s method”. In:SIAM Journal on Numerical Analysis23.4 (1986), pp. 707– 716. 29
1986
-
[25]
Grassmann clustering
P. Gruber and F. J. Theis. “Grassmann clustering”. In:European Signal Processing Conference(2006). 14th European Signal Processing Conference, EUSIPCO 2006 ; Conference date: 04-09-2006 Through 08-09-2006.issn: 2219-5491
2006
-
[26]
Line search algorithms for locally Lip- schitz functions on Riemannian manifolds
S. Hosseini, W. Huang, and R. Yousefpour. “Line search algorithms for locally Lip- schitz functions on Riemannian manifolds”. In:SIAM Journal on Optimization28.1 (2018), pp. 596–619.doi:10.1137/16M1108145
-
[27]
A brief introduction to manifold op- timization
J. Hu, X. Liu, Z.-W. Wen, and Y.-X. Yuan. “A brief introduction to manifold op- timization”. In:Journal of the Operations Research Society of China8 (2 20200), pp. 199–248.doi:10.1007/s40305-020-00295-9
-
[28]
A Riemannian BFGS method for non- convex optimization problems
W. Huang, P.-A. Absil, and K. A. Gallivan. “A Riemannian BFGS method for non- convex optimization problems”. In:Numerical Mathematics and Advanced Applica- tions ENUMATH 2015. Cham: Springer International Publishing, 2016, pp. 627– 634.isbn: 978-3-319-39929-4
2015
-
[29]
Riemannian proximal gradient methods
W. Huang and K. Wei. “Riemannian proximal gradient methods”. In:Mathematical Programming194 (1 2022), pp. 371–413.doi:10.1007/s10107-021-01632-3
-
[30]
L. Hubert and P. Arabie. “Comparing partitions”. In:Journal of Classification2 (1 1985), pp. 193–218.doi:10.1007/BF01908075
-
[31]
Clustering on the torus by conformal prediction
S. Jung, K. Park, and B. Kim. “Clustering on the torus by conformal prediction”. In:The Annals of Applied Statistics15.4 (2021), pp. 1583 –1603.doi:10.1214/21- AOAS1459
work page doi:10.1214/21- 2021
-
[32]
C. Kanzow and L. Lehmann. “Convergence of nonmonotone proximal gradient methods under the Kurdyka-Łojasiewicz property without a global Lipschitz as- sumption”. In:Journal of Optimization Theory and Applications207.4 (2025).doi: 10.1007/s10957-025-02762-w
-
[33]
Kelly, R
M. Kelly, R. Longjohn, and K. Nottingham.The UCI Machine Learning Repository. url:https://archive.ics.uci.edu
-
[34]
On projectional subdifferentials and projectional coderivatives with respect to smooth manifolds
P. Q. Khanh and T. T. Minh. “On projectional subdifferentials and projectional coderivatives with respect to smooth manifolds”. In:Journal of Optimization Theory and Applications208 (3 2026), p. 113.doi:10.1007/s10957-026-02941-3
-
[35]
Numerical solution for optimiza- tion over the efficient set by d.c. optimization algorithms
H. A. Le Thi, T. Pham Dinh, and L. D. Muu. “Numerical solution for optimiza- tion over the efficient set by d.c. optimization algorithms”. In:Operations Research Letters19.3 (1996), pp. 117–128.issn: 0167-6377.doi:10.1016/0167-6377(96) 00022-3
-
[36]
Finding stationary points on bounded-rank matrices: a geometric hurdle and a smooth remedy
E. Levin, J. Kileel, and N Boumal. “Finding stationary points on bounded-rank matrices: a geometric hurdle and a smooth remedy”. In:Mathematical Programming 199 (1 2023), pp. 831–864.doi:10.1007/s10107-022-01851-2
-
[37]
On the limited memory BFGS method for large scale optimization
D. C. Liu and J. Nocedal. “On the limited memory BFGS method for large scale optimization”. In:Mathematical Programming45 (1 1989), pp. 503–528.doi:10. 1007/BF01589116
1989
-
[38]
Lipschitz-like property relative toasetandthegeneralizedMordukhovichcriterion
K. W. Meng, M. H. Li, W. F. Yao, and X. Q. Yang. “Lipschitz-like property relative toasetandthegeneralizedMordukhovichcriterion”.In:Mathematical Programming 189 (1 2021), pp. 455–489.doi:10.1007/s10107-020-01568-0
-
[39]
B. S. Mordukhovich.Variational Analysis and Applications. Springer, 2018.doi: 10.1007/978-3-319-92775-6. 30
-
[40]
Grassmannian clustering for multi- variate time sequences
B.-S. Oh, A. B. J. Teoh, K.-A. Toh, and Z. Lin. “Grassmannian clustering for multi- variate time sequences”. In:New Trends in Computer Technologies and Applications. Springer Singapore, 2019, pp. 654–664.isbn: 978-981-13-9190-3
2019
-
[41]
A heuristic algorithm for solving the minimum sum- of-squares clustering problems
B. Ordin and A. M. Bagirov. “A heuristic algorithm for solving the minimum sum- of-squares clustering problems”. In:Journal of Global Optimization61 (2 2015), pp. 341–361.doi:10.1007/s10898-014-0171-5
-
[42]
Global convergence of Riemannian line search methods with a Zhang- Hager-type condition
H. Oviedo. “Global convergence of Riemannian line search methods with a Zhang- Hager-type condition”. In:Numerical Algorithms91 (3 2022), pp. 1183–1203.doi: 10.1007/s11075-022-01298-8
-
[43]
Riemannian BFGS Algorithm with Ap- plications
C. Qi, K. A. Gallivan, and P.-A. Absil. “Riemannian BFGS Algorithm with Ap- plications”. In:Recent advances in optimization and its applications in engineering. Springer Berlin Heidelberg, 2010, pp. 183–192.isbn: 978-3-642-12598-0
2010
-
[44]
Convergence of ZH-Type nonmonotone de- scent method for Kurdyka–Łojasiewicz optimization problems
Y. Qian, T. Tao, S. Pan, and H. Qi. “Convergence of ZH-Type nonmonotone de- scent method for Kurdyka–Łojasiewicz optimization problems”. In:SIAM Journal on Optimization35.2 (2025), pp. 1089–1109.doi:10.1137/24M1669153
-
[45]
Objective criteria for the evaluation of clustering methods
W. M. Rand. “Objective criteria for the evaluation of clustering methods”. In: Journal of the American Statistical Association66.336 (1971), pp. 846–850.doi: 10.1080/01621459.1971.10482356
-
[46]
R. T. Rockafellar and R. J.-B. Wets.Variational Analysis. Springer, 2009.doi: 10.1007/978-3-642-02431-3
-
[47]
V-Measure: A conditional entropy-based exter- nal cluster evaluation measure
A. Rosenberg and J. Hirschberg. “V-Measure: A conditional entropy-based exter- nal cluster evaluation measure”. In:Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Lan- guage Learning (EMNLP-CoNLL).AssociationforComputationalLinguistics,2007, pp. 410–420
2007
-
[48]
A Bayesian Mixture Model for Clustering on the Stiefel Manifold
S. Sengupta, S. Pal, R. Mitra, Y. Guo, A. Banerjee, and Y. Ji.A Bayesian mixture model for clustering on the stiefel manifold. 2017. arXiv:1708.07196 [stat.ME]
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[49]
Pymanopt: A Python Toolbox for Op- timization on Manifolds using Automatic Differentiation
J. Townsend, N. Koep, and S. Weichwald. “Pymanopt: A Python Toolbox for Op- timization on Manifolds using Automatic Differentiation”. In:Journal of Machine Learning Research17.137 (2016), 1–5.url:http://jmlr.org/papers/v17/16- 177.html
2016
-
[50]
H. Wiechers, B. Eltzner, S. F. Huckemann, and K. V. Mardia.Clustering schemes on the torus with application to RNA clashes. 2021. arXiv:2104.00094 [q-bio.BM]
-
[51]
A nonmonotone line search technique and its ap- plication to unconstrained optimization
H. Zhang and W. W. Hager. “A nonmonotone line search technique and its ap- plication to unconstrained optimization”. In:SIAM Journal on Optimization14.4 (2004), pp. 1043–1056. A Complete Proofs under the KL Property Let us begin with a full proof of Lemma 4.3: Lemma A.1.For all sufficiently largek∈NwithR k < φ(x ∗) +ηandx k ∈ Bϑ(x∗), the following inequal...
2004
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.