pith. sign in

arxiv: 2605.26909 · v1 · pith:QMKGEHCSnew · submitted 2026-05-26 · 🧮 math.OC

A Nonmonotone Descent Method for Optimization Problems Defined by Upper-mathcal{C}² Functions over Submanifolds

Pith reviewed 2026-06-29 15:44 UTC · model grok-4.3

classification 🧮 math.OC
keywords upper-C^2 functionssubmanifoldsprojectional subdifferentialsnonmonotone subgradient methodKurdyka-Lojasiewicz propertydifference of convex functionsmanifold clustering
0
0 comments X

The pith

A nonmonotone subgradient method finds stationary points for upper-C^2 optimization on submanifolds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper considers minimization of nonsmooth functions that satisfy a nonsmooth descent lemma, a class called upper-C^2 in Euclidean space, but now defined over submanifolds. It establishes that the descent property extends to this setting when subdifferentials are taken in the projectional sense. A nonmonotone subgradient algorithm is introduced for these problems. The method produces sequences whose accumulation points are stationary, and further convergence and rate results hold when the objective satisfies the Kurdyka-Lojasiewicz property. The approach is illustrated on difference-of-convex objectives and on clustering tasks posed directly on manifolds.

Core claim

The descent property of upper-C^2 functions carries over to submanifolds when the recent notion of projectional subdifferentials is used; this permits a nonmonotone subgradient method whose generated sequences have stationary accumulation points, with convergence and rate-of-convergence guarantees under the Kurdyka-Lojasiewicz property.

What carries the argument

Nonmonotone subgradient iteration that employs projectional subdifferentials to enforce the manifold constraint while preserving the upper-C^2 descent inequality.

Load-bearing premise

The descent property of upper-C^2 functions carries over to submanifolds when projectional subdifferentials are used.

What would settle it

An explicit upper-C^2 function on a submanifold together with an initial point for which every sequence generated by the algorithm has a non-stationary accumulation point.

Figures

Figures reproduced from arXiv: 2605.26909 by Christian Kanzow, Leo Lehmann.

Figure 1
Figure 1. Figure 1: Performance profiles of CPU times for the Minimum Sum-of-Squares Clustering [PITH_FULL_IMAGE:figures/full_fig_p021_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Performance profiles of CPU times for the Multidimensional Scaling problem, [PITH_FULL_IMAGE:figures/full_fig_p023_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance profiles of CPU times for clustering on the sphere with artificial [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance profiles of CPU times for document clustering on the sphere ( [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗
read the original abstract

We consider the optimization problem of minimizing a nonsmooth function characterized by a nonsmooth formulation of the descent lemma over a manifold. In the unconstrained case over a Euclidean space, this class of functions is called upper-$\mathcal{C}^2$. Using the recent notion of projectional subdifferentials, we show that their descent property carries over to submanifolds. We propose a nonmonotone subgradient method to solve these problems and prove stationarity of accumulation points of the generated sequence as well as convergence and rate-of-convergence results under the Kurdyka-Lojasiewicz property. We also perform numerical experiments and show how our approach can be applied to a certain type of difference of convex functions as well as clustering problems on manifolds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript addresses minimization of nonsmooth functions satisfying a nonsmooth descent lemma (upper-C² functions) over submanifolds. It extends the descent property to this setting via projectional subdifferentials, proposes a nonmonotone subgradient algorithm, proves that accumulation points are stationary, and establishes convergence and rate results under the Kurdyka-Łojasiewicz property. Numerical experiments illustrate the method on difference-of-convex problems and manifold clustering.

Significance. If the extension of the descent property and the subsequent convergence analysis hold, the work supplies a new class of tractable nonsmooth problems on manifolds together with a practical algorithm and KL-based guarantees. This could enlarge the scope of nonmonotone methods beyond Euclidean upper-C² functions. The inclusion of concrete numerical tests on clustering is a positive feature.

minor comments (3)
  1. §2: the definition and basic properties of the projectional subdifferential on the manifold should be stated explicitly rather than only referenced, to make the descent-property proof self-contained.
  2. Algorithm 1: the precise rule for choosing the nonmonotone parameter sequence (e.g., the window length or the update of the reference value) is not fully specified; a short pseudocode block would improve reproducibility.
  3. Numerical section: the manifold clustering experiments would benefit from a brief description of the retraction and the projectional-subdifferential oracle used in the implementation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading and positive assessment of our manuscript. The referee correctly identifies the core contributions: extension of the nonsmooth descent lemma to submanifolds via projectional subdifferentials, the nonmonotone subgradient algorithm, stationarity of accumulation points, and KL-based convergence and rate results. We are pleased that the numerical experiments on difference-of-convex problems and manifold clustering are viewed favorably. Since the report lists no specific major comments, we interpret the minor_revision recommendation as an invitation to incorporate any editorial or minor technical suggestions that may appear in a subsequent round; we will prepare a revised version accordingly.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's core claims rest on extending the descent property of upper-C² functions to submanifolds via the external notion of projectional subdifferentials, then constructing a nonmonotone subgradient method whose stationarity and KL-convergence results follow from standard arguments in nonsmooth optimization on manifolds. No equations, fitted parameters, or self-citations are shown to reduce any prediction or theorem to a tautology by construction. The derivation chain remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review performed on abstract only; full technical assumptions, parameter choices, and any invented constructs cannot be audited.

axioms (2)
  • domain assumption Projectional subdifferentials preserve the nonsmooth descent lemma on submanifolds
    Invoked in abstract paragraph 2 to extend the Euclidean upper-C^2 property.
  • domain assumption Kurdyka-Lojasiewicz property holds for the objective
    Used to obtain convergence rates; standard in nonsmooth optimization literature.

pith-pipeline@v0.9.1-grok · 5657 in / 1287 out tokens · 21857 ms · 2026-06-29T15:44:58.395887+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 35 canonical work pages · 2 internal anchors

  1. [1]

    Absil, R

    P.-A. Absil, R. Mahony, and R. Sepulchre.Optimization Algorithms on Matrix Man- ifolds. Princeton University Press, 2008

  2. [2]

    Nonmonotone subgradient methods based on a local descent lemma

    F. J. Aragón-Artacho, R. Campoy, P. Pérez-Aros, and D. Torregrosa-Belén. “Non- monotone subgradient methods based on a local descent lemma”. In: (2025). arXiv: 2510.19341 [math.OC]

  3. [3]

    Accelerating the DC al- gorithm for smooth functions

    F. J. Aragón-Artacho, R. M. T. Fleming, and P. T. Vuong. “Accelerating the DC al- gorithm for smooth functions”. In:Mathematical Programming169.1 (2018), pp. 95– 118.doi:10.1007/s10107-017-1180-1

  4. [4]

    TheBoostedDouble– proximalSubgradientAlgorithmfornonconvexoptimization

    F.J.Aragón-Artacho,P.Pérez-Aros,andD.Torregrosa-Belén.“TheBoostedDouble– proximalSubgradientAlgorithmfornonconvexoptimization”.In:Mathematical Pro- gramming214 (1 2025), pp. 491–537.doi:10.1007/s10107-024-02190-0

  5. [5]

    The Boosted Difference of Convex Func- tions Algorithm for nonsmooth functions

    F. J. Aragón-Artacho and P. T. Vuong. “The Boosted Difference of Convex Func- tions Algorithm for nonsmooth functions”. In:SIAM Journal on Optimization30.1 (2020), pp. 980–1006.doi:10.1137/18M123339X

  6. [6]

    MinimizationoffunctionshavingLipschitz-continuousfirstpartialderiva- tives

    L.Armijo.“MinimizationoffunctionshavingLipschitz-continuousfirstpartialderiva- tives”. In:Pacific Journal of Mathematics16 (1966), pp. 1–3

  7. [7]

    On the convergence of the proximal algorithm for nons- mooth functions involving analytic features

    H. Attouch and J. Bolte. “On the convergence of the proximal algorithm for nons- mooth functions involving analytic features”. In:Mathematical Programming116.1 (2009), pp. 5–16.doi:10.1007/s10107-007-0133-5

  8. [8]

    Proximal alternating mini- mization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality

    H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. “Proximal alternating mini- mization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality”. In:Mathematics of Operations Research35.2 (2010), pp. 438–457.doi:10.1287/moor.1100.0449. 28

  9. [9]

    Two-Point step size gradient methods

    J. Barzilai and J. M. Borwein. “Two-Point step size gradient methods”. In:IMA Journal of Numerical Analysis8.1 (Jan. 1988), pp. 141–148.issn: 0272-4979.doi: 10.1093/imanum/8.1.141

  10. [10]

    A Grassmann manifold handbook: basic geometry and computational aspects

    T. Bendokat, R. Zimmermann, and P.-A. Absil. “A Grassmann manifold handbook: basic geometry and computational aspects”. In:Advances in Computational Math- ematics50 (1 2024).doi:10.1007/s10444-023-10090-8

  11. [11]

    Nonmonotone Spectral Projected Gradient Methods on Convex Sets

    E. G. Birgin, J. M. Martínez, and M. Raydan. “Nonmonotone Spectral Projected Gradient Methods on Convex Sets”. In:SIAM Journal on Optimization10.4 (2000), pp. 1196–1211.doi:10.1137/S105262349733096

  12. [12]

    TheŁojasiewiczinequalityfornonsmoothsub- analytic functions with applications to subgradient dynamical systems

    J.Bolte,A.Daniilidis,andA.Lewis.“TheŁojasiewiczinequalityfornonsmoothsub- analytic functions with applications to subgradient dynamical systems”. In:SIAM Journal on Optimization17.4 (2007), pp. 1205–1223.doi:10.1137/050644641

  13. [13]

    Clarke subgradients of stratifiable functions

    J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota. “Clarke subgradients of stratifiable functions”. In:SIAM Journal on Optimization18.2 (2007), pp. 556–572.doi:10. 1137/060670080

  14. [14]

    Boumal.An Introduction to Optimization on Smooth Manifolds

    N. Boumal.An Introduction to Optimization on Smooth Manifolds. Cambridge Uni- versity Press, 2023.doi:10.1017/9781009166164

  15. [15]

    Global rates of convergence for nonconvex optimizationonmanifolds

    N. Boumal, P.-A. Absil, and C. Cartis. “Global rates of convergence for nonconvex optimizationonmanifolds”.In:IMA Journal of Numerical Analysis39.1(Feb.2018), pp. 1–33.issn: 0272-4979.doi:10.1093/imanum/drx080

  16. [17]

    Intrinsic mean shift for clustering on Stiefel and Grassmann manifolds

    H. E. Cetingul and R. Vidal. “Intrinsic mean shift for clustering on Stiefel and Grassmann manifolds”. In:2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, pp. 1896–1902.doi:10.1109/CVPR.2009.5206806

  17. [18]

    Chikuse.Statistics on Special Manifolds

    Y. Chikuse.Statistics on Special Manifolds. Springer New York, NY, 2003.isbn: 978-0-387-00160-9.doi:10.1007/978-0-387-21540-2

  18. [19]

    F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and R. R. Wolenski.Nonsmooth Analysis and Control Theory. Springer New York, NY, 1998.doi:10.1007/b97650

  19. [20]

    Proximal gradient methods beyond monotony

    A. De Marchi. “Proximal gradient methods beyond monotony”. In:Journal of Non- smooth Analysis and Optimization4 (2023)

  20. [21]

    doi: 10.1023/A: 1013689704352

    I. S. Dhillon and D. S. Modha. “Concept decompositions for large sparse text data using clustering”. In:Machine Learning42 (1 2001), pp. 143–175.doi:10.1023/A: 1007612920971

  21. [22]

    Godaz, B

    R. Godaz, B. Ghojogh, R. Hosseini, R. Monsefi, F. Karray, and M. Crowley.Vector transport free Riemannian LBFGS for optimization on symmetric positive definite matrix manifolds. 2021. arXiv:2108.11019 [math.OC]

  22. [23]

    Algorithms for clustering on the sphere: Advances & applications

    M. Golzy, M. Markatou, and A. Shivram. “Algorithms for clustering on the sphere: Advances & applications”. In:WCECS 2016 - World Congress on Engineering and Computer Science 2016. Lecture Notes in Engineering and Computer Science. News- wood Limited, 2016, pp. 420–425

  23. [24]

    A nonmonotone line search technique for Newton’s method

    L. Grippo, F. Lampariello, and S. Lucidi. “A nonmonotone line search technique for Newton’s method”. In:SIAM Journal on Numerical Analysis23.4 (1986), pp. 707– 716. 29

  24. [25]

    Grassmann clustering

    P. Gruber and F. J. Theis. “Grassmann clustering”. In:European Signal Processing Conference(2006). 14th European Signal Processing Conference, EUSIPCO 2006 ; Conference date: 04-09-2006 Through 08-09-2006.issn: 2219-5491

  25. [26]

    Line search algorithms for locally Lip- schitz functions on Riemannian manifolds

    S. Hosseini, W. Huang, and R. Yousefpour. “Line search algorithms for locally Lip- schitz functions on Riemannian manifolds”. In:SIAM Journal on Optimization28.1 (2018), pp. 596–619.doi:10.1137/16M1108145

  26. [27]

    A brief introduction to manifold op- timization

    J. Hu, X. Liu, Z.-W. Wen, and Y.-X. Yuan. “A brief introduction to manifold op- timization”. In:Journal of the Operations Research Society of China8 (2 20200), pp. 199–248.doi:10.1007/s40305-020-00295-9

  27. [28]

    A Riemannian BFGS method for non- convex optimization problems

    W. Huang, P.-A. Absil, and K. A. Gallivan. “A Riemannian BFGS method for non- convex optimization problems”. In:Numerical Mathematics and Advanced Applica- tions ENUMATH 2015. Cham: Springer International Publishing, 2016, pp. 627– 634.isbn: 978-3-319-39929-4

  28. [29]

    Riemannian proximal gradient methods

    W. Huang and K. Wei. “Riemannian proximal gradient methods”. In:Mathematical Programming194 (1 2022), pp. 371–413.doi:10.1007/s10107-021-01632-3

  29. [30]

    Comparing partitions

    L. Hubert and P. Arabie. “Comparing partitions”. In:Journal of Classification2 (1 1985), pp. 193–218.doi:10.1007/BF01908075

  30. [31]

    Clustering on the torus by conformal prediction

    S. Jung, K. Park, and B. Kim. “Clustering on the torus by conformal prediction”. In:The Annals of Applied Statistics15.4 (2021), pp. 1583 –1603.doi:10.1214/21- AOAS1459

  31. [32]

    Convergence of nonmonotone proximal gradient methods under the Kurdyka-Łojasiewicz property without a global Lipschitz as- sumption

    C. Kanzow and L. Lehmann. “Convergence of nonmonotone proximal gradient methods under the Kurdyka-Łojasiewicz property without a global Lipschitz as- sumption”. In:Journal of Optimization Theory and Applications207.4 (2025).doi: 10.1007/s10957-025-02762-w

  32. [33]

    Kelly, R

    M. Kelly, R. Longjohn, and K. Nottingham.The UCI Machine Learning Repository. url:https://archive.ics.uci.edu

  33. [34]

    On projectional subdifferentials and projectional coderivatives with respect to smooth manifolds

    P. Q. Khanh and T. T. Minh. “On projectional subdifferentials and projectional coderivatives with respect to smooth manifolds”. In:Journal of Optimization Theory and Applications208 (3 2026), p. 113.doi:10.1007/s10957-026-02941-3

  34. [35]

    Numerical solution for optimiza- tion over the efficient set by d.c. optimization algorithms

    H. A. Le Thi, T. Pham Dinh, and L. D. Muu. “Numerical solution for optimiza- tion over the efficient set by d.c. optimization algorithms”. In:Operations Research Letters19.3 (1996), pp. 117–128.issn: 0167-6377.doi:10.1016/0167-6377(96) 00022-3

  35. [36]

    Finding stationary points on bounded-rank matrices: a geometric hurdle and a smooth remedy

    E. Levin, J. Kileel, and N Boumal. “Finding stationary points on bounded-rank matrices: a geometric hurdle and a smooth remedy”. In:Mathematical Programming 199 (1 2023), pp. 831–864.doi:10.1007/s10107-022-01851-2

  36. [37]

    On the limited memory BFGS method for large scale optimization

    D. C. Liu and J. Nocedal. “On the limited memory BFGS method for large scale optimization”. In:Mathematical Programming45 (1 1989), pp. 503–528.doi:10. 1007/BF01589116

  37. [38]

    Lipschitz-like property relative toasetandthegeneralizedMordukhovichcriterion

    K. W. Meng, M. H. Li, W. F. Yao, and X. Q. Yang. “Lipschitz-like property relative toasetandthegeneralizedMordukhovichcriterion”.In:Mathematical Programming 189 (1 2021), pp. 455–489.doi:10.1007/s10107-020-01568-0

  38. [39]

    B. S. Mordukhovich.Variational Analysis and Applications. Springer, 2018.doi: 10.1007/978-3-319-92775-6. 30

  39. [40]

    Grassmannian clustering for multi- variate time sequences

    B.-S. Oh, A. B. J. Teoh, K.-A. Toh, and Z. Lin. “Grassmannian clustering for multi- variate time sequences”. In:New Trends in Computer Technologies and Applications. Springer Singapore, 2019, pp. 654–664.isbn: 978-981-13-9190-3

  40. [41]

    A heuristic algorithm for solving the minimum sum- of-squares clustering problems

    B. Ordin and A. M. Bagirov. “A heuristic algorithm for solving the minimum sum- of-squares clustering problems”. In:Journal of Global Optimization61 (2 2015), pp. 341–361.doi:10.1007/s10898-014-0171-5

  41. [42]

    Global convergence of Riemannian line search methods with a Zhang- Hager-type condition

    H. Oviedo. “Global convergence of Riemannian line search methods with a Zhang- Hager-type condition”. In:Numerical Algorithms91 (3 2022), pp. 1183–1203.doi: 10.1007/s11075-022-01298-8

  42. [43]

    Riemannian BFGS Algorithm with Ap- plications

    C. Qi, K. A. Gallivan, and P.-A. Absil. “Riemannian BFGS Algorithm with Ap- plications”. In:Recent advances in optimization and its applications in engineering. Springer Berlin Heidelberg, 2010, pp. 183–192.isbn: 978-3-642-12598-0

  43. [44]

    Convergence of ZH-Type nonmonotone de- scent method for Kurdyka–Łojasiewicz optimization problems

    Y. Qian, T. Tao, S. Pan, and H. Qi. “Convergence of ZH-Type nonmonotone de- scent method for Kurdyka–Łojasiewicz optimization problems”. In:SIAM Journal on Optimization35.2 (2025), pp. 1089–1109.doi:10.1137/24M1669153

  44. [45]

    Objective criteria for the evaluation of clustering methods

    W. M. Rand. “Objective criteria for the evaluation of clustering methods”. In: Journal of the American Statistical Association66.336 (1971), pp. 846–850.doi: 10.1080/01621459.1971.10482356

  45. [46]

    R. T. Rockafellar and R. J.-B. Wets.Variational Analysis. Springer, 2009.doi: 10.1007/978-3-642-02431-3

  46. [47]

    V-Measure: A conditional entropy-based exter- nal cluster evaluation measure

    A. Rosenberg and J. Hirschberg. “V-Measure: A conditional entropy-based exter- nal cluster evaluation measure”. In:Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Lan- guage Learning (EMNLP-CoNLL).AssociationforComputationalLinguistics,2007, pp. 410–420

  47. [48]

    A Bayesian Mixture Model for Clustering on the Stiefel Manifold

    S. Sengupta, S. Pal, R. Mitra, Y. Guo, A. Banerjee, and Y. Ji.A Bayesian mixture model for clustering on the stiefel manifold. 2017. arXiv:1708.07196 [stat.ME]

  48. [49]

    Pymanopt: A Python Toolbox for Op- timization on Manifolds using Automatic Differentiation

    J. Townsend, N. Koep, and S. Weichwald. “Pymanopt: A Python Toolbox for Op- timization on Manifolds using Automatic Differentiation”. In:Journal of Machine Learning Research17.137 (2016), 1–5.url:http://jmlr.org/papers/v17/16- 177.html

  49. [50]

    Wiechers, B

    H. Wiechers, B. Eltzner, S. F. Huckemann, and K. V. Mardia.Clustering schemes on the torus with application to RNA clashes. 2021. arXiv:2104.00094 [q-bio.BM]

  50. [51]

    A nonmonotone line search technique and its ap- plication to unconstrained optimization

    H. Zhang and W. W. Hager. “A nonmonotone line search technique and its ap- plication to unconstrained optimization”. In:SIAM Journal on Optimization14.4 (2004), pp. 1043–1056. A Complete Proofs under the KL Property Let us begin with a full proof of Lemma 4.3: Lemma A.1.For all sufficiently largek∈NwithR k < φ(x ∗) +ηandx k ∈ Bϑ(x∗), the following inequal...