pith. machine review for the scientific record. sign in

arxiv: 2605.02838 · v2 · submitted 2026-05-04 · 🧮 math.OC · cs.AI· cs.LG· cs.NA· math.NA

Recognition: 3 theorem links

· Lean Theorem

A second-order method landing on the Stiefel manifold via Newtonunicode{x2013}Schulz iteration

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:11 UTC · model grok-4.3

classification 🧮 math.OC cs.AIcs.LGcs.NAmath.NA
keywords Stiefel manifoldNewton-Schulz iterationretraction-free optimizationquadratic convergencesecond-order methodorthogonal Procrustes problemprincipal component analysis
0
0 comments X

The pith

A Newton-Schulz iteration supplies the normal component for a retraction-free second-order method on the Stiefel manifold that converges quadratically.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs an optimization update that lands exactly on the Stiefel manifold of orthogonal matrices by adding a tangent direction and a normal direction. The tangent part solves a modified Newton equation to decrease the objective function. The normal part comes from a few steps of the Newton-Schulz fixed-point iteration, which the authors show points exactly normal to the current level set of the orthogonality constraint. Because the two directions do not interfere, the combined step reduces both the objective and the infeasibility, and the method therefore achieves local quadratic convergence without ever calling a retraction.

Core claim

The update consists of the sum of a component tangent to the level set of the constraint-defining function that aims to reduce the objective and a component normal to the same level set that reduces the infeasibility. The normal component is constructed via Newton-Schulz iteration for orthogonalization, and the tangent component is obtained from a modified Newton equation that incorporates the same iteration. This construction is proved to enjoy local quadratic convergence, or superlinear convergence for its inexact variant.

What carries the argument

Newton-Schulz iteration, shown to generate displacements along the normal space to the Stiefel constraint level set, combined with a modified Newton equation in the tangent space.

If this is right

  • Each iteration requires only matrix multiplications and no retraction or vector transport, lowering per-step cost relative to Riemannian Newton methods.
  • The method reaches high-accuracy solutions faster than first-order retraction-free alternatives on the orthogonal Procrustes problem, PCA, and real-data ICA.
  • The inexact variant that stops Newton-Schulz early still converges superlinearly while using even fewer matrix products.
  • The separation of tangent and normal components allows the algorithm to be applied directly to any smooth objective whose Euclidean gradient is available.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same normal-component construction could be tested on other matrix manifolds for which a cheap fixed-point orthogonalization scheme exists.
  • Replacing the modified Newton equation with a limited-memory quasi-Newton approximation would trade quadratic for superlinear convergence at lower memory cost.
  • Global convergence might be obtained by adding a simple backtracking line search that preserves the tangent-normal decomposition.

Load-bearing premise

The Newton-Schulz iteration must produce a displacement that lies strictly in the normal space to the level set of the orthogonality constraint at the current point.

What would settle it

A numerical run on a small Stiefel manifold instance in which the measured convergence rate after the first few iterations falls below quadratic while the constraint violation is still above machine epsilon.

Figures

Figures reproduced from arXiv: 2605.02838 by Bin Gao, P.-A. Absil, Xinhui Xiong.

Figure 1
Figure 1. Figure 1: An illustration of second-order landing methods with intuitive and corrected view at source ↗
Figure 2
Figure 2. Figure 2: Numerical comparison of different methods on the orthogonal Procrustes view at source ↗
Figure 3
Figure 3. Figure 3: Numerical comparison of different methods on principal component analysis. view at source ↗
Figure 4
Figure 4. Figure 4: Numerical comparison of different methods on independent component analysis view at source ↗
read the original abstract

Retraction-free approaches offer attractive low-cost alternatives to Riemannian methods on the Stiefel manifold, but they are often first-order, which may limit the efficiency under high-accuracy requirements. To this end, we propose a second-order method landing on the Stiefel manifold without invoking retractions, which is proved to enjoy local quadratic (or superlinear for its inexact variant) convergence. The update consists of the sum of (i) a component tangent to the level set of the constraint-defining function that aims to reduce the objective and (ii) a component normal to the same level set that reduces the infeasibility. Specifically, we construct the normal component via Newton$\unicode{x2013}$Schulz, a fixed-point iteration for orthogonalization. Moreover, we establish a geometric connection between the Newton$\unicode{x2013}$Schulz iteration and Stiefel manifolds, in which Newton$\unicode{x2013}$Schulz moves along the normal space. For the tangent component, we formulate a modified Newton equation that incorporates Newton$\unicode{x2013}$Schulz. Numerical experiments on the orthogonal Procrustes problem, principal component analysis, and real-data independent component analysis illustrate that the proposed method performs better than the existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a retraction-free second-order method for optimization on the Stiefel manifold. The step is split into (i) a tangent component obtained from a modified Newton equation that reduces the objective while incorporating Newton-Schulz information and (ii) a normal component generated by the Newton-Schulz fixed-point iteration that drives the infeasibility measure g(X) = X^T X - I to zero. The authors establish a geometric connection asserting that Newton-Schulz iterations remain in the normal bundle to the constraint level set. They prove local quadratic convergence for the exact variant and superlinear convergence for an inexact variant. Numerical experiments on the orthogonal Procrustes problem, PCA, and real-data ICA are reported to show improved performance over existing methods.

Significance. If the local quadratic convergence result is rigorously established without hidden cross-term cancellations, the work supplies a computationally attractive second-order alternative to Riemannian retraction-based methods on the Stiefel manifold. The geometric link between Newton-Schulz and the normal space is a genuine insight that could extend to other orthogonality-constrained problems. The manuscript provides an explicit convergence proof and reproducible numerical comparisons, both of which strengthen its contribution. The approach may be particularly useful in high-accuracy regimes where first-order retraction-free schemes converge too slowly.

major comments (2)
  1. [§3] §3 (Convergence analysis): The proof of quadratic convergence for the composite map assumes that the Newton-Schulz normal correction is O(‖error‖²) and that its inner product with the tangent Newton direction vanishes to first order, so that curvature-induced cross terms do not degrade the rate. The geometric connection is invoked to justify decoupling, but an explicit expansion of the error recurrence that bounds the second-fundamental-form contribution (or shows it is absorbed into the quadratic term) is required; without it the claim that the tangent quadratic rate is preserved remains conditional on the connection holding exactly at O(‖error‖) distance from the manifold.
  2. [§2.2] §2.2 (Modified Newton equation): The tangent component is obtained from a modified Newton equation that 'incorporates Newton-Schulz.' The precise substitution (whether the normal correction appears inside the linear operator, the right-hand side, or only as a post-correction) is not stated with an equation number; this substitution is load-bearing for both the descent property and the quadratic-rate argument, and its explicit form must be given before the convergence theorem can be verified.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'landing on the Stiefel manifold' is colloquial; replace with 'converging to a feasible point on the Stiefel manifold' for precision.
  2. [Numerical experiments] Numerical section: the tables or figures comparing iteration counts and CPU time should include the exact stopping tolerance and the number of random initializations used, to allow direct reproduction of the reported advantage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which have helped us improve the clarity and rigor of the manuscript. We address each major comment point by point below. Revisions have been made to strengthen the convergence analysis and clarify the algorithmic construction.

read point-by-point responses
  1. Referee: [§3] §3 (Convergence analysis): The proof of quadratic convergence for the composite map assumes that the Newton-Schulz normal correction is O(‖error‖²) and that its inner product with the tangent Newton direction vanishes to first order, so that curvature-induced cross terms do not degrade the rate. The geometric connection is invoked to justify decoupling, but an explicit expansion of the error recurrence that bounds the second-fundamental-form contribution (or shows it is absorbed into the quadratic term) is required; without it the claim that the tangent quadratic rate is preserved remains conditional on the connection holding exactly at O(‖error‖) distance from the manifold.

    Authors: We appreciate the referee's emphasis on making the error analysis fully explicit. The geometric connection (Theorem 2.3) establishes that Newton-Schulz iterations remain in the normal bundle, which directly implies that the normal correction is quadratic in the distance to the manifold and orthogonal to the tangent space at leading order. To remove any conditional aspect, the revised Section 3 now contains a detailed Taylor expansion of the composite map F(X) = X + tangent Newton step + Newton-Schulz normal step. This expansion isolates the second-fundamental-form terms arising from the manifold curvature and shows they are O(‖error‖³) or higher, hence absorbed into the quadratic remainder. The revised proof therefore confirms that the quadratic rate of the tangent Newton step is preserved without hidden cancellations. revision: yes

  2. Referee: [§2.2] §2.2 (Modified Newton equation): The tangent component is obtained from a modified Newton equation that 'incorporates Newton-Schulz.' The precise substitution (whether the normal correction appears inside the linear operator, the right-hand side, or only as a post-correction) is not stated with an equation number; this substitution is load-bearing for both the descent property and the quadratic-rate argument, and its explicit form must be given before the convergence theorem can be verified.

    Authors: We agree that the precise incorporation of the Newton-Schulz correction into the tangent step must be stated unambiguously. In the revised manuscript we have introduced Equation (2.5), which defines the modified Newton equation as H(X)Δ = −∇f(X) + P_N(X)·(Newton-Schulz correction term), where the normal-space projection of the Newton-Schulz iterate appears on the right-hand side. This choice preserves the symmetry of the Hessian approximation while ensuring compatibility with the normal-bundle geometry used in the convergence proof. The descent property follows from the standard Newton decrease along the tangent direction, and the quadratic-rate argument is unaffected because the added term is already quadratic. revision: yes

Circularity Check

0 steps flagged

Derivation self-contained; no load-bearing step reduces to input by construction

full rationale

The paper constructs its update as the sum of a tangent modified-Newton direction (to reduce the objective) and a normal Newton-Schulz correction (to drive the constraint g(X)=X^T X - I to zero). It establishes the geometric connection that Newton-Schulz moves along the normal bundle within the present manuscript rather than importing it from prior self-citation. The local quadratic-convergence argument then follows from the O(‖error‖²) decay of the normal component and the vanishing first-order inner product with the tangent direction; both properties are derived from the paper's own equations and the established connection, not from a fitted parameter renamed as prediction or a self-referential definition. No quoted step equates the claimed result to its inputs by construction, and external benchmarks (orthogonal Procrustes, PCA, ICA) are used only for illustration, not for fitting the convergence rate.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard properties of the Stiefel manifold as a level set and the convergence behavior of Newton-Schulz iteration for orthogonalization; no free parameters or invented entities are introduced in the abstract.

axioms (2)
  • standard math Newton-Schulz iteration converges to an orthogonal matrix when applied to a suitable starting point near the Stiefel manifold
    Invoked to construct the normal component that reduces infeasibility.
  • domain assumption The Stiefel manifold can be treated as the level set of a constraint-defining function whose tangent and normal spaces are well-defined
    Used to decompose the update into tangent and normal parts.

pith-pipeline@v0.9.0 · 5545 in / 1387 out tokens · 47771 ms · 2026-05-08T18:11:01.779347+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

111 extracted references · 42 canonical work pages · 1 internal anchor

  1. [2]

    2023 , url =

    An introduction to optimization on smooth manifolds , author =. 2023 , url =

  2. [3]

    Absil , journal=

    Bin Gao and Simon Vary and Pierre Ablin and P.-A. Absil , journal=. Optimization flows landing on the. 2022 , publisher=

  3. [4]

    Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , pages =

    Fast and accurate optimization on the orthogonal manifold without retraction , author =. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , pages =. 2022 , volume =

  4. [5]

    Journal of Machine Learning Research , volume=

    Infeasible deterministic, stochastic, and variance-reduction algorithms for optimization under orthogonality constraints , author=. Journal of Machine Learning Research , volume=. 2024 , url=

  5. [6]

    SIAM journal on Matrix Analysis and Applications , volume=

    The geometry of algorithms with orthogonality constraints , author=. SIAM journal on Matrix Analysis and Applications , volume=. 1998 , publisher=

  6. [7]

    Optimization without Retraction on the Random Generalized

    Vary, Simon and Ablin, Pierre and Gao, Bin and Absil, Pierre-Antoine , booktitle =. Optimization without Retraction on the Random Generalized. 2024 , volume =

  7. [8]

    Paige, C. C. and Saunders, M. A. , title =. SIAM Journal on Numerical Analysis , volume =. 1975 , doi =

  8. [9]

    SIAM Journal on Scientific Computing , volume =

    Gao, Bin and Liu, Xin and Yuan, Ya-xiang , title =. SIAM Journal on Scientific Computing , volume =. 2019 , doi =

  9. [10]

    and Feppon, Florian

    Goyens, Florentin and Absil, P.-A. and Feppon, Florian. Geometric Design of the Tangent Term in Landing Algorithms for Orthogonality Constraints. Geometric Science of Information. 2026

  10. [11]

    An Iterative Algorithm for Computing the Best Estimate of an Orthogonal Matrix , journal =

    Bj\". An Iterative Algorithm for Computing the Best Estimate of an Orthogonal Matrix , journal =. 1971 , doi =

  11. [12]

    SIAM Journal on Numerical Analysis , volume =

    Kovarik, Zdislav , title =. SIAM Journal on Numerical Analysis , volume =. 1970 , doi =

  12. [13]

    2024 , url =

    Keller Jordan and Yuchen Jin and Vlado Boza and You Jiacheng and Franz Cesista and Laker Newhouse and Jeremy Bernstein , title =. 2024 , url =

  13. [15]

    Local Linear Convergence of Infeasible Optimization With Orthogonal Constraints , year=

    Sun, Youbang and Chen, Shixiang and Garcia, Alfredo and Shahrampour, Shahin , journal=. Local Linear Convergence of Infeasible Optimization With Orthogonal Constraints , year=

  14. [16]

    ESAIM: Control, Optimisation and Calculus of Variations , volume=

    Null space gradient flows for constrained optimization with applications to shape optimization , author=. ESAIM: Control, Optimisation and Calculus of Variations , volume=. 2020 , publisher=

  15. [17]

    , journal=

    Muehlebach, Michael and Jordan, Michael I. , journal=. On constraints in first-order optimization:. 2022 , pdf=

  16. [18]

    Orthogonal directions constrained gradient method: from non-linear equality constraints to

    Schechtman, Sholom and Tiapkin, Daniil and Muehlebach, Michael and Moulines,. Orthogonal directions constrained gradient method: from non-linear equality constraints to. The Thirty Sixth Annual Conference on Learning Theory , pages=. 2023 , publisher=

  17. [19]

    2026 , url=

    A Simple First-Order Algorithm for Full-Rank Equality Constrained Optimization , author=. 2026 , url=

  18. [20]

    2025 , url=

    Adaptive directional decomposition methods for nonconvex constrained optimization , author=. 2025 , url=

  19. [21]

    and Steihaug, Trond , journal=

    Dembo, Ron S. and Steihaug, Trond , journal=. Truncated-. 1983 , publisher=

  20. [22]

    Journal of educational psychology , volume=

    Analysis of a complex of statistical variables into principal components , author=. Journal of educational psychology , volume=. 1933 , publisher=

  21. [24]

    Natural image statistics: A probabilistic approach to early computational vision , pages=

    Independent component analysis , author=. Natural image statistics: A probabilistic approach to early computational vision , pages=. 2001 , publisher=

  22. [25]

    A. Hyv. Independent component analysis: algorithms and applications , journal =. 2000 , doi =

  23. [26]

    Ng, A. Y. and Jordan, M. I. and Weiss, Yair , booktitle =. On Spectral Clustering: Analysis and an algorithm , url =

  24. [27]

    Statistics and Computing , volume=

    A tutorial on spectral clustering , author=. Statistics and Computing , volume=. 2007 , publisher=

  25. [28]

    A generalized solution of the orthogonal

    Sch. A generalized solution of the orthogonal. Psychometrika , volume=. 1966 , publisher=

  26. [29]

    and Elad, M

    Aharon, M. and Elad, M. and Bruckstein, A. , journal=. 2006 , volume=

  27. [30]

    Journal of Machine Learning Research , year =

    Julien Mairal and Francis Bach and Jean Ponce and Guillermo Sapiro , title =. Journal of Machine Learning Research , year =

  28. [31]

    Proceedings of The 33rd International Conference on Machine Learning , pages =

    Unitary Evolution Recurrent Neural Networks , author =. Proceedings of The 33rd International Conference on Machine Learning , pages =. 2016 , volume =

  29. [32]

    Parseval networks:

    Cisse, Moustapha and Bojanowski, Piotr and Grave, Edouard and Dauphin, Yann and Usunier, Nicolas , booktitle =. Parseval networks:. 2017 , publisher=

  30. [33]

    Can We Gain More from Orthogonality Regularizations in Training Deep Networks? , url =

    Bansal, Nitin and Chen, Xiaohan and Wang, Zhangyang , booktitle =. Can We Gain More from Orthogonality Regularizations in Training Deep Networks? , url =

  31. [34]

    and Baker, Christopher G

    Absil, P.-A. and Baker, Christopher G. and Gallivan, Kyle A. , journal=. Trust-region methods on. 2007 , publisher=

  32. [35]

    Huang, Wen and Gallivan, K. A. and Absil, P.-A. , journal =. A. 2015 , doi =

  33. [36]

    IEEE Transactions on Automatic Control , title=

    Bonnabel, Silv. IEEE Transactions on Automatic Control , title=. 2013 , volume=

  34. [37]

    Reddi, Sashank and Sra, Suvrit , booktitle =

    Zhang, Hongyi and J. Reddi, Sashank and Sra, Suvrit , booktitle =. Riemannian

  35. [38]

    International Conference on Learning Representations , year=

    Riemannian Adaptive Optimization Methods , author=. International Conference on Learning Representations , year=

  36. [39]

    Hongyi Zhang and Suvrit Sra , year=. Towards. 1806.02812 , archivePrefix=

  37. [40]

    Mathematical Programming , volume=

    A feasible method for optimization with orthogonality constraints , author=. Mathematical Programming , volume=. 2013 , publisher=

  38. [41]

    Mathematical Programming , volume=

    An exact penalty function for nonlinear programming with inequalities , author=. Mathematical Programming , volume=. 1973 , publisher=

  39. [42]

    Optimization Methods and Software , volume =

    Xiao, Nachuan and Liu, Xin and Yuan, Ya-xiang , title =. Optimization Methods and Software , volume =. 2022 , publisher =

  40. [43]

    SIAM Journal on Optimization , volume =

    Xiao, Nachuan and Liu, Xin and Yuan, Ya-xiang , title =. SIAM Journal on Optimization , volume =. 2021 , doi =

  41. [44]

    2024 , journal =

    Nachuan Xiao and Xin Liu , title =. 2024 , journal =

  42. [45]

    Dissolving constraints for

    Xiao, Nachuan and Liu, Xin and Toh, Kim-Chuan , journal=. Dissolving constraints for. 2023 , publisher=

  43. [46]

    Haiyang Peng and Deren Han and Xin Chen and Meng Huang , year=

  44. [47]

    A penalty-free infeasible approach for a class of nonsmooth optimization problems over the

    Liu, Xin and Xiao, Nachuan and Yuan, Ya-xiang , journal=. A penalty-free infeasible approach for a class of nonsmooth optimization problems over the. 2024 , publisher=

  45. [48]

    Distributed Retraction-Free and Communication-Efficient Optimization on the

    Song, Yilong and Li, Peijin and Gao, Bin and Yuan, Kun , booktitle =. Distributed Retraction-Free and Communication-Efficient Optimization on the. 2025 , volume =

  46. [49]

    Goyens, Florentin and Feppon, Florian , year=. The

  47. [50]

    van der Vorst, H. A. , title =. SIAM Journal on Scientific and Statistical Computing , volume =. 1992 , doi =

  48. [51]

    2026 , url=

    An Embarrassingly Simple Way to Optimize Orthogonal Matrices at Scale , author=. 2026 , url=

  49. [52]

    SIAM Journal on Scientific Computing , volume =

    Gao, Bin and Hu, Guanghui and Kuang, Yang and Liu, Xin , title =. SIAM Journal on Scientific Computing , volume =. 2022 , doi =

  50. [53]

    Journal of the Operations Research Society of China , volume=

    A Brief Introduction to Manifold Optimization , author=. Journal of the Operations Research Society of China , volume=. 2020 , publisher=

  51. [54]

    Decentralized Optimization Over the

    Wang, Lei and Liu, Xin , journal=. Decentralized Optimization Over the. 2022 , volume=

  52. [55]

    Sun, Youbang and Chen, Shixiang and Garcia, Alfredo and Shahrampour, Shahin , title =

  53. [56]

    Riemannian

    Sato, Hiroyuki , volume=. Riemannian. 2021 , publisher=

  54. [58]

    IEEE Transactions on Neural Networks , title=

    Hyv. IEEE Transactions on Neural Networks , title=. 1999 , volume=

  55. [59]

    and Walker, Homer F

    Eisenstat, Stanley C. and Walker, Homer F. , title =. SIAM Journal on Scientific Computing , volume =. 1996 , doi =

  56. [60]

    , title =

    Kenney, Charles and Laub, Alan J. , title =. SIAM Journal on Matrix Analysis and Applications , volume =. 1991 , doi =

  57. [61]

    2026 , url =

    A unified landing framework for equality-constrained optimization , author=. 2026 , url =

  58. [62]

    Baker, A. H. and Jessup, E. R. and Manteuffel, T. , title =. SIAM Journal on Matrix Analysis and Applications , volume =. 2005 , doi =

  59. [63]

    SIAM Journal on Optimization , volume =

    Lalee, Marucha and Nocedal, Jorge and Plantenga, Todd , title =. SIAM Journal on Optimization , volume =. 1998 , doi =

  60. [64]

    The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science , volume =

    Karl Pearson , title =. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science , volume =. 1901 , publisher =

  61. [65]

    , series =

    Numerical Optimization , author =. 2006 , publisher =. doi:10.1007/978-0-387-40065-5 , isbn =

  62. [66]

    Ablin and G

    P. Ablin and G. Peyr\'e , Fast and accurate optimization on the orthogonal manifold without retraction , in Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, vol. 151 of Proceedings of Machine Learning Research, PMLR, 28--30 Mar 2022, pp. 5636--5657, https://proceedings.mlr.press/v151/ablin22a.html

  63. [67]

    Ablin, S

    P. Ablin, S. Vary, B. Gao, and P.-A. Absil , Infeasible deterministic, stochastic, and variance-reduction algorithms for optimization under orthogonality constraints , Journal of Machine Learning Research, 25 (2024), pp. 1--38, http://jmlr.org/papers/v25/23-0451.html

  64. [68]

    Absil, C

    P.-A. Absil, C. G. Baker, and K. A. Gallivan , Trust-region methods on R iemannian manifolds , Foundations of Computational Mathematics, 7 (2007), pp. 303--330, https://doi.org/10.1007/s10208-005-0179-9

  65. [69]

    Absil, R

    P.-A. Absil, R. Mahony, and R. Sepulchre , Optimization Algorithms on Matrix Manifolds , Princeton University Press, Princeton, 2009, https://doi.org/10.1515/9781400830244

  66. [70]

    A. H. Baker, E. R. Jessup, and T. Manteuffel , A technique for accelerating the convergence of restarted GMRES , SIAM Journal on Matrix Analysis and Applications, 26 (2005), pp. 962--984, https://doi.org/10.1137/S0895479803422014

  67. [71]

    Bansal, X

    N. Bansal, X. Chen, and Z. Wang , Can we gain more from orthogonality regularizations in training deep networks? , 31 (2018), https://proceedings.neurips.cc/paper_files/paper/2018/file/bf424cb7b0dea050a42b9739eb261a3a-Paper.pdf

  68. [72]

    Bj\" o rck and C

    r. Bj\" o rck and C. Bowie , An iterative algorithm for computing the best estimate of an orthogonal matrix , SIAM Journal on Numerical Analysis, 8 (1971), pp. 358--364, https://doi.org/10.1137/0708036

  69. [73]

    Cambridge University Press

    N. Boumal , An introduction to optimization on smooth manifolds , Cambridge University Press, 2023, https://doi.org/10.1017/9781009166164, https://www.nicolasboumal.net/book

  70. [74]

    Cisse, P

    M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier , Parseval networks: I mproving robustness to adversarial examples , in Proceedings of the 34th International Conference on Machine Learning, PMLR, 2017, pp. 854--863, https://proceedings.mlr.press/v70/cisse17a.html

  71. [75]

    Arias, and Steven T

    A. Edelman, T. A. Arias, and S. T. Smith , The geometry of algorithms with orthogonality constraints , SIAM journal on Matrix Analysis and Applications, 20 (1998), pp. 303--353, https://doi.org/10.1137/S0895479895290954

  72. [76]

    S. C. Eisenstat and H. F. Walker , Choosing the forcing terms in an inexact newton method , SIAM Journal on Scientific Computing, 17 (1996), pp. 16--32, https://doi.org/10.1137/0917003

  73. [77]

    Fletcher , An exact penalty function for nonlinear programming with inequalities , Mathematical Programming, 5 (1973), pp

    R. Fletcher , An exact penalty function for nonlinear programming with inequalities , Mathematical Programming, 5 (1973), pp. 129--150, https://doi.org/10.1007/BF01580117

  74. [78]

    B. Gao, G. Hu, Y. Kuang, and X. Liu , An orthogonalization-free parallelizable framework for all-electron calculations in density functional theory , SIAM Journal on Scientific Computing, 44 (2022), pp. B723--B745, https://doi.org/10.1137/20M1355884

  75. [79]

    B. Gao, X. Liu, and Y.-x. Yuan , Parallelizable algorithms for optimization problems with orthogonality constraints , SIAM Journal on Scientific Computing, 41 (2019), pp. A1949--A1983, https://doi.org/10.1137/18M1221679

  76. [80]

    B. Gao, S. Vary, P. Ablin, and P.-A. Absil , Optimization flows landing on the S tiefel manifold , IFAC-PapersOnLine, 55 (2022), pp. 25--30, https://doi.org/10.1016/j.ifacol.2022.11.023. 25th International Symposium on Mathematical Theory of Networks and Systems MTNS 2022

  77. [81]

    Goyens, P.-A

    F. Goyens, P.-A. Absil, and F. Feppon , Geometric design of the tangent term in landing algorithms for orthogonality constraints , in Geometric Science of Information, Springer Nature Switzerland, 2026, pp. 133--141, https://doi.org/10.1007/978-3-032-03924-8_14

  78. [82]

    Goyens, A

    F. Goyens, A. Eftekhari, and N. Boumal , Computing second-order points under equality constraints: revisiting F letcher's augmented L agrangian , Journal of Optimization Theory and Applications, 201 (2024), pp. 1198--1228, https://doi.org/10.1007/s10957-024-02421-6

  79. [83]

    Goyens and F

    F. Goyens and F. Feppon , The R iemannian landing method: from projected gradient flows to SQP , (2026), https://hal.science/hal-05460164

  80. [84]

    Gratton and P

    S. Gratton and P. L. Toint , A simple first-order algorithm for full-rank equality constrained optimization , (2026), https://arxiv.org/abs/2510.16390

Showing first 80 references.