arxiv: 2605.02838 · v2 · submitted 2026-05-04 · 🧮 math.OC · cs.AI· cs.LG· cs.NA· math.NA

Recognition: 3 theorem links

· Lean Theorem

A second-order method landing on the Stiefel manifold via Newtonunicode{x2013}Schulz iteration

Xinhui Xiong , Bin Gao , P.-A. Absil

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:11 UTC · model grok-4.3

classification 🧮 math.OC cs.AIcs.LGcs.NAmath.NA

keywords Stiefel manifoldNewton-Schulz iterationretraction-free optimizationquadratic convergencesecond-order methodorthogonal Procrustes problemprincipal component analysis

0 comments

The pith

A Newton-Schulz iteration supplies the normal component for a retraction-free second-order method on the Stiefel manifold that converges quadratically.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs an optimization update that lands exactly on the Stiefel manifold of orthogonal matrices by adding a tangent direction and a normal direction. The tangent part solves a modified Newton equation to decrease the objective function. The normal part comes from a few steps of the Newton-Schulz fixed-point iteration, which the authors show points exactly normal to the current level set of the orthogonality constraint. Because the two directions do not interfere, the combined step reduces both the objective and the infeasibility, and the method therefore achieves local quadratic convergence without ever calling a retraction.

Core claim

The update consists of the sum of a component tangent to the level set of the constraint-defining function that aims to reduce the objective and a component normal to the same level set that reduces the infeasibility. The normal component is constructed via Newton-Schulz iteration for orthogonalization, and the tangent component is obtained from a modified Newton equation that incorporates the same iteration. This construction is proved to enjoy local quadratic convergence, or superlinear convergence for its inexact variant.

What carries the argument

Newton-Schulz iteration, shown to generate displacements along the normal space to the Stiefel constraint level set, combined with a modified Newton equation in the tangent space.

If this is right

Each iteration requires only matrix multiplications and no retraction or vector transport, lowering per-step cost relative to Riemannian Newton methods.
The method reaches high-accuracy solutions faster than first-order retraction-free alternatives on the orthogonal Procrustes problem, PCA, and real-data ICA.
The inexact variant that stops Newton-Schulz early still converges superlinearly while using even fewer matrix products.
The separation of tangent and normal components allows the algorithm to be applied directly to any smooth objective whose Euclidean gradient is available.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same normal-component construction could be tested on other matrix manifolds for which a cheap fixed-point orthogonalization scheme exists.
Replacing the modified Newton equation with a limited-memory quasi-Newton approximation would trade quadratic for superlinear convergence at lower memory cost.
Global convergence might be obtained by adding a simple backtracking line search that preserves the tangent-normal decomposition.

Load-bearing premise

The Newton-Schulz iteration must produce a displacement that lies strictly in the normal space to the level set of the orthogonality constraint at the current point.

What would settle it

A numerical run on a small Stiefel manifold instance in which the measured convergence rate after the first few iterations falls below quadratic while the constraint violation is still above machine epsilon.

Figures

Figures reproduced from arXiv: 2605.02838 by Bin Gao, P.-A. Absil, Xinhui Xiong.

**Figure 1.** Figure 1: An illustration of second-order landing methods with intuitive and corrected view at source ↗

**Figure 2.** Figure 2: Numerical comparison of different methods on the orthogonal Procrustes view at source ↗

**Figure 3.** Figure 3: Numerical comparison of different methods on principal component analysis. view at source ↗

**Figure 4.** Figure 4: Numerical comparison of different methods on independent component analysis view at source ↗

read the original abstract

Retraction-free approaches offer attractive low-cost alternatives to Riemannian methods on the Stiefel manifold, but they are often first-order, which may limit the efficiency under high-accuracy requirements. To this end, we propose a second-order method landing on the Stiefel manifold without invoking retractions, which is proved to enjoy local quadratic (or superlinear for its inexact variant) convergence. The update consists of the sum of (i) a component tangent to the level set of the constraint-defining function that aims to reduce the objective and (ii) a component normal to the same level set that reduces the infeasibility. Specifically, we construct the normal component via Newton$\unicode{x2013}$Schulz, a fixed-point iteration for orthogonalization. Moreover, we establish a geometric connection between the Newton$\unicode{x2013}$Schulz iteration and Stiefel manifolds, in which Newton$\unicode{x2013}$Schulz moves along the normal space. For the tangent component, we formulate a modified Newton equation that incorporates Newton$\unicode{x2013}$Schulz. Numerical experiments on the orthogonal Procrustes problem, principal component analysis, and real-data independent component analysis illustrate that the proposed method performs better than the existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a retraction-free second-order method on the Stiefel manifold by adding a Newton-Schulz normal correction to a modified tangent Newton step, with a claimed local quadratic convergence that rests on a geometric decoupling argument.

read the letter

The core contribution is a concrete split of the update: a tangent piece that solves a modified Newton equation to decrease the objective, plus a normal piece that applies Newton-Schulz iteration to drive the constraint X^T X = I back to zero. The authors link Newton-Schulz to motion along the normal bundle, which lets them avoid any retraction while still targeting quadratic local convergence (or superlinear for the inexact version). That construction is new in the second-order setting and does not collapse to earlier retraction-free first-order schemes. The numerical examples on the orthogonal Procrustes problem, PCA, and ICA are presented as showing practical gains over existing methods, which is useful evidence even if the details are only summarized in the abstract. The geometric connection they invoke is the part that makes the approach interesting rather than ad hoc. The main soft spot is the convergence analysis. Quadratic rate requires that the normal correction be O(error squared) and that it not introduce first-order interference with the tangent direction. The stress-test concern about possible cross terms from manifold curvature when the current point is still O(error) away from the manifold is reasonable and needs to be checked against the actual error bounds. If the proof only shows the connection asymptotically or under the Euclidean metric, the composite map could lose the quadratic term. The abstract states that a proof exists, but without the derivation or the precise assumptions on the inner product and higher-order remainders, the claim cannot be taken as settled. This work is aimed at people already working on manifold-constrained second-order methods who are looking for cheaper alternatives to retractions when high accuracy matters. The construction is specific enough and the underlying iterations are standard, so the paper is worth sending to peer review so that the convergence details and the experimental setup can be examined directly.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a retraction-free second-order method for optimization on the Stiefel manifold. The step is split into (i) a tangent component obtained from a modified Newton equation that reduces the objective while incorporating Newton-Schulz information and (ii) a normal component generated by the Newton-Schulz fixed-point iteration that drives the infeasibility measure g(X) = X^T X - I to zero. The authors establish a geometric connection asserting that Newton-Schulz iterations remain in the normal bundle to the constraint level set. They prove local quadratic convergence for the exact variant and superlinear convergence for an inexact variant. Numerical experiments on the orthogonal Procrustes problem, PCA, and real-data ICA are reported to show improved performance over existing methods.

Significance. If the local quadratic convergence result is rigorously established without hidden cross-term cancellations, the work supplies a computationally attractive second-order alternative to Riemannian retraction-based methods on the Stiefel manifold. The geometric link between Newton-Schulz and the normal space is a genuine insight that could extend to other orthogonality-constrained problems. The manuscript provides an explicit convergence proof and reproducible numerical comparisons, both of which strengthen its contribution. The approach may be particularly useful in high-accuracy regimes where first-order retraction-free schemes converge too slowly.

major comments (2)

[§3] §3 (Convergence analysis): The proof of quadratic convergence for the composite map assumes that the Newton-Schulz normal correction is O(‖error‖²) and that its inner product with the tangent Newton direction vanishes to first order, so that curvature-induced cross terms do not degrade the rate. The geometric connection is invoked to justify decoupling, but an explicit expansion of the error recurrence that bounds the second-fundamental-form contribution (or shows it is absorbed into the quadratic term) is required; without it the claim that the tangent quadratic rate is preserved remains conditional on the connection holding exactly at O(‖error‖) distance from the manifold.
[§2.2] §2.2 (Modified Newton equation): The tangent component is obtained from a modified Newton equation that 'incorporates Newton-Schulz.' The precise substitution (whether the normal correction appears inside the linear operator, the right-hand side, or only as a post-correction) is not stated with an equation number; this substitution is load-bearing for both the descent property and the quadratic-rate argument, and its explicit form must be given before the convergence theorem can be verified.

minor comments (2)

[Abstract] Abstract: the phrase 'landing on the Stiefel manifold' is colloquial; replace with 'converging to a feasible point on the Stiefel manifold' for precision.
[Numerical experiments] Numerical section: the tables or figures comparing iteration counts and CPU time should include the exact stopping tolerance and the number of random initializations used, to allow direct reproduction of the reported advantage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which have helped us improve the clarity and rigor of the manuscript. We address each major comment point by point below. Revisions have been made to strengthen the convergence analysis and clarify the algorithmic construction.

read point-by-point responses

Referee: [§3] §3 (Convergence analysis): The proof of quadratic convergence for the composite map assumes that the Newton-Schulz normal correction is O(‖error‖²) and that its inner product with the tangent Newton direction vanishes to first order, so that curvature-induced cross terms do not degrade the rate. The geometric connection is invoked to justify decoupling, but an explicit expansion of the error recurrence that bounds the second-fundamental-form contribution (or shows it is absorbed into the quadratic term) is required; without it the claim that the tangent quadratic rate is preserved remains conditional on the connection holding exactly at O(‖error‖) distance from the manifold.

Authors: We appreciate the referee's emphasis on making the error analysis fully explicit. The geometric connection (Theorem 2.3) establishes that Newton-Schulz iterations remain in the normal bundle, which directly implies that the normal correction is quadratic in the distance to the manifold and orthogonal to the tangent space at leading order. To remove any conditional aspect, the revised Section 3 now contains a detailed Taylor expansion of the composite map F(X) = X + tangent Newton step + Newton-Schulz normal step. This expansion isolates the second-fundamental-form terms arising from the manifold curvature and shows they are O(‖error‖³) or higher, hence absorbed into the quadratic remainder. The revised proof therefore confirms that the quadratic rate of the tangent Newton step is preserved without hidden cancellations. revision: yes
Referee: [§2.2] §2.2 (Modified Newton equation): The tangent component is obtained from a modified Newton equation that 'incorporates Newton-Schulz.' The precise substitution (whether the normal correction appears inside the linear operator, the right-hand side, or only as a post-correction) is not stated with an equation number; this substitution is load-bearing for both the descent property and the quadratic-rate argument, and its explicit form must be given before the convergence theorem can be verified.

Authors: We agree that the precise incorporation of the Newton-Schulz correction into the tangent step must be stated unambiguously. In the revised manuscript we have introduced Equation (2.5), which defines the modified Newton equation as H(X)Δ = −∇f(X) + P_N(X)·(Newton-Schulz correction term), where the normal-space projection of the Newton-Schulz iterate appears on the right-hand side. This choice preserves the symmetry of the Hessian approximation while ensuring compatibility with the normal-bundle geometry used in the convergence proof. The descent property follows from the standard Newton decrease along the tangent direction, and the quadratic-rate argument is unaffected because the added term is already quadratic. revision: yes

Circularity Check

0 steps flagged

Derivation self-contained; no load-bearing step reduces to input by construction

full rationale

The paper constructs its update as the sum of a tangent modified-Newton direction (to reduce the objective) and a normal Newton-Schulz correction (to drive the constraint g(X)=X^T X - I to zero). It establishes the geometric connection that Newton-Schulz moves along the normal bundle within the present manuscript rather than importing it from prior self-citation. The local quadratic-convergence argument then follows from the O(‖error‖²) decay of the normal component and the vanishing first-order inner product with the tangent direction; both properties are derived from the paper's own equations and the established connection, not from a fitted parameter renamed as prediction or a self-referential definition. No quoted step equates the claimed result to its inputs by construction, and external benchmarks (orthogonal Procrustes, PCA, ICA) are used only for illustration, not for fitting the convergence rate.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard properties of the Stiefel manifold as a level set and the convergence behavior of Newton-Schulz iteration for orthogonalization; no free parameters or invented entities are introduced in the abstract.

axioms (2)

standard math Newton-Schulz iteration converges to an orthogonal matrix when applied to a suitable starting point near the Stiefel manifold
Invoked to construct the normal component that reduces infeasibility.
domain assumption The Stiefel manifold can be treated as the level set of a constraint-defining function whose tangent and normal spaces are well-defined
Used to decompose the update into tangent and normal parts.

pith-pipeline@v0.9.0 · 5545 in / 1387 out tokens · 47771 ms · 2026-05-08T18:11:01.779347+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation (J(x) = ½(x+x⁻¹)−1) washburn_uniqueness_aczel — unrelated; NS series is a Taylor truncation of an inverse square root, not the reciprocal cost unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the order-r Newton–Schulz iteration takes the form Y_{k+1} = Y_k q_r(E_k), q_r(S) = Σ (-1)^j (2j)!/(j!)^2 · S^j / 4^j, the degree-r Taylor polynomial approximation of (I_p + S)^{-1/2}.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

111 extracted references · 42 canonical work pages · 1 internal anchor

[2]

2023 , url =

An introduction to optimization on smooth manifolds , author =. 2023 , url =

2023
[3]

Absil , journal=

Bin Gao and Simon Vary and Pierre Ablin and P.-A. Absil , journal=. Optimization flows landing on the. 2022 , publisher=

2022
[4]

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , pages =

Fast and accurate optimization on the orthogonal manifold without retraction , author =. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , pages =. 2022 , volume =

2022
[5]

Journal of Machine Learning Research , volume=

Infeasible deterministic, stochastic, and variance-reduction algorithms for optimization under orthogonality constraints , author=. Journal of Machine Learning Research , volume=. 2024 , url=

2024
[6]

SIAM journal on Matrix Analysis and Applications , volume=

The geometry of algorithms with orthogonality constraints , author=. SIAM journal on Matrix Analysis and Applications , volume=. 1998 , publisher=

1998
[7]

Optimization without Retraction on the Random Generalized

Vary, Simon and Ablin, Pierre and Gao, Bin and Absil, Pierre-Antoine , booktitle =. Optimization without Retraction on the Random Generalized. 2024 , volume =

2024
[8]

Paige, C. C. and Saunders, M. A. , title =. SIAM Journal on Numerical Analysis , volume =. 1975 , doi =

1975
[9]

SIAM Journal on Scientific Computing , volume =

Gao, Bin and Liu, Xin and Yuan, Ya-xiang , title =. SIAM Journal on Scientific Computing , volume =. 2019 , doi =

2019
[10]

and Feppon, Florian

Goyens, Florentin and Absil, P.-A. and Feppon, Florian. Geometric Design of the Tangent Term in Landing Algorithms for Orthogonality Constraints. Geometric Science of Information. 2026

2026
[11]

An Iterative Algorithm for Computing the Best Estimate of an Orthogonal Matrix , journal =

Bj\". An Iterative Algorithm for Computing the Best Estimate of an Orthogonal Matrix , journal =. 1971 , doi =

1971
[12]

SIAM Journal on Numerical Analysis , volume =

Kovarik, Zdislav , title =. SIAM Journal on Numerical Analysis , volume =. 1970 , doi =

1970
[13]

2024 , url =

Keller Jordan and Yuchen Jin and Vlado Boza and You Jiacheng and Franz Cesista and Laker Newhouse and Jeremy Bernstein , title =. 2024 , url =

2024
[15]

Local Linear Convergence of Infeasible Optimization With Orthogonal Constraints , year=

Sun, Youbang and Chen, Shixiang and Garcia, Alfredo and Shahrampour, Shahin , journal=. Local Linear Convergence of Infeasible Optimization With Orthogonal Constraints , year=
[16]

ESAIM: Control, Optimisation and Calculus of Variations , volume=

Null space gradient flows for constrained optimization with applications to shape optimization , author=. ESAIM: Control, Optimisation and Calculus of Variations , volume=. 2020 , publisher=

2020
[17]

, journal=

Muehlebach, Michael and Jordan, Michael I. , journal=. On constraints in first-order optimization:. 2022 , pdf=

2022
[18]

Orthogonal directions constrained gradient method: from non-linear equality constraints to

Schechtman, Sholom and Tiapkin, Daniil and Muehlebach, Michael and Moulines,. Orthogonal directions constrained gradient method: from non-linear equality constraints to. The Thirty Sixth Annual Conference on Learning Theory , pages=. 2023 , publisher=

2023
[19]

2026 , url=

A Simple First-Order Algorithm for Full-Rank Equality Constrained Optimization , author=. 2026 , url=

2026
[20]

2025 , url=

Adaptive directional decomposition methods for nonconvex constrained optimization , author=. 2025 , url=

2025
[21]

and Steihaug, Trond , journal=

Dembo, Ron S. and Steihaug, Trond , journal=. Truncated-. 1983 , publisher=

1983
[22]

Journal of educational psychology , volume=

Analysis of a complex of statistical variables into principal components , author=. Journal of educational psychology , volume=. 1933 , publisher=

1933
[24]

Natural image statistics: A probabilistic approach to early computational vision , pages=

Independent component analysis , author=. Natural image statistics: A probabilistic approach to early computational vision , pages=. 2001 , publisher=

2001
[25]

A. Hyv. Independent component analysis: algorithms and applications , journal =. 2000 , doi =

2000
[26]

Ng, A. Y. and Jordan, M. I. and Weiss, Yair , booktitle =. On Spectral Clustering: Analysis and an algorithm , url =
[27]

Statistics and Computing , volume=

A tutorial on spectral clustering , author=. Statistics and Computing , volume=. 2007 , publisher=

2007
[28]

A generalized solution of the orthogonal

Sch. A generalized solution of the orthogonal. Psychometrika , volume=. 1966 , publisher=

1966
[29]

and Elad, M

Aharon, M. and Elad, M. and Bruckstein, A. , journal=. 2006 , volume=

2006
[30]

Journal of Machine Learning Research , year =

Julien Mairal and Francis Bach and Jean Ponce and Guillermo Sapiro , title =. Journal of Machine Learning Research , year =
[31]

Proceedings of The 33rd International Conference on Machine Learning , pages =

Unitary Evolution Recurrent Neural Networks , author =. Proceedings of The 33rd International Conference on Machine Learning , pages =. 2016 , volume =

2016
[32]

Parseval networks:

Cisse, Moustapha and Bojanowski, Piotr and Grave, Edouard and Dauphin, Yann and Usunier, Nicolas , booktitle =. Parseval networks:. 2017 , publisher=

2017
[33]

Can We Gain More from Orthogonality Regularizations in Training Deep Networks? , url =

Bansal, Nitin and Chen, Xiaohan and Wang, Zhangyang , booktitle =. Can We Gain More from Orthogonality Regularizations in Training Deep Networks? , url =
[34]

and Baker, Christopher G

Absil, P.-A. and Baker, Christopher G. and Gallivan, Kyle A. , journal=. Trust-region methods on. 2007 , publisher=

2007
[35]

Huang, Wen and Gallivan, K. A. and Absil, P.-A. , journal =. A. 2015 , doi =

2015
[36]

IEEE Transactions on Automatic Control , title=

Bonnabel, Silv. IEEE Transactions on Automatic Control , title=. 2013 , volume=

2013
[37]

Reddi, Sashank and Sra, Suvrit , booktitle =

Zhang, Hongyi and J. Reddi, Sashank and Sra, Suvrit , booktitle =. Riemannian
[38]

International Conference on Learning Representations , year=

Riemannian Adaptive Optimization Methods , author=. International Conference on Learning Representations , year=
[39]

Hongyi Zhang and Suvrit Sra , year=. Towards. 1806.02812 , archivePrefix=

work page arXiv
[40]

Mathematical Programming , volume=

A feasible method for optimization with orthogonality constraints , author=. Mathematical Programming , volume=. 2013 , publisher=

2013
[41]

Mathematical Programming , volume=

An exact penalty function for nonlinear programming with inequalities , author=. Mathematical Programming , volume=. 1973 , publisher=

1973
[42]

Optimization Methods and Software , volume =

Xiao, Nachuan and Liu, Xin and Yuan, Ya-xiang , title =. Optimization Methods and Software , volume =. 2022 , publisher =

2022
[43]

SIAM Journal on Optimization , volume =

Xiao, Nachuan and Liu, Xin and Yuan, Ya-xiang , title =. SIAM Journal on Optimization , volume =. 2021 , doi =

2021
[44]

2024 , journal =

Nachuan Xiao and Xin Liu , title =. 2024 , journal =

2024
[45]

Dissolving constraints for

Xiao, Nachuan and Liu, Xin and Toh, Kim-Chuan , journal=. Dissolving constraints for. 2023 , publisher=

2023
[46]

Haiyang Peng and Deren Han and Xin Chen and Meng Huang , year=
[47]

A penalty-free infeasible approach for a class of nonsmooth optimization problems over the

Liu, Xin and Xiao, Nachuan and Yuan, Ya-xiang , journal=. A penalty-free infeasible approach for a class of nonsmooth optimization problems over the. 2024 , publisher=

2024
[48]

Distributed Retraction-Free and Communication-Efficient Optimization on the

Song, Yilong and Li, Peijin and Gao, Bin and Yuan, Kun , booktitle =. Distributed Retraction-Free and Communication-Efficient Optimization on the. 2025 , volume =

2025
[49]

Goyens, Florentin and Feppon, Florian , year=. The
[50]

van der Vorst, H. A. , title =. SIAM Journal on Scientific and Statistical Computing , volume =. 1992 , doi =

1992
[51]

2026 , url=

An Embarrassingly Simple Way to Optimize Orthogonal Matrices at Scale , author=. 2026 , url=

2026
[52]

SIAM Journal on Scientific Computing , volume =

Gao, Bin and Hu, Guanghui and Kuang, Yang and Liu, Xin , title =. SIAM Journal on Scientific Computing , volume =. 2022 , doi =

2022
[53]

Journal of the Operations Research Society of China , volume=

A Brief Introduction to Manifold Optimization , author=. Journal of the Operations Research Society of China , volume=. 2020 , publisher=

2020
[54]

Decentralized Optimization Over the

Wang, Lei and Liu, Xin , journal=. Decentralized Optimization Over the. 2022 , volume=

2022
[55]

Sun, Youbang and Chen, Shixiang and Garcia, Alfredo and Shahrampour, Shahin , title =
[56]

Riemannian

Sato, Hiroyuki , volume=. Riemannian. 2021 , publisher=

2021
[58]

IEEE Transactions on Neural Networks , title=

Hyv. IEEE Transactions on Neural Networks , title=. 1999 , volume=

1999
[59]

and Walker, Homer F

Eisenstat, Stanley C. and Walker, Homer F. , title =. SIAM Journal on Scientific Computing , volume =. 1996 , doi =

1996
[60]

, title =

Kenney, Charles and Laub, Alan J. , title =. SIAM Journal on Matrix Analysis and Applications , volume =. 1991 , doi =

1991
[61]

2026 , url =

A unified landing framework for equality-constrained optimization , author=. 2026 , url =

2026
[62]

Baker, A. H. and Jessup, E. R. and Manteuffel, T. , title =. SIAM Journal on Matrix Analysis and Applications , volume =. 2005 , doi =

2005
[63]

SIAM Journal on Optimization , volume =

Lalee, Marucha and Nocedal, Jorge and Plantenga, Todd , title =. SIAM Journal on Optimization , volume =. 1998 , doi =

1998
[64]

The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science , volume =

Karl Pearson , title =. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science , volume =. 1901 , publisher =

1901
[65]

, series =

Numerical Optimization , author =. 2006 , publisher =. doi:10.1007/978-0-387-40065-5 , isbn =

work page doi:10.1007/978-0-387-40065-5 2006
[66]

Ablin and G

P. Ablin and G. Peyr\'e , Fast and accurate optimization on the orthogonal manifold without retraction , in Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, vol. 151 of Proceedings of Machine Learning Research, PMLR, 28--30 Mar 2022, pp. 5636--5657, https://proceedings.mlr.press/v151/ablin22a.html

2022
[67]

Ablin, S

P. Ablin, S. Vary, B. Gao, and P.-A. Absil , Infeasible deterministic, stochastic, and variance-reduction algorithms for optimization under orthogonality constraints , Journal of Machine Learning Research, 25 (2024), pp. 1--38, http://jmlr.org/papers/v25/23-0451.html

2024
[68]

Absil, C

P.-A. Absil, C. G. Baker, and K. A. Gallivan , Trust-region methods on R iemannian manifolds , Foundations of Computational Mathematics, 7 (2007), pp. 303--330, https://doi.org/10.1007/s10208-005-0179-9

work page doi:10.1007/s10208-005-0179-9 2007
[69]

Absil, R

P.-A. Absil, R. Mahony, and R. Sepulchre , Optimization Algorithms on Matrix Manifolds , Princeton University Press, Princeton, 2009, https://doi.org/10.1515/9781400830244

work page doi:10.1515/9781400830244 2009
[70]

A. H. Baker, E. R. Jessup, and T. Manteuffel , A technique for accelerating the convergence of restarted GMRES , SIAM Journal on Matrix Analysis and Applications, 26 (2005), pp. 962--984, https://doi.org/10.1137/S0895479803422014

work page doi:10.1137/s0895479803422014 2005
[71]

Bansal, X

N. Bansal, X. Chen, and Z. Wang , Can we gain more from orthogonality regularizations in training deep networks? , 31 (2018), https://proceedings.neurips.cc/paper_files/paper/2018/file/bf424cb7b0dea050a42b9739eb261a3a-Paper.pdf

2018
[72]

Bj\" o rck and C

r. Bj\" o rck and C. Bowie , An iterative algorithm for computing the best estimate of an orthogonal matrix , SIAM Journal on Numerical Analysis, 8 (1971), pp. 358--364, https://doi.org/10.1137/0708036

work page doi:10.1137/0708036 1971
[73]

Cambridge University Press

N. Boumal , An introduction to optimization on smooth manifolds , Cambridge University Press, 2023, https://doi.org/10.1017/9781009166164, https://www.nicolasboumal.net/book

work page doi:10.1017/9781009166164 2023
[74]

Cisse, P

M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier , Parseval networks: I mproving robustness to adversarial examples , in Proceedings of the 34th International Conference on Machine Learning, PMLR, 2017, pp. 854--863, https://proceedings.mlr.press/v70/cisse17a.html

2017
[75]

Arias, and Steven T

A. Edelman, T. A. Arias, and S. T. Smith , The geometry of algorithms with orthogonality constraints , SIAM journal on Matrix Analysis and Applications, 20 (1998), pp. 303--353, https://doi.org/10.1137/S0895479895290954

work page doi:10.1137/s0895479895290954 1998
[76]

S. C. Eisenstat and H. F. Walker , Choosing the forcing terms in an inexact newton method , SIAM Journal on Scientific Computing, 17 (1996), pp. 16--32, https://doi.org/10.1137/0917003

work page doi:10.1137/0917003 1996
[77]

Fletcher , An exact penalty function for nonlinear programming with inequalities , Mathematical Programming, 5 (1973), pp

R. Fletcher , An exact penalty function for nonlinear programming with inequalities , Mathematical Programming, 5 (1973), pp. 129--150, https://doi.org/10.1007/BF01580117

work page doi:10.1007/bf01580117 1973
[78]

B. Gao, G. Hu, Y. Kuang, and X. Liu , An orthogonalization-free parallelizable framework for all-electron calculations in density functional theory , SIAM Journal on Scientific Computing, 44 (2022), pp. B723--B745, https://doi.org/10.1137/20M1355884

work page doi:10.1137/20m1355884 2022
[79]

B. Gao, X. Liu, and Y.-x. Yuan , Parallelizable algorithms for optimization problems with orthogonality constraints , SIAM Journal on Scientific Computing, 41 (2019), pp. A1949--A1983, https://doi.org/10.1137/18M1221679

work page doi:10.1137/18m1221679 2019
[80]

B. Gao, S. Vary, P. Ablin, and P.-A. Absil , Optimization flows landing on the S tiefel manifold , IFAC-PapersOnLine, 55 (2022), pp. 25--30, https://doi.org/10.1016/j.ifacol.2022.11.023. 25th International Symposium on Mathematical Theory of Networks and Systems MTNS 2022

work page doi:10.1016/j.ifacol.2022.11.023 2022
[81]

Goyens, P.-A

F. Goyens, P.-A. Absil, and F. Feppon , Geometric design of the tangent term in landing algorithms for orthogonality constraints , in Geometric Science of Information, Springer Nature Switzerland, 2026, pp. 133--141, https://doi.org/10.1007/978-3-032-03924-8_14

work page doi:10.1007/978-3-032-03924-8_14 2026
[82]

Goyens, A

F. Goyens, A. Eftekhari, and N. Boumal , Computing second-order points under equality constraints: revisiting F letcher's augmented L agrangian , Journal of Optimization Theory and Applications, 201 (2024), pp. 1198--1228, https://doi.org/10.1007/s10957-024-02421-6

work page doi:10.1007/s10957-024-02421-6 2024
[83]

Goyens and F

F. Goyens and F. Feppon , The R iemannian landing method: from projected gradient flows to SQP , (2026), https://hal.science/hal-05460164

2026
[84]

Gratton and P

S. Gratton and P. L. Toint , A simple first-order algorithm for full-rank equality constrained optimization , (2026), https://arxiv.org/abs/2510.16390

work page arXiv 2026

Showing first 80 references.