Recognition: unknown
Improved Penalty Function Approaches for Optimization Problems with General Orthogonality
Pith reviewed 2026-05-07 13:15 UTC · model grok-4.3
The pith
A constraint-dissolving penalty function preserves stationary points for generalized orthogonal optimization problems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We define the generalized orthogonal constraint dissolving function (GOCDF) for problems that minimize f(X) subject to X lying in a linear subspace F and satisfying the quadratic orthogonality condition X^T ϕ(X) = I_p. We prove that every first-order stationary point of GOCDF is a first-order stationary point of the original GOOCP and vice versa; the same equivalence holds for second-order stationary points. We also show that the arithmetic cost of a first-order step on GOCDF is strictly lower than the cost of the corresponding Riemannian gradient step on the embedded submanifold.
What carries the argument
The generalized orthogonal constraint dissolving function (GOCDF), an unconstrained penalty that replaces the quadratic orthogonality constraint while preserving the location and type of stationary points.
If this is right
- First- and second-order stationary points coincide between the penalty function and the original constrained problem.
- Gradient or proximal-gradient steps on the penalty function cost less arithmetic work than Riemannian gradient steps on the manifold.
- Standard unconstrained solvers can be applied directly and still locate the same critical points.
- The method covers every manifold whose defining constraint matches the given subspace-plus-quadratic form, including all listed Stiefel-type examples.
Where Pith is reading between the lines
- Implementations become simpler because no manifold projection or retraction code is required.
- The same penalty construction may apply to other quadratic matrix constraints once the embedded-submanifold property is verified.
- Hybrid schemes that start with the penalty function and switch to a Riemannian correction only near convergence could combine the speed of the unconstrained phase with the final accuracy of manifold methods.
Load-bearing premise
The set of points satisfying both the linear subspace membership and the quadratic orthogonality condition forms a closed embedded submanifold of R^{n x p}.
What would settle it
Exhibit a point that is first-order stationary for the penalty function but fails the first-order stationarity condition of the original constrained problem.
Figures
read the original abstract
In this paper, we consider a class of generalized orthogonal optimization constraint problems (GOOCP) over $\mathbb{R}^{n \times p}$, where the variable $X$ is restricted within the intersection of a certain subspace $\mathcal{F}$ and satisfies the quadratic constraint $\{X \in \mathbb{R}^{n \times p}: X^{\top} \phi(X) = I_p\}$. Such constraints generalize a wide range of structured matrix manifolds, such as the Stiefel manifold, the symplectic Stiefel manifold, the indefinite Stiefel manifold, the third-order tensor Stiefel manifold, etc. We show that the feasible region of GOOCP is a closed embedded submanifold of $\mathbb{R}^{n \times p}$ and characterize the necessary geometric materials for the existing Riemannian optimization frameworks. Based on the constraint dissolving approach for Riemannian optimization problems, we propose the constraint dissolving penalty function (GOCDF) for the constrained optimization problem GOOCP with easy-to-compute formulations. We further establish the equivalence between GOCDF and GOOCP in the aspects of first-order and second-order stationary points. We also analyze the computational complexity of applying first-order methods to minimize GOOCP, which could be significantly lower than those of first-order Riemannian optimization methods. Numerical experiments demonstrate that solving GOOCP through applying unconstrained optimization methods to minimize constraint dissolving function demonstrates superior efficiency to existing Riemannian optimization methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces generalized orthogonal optimization constraint problems (GOOCP) over R^{n x p} where X lies in a subspace F and satisfies the quadratic constraint X^T phi(X) = I_p, generalizing manifolds such as Stiefel, symplectic Stiefel, indefinite Stiefel, and tensor Stiefel. It claims that the feasible set is always a closed embedded submanifold of R^{n x p}, develops the associated Riemannian geometry, proposes a constraint-dissolving penalty function (GOCDF) with explicit formulations, proves equivalence of first- and second-order stationary points between GOCDF and GOOCP, and shows that first-order methods applied to the penalty have significantly lower complexity than Riemannian first-order methods. Numerical experiments are presented to demonstrate efficiency advantages.
Significance. If the manifold property and equivalence results hold in the stated generality, the work provides a unified, implementable penalty framework that avoids manifold-specific retractions and projections for a broad class of structured matrix problems, potentially enabling simpler and faster first-order solvers while preserving stationarity guarantees. The complexity comparison, if rigorously established, would be a notable practical contribution over existing Riemannian toolboxes.
major comments (2)
- [Abstract / feasible-region characterization] Abstract and the section characterizing the feasible region: the claim that {X in F : X^T phi(X) = I_p} is always a closed embedded submanifold of R^{n x p} (required for the Riemannian geometry and for the equivalence of stationary points) rests on the regular level-set theorem, i.e., surjectivity of Dg(X) where g(X) = X^T phi(X) - I_p at every feasible point. No explicit general condition on phi (differentiability plus a uniform rank bound on Dg) is stated that would guarantee this for arbitrary phi and F; the abstract lists examples but does not derive the rank condition from the general definition. If the proof proceeds only by case-by-case verification, the manifold property fails to hold in the claimed generality and the equivalence plus complexity results cannot be established for the full class.
- [Equivalence of stationary points] The equivalence theorem for first- and second-order stationary points between GOCDF and GOOCP: this equivalence is derived under the embedded-submanifold assumption. Because the surjectivity of Dg is not shown to hold uniformly, the stationary-point correspondence may require additional hypotheses on phi that are not stated, undermining the claim that GOCDF can be minimized by unconstrained first-order methods while retaining the same critical points as the original constrained problem.
minor comments (2)
- [Introduction / notation] Notation for the map phi and the subspace F should be introduced with explicit differentiability assumptions and an example of how the general definition specializes to the symplectic case.
- [Complexity analysis] The complexity analysis section should include a precise operation-count comparison (e.g., per-iteration cost of the penalty gradient versus a Riemannian retraction) rather than a qualitative statement.
Simulated Author's Rebuttal
We thank the referee for the careful reading of our manuscript and the constructive major comments. We respond to each point below and indicate the revisions that will be incorporated.
read point-by-point responses
-
Referee: Abstract and the section characterizing the feasible region: the claim that {X in F : X^T phi(X) = I_p} is always a closed embedded submanifold of R^{n x p} (required for the Riemannian geometry and for the equivalence of stationary points) rests on the regular level-set theorem, i.e., surjectivity of Dg(X) where g(X) = X^T phi(X) - I_p at every feasible point. No explicit general condition on phi (differentiability plus a uniform rank bound on Dg) is stated that would guarantee this for arbitrary phi and F; the abstract lists examples but does not derive the rank condition from the general definition. If the proof proceeds only by case-by-case verification, the manifold property fails to hold in the claimed generality and the equivalence plus complexity results cannot be established for the full class.
Authors: We agree that the manuscript would benefit from an explicit statement of the conditions on phi that ensure the surjectivity of Dg. Although the GOOCP is defined for phi corresponding to generalized orthogonality constraints (not arbitrary phi), and the manifold property is established via the regular level set theorem in the paper, we did not isolate the rank condition as a standing hypothesis. In the revised manuscript, we will add to the abstract and the feasible region section the assumption that phi is continuously differentiable and Dg(X) has full rank at feasible X. We will also provide a general argument or verification that this holds for the class of problems considered, including the examples. This addresses the concern while preserving the generality within the intended scope. revision: yes
-
Referee: The equivalence theorem for first- and second-order stationary points between GOCDF and GOOCP: this equivalence is derived under the embedded-submanifold assumption. Because the surjectivity of Dg is not shown to hold uniformly, the stationary-point correspondence may require additional hypotheses on phi that are not stated, undermining the claim that GOCDF can be minimized by unconstrained first-order methods while retaining the same critical points as the original constrained problem.
Authors: The equivalence results are indeed based on the embedded submanifold property of the feasible set. By incorporating the explicit rank condition on Dg as described in our response to the first comment, the theorems on first- and second-order stationary point equivalence will be valid under the stated hypotheses. We will revise the theorem statements to include these assumptions on phi. This ensures that unconstrained first-order methods applied to the GOCDF retain the stationarity guarantees for the original GOOCP within the class of problems where the manifold property holds. revision: yes
Circularity Check
No circularity; derivation is self-contained with independent proofs
full rationale
The paper defines GOOCP via the intersection of subspace F and the quadratic constraint X^T phi(X) = I_p for general phi, then states it shows the feasible set is a closed embedded submanifold of R^{n x p} (abstract). It characterizes the necessary Riemannian geometry, constructs the penalty GOCDF from the prior constraint-dissolving framework, and establishes equivalence of first- and second-order stationary points between GOCDF and GOOCP. The complexity comparison follows directly from GOCDF being unconstrained. No quoted step reduces any claimed result (equivalence, manifold property, or complexity bound) to a fitted parameter, self-definition, or unverified self-citation chain; the central equivalence is presented as a derived theorem rather than an input. Self-citation of the base dissolving approach, if present, is not load-bearing for the new general-phi results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The set defined by the intersection of subspace F and the quadratic constraint X^T phi(X) = I_p forms a closed embedded submanifold of R^{n x p}.
Reference graph
Works this paper leans on
-
[1]
B. Gao, G. Hu, Y . Kuang, X. Liu, An orthogonalization-free parallelizable framework for all-electron calculations in density functional theory, SIAM Journal on Scientific Computing 44 (3) (2022) B723–B745
2022
-
[2]
X. Liu, Z. Wen, X. Wang, M. Ulbrich, Y . Y uan, On the analysi s of the discretized kohn–sham density functional theory, S IAM Journal on Numerical Analysis 53 (4) (2015) 1758–1785
2015
-
[3]
FISCHLER AND, Random sample consensus: a paradigm for model fitting with applications to image analysis and automa ted cartogra- phy, Commun
M. FISCHLER AND, Random sample consensus: a paradigm for model fitting with applications to image analysis and automa ted cartogra- phy, Commun. ACM 24 (6) (1981) 381–395
1981
-
[4]
H. Zou, T. Hastie, R. Tibshirani, Sparse principal compo nent analysis, Journal of computational and graphical stat istics 15 (2) (2006) 265– 286
2006
-
[5]
Huang, X
L. Huang, X. Liu, B. Lang, A. Y u, Y . Wang, B. Li, Orthogonal weight normalization: Solution to optimization over multi ple dependent stiefel manifolds in deep neural networks, in: Proceedings of the AA AI Conference on Artificial Intelligence, V ol. 32, 2018
2018
-
[6]
Mackey, V
L. Mackey, V . Syrgkanis, I. Zadik, Orthogonal machine le arning: Power and limitations, in: International Conferen ce on Machine Learning, PMLR, 2018, pp. 3375–3383
2018
-
[7]
J. A. Tropp, I. S. Dhillon, R. W. Heath, T. Strohmer, Desig ning structured tight frames via an alternating projection method, IEEE Transactions on information theory 51 (1) (2005) 188–209
2005
-
[8]
Absil, R
P .-A. Absil, R. Mahony, R. Sepulchre, Optimization Algo rithms on Matrix Manifolds (2008)
2008
-
[9]
Boumal, An introduction to optimization on smooth man ifolds, Cambridge University Press, 2023
N. Boumal, An introduction to optimization on smooth man ifolds, Cambridge University Press, 2023
2023
-
[10]
H. Sato, K. Aihara, Cholesky qr-based retraction on the generalized stiefel manifold, Computational Optimization and Applications 72 (2019) 293–308
2019
-
[11]
Shustin, H
B. Shustin, H. Avron, Riemannian optimization with a pr econditioning scheme on the generalized stiefel manifold, Journal of Computational and Applied Mathematics 423 (2023) 114953
2023
-
[12]
B. Gao, N. T. Son, P .-A. Absil, T. Stykel, Riemannian opt imization on the symplectic stiefel manifold, SIAM Journal on Optimization 31 (2) (2021) 1546–1575
2021
-
[13]
B. Gao, N. T. Son, P .-A. Absil, T. Stykel, Geometry of the symplectic stiefel manifold endowed with the euclidean met ric, in: Geometric Science of Information: 5th International Conference, GSI 2021, Paris, France, July 21–23, 2021, Proceedings 5, Sprin ger, 2021, pp. 789– 796
2021
-
[14]
V an Tiep, N
D. V an Tiep, N. T. Son, A riemannian optimization method on the indefinite stiefel manifold, arXiv preprint arXiv:24 10.22068 (2024)
2024
-
[15]
Bai, R.-C
Z. Bai, R.-C. Li, Minimization principles and computat ion for the generalized linear response eigenvalue problem , BIT Numerical Mathe- matics 54 (1) (2014) 31–54. 28
2014
-
[16]
X.-P . Mao, Y . Wang, Y .-N. Y ang, Computation over t-product based tensor stiefel manifold: A preliminary study, Jou rnal of the Operations Research Society of China (2024) 1–49
2024
-
[17]
Kernfeld, M
E. Kernfeld, M. Kilmer, S. Aeron, Tensor–tensor produc ts with invertible linear transforms, Linear Algebra and it s Applications 485 (2015) 545–570
2015
-
[18]
C. Qi, K. A. Gallivan, P .-A. Absil, Riemannian bfgs algo rithm with applications, in: Recent Advances in Optimizati on and its Applications in Engineering: The 14th Belgian-French-German Conferenc e on Optimization, Springer, 2010, pp. 183–192
2010
-
[19]
T. E. Abrudan, J. Eriksson, V . Koivunen, Steepest desce nt algorithms for optimization under unitary matrix constr aint, IEEE Transactions on Signal Processing 56 (3) (2008) 1134–1147
2008
-
[20]
Sato, Riemannian conjugate gradient methods: Gener al framework and specific algorithms with convergence analy ses, SIAM Journal on Optimization 32 (4) (2022) 2690–2717
H. Sato, Riemannian conjugate gradient methods: Gener al framework and specific algorithms with convergence analy ses, SIAM Journal on Optimization 32 (4) (2022) 2690–2717
2022
-
[21]
C. Tang, W. Tan, S. Xing, H. Zheng, A class of spectral con jugate gradient methods for riemannian optimization, Nume rical Algorithms 94 (1) (2023) 131–147
2023
-
[22]
Zhu, A riemannian conjugate gradient method for opti mization on the stiefel manifold, Computational optimizat ion and Applications 67 (2017) 73–110
X. Zhu, A riemannian conjugate gradient method for opti mization on the stiefel manifold, Computational optimizat ion and Applications 67 (2017) 73–110
2017
-
[23]
A. Han, B. Mishra, P . Jawanpuria, J. Gao, Riemannian acc elerated gradient methods via extrapolation, in: Internat ional Conference on Artificial Intelligence and Statistics, PMLR, 2023, pp. 155 4–1585
2023
-
[24]
Y . Liu, F. Shang, J. Cheng, H. Cheng, L. Jiao, Accelerate d first-order methods for geodesically convex optimization on riemannian manifolds, Advances in Neural Information Processing Systems 30 (2017 )
2017
- [25]
- [26]
-
[27]
N. Xiao, X. Liu, K.-C. Toh, Dissolving constraints for r iemannian optimization, Mathematics of Operations Resear ch 49 (1) (2024) 366–397
2024
- [28]
-
[29]
T. G. Kolda, B. W. Bader, Tensor decompositions and appl ications, SIAM review 51 (3) (2009) 455–500
2009
-
[30]
N. Xiao, X. Hu, X. Liu, K.-C. Toh, Cdopt: a python package for a class of riemannian optimization, Mathematical Programming Computation (2025) 1–58
2025
-
[31]
Virtanen, R
P . Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T . Reddy, D. Cournapeau, E. Burovski, P . Peterson, W. Weckesser, J. Bright, et al., Scipy 1.0: fundamental algorithms for scientific computing in python, Nature methods 17 (3) (2020) 261–272
2020
-
[32]
Nocedal, S
J. Nocedal, S. J. Wright, Numerical optimization, Spri nger, 1999
1999
-
[33]
R. H. Byrd, P . Lu, J. Nocedal, C. Zhu, A limited memory alg orithm for bound constrained optimization, SIAM Journal on scientific computing 16 (5) (1995) 1190–1208
1995
-
[34]
Fletcher, On the barzilai-borwein method, in: Optim ization and control with applications, Springer, 2005, pp
R. Fletcher, On the barzilai-borwein method, in: Optim ization and control with applications, Springer, 2005, pp. 235–256
2005
-
[35]
Townsend, N
J. Townsend, N. Koep, S. Weichwald, Pymanopt: A python t oolbox for optimization on manifolds using automatic di fferentiation, Journal of Machine Learning Research 17 (137) (2016) 1–5
2016
-
[36]
Iannazzo, M
B. Iannazzo, M. Porcelli, The riemannian barzilai–bor wein method with nonmonotone line search and the matrix geom etric mean computa- tion, IMA Journal of Numerical Analysis 38 (1) (2018) 495–51 7
2018
-
[37]
Z. Wen, W. Yin, A feasible method for optimization with o rthogonality constraints, Mathematical Programming 142 ( 1) (2013) 397–434
2013
-
[38]
D. C. Sorensen, Y . Zhou, Direct methods for matrix sylve ster and lyapunov equations, Journal of Applied Mathematic s 2003 (6) (2003) 277–303
2003
-
[39]
M. E. Kilmer, K. Braman, N. Hao, R. C. Hoover, Third-orde r tensors as operators on matrices: A theoretical and comput ational framework with applications in imaging, SIAM Journal on Matrix Analys is and Applications 34 (1) (2013) 148–172
2013
-
[40]
Bhattacharya, V
R. Bhattacharya, V . Patrangenaru, Large sample theory of intrinsic and extrinsic sample means on manifolds, The An nals of Statistics 31 (1) (2003) 1–29. 29
2003
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.