arxiv: 2604.26484 · v1 · submitted 2026-04-29 · 🧮 math.OC

Recognition: unknown

Improved Penalty Function Approaches for Optimization Problems with General Orthogonality

Yongshen Zhang , Xin Liu , Nachuan Xiao , Chunming Tang

Authors on Pith no claims yet

Pith reviewed 2026-05-07 13:15 UTC · model grok-4.3

classification 🧮 math.OC

keywords generalized orthogonal constraintsconstraint dissolving penaltystationary points equivalenceStiefel manifoldRiemannian optimizationunconstrained reformulationfirst-order methods

0 comments

The pith

A constraint-dissolving penalty function preserves stationary points for generalized orthogonal optimization problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a penalty function that removes the need to enforce orthogonality constraints explicitly during optimization. It proves that the first-order and second-order stationary points of this unconstrained penalty function match those of the original constrained problem exactly. Because the penalty version can be minimized with standard first-order methods, the per-iteration cost drops compared with Riemannian methods that must project or retract onto the manifold at every step. The approach covers a broad family of problems whose feasible sets include the Stiefel manifold and its symplectic and indefinite variants. Experiments show the resulting unconstrained solvers run faster in practice.

Core claim

We define the generalized orthogonal constraint dissolving function (GOCDF) for problems that minimize f(X) subject to X lying in a linear subspace F and satisfying the quadratic orthogonality condition X^T ϕ(X) = I_p. We prove that every first-order stationary point of GOCDF is a first-order stationary point of the original GOOCP and vice versa; the same equivalence holds for second-order stationary points. We also show that the arithmetic cost of a first-order step on GOCDF is strictly lower than the cost of the corresponding Riemannian gradient step on the embedded submanifold.

What carries the argument

The generalized orthogonal constraint dissolving function (GOCDF), an unconstrained penalty that replaces the quadratic orthogonality constraint while preserving the location and type of stationary points.

If this is right

First- and second-order stationary points coincide between the penalty function and the original constrained problem.
Gradient or proximal-gradient steps on the penalty function cost less arithmetic work than Riemannian gradient steps on the manifold.
Standard unconstrained solvers can be applied directly and still locate the same critical points.
The method covers every manifold whose defining constraint matches the given subspace-plus-quadratic form, including all listed Stiefel-type examples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Implementations become simpler because no manifold projection or retraction code is required.
The same penalty construction may apply to other quadratic matrix constraints once the embedded-submanifold property is verified.
Hybrid schemes that start with the penalty function and switch to a Riemannian correction only near convergence could combine the speed of the unconstrained phase with the final accuracy of manifold methods.

Load-bearing premise

The set of points satisfying both the linear subspace membership and the quadratic orthogonality condition forms a closed embedded submanifold of R^{n x p}.

What would settle it

Exhibit a point that is first-order stationary for the penalty function but fails the first-order stationarity condition of the original constrained problem.

Figures

Figures reproduced from arXiv: 2604.26484 by Chunming Tang, Nachuan Xiao, Xin Liu, Yongshen Zhang.

**Figure 1.** Figure 1: Computational time comparison for CDFGD, CDFCG, R view at source ↗

**Figure 2.** Figure 2: (n, p) = (1000, 10). Comparison of initial and final errors of the CDFCG Method in solving the extrinsic mean problem for indefinite Stiefel manifold. A total of 100 randomly generated samples were taken. The blue bars represent the error between the initial iterate and each sample, while the orange bars depict the error between the convergence point and each sample. 23 view at source ↗

**Figure 3.** Figure 3: Computational time comparison for CDFGD, CDFCG, R view at source ↗

read the original abstract

In this paper, we consider a class of generalized orthogonal optimization constraint problems (GOOCP) over $\mathbb{R}^{n \times p}$, where the variable $X$ is restricted within the intersection of a certain subspace $\mathcal{F}$ and satisfies the quadratic constraint $\{X \in \mathbb{R}^{n \times p}: X^{\top} \phi(X) = I_p\}$. Such constraints generalize a wide range of structured matrix manifolds, such as the Stiefel manifold, the symplectic Stiefel manifold, the indefinite Stiefel manifold, the third-order tensor Stiefel manifold, etc. We show that the feasible region of GOOCP is a closed embedded submanifold of $\mathbb{R}^{n \times p}$ and characterize the necessary geometric materials for the existing Riemannian optimization frameworks. Based on the constraint dissolving approach for Riemannian optimization problems, we propose the constraint dissolving penalty function (GOCDF) for the constrained optimization problem GOOCP with easy-to-compute formulations. We further establish the equivalence between GOCDF and GOOCP in the aspects of first-order and second-order stationary points. We also analyze the computational complexity of applying first-order methods to minimize GOOCP, which could be significantly lower than those of first-order Riemannian optimization methods. Numerical experiments demonstrate that solving GOOCP through applying unconstrained optimization methods to minimize constraint dissolving function demonstrates superior efficiency to existing Riemannian optimization methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper unifies several orthogonal manifolds under one penalty framework and claims matching stationary points plus lower complexity, but the manifold regularity step looks like it may only be verified case-by-case rather than for arbitrary phi.

read the letter

The main thing here is a constraint-dissolving penalty (GOCDF) that turns GOOCP into an unconstrained problem while preserving first- and second-order stationary points. They define the general setting as X in some subspace F with X^T phi(X) = I_p, list the usual examples (Stiefel, symplectic, indefinite, tensor), and say the feasible set is a closed embedded submanifold so Riemannian tools apply. Then they show equivalence and argue that plain first-order methods on the penalty are cheaper than Riemannian first-order methods. Numerical tests are mentioned as showing better efficiency.

Referee Report

2 major / 2 minor

Summary. The paper introduces generalized orthogonal optimization constraint problems (GOOCP) over R^{n x p} where X lies in a subspace F and satisfies the quadratic constraint X^T phi(X) = I_p, generalizing manifolds such as Stiefel, symplectic Stiefel, indefinite Stiefel, and tensor Stiefel. It claims that the feasible set is always a closed embedded submanifold of R^{n x p}, develops the associated Riemannian geometry, proposes a constraint-dissolving penalty function (GOCDF) with explicit formulations, proves equivalence of first- and second-order stationary points between GOCDF and GOOCP, and shows that first-order methods applied to the penalty have significantly lower complexity than Riemannian first-order methods. Numerical experiments are presented to demonstrate efficiency advantages.

Significance. If the manifold property and equivalence results hold in the stated generality, the work provides a unified, implementable penalty framework that avoids manifold-specific retractions and projections for a broad class of structured matrix problems, potentially enabling simpler and faster first-order solvers while preserving stationarity guarantees. The complexity comparison, if rigorously established, would be a notable practical contribution over existing Riemannian toolboxes.

major comments (2)

[Abstract / feasible-region characterization] Abstract and the section characterizing the feasible region: the claim that {X in F : X^T phi(X) = I_p} is always a closed embedded submanifold of R^{n x p} (required for the Riemannian geometry and for the equivalence of stationary points) rests on the regular level-set theorem, i.e., surjectivity of Dg(X) where g(X) = X^T phi(X) - I_p at every feasible point. No explicit general condition on phi (differentiability plus a uniform rank bound on Dg) is stated that would guarantee this for arbitrary phi and F; the abstract lists examples but does not derive the rank condition from the general definition. If the proof proceeds only by case-by-case verification, the manifold property fails to hold in the claimed generality and the equivalence plus complexity results cannot be established for the full class.
[Equivalence of stationary points] The equivalence theorem for first- and second-order stationary points between GOCDF and GOOCP: this equivalence is derived under the embedded-submanifold assumption. Because the surjectivity of Dg is not shown to hold uniformly, the stationary-point correspondence may require additional hypotheses on phi that are not stated, undermining the claim that GOCDF can be minimized by unconstrained first-order methods while retaining the same critical points as the original constrained problem.

minor comments (2)

[Introduction / notation] Notation for the map phi and the subspace F should be introduced with explicit differentiability assumptions and an example of how the general definition specializes to the symplectic case.
[Complexity analysis] The complexity analysis section should include a precise operation-count comparison (e.g., per-iteration cost of the penalty gradient versus a Riemannian retraction) rather than a qualitative statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the constructive major comments. We respond to each point below and indicate the revisions that will be incorporated.

read point-by-point responses

Referee: Abstract and the section characterizing the feasible region: the claim that {X in F : X^T phi(X) = I_p} is always a closed embedded submanifold of R^{n x p} (required for the Riemannian geometry and for the equivalence of stationary points) rests on the regular level-set theorem, i.e., surjectivity of Dg(X) where g(X) = X^T phi(X) - I_p at every feasible point. No explicit general condition on phi (differentiability plus a uniform rank bound on Dg) is stated that would guarantee this for arbitrary phi and F; the abstract lists examples but does not derive the rank condition from the general definition. If the proof proceeds only by case-by-case verification, the manifold property fails to hold in the claimed generality and the equivalence plus complexity results cannot be established for the full class.

Authors: We agree that the manuscript would benefit from an explicit statement of the conditions on phi that ensure the surjectivity of Dg. Although the GOOCP is defined for phi corresponding to generalized orthogonality constraints (not arbitrary phi), and the manifold property is established via the regular level set theorem in the paper, we did not isolate the rank condition as a standing hypothesis. In the revised manuscript, we will add to the abstract and the feasible region section the assumption that phi is continuously differentiable and Dg(X) has full rank at feasible X. We will also provide a general argument or verification that this holds for the class of problems considered, including the examples. This addresses the concern while preserving the generality within the intended scope. revision: yes
Referee: The equivalence theorem for first- and second-order stationary points between GOCDF and GOOCP: this equivalence is derived under the embedded-submanifold assumption. Because the surjectivity of Dg is not shown to hold uniformly, the stationary-point correspondence may require additional hypotheses on phi that are not stated, undermining the claim that GOCDF can be minimized by unconstrained first-order methods while retaining the same critical points as the original constrained problem.

Authors: The equivalence results are indeed based on the embedded submanifold property of the feasible set. By incorporating the explicit rank condition on Dg as described in our response to the first comment, the theorems on first- and second-order stationary point equivalence will be valid under the stated hypotheses. We will revise the theorem statements to include these assumptions on phi. This ensures that unconstrained first-order methods applied to the GOCDF retain the stationarity guarantees for the original GOOCP within the class of problems where the manifold property holds. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained with independent proofs

full rationale

The paper defines GOOCP via the intersection of subspace F and the quadratic constraint X^T phi(X) = I_p for general phi, then states it shows the feasible set is a closed embedded submanifold of R^{n x p} (abstract). It characterizes the necessary Riemannian geometry, constructs the penalty GOCDF from the prior constraint-dissolving framework, and establishes equivalence of first- and second-order stationary points between GOCDF and GOOCP. The complexity comparison follows directly from GOCDF being unconstrained. No quoted step reduces any claimed result (equivalence, manifold property, or complexity bound) to a fitted parameter, self-definition, or unverified self-citation chain; the central equivalence is presented as a derived theorem rather than an input. Self-citation of the base dissolving approach, if present, is not load-bearing for the new general-phi results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the manifold property of the feasible set and the validity of the constraint-dissolving construction for the general quadratic constraint.

axioms (1)

domain assumption The set defined by the intersection of subspace F and the quadratic constraint X^T phi(X) = I_p forms a closed embedded submanifold of R^{n x p}.
Explicitly stated as shown in the paper; required for applying Riemannian geometry and for the penalty equivalence.

pith-pipeline@v0.9.0 · 5559 in / 1181 out tokens · 58337 ms · 2026-05-07T13:15:56.766730+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 3 canonical work pages

[1]

B. Gao, G. Hu, Y . Kuang, X. Liu, An orthogonalization-free parallelizable framework for all-electron calculations in density functional theory, SIAM Journal on Scientiﬁc Computing 44 (3) (2022) B723–B745

2022
[2]

X. Liu, Z. Wen, X. Wang, M. Ulbrich, Y . Y uan, On the analysi s of the discretized kohn–sham density functional theory, S IAM Journal on Numerical Analysis 53 (4) (2015) 1758–1785

2015
[3]

FISCHLER AND, Random sample consensus: a paradigm for model ﬁtting with applications to image analysis and automa ted cartogra- phy, Commun

M. FISCHLER AND, Random sample consensus: a paradigm for model ﬁtting with applications to image analysis and automa ted cartogra- phy, Commun. ACM 24 (6) (1981) 381–395

1981
[4]

H. Zou, T. Hastie, R. Tibshirani, Sparse principal compo nent analysis, Journal of computational and graphical stat istics 15 (2) (2006) 265– 286

2006
[5]

Huang, X

L. Huang, X. Liu, B. Lang, A. Y u, Y . Wang, B. Li, Orthogonal weight normalization: Solution to optimization over multi ple dependent stiefel manifolds in deep neural networks, in: Proceedings of the AA AI Conference on Artiﬁcial Intelligence, V ol. 32, 2018

2018
[6]

Mackey, V

L. Mackey, V . Syrgkanis, I. Zadik, Orthogonal machine le arning: Power and limitations, in: International Conferen ce on Machine Learning, PMLR, 2018, pp. 3375–3383

2018
[7]

J. A. Tropp, I. S. Dhillon, R. W. Heath, T. Strohmer, Desig ning structured tight frames via an alternating projection method, IEEE Transactions on information theory 51 (1) (2005) 188–209

2005
[8]

Absil, R

P .-A. Absil, R. Mahony, R. Sepulchre, Optimization Algo rithms on Matrix Manifolds (2008)

2008
[9]

Boumal, An introduction to optimization on smooth man ifolds, Cambridge University Press, 2023

N. Boumal, An introduction to optimization on smooth man ifolds, Cambridge University Press, 2023

2023
[10]

H. Sato, K. Aihara, Cholesky qr-based retraction on the generalized stiefel manifold, Computational Optimization and Applications 72 (2019) 293–308

2019
[11]

Shustin, H

B. Shustin, H. Avron, Riemannian optimization with a pr econditioning scheme on the generalized stiefel manifold, Journal of Computational and Applied Mathematics 423 (2023) 114953

2023
[12]

B. Gao, N. T. Son, P .-A. Absil, T. Stykel, Riemannian opt imization on the symplectic stiefel manifold, SIAM Journal on Optimization 31 (2) (2021) 1546–1575

2021
[13]

B. Gao, N. T. Son, P .-A. Absil, T. Stykel, Geometry of the symplectic stiefel manifold endowed with the euclidean met ric, in: Geometric Science of Information: 5th International Conference, GSI 2021, Paris, France, July 21–23, 2021, Proceedings 5, Sprin ger, 2021, pp. 789– 796

2021
[14]

V an Tiep, N

D. V an Tiep, N. T. Son, A riemannian optimization method on the indeﬁnite stiefel manifold, arXiv preprint arXiv:24 10.22068 (2024)

2024
[15]

Bai, R.-C

Z. Bai, R.-C. Li, Minimization principles and computat ion for the generalized linear response eigenvalue problem , BIT Numerical Mathe- matics 54 (1) (2014) 31–54. 28

2014
[16]

X.-P . Mao, Y . Wang, Y .-N. Y ang, Computation over t-product based tensor stiefel manifold: A preliminary study, Jou rnal of the Operations Research Society of China (2024) 1–49

2024
[17]

Kernfeld, M

E. Kernfeld, M. Kilmer, S. Aeron, Tensor–tensor produc ts with invertible linear transforms, Linear Algebra and it s Applications 485 (2015) 545–570

2015
[18]

C. Qi, K. A. Gallivan, P .-A. Absil, Riemannian bfgs algo rithm with applications, in: Recent Advances in Optimizati on and its Applications in Engineering: The 14th Belgian-French-German Conferenc e on Optimization, Springer, 2010, pp. 183–192

2010
[19]

T. E. Abrudan, J. Eriksson, V . Koivunen, Steepest desce nt algorithms for optimization under unitary matrix constr aint, IEEE Transactions on Signal Processing 56 (3) (2008) 1134–1147

2008
[20]

Sato, Riemannian conjugate gradient methods: Gener al framework and speciﬁc algorithms with convergence analy ses, SIAM Journal on Optimization 32 (4) (2022) 2690–2717

H. Sato, Riemannian conjugate gradient methods: Gener al framework and speciﬁc algorithms with convergence analy ses, SIAM Journal on Optimization 32 (4) (2022) 2690–2717

2022
[21]

C. Tang, W. Tan, S. Xing, H. Zheng, A class of spectral con jugate gradient methods for riemannian optimization, Nume rical Algorithms 94 (1) (2023) 131–147

2023
[22]

Zhu, A riemannian conjugate gradient method for opti mization on the stiefel manifold, Computational optimizat ion and Applications 67 (2017) 73–110

X. Zhu, A riemannian conjugate gradient method for opti mization on the stiefel manifold, Computational optimizat ion and Applications 67 (2017) 73–110

2017
[23]

A. Han, B. Mishra, P . Jawanpuria, J. Gao, Riemannian acc elerated gradient methods via extrapolation, in: Internat ional Conference on Artiﬁcial Intelligence and Statistics, PMLR, 2023, pp. 155 4–1585

2023
[24]

Y . Liu, F. Shang, J. Cheng, H. Cheng, L. Jiao, Accelerate d ﬁrst-order methods for geodesically convex optimization on riemannian manifolds, Advances in Neural Information Processing Systems 30 (2017 )

2017
[25]

J. W. Siegel, Accelerated optimization with orthogona lity constraints, arXiv preprint arXiv:1903.05204 (2019)

work page arXiv 1903
[26]

Zhang, S

H. Zhang, S. Sra, Towards riemannian accelerated gradi ent methods, arXiv preprint arXiv:1806.02812 (2018)

work page arXiv 2018
[27]

N. Xiao, X. Liu, K.-C. Toh, Dissolving constraints for r iemannian optimization, Mathematics of Operations Resear ch 49 (1) (2024) 366–397

2024
[28]

Jiang, N

L. Jiang, N. Xiao, X. Liu, A smooth locally exact penalty method for optimization problems over generalized stiefel manifolds, arXiv preprint arXiv:2602.05631 (2026)

work page arXiv 2026
[29]

T. G. Kolda, B. W. Bader, Tensor decompositions and appl ications, SIAM review 51 (3) (2009) 455–500

2009
[30]

N. Xiao, X. Hu, X. Liu, K.-C. Toh, Cdopt: a python package for a class of riemannian optimization, Mathematical Programming Computation (2025) 1–58

2025
[31]

Virtanen, R

P . Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T . Reddy, D. Cournapeau, E. Burovski, P . Peterson, W. Weckesser, J. Bright, et al., Scipy 1.0: fundamental algorithms for scientiﬁc computing in python, Nature methods 17 (3) (2020) 261–272

2020
[32]

Nocedal, S

J. Nocedal, S. J. Wright, Numerical optimization, Spri nger, 1999

1999
[33]

R. H. Byrd, P . Lu, J. Nocedal, C. Zhu, A limited memory alg orithm for bound constrained optimization, SIAM Journal on scientiﬁc computing 16 (5) (1995) 1190–1208

1995
[34]

Fletcher, On the barzilai-borwein method, in: Optim ization and control with applications, Springer, 2005, pp

R. Fletcher, On the barzilai-borwein method, in: Optim ization and control with applications, Springer, 2005, pp. 235–256

2005
[35]

Townsend, N

J. Townsend, N. Koep, S. Weichwald, Pymanopt: A python t oolbox for optimization on manifolds using automatic di ﬀerentiation, Journal of Machine Learning Research 17 (137) (2016) 1–5

2016
[36]

Iannazzo, M

B. Iannazzo, M. Porcelli, The riemannian barzilai–bor wein method with nonmonotone line search and the matrix geom etric mean computa- tion, IMA Journal of Numerical Analysis 38 (1) (2018) 495–51 7

2018
[37]

Z. Wen, W. Yin, A feasible method for optimization with o rthogonality constraints, Mathematical Programming 142 ( 1) (2013) 397–434

2013
[38]

D. C. Sorensen, Y . Zhou, Direct methods for matrix sylve ster and lyapunov equations, Journal of Applied Mathematic s 2003 (6) (2003) 277–303

2003
[39]

M. E. Kilmer, K. Braman, N. Hao, R. C. Hoover, Third-orde r tensors as operators on matrices: A theoretical and comput ational framework with applications in imaging, SIAM Journal on Matrix Analys is and Applications 34 (1) (2013) 148–172

2013
[40]

Bhattacharya, V

R. Bhattacharya, V . Patrangenaru, Large sample theory of intrinsic and extrinsic sample means on manifolds, The An nals of Statistics 31 (1) (2003) 1–29. 29

2003