pith. machine review for the scientific record. sign in

arxiv: 2604.26484 · v1 · submitted 2026-04-29 · 🧮 math.OC

Recognition: unknown

Improved Penalty Function Approaches for Optimization Problems with General Orthogonality

Authors on Pith no claims yet

Pith reviewed 2026-05-07 13:15 UTC · model grok-4.3

classification 🧮 math.OC
keywords generalized orthogonal constraintsconstraint dissolving penaltystationary points equivalenceStiefel manifoldRiemannian optimizationunconstrained reformulationfirst-order methods
0
0 comments X

The pith

A constraint-dissolving penalty function preserves stationary points for generalized orthogonal optimization problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a penalty function that removes the need to enforce orthogonality constraints explicitly during optimization. It proves that the first-order and second-order stationary points of this unconstrained penalty function match those of the original constrained problem exactly. Because the penalty version can be minimized with standard first-order methods, the per-iteration cost drops compared with Riemannian methods that must project or retract onto the manifold at every step. The approach covers a broad family of problems whose feasible sets include the Stiefel manifold and its symplectic and indefinite variants. Experiments show the resulting unconstrained solvers run faster in practice.

Core claim

We define the generalized orthogonal constraint dissolving function (GOCDF) for problems that minimize f(X) subject to X lying in a linear subspace F and satisfying the quadratic orthogonality condition X^T ϕ(X) = I_p. We prove that every first-order stationary point of GOCDF is a first-order stationary point of the original GOOCP and vice versa; the same equivalence holds for second-order stationary points. We also show that the arithmetic cost of a first-order step on GOCDF is strictly lower than the cost of the corresponding Riemannian gradient step on the embedded submanifold.

What carries the argument

The generalized orthogonal constraint dissolving function (GOCDF), an unconstrained penalty that replaces the quadratic orthogonality constraint while preserving the location and type of stationary points.

If this is right

  • First- and second-order stationary points coincide between the penalty function and the original constrained problem.
  • Gradient or proximal-gradient steps on the penalty function cost less arithmetic work than Riemannian gradient steps on the manifold.
  • Standard unconstrained solvers can be applied directly and still locate the same critical points.
  • The method covers every manifold whose defining constraint matches the given subspace-plus-quadratic form, including all listed Stiefel-type examples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Implementations become simpler because no manifold projection or retraction code is required.
  • The same penalty construction may apply to other quadratic matrix constraints once the embedded-submanifold property is verified.
  • Hybrid schemes that start with the penalty function and switch to a Riemannian correction only near convergence could combine the speed of the unconstrained phase with the final accuracy of manifold methods.

Load-bearing premise

The set of points satisfying both the linear subspace membership and the quadratic orthogonality condition forms a closed embedded submanifold of R^{n x p}.

What would settle it

Exhibit a point that is first-order stationary for the penalty function but fails the first-order stationarity condition of the original constrained problem.

Figures

Figures reproduced from arXiv: 2604.26484 by Chunming Tang, Nachuan Xiao, Xin Liu, Yongshen Zhang.

Figure 1
Figure 1. Figure 1: Computational time comparison for CDFGD, CDFCG, R view at source ↗
Figure 2
Figure 2. Figure 2: (n, p) = (1000, 10). Comparison of initial and final errors of the CDFCG Method in solving the extrinsic mean problem for indefinite Stiefel manifold. A total of 100 randomly generated samples were taken. The blue bars represent the error between the initial iterate and each sample, while the orange bars depict the error between the convergence point and each sample. 23 view at source ↗
Figure 3
Figure 3. Figure 3: Computational time comparison for CDFGD, CDFCG, R view at source ↗
read the original abstract

In this paper, we consider a class of generalized orthogonal optimization constraint problems (GOOCP) over $\mathbb{R}^{n \times p}$, where the variable $X$ is restricted within the intersection of a certain subspace $\mathcal{F}$ and satisfies the quadratic constraint $\{X \in \mathbb{R}^{n \times p}: X^{\top} \phi(X) = I_p\}$. Such constraints generalize a wide range of structured matrix manifolds, such as the Stiefel manifold, the symplectic Stiefel manifold, the indefinite Stiefel manifold, the third-order tensor Stiefel manifold, etc. We show that the feasible region of GOOCP is a closed embedded submanifold of $\mathbb{R}^{n \times p}$ and characterize the necessary geometric materials for the existing Riemannian optimization frameworks. Based on the constraint dissolving approach for Riemannian optimization problems, we propose the constraint dissolving penalty function (GOCDF) for the constrained optimization problem GOOCP with easy-to-compute formulations. We further establish the equivalence between GOCDF and GOOCP in the aspects of first-order and second-order stationary points. We also analyze the computational complexity of applying first-order methods to minimize GOOCP, which could be significantly lower than those of first-order Riemannian optimization methods. Numerical experiments demonstrate that solving GOOCP through applying unconstrained optimization methods to minimize constraint dissolving function demonstrates superior efficiency to existing Riemannian optimization methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces generalized orthogonal optimization constraint problems (GOOCP) over R^{n x p} where X lies in a subspace F and satisfies the quadratic constraint X^T phi(X) = I_p, generalizing manifolds such as Stiefel, symplectic Stiefel, indefinite Stiefel, and tensor Stiefel. It claims that the feasible set is always a closed embedded submanifold of R^{n x p}, develops the associated Riemannian geometry, proposes a constraint-dissolving penalty function (GOCDF) with explicit formulations, proves equivalence of first- and second-order stationary points between GOCDF and GOOCP, and shows that first-order methods applied to the penalty have significantly lower complexity than Riemannian first-order methods. Numerical experiments are presented to demonstrate efficiency advantages.

Significance. If the manifold property and equivalence results hold in the stated generality, the work provides a unified, implementable penalty framework that avoids manifold-specific retractions and projections for a broad class of structured matrix problems, potentially enabling simpler and faster first-order solvers while preserving stationarity guarantees. The complexity comparison, if rigorously established, would be a notable practical contribution over existing Riemannian toolboxes.

major comments (2)
  1. [Abstract / feasible-region characterization] Abstract and the section characterizing the feasible region: the claim that {X in F : X^T phi(X) = I_p} is always a closed embedded submanifold of R^{n x p} (required for the Riemannian geometry and for the equivalence of stationary points) rests on the regular level-set theorem, i.e., surjectivity of Dg(X) where g(X) = X^T phi(X) - I_p at every feasible point. No explicit general condition on phi (differentiability plus a uniform rank bound on Dg) is stated that would guarantee this for arbitrary phi and F; the abstract lists examples but does not derive the rank condition from the general definition. If the proof proceeds only by case-by-case verification, the manifold property fails to hold in the claimed generality and the equivalence plus complexity results cannot be established for the full class.
  2. [Equivalence of stationary points] The equivalence theorem for first- and second-order stationary points between GOCDF and GOOCP: this equivalence is derived under the embedded-submanifold assumption. Because the surjectivity of Dg is not shown to hold uniformly, the stationary-point correspondence may require additional hypotheses on phi that are not stated, undermining the claim that GOCDF can be minimized by unconstrained first-order methods while retaining the same critical points as the original constrained problem.
minor comments (2)
  1. [Introduction / notation] Notation for the map phi and the subspace F should be introduced with explicit differentiability assumptions and an example of how the general definition specializes to the symplectic case.
  2. [Complexity analysis] The complexity analysis section should include a precise operation-count comparison (e.g., per-iteration cost of the penalty gradient versus a Riemannian retraction) rather than a qualitative statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the constructive major comments. We respond to each point below and indicate the revisions that will be incorporated.

read point-by-point responses
  1. Referee: Abstract and the section characterizing the feasible region: the claim that {X in F : X^T phi(X) = I_p} is always a closed embedded submanifold of R^{n x p} (required for the Riemannian geometry and for the equivalence of stationary points) rests on the regular level-set theorem, i.e., surjectivity of Dg(X) where g(X) = X^T phi(X) - I_p at every feasible point. No explicit general condition on phi (differentiability plus a uniform rank bound on Dg) is stated that would guarantee this for arbitrary phi and F; the abstract lists examples but does not derive the rank condition from the general definition. If the proof proceeds only by case-by-case verification, the manifold property fails to hold in the claimed generality and the equivalence plus complexity results cannot be established for the full class.

    Authors: We agree that the manuscript would benefit from an explicit statement of the conditions on phi that ensure the surjectivity of Dg. Although the GOOCP is defined for phi corresponding to generalized orthogonality constraints (not arbitrary phi), and the manifold property is established via the regular level set theorem in the paper, we did not isolate the rank condition as a standing hypothesis. In the revised manuscript, we will add to the abstract and the feasible region section the assumption that phi is continuously differentiable and Dg(X) has full rank at feasible X. We will also provide a general argument or verification that this holds for the class of problems considered, including the examples. This addresses the concern while preserving the generality within the intended scope. revision: yes

  2. Referee: The equivalence theorem for first- and second-order stationary points between GOCDF and GOOCP: this equivalence is derived under the embedded-submanifold assumption. Because the surjectivity of Dg is not shown to hold uniformly, the stationary-point correspondence may require additional hypotheses on phi that are not stated, undermining the claim that GOCDF can be minimized by unconstrained first-order methods while retaining the same critical points as the original constrained problem.

    Authors: The equivalence results are indeed based on the embedded submanifold property of the feasible set. By incorporating the explicit rank condition on Dg as described in our response to the first comment, the theorems on first- and second-order stationary point equivalence will be valid under the stated hypotheses. We will revise the theorem statements to include these assumptions on phi. This ensures that unconstrained first-order methods applied to the GOCDF retain the stationarity guarantees for the original GOOCP within the class of problems where the manifold property holds. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained with independent proofs

full rationale

The paper defines GOOCP via the intersection of subspace F and the quadratic constraint X^T phi(X) = I_p for general phi, then states it shows the feasible set is a closed embedded submanifold of R^{n x p} (abstract). It characterizes the necessary Riemannian geometry, constructs the penalty GOCDF from the prior constraint-dissolving framework, and establishes equivalence of first- and second-order stationary points between GOCDF and GOOCP. The complexity comparison follows directly from GOCDF being unconstrained. No quoted step reduces any claimed result (equivalence, manifold property, or complexity bound) to a fitted parameter, self-definition, or unverified self-citation chain; the central equivalence is presented as a derived theorem rather than an input. Self-citation of the base dissolving approach, if present, is not load-bearing for the new general-phi results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the manifold property of the feasible set and the validity of the constraint-dissolving construction for the general quadratic constraint.

axioms (1)
  • domain assumption The set defined by the intersection of subspace F and the quadratic constraint X^T phi(X) = I_p forms a closed embedded submanifold of R^{n x p}.
    Explicitly stated as shown in the paper; required for applying Riemannian geometry and for the penalty equivalence.

pith-pipeline@v0.9.0 · 5559 in / 1181 out tokens · 58337 ms · 2026-05-07T13:15:56.766730+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 3 canonical work pages

  1. [1]

    B. Gao, G. Hu, Y . Kuang, X. Liu, An orthogonalization-free parallelizable framework for all-electron calculations in density functional theory, SIAM Journal on Scientific Computing 44 (3) (2022) B723–B745

  2. [2]

    X. Liu, Z. Wen, X. Wang, M. Ulbrich, Y . Y uan, On the analysi s of the discretized kohn–sham density functional theory, S IAM Journal on Numerical Analysis 53 (4) (2015) 1758–1785

  3. [3]

    FISCHLER AND, Random sample consensus: a paradigm for model fitting with applications to image analysis and automa ted cartogra- phy, Commun

    M. FISCHLER AND, Random sample consensus: a paradigm for model fitting with applications to image analysis and automa ted cartogra- phy, Commun. ACM 24 (6) (1981) 381–395

  4. [4]

    H. Zou, T. Hastie, R. Tibshirani, Sparse principal compo nent analysis, Journal of computational and graphical stat istics 15 (2) (2006) 265– 286

  5. [5]

    Huang, X

    L. Huang, X. Liu, B. Lang, A. Y u, Y . Wang, B. Li, Orthogonal weight normalization: Solution to optimization over multi ple dependent stiefel manifolds in deep neural networks, in: Proceedings of the AA AI Conference on Artificial Intelligence, V ol. 32, 2018

  6. [6]

    Mackey, V

    L. Mackey, V . Syrgkanis, I. Zadik, Orthogonal machine le arning: Power and limitations, in: International Conferen ce on Machine Learning, PMLR, 2018, pp. 3375–3383

  7. [7]

    J. A. Tropp, I. S. Dhillon, R. W. Heath, T. Strohmer, Desig ning structured tight frames via an alternating projection method, IEEE Transactions on information theory 51 (1) (2005) 188–209

  8. [8]

    Absil, R

    P .-A. Absil, R. Mahony, R. Sepulchre, Optimization Algo rithms on Matrix Manifolds (2008)

  9. [9]

    Boumal, An introduction to optimization on smooth man ifolds, Cambridge University Press, 2023

    N. Boumal, An introduction to optimization on smooth man ifolds, Cambridge University Press, 2023

  10. [10]

    H. Sato, K. Aihara, Cholesky qr-based retraction on the generalized stiefel manifold, Computational Optimization and Applications 72 (2019) 293–308

  11. [11]

    Shustin, H

    B. Shustin, H. Avron, Riemannian optimization with a pr econditioning scheme on the generalized stiefel manifold, Journal of Computational and Applied Mathematics 423 (2023) 114953

  12. [12]

    B. Gao, N. T. Son, P .-A. Absil, T. Stykel, Riemannian opt imization on the symplectic stiefel manifold, SIAM Journal on Optimization 31 (2) (2021) 1546–1575

  13. [13]

    B. Gao, N. T. Son, P .-A. Absil, T. Stykel, Geometry of the symplectic stiefel manifold endowed with the euclidean met ric, in: Geometric Science of Information: 5th International Conference, GSI 2021, Paris, France, July 21–23, 2021, Proceedings 5, Sprin ger, 2021, pp. 789– 796

  14. [14]

    V an Tiep, N

    D. V an Tiep, N. T. Son, A riemannian optimization method on the indefinite stiefel manifold, arXiv preprint arXiv:24 10.22068 (2024)

  15. [15]

    Bai, R.-C

    Z. Bai, R.-C. Li, Minimization principles and computat ion for the generalized linear response eigenvalue problem , BIT Numerical Mathe- matics 54 (1) (2014) 31–54. 28

  16. [16]

    X.-P . Mao, Y . Wang, Y .-N. Y ang, Computation over t-product based tensor stiefel manifold: A preliminary study, Jou rnal of the Operations Research Society of China (2024) 1–49

  17. [17]

    Kernfeld, M

    E. Kernfeld, M. Kilmer, S. Aeron, Tensor–tensor produc ts with invertible linear transforms, Linear Algebra and it s Applications 485 (2015) 545–570

  18. [18]

    C. Qi, K. A. Gallivan, P .-A. Absil, Riemannian bfgs algo rithm with applications, in: Recent Advances in Optimizati on and its Applications in Engineering: The 14th Belgian-French-German Conferenc e on Optimization, Springer, 2010, pp. 183–192

  19. [19]

    T. E. Abrudan, J. Eriksson, V . Koivunen, Steepest desce nt algorithms for optimization under unitary matrix constr aint, IEEE Transactions on Signal Processing 56 (3) (2008) 1134–1147

  20. [20]

    Sato, Riemannian conjugate gradient methods: Gener al framework and specific algorithms with convergence analy ses, SIAM Journal on Optimization 32 (4) (2022) 2690–2717

    H. Sato, Riemannian conjugate gradient methods: Gener al framework and specific algorithms with convergence analy ses, SIAM Journal on Optimization 32 (4) (2022) 2690–2717

  21. [21]

    C. Tang, W. Tan, S. Xing, H. Zheng, A class of spectral con jugate gradient methods for riemannian optimization, Nume rical Algorithms 94 (1) (2023) 131–147

  22. [22]

    Zhu, A riemannian conjugate gradient method for opti mization on the stiefel manifold, Computational optimizat ion and Applications 67 (2017) 73–110

    X. Zhu, A riemannian conjugate gradient method for opti mization on the stiefel manifold, Computational optimizat ion and Applications 67 (2017) 73–110

  23. [23]

    A. Han, B. Mishra, P . Jawanpuria, J. Gao, Riemannian acc elerated gradient methods via extrapolation, in: Internat ional Conference on Artificial Intelligence and Statistics, PMLR, 2023, pp. 155 4–1585

  24. [24]

    Y . Liu, F. Shang, J. Cheng, H. Cheng, L. Jiao, Accelerate d first-order methods for geodesically convex optimization on riemannian manifolds, Advances in Neural Information Processing Systems 30 (2017 )

  25. [25]

    J. W. Siegel, Accelerated optimization with orthogona lity constraints, arXiv preprint arXiv:1903.05204 (2019)

  26. [26]

    Zhang, S

    H. Zhang, S. Sra, Towards riemannian accelerated gradi ent methods, arXiv preprint arXiv:1806.02812 (2018)

  27. [27]

    N. Xiao, X. Liu, K.-C. Toh, Dissolving constraints for r iemannian optimization, Mathematics of Operations Resear ch 49 (1) (2024) 366–397

  28. [28]

    Jiang, N

    L. Jiang, N. Xiao, X. Liu, A smooth locally exact penalty method for optimization problems over generalized stiefel manifolds, arXiv preprint arXiv:2602.05631 (2026)

  29. [29]

    T. G. Kolda, B. W. Bader, Tensor decompositions and appl ications, SIAM review 51 (3) (2009) 455–500

  30. [30]

    N. Xiao, X. Hu, X. Liu, K.-C. Toh, Cdopt: a python package for a class of riemannian optimization, Mathematical Programming Computation (2025) 1–58

  31. [31]

    Virtanen, R

    P . Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T . Reddy, D. Cournapeau, E. Burovski, P . Peterson, W. Weckesser, J. Bright, et al., Scipy 1.0: fundamental algorithms for scientific computing in python, Nature methods 17 (3) (2020) 261–272

  32. [32]

    Nocedal, S

    J. Nocedal, S. J. Wright, Numerical optimization, Spri nger, 1999

  33. [33]

    R. H. Byrd, P . Lu, J. Nocedal, C. Zhu, A limited memory alg orithm for bound constrained optimization, SIAM Journal on scientific computing 16 (5) (1995) 1190–1208

  34. [34]

    Fletcher, On the barzilai-borwein method, in: Optim ization and control with applications, Springer, 2005, pp

    R. Fletcher, On the barzilai-borwein method, in: Optim ization and control with applications, Springer, 2005, pp. 235–256

  35. [35]

    Townsend, N

    J. Townsend, N. Koep, S. Weichwald, Pymanopt: A python t oolbox for optimization on manifolds using automatic di fferentiation, Journal of Machine Learning Research 17 (137) (2016) 1–5

  36. [36]

    Iannazzo, M

    B. Iannazzo, M. Porcelli, The riemannian barzilai–bor wein method with nonmonotone line search and the matrix geom etric mean computa- tion, IMA Journal of Numerical Analysis 38 (1) (2018) 495–51 7

  37. [37]

    Z. Wen, W. Yin, A feasible method for optimization with o rthogonality constraints, Mathematical Programming 142 ( 1) (2013) 397–434

  38. [38]

    D. C. Sorensen, Y . Zhou, Direct methods for matrix sylve ster and lyapunov equations, Journal of Applied Mathematic s 2003 (6) (2003) 277–303

  39. [39]

    M. E. Kilmer, K. Braman, N. Hao, R. C. Hoover, Third-orde r tensors as operators on matrices: A theoretical and comput ational framework with applications in imaging, SIAM Journal on Matrix Analys is and Applications 34 (1) (2013) 148–172

  40. [40]

    Bhattacharya, V

    R. Bhattacharya, V . Patrangenaru, Large sample theory of intrinsic and extrinsic sample means on manifolds, The An nals of Statistics 31 (1) (2003) 1–29. 29