Expressivity of congruence-based architectures for DNNs on positive-definite matrices
Pith reviewed 2026-06-28 15:25 UTC · model grok-4.3
The pith
The semi-orthogonality constraint on weights in congruence-like layers for positive-definite matrices limits expressivity, collapsing stacked layers to one-hidden-layer equivalents for some activations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Congruence-like layers multiply the input positive-definite matrix on both sides by a weight matrix W and its transpose. When W is constrained to be semi-orthogonal, Poincaré's separation theorem implies that the eigenvalues of the output cannot span a wider range than those of the input in certain ways. Combined with specific activations, this causes any number of stacked layers to produce outputs equivalent in expressivity to those of a single layer.
What carries the argument
The congruence-like layer, which transforms a positive-definite matrix X into W X W^T, with the semi-orthogonality constraint on W that triggers the spectral collapse via Poincaré's separation theorem.
If this is right
- The architecture with multiple congruence layers behaves identically to a one-hidden-layer network for the affected activations.
- Stacked layers fail to gain additional expressivity from depth due to repeated loss of spectral diversity.
- Only activations that preserve the necessary spectral properties avoid the collapse.
- The final classifier choice must account for the limited feature variety produced by the layers.
Where Pith is reading between the lines
- Removing or relaxing the semi-orthogonality constraint could allow deeper networks to achieve greater expressivity on positive-definite data.
- Similar spectral limitations might appear in other manifold-based neural architectures that impose orthogonality.
- Empirical tests on classification accuracy could reveal whether the theoretical collapse translates to performance plateaus in practice.
Load-bearing premise
Poincaré's separation theorem applies directly to imply loss of spectral diversity in the stacked congruence-like layers under the chosen activation functions, without further restrictions on the input matrices.
What would settle it
A concrete counterexample would be a multi-layer congruence network with semi-orthogonal weights and the specified activations that produces output distributions or decision boundaries distinct from and more powerful than a single-layer version on the same positive-definite matrix inputs.
Figures
read the original abstract
This work studies neural architectures for classifying symmetric positive-definite matrices, focusing on congruence-like layers, in which the input matrix is multiplied on the left and right by a (possibly rectangular) weight matrix $W$ and its transpose. Such layers lie at the core of the celebrated SPDNet and have also been employed independently for dimensionality reduction on positive-definite data. We show that the (semi)-orthogonality constraint commonly imposed on $W$ limits the expressivity of these layers: for certain activation functions, the resulting architecture collapses to a one-hidden-layer equivalent. This lack of expressivity follows from a loss of spectral diversity in congruence-like layers for semi-orthogonal $W$ and is a direct consequence of Poincar\'e's separation theorem. We then examine the choice of the final classifier, comparing several Riemannian classifiers and discussing their compatibility with the feature maps produced by congruence-like layers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies congruence-like layers (W A W^T) for DNNs on symmetric positive-definite matrices, as used in SPDNet. It claims that the common semi-orthogonality constraint on W causes loss of spectral diversity by Poincaré's separation theorem, so that for certain activation functions any number of such layers is equivalent to a single hidden layer. The work also compares several Riemannian classifiers for the final layer and their compatibility with the resulting feature maps.
Significance. If the central claim is established with precise conditions, the result would be significant for manifold-valued deep learning: it identifies a structural limitation in a widely adopted architecture and supplies a theorem-based explanation rather than an empirical observation. The reliance on an external result (Poincaré separation) rather than fitted parameters is a methodological strength.
major comments (2)
- [Abstract / main expressivity theorem] The argument that stacked congruence layers remain equivalent to a single layer requires an explicit hypothesis on the activation functions (e.g., eigenvalue-wise monotonicity or a similar property) that prevents recovery of the eigenvalues discarded by Poincaré separation. The abstract states only “certain activation functions” without listing the class or verifying that the composition across layers cannot restore the lost spectral information; this step is load-bearing for the multi-layer collapse claim.
- [Expressivity analysis (section containing the proof of collapse)] Poincaré separation (or Cauchy interlacing) bounds the eigenvalues of W^T A W when W has orthonormal columns, but the manuscript must still demonstrate that the subsequent nonlinear map, when iterated, cannot propagate information from the discarded eigenvalues. Without this propagation argument, the reduction to a one-hidden-layer equivalent does not automatically follow from the single-layer spectral loss.
minor comments (2)
- [Introduction / Methods] Notation for the congruence operation and the precise definition of “semi-orthogonal” (rectangular vs. square) should be stated once at the beginning of the methods section for clarity.
- [Classifier comparison section] The comparison of Riemannian classifiers would benefit from a short table summarizing which classifiers are compatible with the spectral features produced by the congruence layers.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on the expressivity analysis. We address the two major comments point by point below and will revise the manuscript to improve clarity on the activation hypotheses and the iterative non-recovery argument.
read point-by-point responses
-
Referee: [Abstract / main expressivity theorem] The argument that stacked congruence layers remain equivalent to a single layer requires an explicit hypothesis on the activation functions (e.g., eigenvalue-wise monotonicity or a similar property) that prevents recovery of the eigenvalues discarded by Poincaré separation. The abstract states only “certain activation functions” without listing the class or verifying that the composition across layers cannot restore the lost spectral information; this step is load-bearing for the multi-layer collapse claim.
Authors: We agree that the abstract is too terse. The main text defines the relevant class as eigenvalue-wise monotonic (strictly increasing) functions; the proof shows that monotonicity together with repeated application of Poincaré separation prevents recovery of discarded eigenvalues. We will revise the abstract to read 'for eigenvalue-wise monotonic activation functions' and add one sentence noting that the composition across layers cannot restore lost spectral information. revision: yes
-
Referee: [Expressivity analysis (section containing the proof of collapse)] Poincaré separation (or Cauchy interlacing) bounds the eigenvalues of W^T A W when W has orthonormal columns, but the manuscript must still demonstrate that the subsequent nonlinear map, when iterated, cannot propagate information from the discarded eigenvalues. Without this propagation argument, the reduction to a one-hidden-layer equivalent does not automatically follow from the single-layer spectral loss.
Authors: The existing inductive argument already shows that each layer re-applies Poincaré separation to the output of the preceding activation, and monotonicity of the activation preserves the interlacing bounds without restoring eigenvalues outside them. To make the non-propagation step fully explicit, we will insert a short lemma stating that the composition of congruence-plus-monotonic-activation cannot recover information lost at any prior layer. revision: yes
Circularity Check
No circularity; central claim rests on external Poincaré separation theorem
full rationale
The paper's key expressivity result is explicitly attributed to an external classical theorem (Poincaré's separation theorem) rather than to any fitted parameter, self-definition, or self-citation chain. The abstract states the collapse 'is a direct consequence of Poincaré's separation theorem' with no indication that the theorem itself is derived from the present work or that the activation composition step reduces to a tautology. No load-bearing step is shown to equate a prediction with its own input by construction, and the derivation chain therefore remains independent of the paper's own fitted quantities or prior self-referential results.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Poincaré's separation theorem
Reference graph
Works this paper leans on
-
[1]
Arora, N
S. Arora, N. Cohen, and E. Hazan. On the optimization of deep networks: Implicit acceleration by overparameterization. InProceedings of the 35th International Conference on Machine Learning, 2018
2018
-
[2]
Arora, N
S. Arora, N. Cohen, W. Hu, and Y . Luo. Implicit Regularization in Deep Matrix Factorization. InProceedings of the 33rd Conf. on Neural Information Processing Systems, 2019
2019
-
[3]
Arsigny, P
V . Arsigny, P. Fillard, X. Pennec, and N. Ayache. Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices. SIAM J. on Matrix Analysis and Applications, 29(1):328–347, 2007
2007
-
[4]
Barachant, S
A. Barachant, S. Bonnet, M. Congedo, and C. Jutten. Multiclass brain- computer interface classification by Riemannian geometry.IEEE Tr. on Biomedical Engineering, 59(4):920–928, 2012
2012
-
[5]
Bellman.Introduction to Matrix Analysis, Second Edition
R. Bellman.Introduction to Matrix Analysis, Second Edition. SIAM, 1997
1997
-
[6]
Bhatia.Matrix Analysis, volume 169 ofGraduate Texts in Mathe- matics
R. Bhatia.Matrix Analysis, volume 169 ofGraduate Texts in Mathe- matics. Springer, 1997
1997
-
[7]
Boucherie, T
C. Boucherie, T. de Surrel, and F. Yger. SPDNet-AE: a Compact SPD Representation through Riemannian Autoencoding. In34th European Symposium on Artificial Neural Networks, 2026
2026
-
[8]
Boumal.An introduction to optimization on smooth manifolds
N. Boumal.An introduction to optimization on smooth manifolds. Cambridge University Press, 2023
2023
-
[9]
M. M. Bronstein, J. Bruna, Y . LeCun, A. Szlam, and P. Vandergheynst. Geometric Deep Learning: Going beyond Euclidean data.IEEE Signal Processing Magazine, 34(3):18–42, 2017
2017
-
[10]
Brooks, O
D. Brooks, O. Schwander, F. Barbaresco, J.-Y . Schneider, and M. Cord. Riemannian batch normalization for SPD neural networks. InProceed- ings of the 33rd International Conf. on Neural Information Processing Systems, 2019
2019
-
[11]
Cabanes, F
Y . Cabanes, F. Barbaresco, M. Arnaudon, and J. Bigot. Toeplitz Hermitian positive definite matrix machine learning based in Fisher metric. InProceedings of Geometric Science of Information, 2019
2019
-
[12]
Chakraborty, J
R. Chakraborty, J. Bouza, J. H. Manton, and B. C. Vemuri. ManifoldNet: A Deep Neural Network for Manifold-Valued Data with Applications. IEEE Tr. on Pattern Analysis and Machine Intelligence, 44(2):799–810, 2022
2022
-
[13]
Harandi and M
M. Harandi and M. Salzmann. Riemannian coding and dictionary learning: Kernels to the rescue. InProceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, pages 3926–3935, 2015
2015
-
[14]
Harandi, M
M. Harandi, M. Salzmann, and R. Hartley. Dimensionality Reduction on SPD Manifolds: The Emergence of Geometry-Aware Methods.IEEE Tr. on Pattern Analysis and Machine Intelligence, 40(1):48–62, 2018
2018
-
[15]
N. J. Higham.Functions of Matrices. SIAM, 2008
2008
-
[16]
Horev, F
I. Horev, F. Yger, and M. Sugiyama. Geometry-aware principal compo- nent analysis for symmetric positive definite matrices. InProceedings of the Asian Conf. on Machine Learning, pages 1–16, 2022
2022
-
[17]
Huang and L
Z. Huang and L. Van Gool. A Riemannian network for SPD matrix learning. InProceedings of the 31st AAAI Conf. on Artificial Intelligence, page 2036–2042, 2017
2036
-
[18]
Ionescu, J
C. Ionescu, J. Carreira, and C. Sminchisescu. Iterated second-order label sensitive pooling for 3d human pose estimation. InProceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, pages 1661– 1668, 2014
2014
-
[19]
Jayasumana, R
S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi. Kernel Methods on the Riemannian Manifold of Symmetric Positive Definite Matrices. InProceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, page 73–80, 2013
2013
-
[20]
L ´opez, B
F. L ´opez, B. Pozzetti, S. Trettel, M. Strube, and A. Wienhard. Vector- valued distance and gyrocalculus on the space of symmetric positive definite matrices. InProceedings of the 35th International Conf. on Neural Information Processing Systems, 2021
2021
-
[21]
Massart and P.-A
E. Massart and P.-A. Absil. Quotient geometry with simple geodesics for the manifold of fixed-rank positive-semidefinite matrices.SIAM J. on Matrix Analysis and Applications, 41(1):171–198, 2020
2020
-
[22]
Massart and S
E. Massart and S. Chevallier. Inductive means and sequences applied to online classification of EEG. InProceedings of Geometric Science of Information, 2017
2017
-
[23]
X. S. Nguyen and S. Yang. Building Neural Networks on Matrix Manifolds: A Gyrovector Space Approach. InProceedings of the 40th International Conf. on Machine Learning, 2023
2023
-
[24]
Yang, and A
X.S Nguyen, S. Yang, and A. Histace. Matrix manifold neural net- works++. InProceedings of the 12th International Conf. on Learning Representations, 2024
2024
-
[25]
C. Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall. Activation functions: Comparison of trends in practice and research for deep learning.arXiv preprint arXiv:1811.03378, 2018
Pith/arXiv arXiv 2018
-
[26]
Pennec, P
X. Pennec, P. Fillard, and N. Ayache. A Riemannian Framework for Tensor Computing.International J. of Computer Vision, 66(1):41–66, 2006
2006
-
[27]
S. Said, L. Bombrun, Y . Berthoumieu, and J. H. Manton. Riemannian Gaussian distributions on the Space of Symmetric Positive Definite Matrices.IEEE Tr. on Information Theory, 63:2153–2170, 2017
2017
-
[28]
S. Sra. A new metric on the manifold of kernel matrices with application to matrix geometric means. InProceedings of the 26th Conf. on Neural Information Processing Systems, 2012
2012
-
[29]
Steinert, S
F. Steinert, S. Said, and C. Mostajeran. Universal Kernels via Harmonic Analysis on Riemannian Symmetric Spaces. InProceedings of Geomet- ric Science of Information, 2025
2025
-
[30]
Tosato, M
D. Tosato, M. Farenzena, M. Spera, V . Murino, and M. Cristani. Multi- class Classification on Riemannian Manifolds for Video Surveillance. InProceedings of the 11th European Conf. on Computer Vision, pages 378–391, 2010
2010
-
[31]
T ¨uzel, F
O. T ¨uzel, F. Porikli, and P. Meer. Pedestrian Detection via Classification on Riemannian Manifolds.IEEE Tr. on Pattern Analysis and Machine Intelligence, 30(10):1713–1727, 2008
2008
-
[32]
R. Wang, H. Guo, L. S. Davis, and Q. Dai. Covariance discriminative learning: A natural and efficient approach to image set classification. InProceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, pages 2496–2503, 2012
2012
-
[33]
Wang, X.-J
R. Wang, X.-J. Wu, Z. Chen, T. Xu, and J. Kittler. DreamNet: A Deep Riemannian Manifold Network for SPD Matrix Learning. In Proceedings of the Asian Conf. on Computer Vision, pages 3241–3257, 2022
2022
-
[34]
Wang, X.-J
R. Wang, X.-J. Wu, T. Xu, C. Hu, and J. Kittler. U-SPDNet: An SPD manifold learning-based neural network for visual classification.Neural Networks, 161:382–396, 2023
2023
-
[35]
Wilson, R
D. Wilson, R. T. Schirrmeister, L. A. W. Gemein, and T. Ball. Deep Riemannian Networks for end-to-end EEG decoding.Imaging Neuro- science, 3, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.