Davis-Kahan Theorem in the two-to-infinity norm and its application to perfect clustering
Pith reviewed 2026-05-23 16:59 UTC · model grok-4.3
The pith
Upper bounds on eigenvector deviations in the two-to-infinity norm hold under varied error assumptions and yield conditions for perfect clustering.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Variants of the Davis-Kahan theorem produce explicit upper bounds on the two-to-infinity norm of U minus hat U across several regimes of the perturbation matrix, from deterministic or mild probabilistic settings through heavy-tailed and exponentially decaying errors, and these bounds translate directly into sufficient conditions for perfect clustering in a general statistical model.
What carries the argument
Two-to-infinity norm Davis-Kahan bounds that control the maximum absolute entry in any row of the eigenvector difference matrix.
If this is right
- Bounds hold initially with no or mild probabilistic assumptions on the error.
- Tighter bounds follow once generic probabilistic assumptions on the errors are added.
- Separate corrections apply to heavy-tailed errors and to exponentially fast decaying errors.
- The bounds supply sufficient conditions for perfect clustering in a generic setting.
- Alternative constructions of hat U can be compared directly for accuracy.
Where Pith is reading between the lines
- The same bounding techniques could be inserted into analyses of eigenvector estimation in principal component analysis or matrix completion.
- The clustering conditions could be turned into practical finite-sample checks by estimating the eigenvalue gap and error size from data.
- Different choices among the suggested estimators for hat U might produce strictly smaller two-to-infinity errors in particular applications.
Load-bearing premise
The population matrix must have an eigenvalue gap, and the error matrix must meet the technical requirements that let the Davis-Kahan variants apply.
What would settle it
A concrete matrix whose population version has a verified eigenvalue gap but whose sample eigenvectors exceed the derived two-to-infinity bound under the matching error conditions would falsify the claims.
read the original abstract
Many statistical applications, such as the Principal Component Analysis, matrix completion, tensor regression and many others, rely on accurate estimation of leading eigenvectors of a matrix. The Davis-Kahan theorem is known to be instrumental for bounding above the distances between matrices $U$ and $\widehat{U}$ of population eigenvectors and their sample versions. While those distances can be measured in various metrics, the recent developments have shown advantages of evaluation of the deviation in the two-to-infinity norm. The purpose of this paper is to develop a toolbox for derivation of upper bounds for the distances between $U$ and $\widehat{U}$ in the two-to-infinity norm for a variety of possible scenarios. Although this problem has been studied by several authors, the difference between this paper and its predecessors is that the upper bounds are obtained under various sets of assumptions. The upper bounds are initially derived with no or mild probabilistic assumptions on the error, and are subsequently refined, when some generic probabilistic assumptions on the errors hold. The paper also provides rectification of the upper bounds in the cases of heavy-tailed or exponentially fast decaying errors. In addition, the paper suggests alternative methods for evaluation of $\widehat{U}$ and, therefore, enables one to compare the resulting accuracies. As an example of an application of the techniques in the paper, we derive sufficient conditions for perfect clustering in a generic setting, and then employ them in various scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a toolbox of upper bounds on the two-to-infinity norm distance between population eigenvectors U and sample eigenvectors Ũ under successively stronger assumptions on the perturbation (error) matrix: first with no or mild probabilistic assumptions, then under generic probabilistic assumptions, and with rectifications for heavy-tailed and sub-exponential error regimes. It also proposes alternative estimators for Ũ and derives sufficient conditions for perfect clustering as an application, relying on an eigenvalue gap in the population matrix and suitable control on the error matrix to apply a Davis-Kahan variant.
Significance. If the derivations hold, the work supplies a systematic collection of 2-to-∞ eigenvector perturbation bounds that can be directly applied in PCA, matrix completion, tensor regression, and clustering analyses. The explicit refinement across error regimes and the generic clustering application are useful; the manuscript also supplies alternative Ũ constructions that allow accuracy comparisons.
minor comments (3)
- The abstract and introduction would benefit from a short table or enumerated list that explicitly maps each assumption regime (no/mild, generic probabilistic, heavy-tailed, sub-exponential) to the corresponding bound statement and the section where it appears.
- Notation for the alternative estimators of Ũ is introduced without a dedicated subsection; a brief comparison paragraph or small table summarizing their respective 2-to-∞ bounds would improve readability.
- In the clustering application, the sufficient condition that the bound is smaller than half the eigenvalue gap is stated generically; an explicit numerical illustration with one of the derived bounds would help readers verify the translation from perturbation bound to perfect recovery.
Simulated Author's Rebuttal
We thank the referee for the positive summary, recognition of the paper's significance, and recommendation of minor revision. No specific major comments were raised in the report.
Circularity Check
No significant circularity; derivations rest on external eigenvalue gap and error assumptions
full rationale
The paper develops a toolbox of 2-to-infinity norm bounds by applying variants of the Davis-Kahan theorem under explicit assumption regimes on the perturbation matrix (none/mild, generic probabilistic, heavy-tailed, sub-exponential). These start from a population eigenvalue gap plus error-matrix conditions that are stated as inputs, not fitted or self-defined quantities. The clustering application is a direct, standard consequence of the derived bounds being smaller than half the gap. No self-citation chain, ansatz smuggling, or renaming of known results is load-bearing; the abstract and high-level program describe independent derivations from the invoked theorem variants. This is the common honest case of a self-contained perturbation analysis.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Existence of a sufficient eigenvalue gap between leading and remaining eigenvalues of the population matrix
- domain assumption The error matrix satisfies conditions allowing application of the theorem under the successive assumption regimes (mild probabilistic, generic, heavy-tailed, exponential decay)
Forward citations
Cited by 1 Pith paper
-
Consistent line clustering using geometric hypergraphs
Derives information-theoretic recovery thresholds for two intersecting lines with polynomial mass concentration near the intersection and matches them (up to polylog factors) via a spectral algorithm on a hypergraph b...
Reference graph
Works this paper leans on
-
[1]
Community detection and stochastic bloc k models: Recent developments
Emmanuel Abbe. Community detection and stochastic bloc k models: Recent developments. J. Mach. Learn. Res. , 18(177):1–86, 2018
work page 2018
-
[2]
Emmanuel Abbe, Afonso S. Bandeira, and Georgina Hall. Ex act recovery in the stochastic block model. IEEE Transactions on Information Theory , 62(1):471–487, 2016. ISSN 0018-9448. doi: 10.1109/TIT.2015.2490670
-
[3]
An \ ell\_p\ theory of PCA and spectral clustering
Emmanuel Abbe, Jianqing Fan, and Kaizheng Wang. An ℓp theory of PCA and spectral clustering. The Annals of Statistics , 50(4):2359 – 2385, 2022. doi: 10.1214/22-AOS2196. URL https://doi.org/10.1214/22-AOS2196
-
[4]
Joshua Agterberg and Joshua Cape. An overview of asympto tic normality in stochas- tic blockmodels: Cluster analysis and inference. ArXiv:2305.06353, 2023. URL https://arxiv.org/abs/2305.06353
-
[5]
Arash A. Amini and Elizaveta Levina. On semidefinite rela xations for the block model. Ann. Statist., 46(1):149–179, 2018. doi: 10.1214/17-AOS1545
-
[6]
Changxiao Cai, Gen Li, Yuejie Chi, H. Vincent Poor, and Yu xin Chen. Subspace estimation from unbalanced and incomplete data matrices: ℓ2,∞ statistical guarantees. The Annals of Statistics , 49(2):944 – 967, 2021. doi: 10.1214/20-AOS1986. URL https://doi.org/10.1214/20-AOS1986
-
[7]
T. Tony Cai and Anru Zhang. Rate-optimal perturbation bo unds for singular subspaces with applications to high-dimensional statistics. The Annals of Statistics , 46(1):60 – 89, 2018. doi: 10.1214/17-AOS1541. URL https://doi.org/10.1214/17-AOS1541
-
[8]
Joshua Cape, Minh Tang, and Carey E. Priebe. The two-to-i nfinity norm and singular subspace geometry with applications to high-dimensional statistic s. The Annals of Statistics , 47(5):2405 – 2439, 2019. doi: 10.1214/18-AOS1752. URL https://doi.org/10.1214/18-AOS1752
-
[9]
Sub sampling based community detec- tion for large networks
Sayan Chakrabarty, Srijan Sengupta, and Yuguo Chen. Sub sampling based community detec- tion for large networks. Statistica Sinica, in press, 2023. doi: 10.5705/ss.202022.0108
-
[10]
Infer ence and uncertainty quantification for noisy matrix completion
Yuxin Chen, Jianqing Fan, Cong Ma, and Yuling Yan. Infer ence and uncertainty quantification for noisy matrix completion. Proceedings of the National Academy of Sciences , 116(46):22931–22937, 2019. doi: 10.1073/pnas.19100531 16. URL https://www.pnas.org/doi/abs/10.1073/pnas.1910053116
-
[11]
Asymmetry helps: Eigenvalue and eigenvector analyses of asymmetrically perturbed low-rank matrices
Yuxin Chen, Chen Cheng, and Jianqing Fan. Asymmetry hel ps: Eigenvalue and eigenvector analyses of asymmetrically perturbed low-rank matrices. The Annals of Statistics , 49(1):435 – 458, 2021. doi: 10.1214/20-AOS1963. URL https://doi.org/10.1214/20-AOS1963
-
[12]
Spectral Methods for Data Science : A Statistical Perspective
Yuxin Chen, Yuejie Chi, Jianqing Fan, and Cong Ma. Spect ral methods for data science: A statistical perspective. Foundations and Trends in Machine Learning , 14(5):566–806, 2021. ISSN 1935-8245. doi: 10.1561/2200000079. URL http://dx.doi.org/10.1561/2200000079
-
[13]
Chandler Davis and W. M. Kahan. The rotation of eigenvec tors by a perturbation. iii. SIAM Journal on Numerical Analysis , 7(1):1–46, 1970. doi: 10.1137/0707001. URL https://doi.org/10.1137/0707001. 24
-
[14]
Partial recovery bounds for clustering with the relaxed $K$means
Christophe Giraud and Nicolas Verzelen. Partial recov ery bounds for clustering with the relaxed kmeans. ArXiv:1807.07547, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[15]
John C. Gower and Garmt B. Dijksterhuis. Procrustes problems, volume 30 of Oxford Statistical Science Series . Oxford University Press, Oxford, UK, January 2004. ISBN 01 98510586. URL http://www.oup.com/uk/catalogue/?ci=9780198510581
work page 2004
-
[16]
A. Kumar, Y. Sabharwal, and S. Sen. A simple linear time ( 1 + epsiv;)-approximation algorithm for k-means clustering in any dimensions. In 45th Annual IEEE Symposium on Foundations of Computer Science , pages 454–462, Oct 2004. doi: 10.1109/FOCS.2004.7
- [17]
-
[18]
Consistency of spectr al clustering in stochastic block models
Jing Lei and Alessandro Rinaldo. Consistency of spectr al clustering in stochastic block models. Ann. Statist. , 43(1):215–237, 2015. doi: 10.1214/14-AOS1274
-
[19]
Unified \ ell\_\ 2 rightarrow infty\ \ Eigenspace Perturbation Theory for Symmetric Random Matrices
Lihua Lei. Unified ℓ2→∞ eigenspace perturbation theory for symmetric random matri ces. ArXiv: 1909.04798, 2020. URL https://arxiv.org/abs/1909.04798
-
[20]
Matthias Loffler, Anderson Y. Zhang, and Harrison H. Zhou . Optimality of spectral clustering in the Gaussian mixture model. The Annals of Statistics , 49(5):2506 – 2530, 2021. doi: 10. 1214/20-AOS2044. URL https://doi.org/10.1214/20-AOS2044
-
[21]
Two provably consis- tent divide-and-conquer clustering algorithms for large n etworks
Soumendu Sundar Mukherjee, Purnamrita Sarkar, and Pet er J Bickel. Two provably consis- tent divide-and-conquer clustering algorithms for large n etworks. Proceedings of the National Academy of Sciences , 118(44):e2100482118, 2021
work page 2021
-
[22]
Sharp optimal recovery in the two compo nent Gaussian mixture model
Mohamed Ndaoud. Sharp optimal recovery in the two compo nent Gaussian mixture model. The Annals of Statistics , 50(4):2096 – 2126, 2022. doi: 10.1214/22-AOS2178. URL https://doi.org/10.1214/22-AOS2178
-
[23]
Signed diverse multiplex networks: C lustering and inference
Marianna Pensky. Signed diverse multiplex networks: C lustering and inference. ArXiv:2204.12087, 2024
-
[24]
Clustering of diverse multiplex networks
Marianna Pensky and Yaxuan Wang. Clustering of diverse multiplex networks. IEEE Transac- tions on Network Science and Engineering , 11(4):3441–3454, 2024. doi: 10.1109/TNSE.2024. 3374102
-
[25]
Spectral clus tering and the high-dimensional stochastic blockmodel
Karl Rohe, Sourav Chatterjee, and Bin Yu. Spectral clus tering and the high-dimensional stochastic blockmodel. Ann. Statist. , 39(4):1878–1915, 2011
work page 1915
-
[26]
Adaptive clustering through semidefinit e programming
Martin Royer. Adaptive clustering through semidefinit e programming. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. V ish- wanathan, and R. Garnett, editors, Advances in Neural Information Pro- cessing Systems 30 , pages 1795–1803. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/6776-adaptive-clusterin g-through-semidefinit...
work page 2017
-
[27]
Patrick Rubin-Delanchy, Joshua Cape, Minh Tang, and Ca rey E. Priebe. A Statistical Inter- pretation of Spectral Embedding: The Generalised Random Do t Product Graph. Journal of the Royal Statistical Society Series B: Statistical Method ology, 84(4):1446–1473, 2022. ISSN 1369-7412. doi: 10.1111/rssb.12509. URL https://doi.org/10.1111/rssb.12509. 25
-
[28]
Roman Vershynin. High-Dimensional Probability, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics . Cambridge University Press, 2018
work page 2018
- [30]
-
[31]
Entrywise limit theorems for eigenvect ors of signal-plus-noise matrix mod- els with weak signals
Fangzheng Xie. Entrywise limit theorems for eigenvect ors of signal-plus-noise matrix mod- els with weak signals. Bernoulli, 30(1):388 – 418, 2024. doi: 10.3150/23-BEJ1602. URL https://doi.org/10.3150/23-BEJ1602
-
[32]
Inference for heteroskedastic PCA with miss- ing data
Yuling Yan, Yuxin Chen, and Jianqing Fan. Inference for heteroskedastic PCA with miss- ing data. The Annals of Statistics , 52(2):729 – 756, 2024. doi: 10.1214/24-AOS2366. URL https://doi.org/10.1214/24-AOS2366
-
[33]
Y. Yu, T. Wang, and R. J. Samworth. A useful variant of the davis-kahan theorem for statisti- cians. Biometrika, 102(2):315–323, 2014. ISSN 0006-3444. doi: 10.1093/biom et/asv008. URL https://doi.org/10.1093/biomet/asv008
-
[34]
Fundamental limits of spectral clus tering in stochastic block models
Anderson Ye Zhang. Fundamental limits of spectral clus tering in stochastic block models. IEEE Transactions on Information Theory , 70(10):7320–7348, 2024. doi: 10.1109/TIT.2024.3425581
-
[35]
Runbing Zheng and Minh Tang. Limit results for distribu ted estimation of invariant subspaces in multiple networks inference and pca. ArXiv: 2206.04306 , 2022. doi: 10.48550/ARXIV.2206. 04306
-
[36]
Deflated HeteroPCA : Overcoming the curse of ill-conditioning in heteroskedastic PCA , March 2023
Yuchen Zhou and Yuxin Chen. Deflated heteropca: Overcom ing the curse of ill-conditioning in heteroskedastic pca. ArXiv: 2303.06198 , 2024. URL https://arxiv.org/abs/2303.06198. 26 7 Proofs 7.1 Proofs of statements in Section 2. Proof of Theorem 2. Note that, by Weyl’s theorem, one has ˆλ r = λ r( ˆY ) ≥ λ r − ∥ E ∥, so that ˆΛ− 1 = |ˆλ r|− 1 ≤ ...
-
[37]
Inequality (2.8) is the direct consequence of (2.6) and (2.7)
(7.6) Finally, combining (7.2)–(7.6) and taking into account tha t ∆ 0 ≤ 1/ 4, obtain (2.7). Inequality (2.8) is the direct consequence of (2.6) and (2.7). 27 Proof of Theorem 3. Denote the sets, on which (2.6) and (2.10) are true, by, respe ctively, Ω τ,1 and Ω τ,1. Denote Ω τ = Ωτ,1 ∩ Ωτ,1 and observe that P(Ωτ ) ≥ 1 − 2 n− τ . Note that, due to (2.12) ...
-
[38]
Also, ∥ ˆU ∥2,∞ ≤ ∥ ˆU − U WU ∥2,∞ + ∥U ∥2,∞ . Combining all inequalities above and recalling that ǫ0 = o(1), immediately obtain (7.10). In order to prove (7.11), we use the “leave one out” method. Sp ecifically, fix l ∈ [n] and let ˆY (l) = ˆU (l) ˆΛ(l)( ˆU (l))T + ˆU (l) ⊥ ˆΛ(l) ⊥ ( ˆU (l) ⊥ )T be the SVD of ˆY (l), where ˆU (l) ∈ O n,r and ˆU (l) ⊥ ∈ O n...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.