Consistent line clustering using geometric hypergraphs

arxiv: 2505.24868 · v3 · submitted 2025-05-30 · 🧮 math.ST · stat.ML· stat.TH

Consistent line clustering using geometric hypergraphs

Kalle Alaluusua , Konstantin Avrachenkov , B. R. Vinay Kumar , Lasse Leskel\"a This is my paper

Pith reviewed 2026-05-19 12:37 UTC · model grok-4.3

classification 🧮 math.ST stat.MLstat.TH

keywords subspace clusteringgeometric hypergraphsintersecting linesspectral clusteringinformation-theoretic boundsexact recoveryGaussian noise

0 comments p. Extension

The pith

A spectral algorithm on hypergraphs of nearly collinear triples recovers two intersecting lines at the information-theoretic limit.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Subspace clustering becomes difficult near intersections because points from different subspaces lie close together. This paper studies the minimal case of two lines that cross, where the sampling distribution places polynomially large mass in small neighborhoods around the crossing point. The authors form a hypergraph whose hyperedges connect nearly collinear triples of points and derive a similarity matrix from it. Under a regularity condition on the latent distribution they show that spectral clustering on this matrix achieves exact or almost exact recovery at the information-theoretic threshold up to polylog factors, even under Gaussian noise. The result matters because it demonstrates that triplewise geometric information suffices where pairwise distances alone are uninformative.

Core claim

In the setting of two intersecting lines observed under Gaussian noise, when the latent sampling law places polynomially large mass near the intersection, a spectral algorithm that operates on the similarity matrix of a hypergraph whose hyperedges are nearly collinear triples achieves the information-theoretic bounds for exact and almost exact recovery up to polylogarithmic factors, provided a simple regularity condition holds on the latent distribution.

What carries the argument

The hypergraph similarity matrix built from triples of nearly collinear points, which supplies the higher-order geometric signal needed for spectral clustering to separate the two latent lines.

If this is right

The exact-recovery threshold is governed by the concentration rate of the latent law near the intersection.
Pairwise collinearity information is insufficient to decide cluster membership, but information from collinear triples is sufficient.
The method succeeds without the strong separation or sampling assumptions that avoid the intersection region in earlier work.
Recovery guarantees hold up to polylogarithmic factors of the information-theoretic lower bounds derived for both exact and almost exact recovery.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same hypergraph construction could be generalized to more than two lines or to subspaces of higher dimension by replacing collinearity with higher-order flatness conditions.
Empirical tests on real geometric data sets containing known intersections would reveal how far the theoretical guarantees translate to practice.
Analogous higher-order geometric hypergraphs might resolve similar intersection ambiguities in other noisy clustering tasks such as manifold learning or computer-vision line detection.

Load-bearing premise

The latent distribution must place polynomially large mass near the intersection together with a regularity condition that guarantees the spectral method succeeds.

What would settle it

Run the algorithm on samples from a distribution that places only sub-polynomial mass near the intersection or that violates the regularity condition and check whether exact recovery fails at the predicted noise level.

Figures

Figures reproduced from arXiv: 2505.24868 by B. R. Vinay Kumar, Kalle Alaluusua, Konstantin Avrachenkov, Lasse Leskel\"a.

**Figure 2.** Figure 2: Boxplots of the ARI values obtained by the three clustering methods on the [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Boxplots of ARI values for Clique averaging and TLS thresholding on the twoline clustering problem. The mean ARI was approximately 0.90 for Clique averaging and 0.92 for TLS thresholding. The boxplots in [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: The decision region Ac ∗ where the density f2 dominates f1 is colored in red. Then the conditional laws F1 and F2 of Xi given Zi = 1 and Zi = 2 have Lebesgue densities f1(x1, x2) = Z ℓ/2 −ℓ/2 φσ(x1 − u cos β) φσ(x2 + u sin β) ν(du), f2(x1, x2) = Z ℓ/2 −ℓ/2 φσ(x1 − u cos β) φσ(x2 − u sin β) ν(du). Expanding the squares gives f1(x1, x2) = cσ(x1, x2) Z ℓ/2 −ℓ/2 exp − u 2 2σ 2 + u(x1 cos β − x2 sin β) σ 2 ! ν(… view at source ↗

**Figure 5.** Figure 5: Transverse (dashed) and direct (dotted) common tangents of circles [PITH_FULL_IMAGE:figures/full_fig_p054_5.png] view at source ↗

read the original abstract

Subspace clustering becomes inherently difficult near intersections, where points from different subspaces are barely separated. Most existing theoretical results address this issue by imposing separation or sampling assumptions that limit the statistical effect of points near the intersection. We study a minimal setting of two intersecting lines in which the latent sampling law places polynomially large mass in small neighborhoods of the intersection. We derive information-theoretic lower bounds for exact and almost exact recovery under Gaussian noise. In particular, we show that the exact-recovery threshold is determined by the rate at which the latent law concentrates near the intersection. Since any two points are collinear, pairwise information alone does not reveal whether they are sampled from the same latent line. We therefore construct a hypergraph in which nearly collinear triples form hyperedges, and study the resulting hypergraph similarity matrix. Under a simple regularity condition on the latent distribution, we introduce a spectral algorithm that achieves the information-theoretic bounds up to polylogarithmic factors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sets up info-theoretic thresholds for exact recovery of two intersecting lines under polynomial concentration near the crossing and claims a hypergraph spectral algorithm matches them up to logs under one regularity condition.

read the letter

The main point is that they isolate the hard case of two lines crossing with sampling that puts polynomially large mass in small neighborhoods of the intersection. Pairwise distances or angles give no signal because any two points lie on a line, so they build a hypergraph whose edges are nearly collinear triples and then run spectral clustering on the resulting similarity matrix. The abstract says this recovers the lines at the information-theoretic threshold up to polylog factors once a simple regularity condition on the latent distribution is added. That construction and the link between concentration rate and the recovery threshold are the concrete advances here. The lower bounds are derived directly from the sampling model and Gaussian noise, which keeps the argument self-contained. The hypergraph step correctly avoids the triviality of pairs and focuses on the geometry that actually distinguishes the two lines. The main soft spot is that only the abstract is available. The regularity condition is doing heavy lifting for the spectral step, but its precise form and how restrictive it is remain unclear. Without the proofs or the error analysis for the hypergraph similarity matrix, it is impossible to check whether the claimed polylog gap is tight or whether hidden constants blow up. The abstract itself shows no internal contradiction, and the strategy of moving from pairs to triples is logically sound for this regime. This is for readers working on theoretical subspace clustering or geometric hypergraphs in statistics. Someone who already knows the pairwise literature and wants to see how higher-order information changes the thresholds near intersections will get a useful setup to build on. I would send it to a serious referee so the proofs and the strength of the regularity condition can be examined in detail.

Referee Report

1 major / 0 minor

Summary. The paper studies subspace clustering for points sampled from two intersecting lines under Gaussian noise, where the latent distribution places polynomially large mass near the intersection. It derives information-theoretic lower bounds for exact and almost-exact recovery, with the exact-recovery threshold determined by the concentration rate near the intersection. Pairwise information is insufficient since any two points are collinear, so the paper constructs a hypergraph from nearly collinear triples, forms a similarity matrix, and proposes a spectral algorithm that achieves the lower bounds up to polylogarithmic factors under a simple regularity condition on the latent distribution.

Significance. If the results hold, this would represent a meaningful contribution to subspace clustering by handling the difficult regime near intersections without strong separation or sampling assumptions. The hypergraph construction to capture geometric higher-order structure and the near-matching of information-theoretic bounds in this minimal setting could provide useful insights for more general problems in geometric clustering and recovery.

major comments (1)

Abstract: The abstract states that lower bounds are derived and that the algorithm achieves them up to polylog factors, but the provided manuscript consists solely of the abstract without the proofs, error analyses, or explicit regularity condition, so the support for the central claims cannot be verified.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed summary and for recognizing the potential significance of addressing subspace clustering near intersections via hypergraph methods. We address the major comment below.

read point-by-point responses

Referee: Abstract: The abstract states that lower bounds are derived and that the algorithm achieves them up to polylog factors, but the provided manuscript consists solely of the abstract without the proofs, error analyses, or explicit regularity condition, so the support for the central claims cannot be verified.

Authors: We agree that the version provided for review contains only the abstract. The full manuscript includes the derivation of the information-theoretic lower bounds for exact and almost-exact recovery, the hypergraph construction from nearly collinear triples, the similarity matrix, the spectral algorithm, all error analyses, and the explicit regularity condition on the latent distribution. We will revise the submission to include the complete manuscript (or a direct link to the full arXiv version) so that the claims can be verified. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The abstract derives information-theoretic lower bounds for exact and almost-exact recovery directly from the given latent sampling law (polynomial mass near the intersection) and Gaussian noise model, then constructs a hypergraph on nearly-collinear triples whose similarity matrix is analyzed by a spectral algorithm that succeeds under one explicitly invoked regularity condition on the latent distribution. No equations, fitted parameters, or self-citations are referenced that would reduce the claimed thresholds or recovery guarantees to inputs by construction; the regularity condition is presented as an external assumption enabling the spectral step rather than a tautology, and the overall strategy remains independent of the target results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the polynomial mass concentration near the intersection and an unspecified simple regularity condition; no explicit free parameters or new physical entities are introduced in the abstract.

axioms (1)

domain assumption simple regularity condition on the latent distribution
Invoked to ensure the spectral algorithm on the hypergraph similarity matrix achieves the information-theoretic bounds.

invented entities (1)

hypergraph similarity matrix from nearly collinear triples no independent evidence
purpose: To encode higher-order collinearity information that pairwise distances cannot provide for distinguishing the two lines
Constructed explicitly to overcome the fact that any two points are collinear.

pith-pipeline@v0.9.0 · 5678 in / 1360 out tokens · 35693 ms · 2026-05-19T12:37:35.059412+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We therefore construct a hypergraph in which nearly collinear triples form hyperedges, and study the resulting hypergraph similarity matrix. Under a simple regularity condition on the latent distribution, we introduce a spectral algorithm...
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption 1 (Concentration of mass at the intersection). There exists ρ ∈ (0,1] such that lim inf u↓0 ν([−u,u])/u^ρ > 0.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Planted clique detection and recovery from the hypergraph adjacency matrix
math.ST 2026-04 unverdicted novelty 6.0

Spectral norm test and leading-eigenvector method achieve detection and exact recovery of planted cliques from hypergraph adjacency matrices at the sqrt(n) scale.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

The Annals of Statistics48(3), 1452–1474 (2020)

Abbe, E., Fan, J., Wang, K., Zhong, Y.: Entrywise eigenvector analysis of random matrices with low expected rank. The Annals of Statistics48(3), 1452–1474 (2020)

work page 2020
[2]

In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)

Agarwal, S., Lim, J., Zelnik-Manor, L., Perona, P., Kriegman, D., Belongie, S.: Beyond pairwise clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)

work page 2005
[3]

In: 2017 IEEE International Symposium on Information Theory (ISIT)

Ahn, K., Lee, K., Suh, C.: Information-theoretic limits of subspace clustering. In: 2017 IEEE International Symposium on Information Theory (ISIT). pp. 2473–2477. IEEE (2017)

work page 2017
[4]

IEEE Journal of Selected Topics in Signal Processing12(5), 959–974 (2018)

Ahn, K., Lee, K., Suh, C.: Hypergraph spectral clustering in the weighted stochastic block model. IEEE Journal of Selected Topics in Signal Processing12(5), 959–974 (2018)

work page 2018
[5]

IEEE Transactions on Information Theory65(10), 6561–6579 (2019)

Ahn, K., Lee, K., Suh, C.: Community recovery in hypergraphs. IEEE Transactions on Information Theory65(10), 6561–6579 (2019)

work page 2019
[6]

In: International Workshop on Algorithms and Models for the Web-Graph (WAW)

Alaluusua, K., Avrachenkov, K., Vinay Kumar, B.R., Leskelä, L.: Multilayer hyper- graph clustering using the aggregate similarity matrix. In: International Workshop on Algorithms and Models for the Web-Graph (WAW). pp. 83–98 (2023)

work page 2023
[7]

Electronic Journal of Statistics 5, 1537–1587 (2011)

Arias-Castro, E., Chen, G., Lerman, G.: Spectral clustering based on lo- cal linear approximations. Electronic Journal of Statistics 5, 1537–1587 (2011). https://doi.org/10.1214/11-EJS651

work page doi:10.1214/11-ejs651 2011
[8]

Journal of Machine Learning Research18, 1–57 (2017)

Arias-Castro, E., Lerman, G., Zhang, T.: Spectral clustering based on local PCA. Journal of Machine Learning Research18, 1–57 (2017)

work page 2017
[9]

Now Publishers (2022)

Avrachenkov, K., Dreveton, M.: Statistical analysis of networks. Now Publishers (2022)

work page 2022
[10]

Journal of Global Optimization 16(1), 23–32 (2000)

Bradley, P.S., Mangasarian, O.L.:k-plane clustering. Journal of Global Optimization 16(1), 23–32 (2000). https://doi.org/10.1023/A:1008324625522

work page doi:10.1023/a:1008324625522 2000
[11]

Scandinavian Journal of Statistics51(4), 1661–1684 (2024)

Brusa, L., Matias, C.: Model-based clustering in simple hypergraphs through a stochastic blockmodel. Scandinavian Journal of Statistics51(4), 1661–1684 (2024)

work page 2024
[12]

Foundations of Computational Mathematics9(5), 517–558 (2009)

Chen, G., Lerman, G.: Foundations of a multi-way spectral clustering framework for hybrid linear modeling. Foundations of Computational Mathematics9(5), 517–558 (2009). https://doi.org/10.1007/s10208-009-9043-7

work page doi:10.1007/s10208-009-9043-7 2009
[13]

International Journal of Computer Vision81(3), 317–330 (2009)

Chen, G., Lerman, G.: Spectral curvature clustering (SCC). International Journal of Computer Vision81(3), 317–330 (2009). https://doi.org/10.1007/s11263-008-0178-9

work page doi:10.1007/s11263-008-0178-9 2009
[14]

IEEE Transactions on Information Theory65(12), 8095–8118 (2019)

Chien, I.E., Lin, C.Y., Wang, I.H.: On the minimax misclassification ratio of hy- pergraph community detection. IEEE Transactions on Information Theory65(12), 8095–8118 (2019)

work page 2019
[15]

The Annals of Statistics54(1), 48–73 (2026) 57

Dumitriu, I., Wang, H.X.: Optimal and exact recovery on the general non-uniform hypergraph stochastic block model. The Annals of Statistics54(1), 48–73 (2026) 57

work page 2026
[16]

Combinatorics, Probability and Com- puting 34(1), 1–51 (2025)

Dumitriu, I., Wang, H.X., Zhu, Y.: Partial recovery and weak consistency in the non- uniform hypergraph stochastic block model. Combinatorics, Probability and Com- puting 34(1), 1–51 (2025). https://doi.org/10.1017/S0963548324000166

work page doi:10.1017/s0963548324000166 2025
[17]

In: Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Elhamifar, E., Vidal, R.: Clustering disjoint subspaces via sparse representation. In: Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1926–1929 (2010)

work page 2010
[18]

In: Conference on Learning Theory (COLT)

Gaudio, J., Joshi, N.: Community detection in the hypergraph SBM: exact recovery given the similarity matrix. In: Conference on Learning Theory (COLT). pp. 469–510 (2023)

work page 2023
[19]

Journal of Statistical Theory and Practice15(2), 35 (2021)

Ghosh, M.: Exponential tail bounds for chi-squared random variables. Journal of Statistical Theory and Practice15(2), 35 (2021)

work page 2021
[20]

In: AAAI Conference on Artificial Intelligence (AAAI) (2015)

Ghoshdastidar, D., Dukkipati, A.: Spectral clustering using multilinear SVD: analy- sis, approximations and applications. In: AAAI Conference on Artificial Intelligence (AAAI) (2015)

work page 2015
[21]

In: International Conference on Machine Learning (ICML) (2021)

Gösgens, M.M., Tikhonov, A., Prokhorenkova, L.: Systematic analysis of cluster similarity indices: how to validate validation measures. In: International Conference on Machine Learning (ICML) (2021)

work page 2021
[22]

IEEE Transactions on Neural Networks and Learning Systems27(12), 2643–2655 (2015)

He, R., Wang, L., Sun, Z., Zhang, Y., Li, B.: Information-theoretic subspace cluster- ing. IEEE Transactions on Neural Networks and Learning Systems27(12), 2643–2655 (2015)

work page 2015
[23]

IEEE Transactions on Information Theory 61(11), 6320–6342 (2015)

Heckel, R., Bölcskei, H.: Robust subspace clustering via thresholding. IEEE Transactions on Information Theory 61(11), 6320–6342 (2015). https://doi.org/10.1109/TIT.2015.2472520

work page doi:10.1109/tit.2015.2472520 2015
[24]

Journal of Classification2(1), 193–218 (1985)

Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification2(1), 193–218 (1985)

work page 1985
[25]

In: Pacific-Asia Conference on Knowledge Discovery and Data Mining

Hubig, N., Plant, C.: Information-theoretic non-redundant subspace clustering. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp. 198–209. Springer (2017)

work page 2017
[26]

PLOS ONE14(11), e0224307 (2019)

Kamiński, B., Poulin, V., Prałat, P., Szufel, P., Théberge, F.: Clustering via hyper- graph modularity. PLOS ONE14(11), e0224307 (2019)

work page 2019
[27]

Springer (2008)

Kosorok, M.R.: Introduction to empirical processes and semiparametric inference. Springer (2008)

work page 2008
[28]

Kumar, A., Sabharwal, Y., Sen, S.: A simple linear-time(1 +ε)-approximation algo- rithmfor k-meansclusteringinanydimensions.In: IEEESymposiumonFoundations of Computer Science (FOCS). pp. 454–462 (2004)

work page 2004
[29]

Springer (1998)

Lehmann, E.L., Casella, G.: Theory of point estimation. Springer (1998)

work page 1998
[30]

The Annals of Statistics pp

Lei, J., Rinaldo, A.: Consistency of spectral clustering in stochastic block models. The Annals of Statistics pp. 215–237 (2015) 58

work page 2015
[31]

In: International Conference on Artificial Intelligence and Statistics (AISTATS)

Leordeanu, M., Sminchisescu, C.: Efficient hypergraph clustering. In: International Conference on Artificial Intelligence and Statistics (AISTATS). pp. 676–684 (2012)

work page 2012
[32]

The Annals of Statistics 39(5), 2686–2715 (2011)

Lerman, G., Zhang, T.: Robust recovery of multiple subspaces by geo- metric ℓp minimization. The Annals of Statistics 39(5), 2686–2715 (2011). https://doi.org/10.1214/11-AOS914

work page doi:10.1214/11-aos914 2011
[33]

Information and Inference: A Journal of the IMA10(1), 73–107 (2021)

Lipor, J., Hong, D., Tan, Y.S., Balzano, L.: Subspace clustering using ensembles of K-subspaces. Information and Inference: A Journal of the IMA10(1), 73–107 (2021). https://doi.org/10.1093/imaiai/iaaa031

work page doi:10.1093/imaiai/iaaa031 2021
[34]

IEEE Transactions on Pattern Analysis and Ma- chine Intelligence35(1), 171–184 (2013)

Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y.: Robust recovery of subspace struc- tures by low-rank representation. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence35(1), 171–184 (2013). https://doi.org/10.1109/TPAMI.2012.88

work page doi:10.1109/tpami.2012.88 2013
[35]

Discrete & Computational Geometry 24(1), 61–84 (2000)

Matoušek, J.: On approximate geometrick-clustering. Discrete & Computational Geometry 24(1), 61–84 (2000)

work page 2000
[36]

Journal of Multi- variate Analysis98(5), 873–895 (2007)

Meilă, M.: Comparing clusterings: an information-based distance. Journal of Multi- variate Analysis98(5), 873–895 (2007)

work page 2007
[37]

Artificial Intelligence Review 58, 346 (2025)

Miao, J., Zhang, X., Yang, T., Fan, C., Tian, Y., Shi, Y., Xu, M.: A comprehen- sive survey on subspace clustering: methods and applications. Artificial Intelligence Review 58, 346 (2025). https://doi.org/10.1007/s10462-025-11349-w

work page doi:10.1007/s10462-025-11349-w 2025
[38]

Mityagin, B.: The zero set of a real analytic function (2015),https://arxiv.org/ abs/1512.07276

work page internal anchor Pith review Pith/arXiv arXiv 2015
[39]

Molloy, M., Reed, B.: Graphcolouringandtheprobabilisticmethod, vol.23.Springer (2002)

work page 2002
[40]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Nasihatkon, B., Hartley, R.: Graph connectivity in sparse subspace clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2137–2144 (2011). https://doi.org/10.1109/CVPR.2011.5995679

work page doi:10.1109/cvpr.2011.5995679 2011
[41]

Pensky, M.: Davis–Kahan theorem in the two-to-infinity norm and its application to perfect clustering (2024),https://arxiv.org/abs/2411.11728

work page internal anchor Pith review Pith/arXiv arXiv 2024
[42]

In: International Conference on Machine Learn- ing

Pimentel-Alarcon, D., Nowak, R.: The information-theoretic requirements of sub- space clustering with missing data. In: International Conference on Machine Learn- ing. pp. 802–810. PMLR (2016)

work page 2016
[43]

Mathematics11(2), 436 (2023)

Qu, W., Xiu, X., Chen, H., Kong, L.: A survey on high-dimensional subspace clus- tering. Mathematics11(2), 436 (2023). https://doi.org/10.3390/math11020436

work page doi:10.3390/math11020436 2023
[44]

Springer (1995)

Schervish, M.J.: Theory of statistics. Springer (1995)

work page 1995
[45]

The Annals of Statistics 40(4), 2195–2238 (2012)

Soltanolkotabi, M., Candès, E.J.: A geometric analysis of subspace clus- tering with outliers. The Annals of Statistics 40(4), 2195–2238 (2012). https://doi.org/10.1214/12-AOS1034

work page doi:10.1214/12-aos1034 2012
[46]

The Annals of Statistics42(2), 669–699 (2014)

Soltanolkotabi, M., Elhamifar, E., Candès, E.J.: Robust subspace clustering. The Annals of Statistics42(2), 669–699 (2014). https://doi.org/10.1214/13-AOS1199 59

work page doi:10.1214/13-aos1199 2014
[47]

Springer (2013)

Van Huffel, S., Lemmerling, P.: Total least squares and errors-in-variables modeling: analysis, algorithms and applications. Springer (2013)

work page 2013
[48]

In: Computational Statis- tics, Handbook of Statistics, vol

Van Huffel, S., Zha, H.: The total least squares problem. In: Computational Statis- tics, Handbook of Statistics, vol. 9, pp. 377–408. Elsevier (1993)

work page 1993
[49]

IEEE Signal Processing Magazine 28(2), 52–68 (2011)

Vidal, R.: Subspace clustering. IEEE Signal Processing Magazine 28(2), 52–68 (2011). https://doi.org/10.1109/MSP.2010.939739

work page doi:10.1109/msp.2010.939739 2011
[50]

arXiv e-prints pp

Wang, H.X.: Information-theoretic limits and strong consistency on binary non- uniform hypergraph stochastic block models. arXiv e-prints pp. arXiv–2306 (2023)

work page 2023
[51]

In: Proceedings of the 39th International Conference on Machine Learning

Wang, P., Liu, H., So, A.M.C., Balzano, L.: Convergence and recovery guarantees of the K-subspaces method for subspace clustering. In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 22884–22918 (2022)

work page 2022
[52]

In: Artificial Intelligence and Statistics

Wang, Y., Wang, Y.X., Singh, A.: Graph connectivity in noisy sparse subspace clustering. In: Artificial Intelligence and Statistics. pp. 538–546. PMLR (2016)

work page 2016
[53]

Wang, Y., Wang, Y.X., Singh, A.: A theoretical analysis of noisy sparse subspace clusteringondimensionality-reduceddata.IEEETransactionsonInformationTheory 65(2), 685–706 (2018)

work page 2018
[54]

Journal of Machine Learning Research 17(12), 1–41 (2016)

Wang, Y.X., Xu, H.: Noisy sparse subspace clustering. Journal of Machine Learning Research 17(12), 1–41 (2016)

work page 2016
[55]

IEEE Transactions on Information Theory69(1), 453–471 (2023)

Zhang, Q., Tan, V.Y.F.: Exact recovery in the general hypergraph stochastic block model. IEEE Transactions on Information Theory69(1), 453–471 (2023)

work page 2023
[56]

International Journal of Computer Vision 100(3), 217–240 (2012)

Zhang, T., Szlam, A., Wang, Y., Lerman, G.: Hybrid linear modeling via local best-fit flats. International Journal of Computer Vision 100(3), 217–240 (2012). https://doi.org/10.1007/s11263-012-0535-6 60

work page doi:10.1007/s11263-012-0535-6 2012

[1] [1]

The Annals of Statistics48(3), 1452–1474 (2020)

Abbe, E., Fan, J., Wang, K., Zhong, Y.: Entrywise eigenvector analysis of random matrices with low expected rank. The Annals of Statistics48(3), 1452–1474 (2020)

work page 2020

[2] [2]

In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)

Agarwal, S., Lim, J., Zelnik-Manor, L., Perona, P., Kriegman, D., Belongie, S.: Beyond pairwise clustering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)

work page 2005

[3] [3]

In: 2017 IEEE International Symposium on Information Theory (ISIT)

Ahn, K., Lee, K., Suh, C.: Information-theoretic limits of subspace clustering. In: 2017 IEEE International Symposium on Information Theory (ISIT). pp. 2473–2477. IEEE (2017)

work page 2017

[4] [4]

IEEE Journal of Selected Topics in Signal Processing12(5), 959–974 (2018)

Ahn, K., Lee, K., Suh, C.: Hypergraph spectral clustering in the weighted stochastic block model. IEEE Journal of Selected Topics in Signal Processing12(5), 959–974 (2018)

work page 2018

[5] [5]

IEEE Transactions on Information Theory65(10), 6561–6579 (2019)

Ahn, K., Lee, K., Suh, C.: Community recovery in hypergraphs. IEEE Transactions on Information Theory65(10), 6561–6579 (2019)

work page 2019

[6] [6]

In: International Workshop on Algorithms and Models for the Web-Graph (WAW)

Alaluusua, K., Avrachenkov, K., Vinay Kumar, B.R., Leskelä, L.: Multilayer hyper- graph clustering using the aggregate similarity matrix. In: International Workshop on Algorithms and Models for the Web-Graph (WAW). pp. 83–98 (2023)

work page 2023

[7] [7]

Electronic Journal of Statistics 5, 1537–1587 (2011)

Arias-Castro, E., Chen, G., Lerman, G.: Spectral clustering based on lo- cal linear approximations. Electronic Journal of Statistics 5, 1537–1587 (2011). https://doi.org/10.1214/11-EJS651

work page doi:10.1214/11-ejs651 2011

[8] [8]

Journal of Machine Learning Research18, 1–57 (2017)

Arias-Castro, E., Lerman, G., Zhang, T.: Spectral clustering based on local PCA. Journal of Machine Learning Research18, 1–57 (2017)

work page 2017

[9] [9]

Now Publishers (2022)

Avrachenkov, K., Dreveton, M.: Statistical analysis of networks. Now Publishers (2022)

work page 2022

[10] [10]

Journal of Global Optimization 16(1), 23–32 (2000)

Bradley, P.S., Mangasarian, O.L.:k-plane clustering. Journal of Global Optimization 16(1), 23–32 (2000). https://doi.org/10.1023/A:1008324625522

work page doi:10.1023/a:1008324625522 2000

[11] [11]

Scandinavian Journal of Statistics51(4), 1661–1684 (2024)

Brusa, L., Matias, C.: Model-based clustering in simple hypergraphs through a stochastic blockmodel. Scandinavian Journal of Statistics51(4), 1661–1684 (2024)

work page 2024

[12] [12]

Foundations of Computational Mathematics9(5), 517–558 (2009)

Chen, G., Lerman, G.: Foundations of a multi-way spectral clustering framework for hybrid linear modeling. Foundations of Computational Mathematics9(5), 517–558 (2009). https://doi.org/10.1007/s10208-009-9043-7

work page doi:10.1007/s10208-009-9043-7 2009

[13] [13]

International Journal of Computer Vision81(3), 317–330 (2009)

Chen, G., Lerman, G.: Spectral curvature clustering (SCC). International Journal of Computer Vision81(3), 317–330 (2009). https://doi.org/10.1007/s11263-008-0178-9

work page doi:10.1007/s11263-008-0178-9 2009

[14] [14]

IEEE Transactions on Information Theory65(12), 8095–8118 (2019)

Chien, I.E., Lin, C.Y., Wang, I.H.: On the minimax misclassification ratio of hy- pergraph community detection. IEEE Transactions on Information Theory65(12), 8095–8118 (2019)

work page 2019

[15] [15]

The Annals of Statistics54(1), 48–73 (2026) 57

Dumitriu, I., Wang, H.X.: Optimal and exact recovery on the general non-uniform hypergraph stochastic block model. The Annals of Statistics54(1), 48–73 (2026) 57

work page 2026

[16] [16]

Combinatorics, Probability and Com- puting 34(1), 1–51 (2025)

Dumitriu, I., Wang, H.X., Zhu, Y.: Partial recovery and weak consistency in the non- uniform hypergraph stochastic block model. Combinatorics, Probability and Com- puting 34(1), 1–51 (2025). https://doi.org/10.1017/S0963548324000166

work page doi:10.1017/s0963548324000166 2025

[17] [17]

In: Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Elhamifar, E., Vidal, R.: Clustering disjoint subspaces via sparse representation. In: Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1926–1929 (2010)

work page 2010

[18] [18]

In: Conference on Learning Theory (COLT)

Gaudio, J., Joshi, N.: Community detection in the hypergraph SBM: exact recovery given the similarity matrix. In: Conference on Learning Theory (COLT). pp. 469–510 (2023)

work page 2023

[19] [19]

Journal of Statistical Theory and Practice15(2), 35 (2021)

Ghosh, M.: Exponential tail bounds for chi-squared random variables. Journal of Statistical Theory and Practice15(2), 35 (2021)

work page 2021

[20] [20]

In: AAAI Conference on Artificial Intelligence (AAAI) (2015)

Ghoshdastidar, D., Dukkipati, A.: Spectral clustering using multilinear SVD: analy- sis, approximations and applications. In: AAAI Conference on Artificial Intelligence (AAAI) (2015)

work page 2015

[21] [21]

In: International Conference on Machine Learning (ICML) (2021)

Gösgens, M.M., Tikhonov, A., Prokhorenkova, L.: Systematic analysis of cluster similarity indices: how to validate validation measures. In: International Conference on Machine Learning (ICML) (2021)

work page 2021

[22] [22]

IEEE Transactions on Neural Networks and Learning Systems27(12), 2643–2655 (2015)

He, R., Wang, L., Sun, Z., Zhang, Y., Li, B.: Information-theoretic subspace cluster- ing. IEEE Transactions on Neural Networks and Learning Systems27(12), 2643–2655 (2015)

work page 2015

[23] [23]

IEEE Transactions on Information Theory 61(11), 6320–6342 (2015)

Heckel, R., Bölcskei, H.: Robust subspace clustering via thresholding. IEEE Transactions on Information Theory 61(11), 6320–6342 (2015). https://doi.org/10.1109/TIT.2015.2472520

work page doi:10.1109/tit.2015.2472520 2015

[24] [24]

Journal of Classification2(1), 193–218 (1985)

Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification2(1), 193–218 (1985)

work page 1985

[25] [25]

In: Pacific-Asia Conference on Knowledge Discovery and Data Mining

Hubig, N., Plant, C.: Information-theoretic non-redundant subspace clustering. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp. 198–209. Springer (2017)

work page 2017

[26] [26]

PLOS ONE14(11), e0224307 (2019)

Kamiński, B., Poulin, V., Prałat, P., Szufel, P., Théberge, F.: Clustering via hyper- graph modularity. PLOS ONE14(11), e0224307 (2019)

work page 2019

[27] [27]

Springer (2008)

Kosorok, M.R.: Introduction to empirical processes and semiparametric inference. Springer (2008)

work page 2008

[28] [28]

Kumar, A., Sabharwal, Y., Sen, S.: A simple linear-time(1 +ε)-approximation algo- rithmfor k-meansclusteringinanydimensions.In: IEEESymposiumonFoundations of Computer Science (FOCS). pp. 454–462 (2004)

work page 2004

[29] [29]

Springer (1998)

Lehmann, E.L., Casella, G.: Theory of point estimation. Springer (1998)

work page 1998

[30] [30]

The Annals of Statistics pp

Lei, J., Rinaldo, A.: Consistency of spectral clustering in stochastic block models. The Annals of Statistics pp. 215–237 (2015) 58

work page 2015

[31] [31]

In: International Conference on Artificial Intelligence and Statistics (AISTATS)

Leordeanu, M., Sminchisescu, C.: Efficient hypergraph clustering. In: International Conference on Artificial Intelligence and Statistics (AISTATS). pp. 676–684 (2012)

work page 2012

[32] [32]

The Annals of Statistics 39(5), 2686–2715 (2011)

Lerman, G., Zhang, T.: Robust recovery of multiple subspaces by geo- metric ℓp minimization. The Annals of Statistics 39(5), 2686–2715 (2011). https://doi.org/10.1214/11-AOS914

work page doi:10.1214/11-aos914 2011

[33] [33]

Information and Inference: A Journal of the IMA10(1), 73–107 (2021)

Lipor, J., Hong, D., Tan, Y.S., Balzano, L.: Subspace clustering using ensembles of K-subspaces. Information and Inference: A Journal of the IMA10(1), 73–107 (2021). https://doi.org/10.1093/imaiai/iaaa031

work page doi:10.1093/imaiai/iaaa031 2021

[34] [34]

IEEE Transactions on Pattern Analysis and Ma- chine Intelligence35(1), 171–184 (2013)

Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y.: Robust recovery of subspace struc- tures by low-rank representation. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence35(1), 171–184 (2013). https://doi.org/10.1109/TPAMI.2012.88

work page doi:10.1109/tpami.2012.88 2013

[35] [35]

Discrete & Computational Geometry 24(1), 61–84 (2000)

Matoušek, J.: On approximate geometrick-clustering. Discrete & Computational Geometry 24(1), 61–84 (2000)

work page 2000

[36] [36]

Journal of Multi- variate Analysis98(5), 873–895 (2007)

Meilă, M.: Comparing clusterings: an information-based distance. Journal of Multi- variate Analysis98(5), 873–895 (2007)

work page 2007

[37] [37]

Artificial Intelligence Review 58, 346 (2025)

Miao, J., Zhang, X., Yang, T., Fan, C., Tian, Y., Shi, Y., Xu, M.: A comprehen- sive survey on subspace clustering: methods and applications. Artificial Intelligence Review 58, 346 (2025). https://doi.org/10.1007/s10462-025-11349-w

work page doi:10.1007/s10462-025-11349-w 2025

[38] [38]

Mityagin, B.: The zero set of a real analytic function (2015),https://arxiv.org/ abs/1512.07276

work page internal anchor Pith review Pith/arXiv arXiv 2015

[39] [39]

Molloy, M., Reed, B.: Graphcolouringandtheprobabilisticmethod, vol.23.Springer (2002)

work page 2002

[40] [40]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Nasihatkon, B., Hartley, R.: Graph connectivity in sparse subspace clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2137–2144 (2011). https://doi.org/10.1109/CVPR.2011.5995679

work page doi:10.1109/cvpr.2011.5995679 2011

[41] [41]

Pensky, M.: Davis–Kahan theorem in the two-to-infinity norm and its application to perfect clustering (2024),https://arxiv.org/abs/2411.11728

work page internal anchor Pith review Pith/arXiv arXiv 2024

[42] [42]

In: International Conference on Machine Learn- ing

Pimentel-Alarcon, D., Nowak, R.: The information-theoretic requirements of sub- space clustering with missing data. In: International Conference on Machine Learn- ing. pp. 802–810. PMLR (2016)

work page 2016

[43] [43]

Mathematics11(2), 436 (2023)

Qu, W., Xiu, X., Chen, H., Kong, L.: A survey on high-dimensional subspace clus- tering. Mathematics11(2), 436 (2023). https://doi.org/10.3390/math11020436

work page doi:10.3390/math11020436 2023

[44] [44]

Springer (1995)

Schervish, M.J.: Theory of statistics. Springer (1995)

work page 1995

[45] [45]

The Annals of Statistics 40(4), 2195–2238 (2012)

Soltanolkotabi, M., Candès, E.J.: A geometric analysis of subspace clus- tering with outliers. The Annals of Statistics 40(4), 2195–2238 (2012). https://doi.org/10.1214/12-AOS1034

work page doi:10.1214/12-aos1034 2012

[46] [46]

The Annals of Statistics42(2), 669–699 (2014)

Soltanolkotabi, M., Elhamifar, E., Candès, E.J.: Robust subspace clustering. The Annals of Statistics42(2), 669–699 (2014). https://doi.org/10.1214/13-AOS1199 59

work page doi:10.1214/13-aos1199 2014

[47] [47]

Springer (2013)

Van Huffel, S., Lemmerling, P.: Total least squares and errors-in-variables modeling: analysis, algorithms and applications. Springer (2013)

work page 2013

[48] [48]

In: Computational Statis- tics, Handbook of Statistics, vol

Van Huffel, S., Zha, H.: The total least squares problem. In: Computational Statis- tics, Handbook of Statistics, vol. 9, pp. 377–408. Elsevier (1993)

work page 1993

[49] [49]

IEEE Signal Processing Magazine 28(2), 52–68 (2011)

Vidal, R.: Subspace clustering. IEEE Signal Processing Magazine 28(2), 52–68 (2011). https://doi.org/10.1109/MSP.2010.939739

work page doi:10.1109/msp.2010.939739 2011

[50] [50]

arXiv e-prints pp

Wang, H.X.: Information-theoretic limits and strong consistency on binary non- uniform hypergraph stochastic block models. arXiv e-prints pp. arXiv–2306 (2023)

work page 2023

[51] [51]

In: Proceedings of the 39th International Conference on Machine Learning

Wang, P., Liu, H., So, A.M.C., Balzano, L.: Convergence and recovery guarantees of the K-subspaces method for subspace clustering. In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 22884–22918 (2022)

work page 2022

[52] [52]

In: Artificial Intelligence and Statistics

Wang, Y., Wang, Y.X., Singh, A.: Graph connectivity in noisy sparse subspace clustering. In: Artificial Intelligence and Statistics. pp. 538–546. PMLR (2016)

work page 2016

[53] [53]

Wang, Y., Wang, Y.X., Singh, A.: A theoretical analysis of noisy sparse subspace clusteringondimensionality-reduceddata.IEEETransactionsonInformationTheory 65(2), 685–706 (2018)

work page 2018

[54] [54]

Journal of Machine Learning Research 17(12), 1–41 (2016)

Wang, Y.X., Xu, H.: Noisy sparse subspace clustering. Journal of Machine Learning Research 17(12), 1–41 (2016)

work page 2016

[55] [55]

IEEE Transactions on Information Theory69(1), 453–471 (2023)

Zhang, Q., Tan, V.Y.F.: Exact recovery in the general hypergraph stochastic block model. IEEE Transactions on Information Theory69(1), 453–471 (2023)

work page 2023

[56] [56]

International Journal of Computer Vision 100(3), 217–240 (2012)

Zhang, T., Szlam, A., Wang, Y., Lerman, G.: Hybrid linear modeling via local best-fit flats. International Journal of Computer Vision 100(3), 217–240 (2012). https://doi.org/10.1007/s11263-012-0535-6 60

work page doi:10.1007/s11263-012-0535-6 2012