pith. sign in

arxiv: 2510.12365 · v2 · submitted 2025-10-14 · 🧮 math.PR · cs.DS

Planted clique recovery in random geometric graphs

Pith reviewed 2026-05-18 08:04 UTC · model grok-4.3

classification 🧮 math.PR cs.DS
keywords planted cliquerandom geometric graphexact recoverycommon neighborsvertex degreeanomaly detectionconnectivity regime
0
0 comments X

The pith

The common-neighbors algorithm recovers planted cliques exactly with high probability in random geometric graphs, even for cliques as small as single edges in the connectivity regime.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies the recovery of hidden cliques inserted into random geometric graphs, where points in space connect if they lie within a fixed distance. It compares a simple vertex-degree test against a common-neighbor count test under different average-degree and clique-size regimes. The analysis establishes that exact recovery of every planted vertex succeeds with high probability once the graph is large enough, for suitable parameter choices. The common-neighbor method works markedly better than degree-based detection and succeeds even when the planted structure is only an edge. The results matter for anomaly detection because they show how local neighbor statistics can isolate geometric outliers in large spatial networks.

Core claim

We demonstrate that exact recovery of a planted clique is achieved with high probability as the number of vertices tends to infinity, for appropriate ranges of average degree and clique size. The common-neighbor algorithm significantly outperforms the vertex-degree algorithm; in particular, in the connectivity regime it correctly identifies even tiny planted cliques, including single edges, with direct implications for anomaly detection. Numerical experiments on finite instances confirm that both algorithms are effective in practice.

What carries the argument

The common-neighbor counting algorithm that flags vertices whose number of shared neighbors deviates from the background geometric-graph expectation.

If this is right

  • Exact recovery of the entire planted clique holds with high probability once the graph size is large enough, for suitable average-degree and clique-size pairs.
  • The common-neighbor algorithm recovers planted cliques of size two (single edges) in the connectivity regime.
  • Superior performance of common-neighbor counting over degree counting directly improves anomaly detection in geometric networks.
  • Finite-graph simulations match the high-probability statements and show practical reliability of both algorithms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same neighbor-counting idea may extend to planted dense subgraphs other than cliques in geometric settings.
  • Recovery thresholds are likely to depend on the embedding dimension, suggesting a natural next calculation for higher-dimensional point clouds.
  • Real-world location networks could adopt the common-neighbor test as a lightweight filter for spatial outliers before more expensive global methods are applied.

Load-bearing premise

The graph must be generated exactly as a random geometric graph with fixed connection radius in the stated average-degree regime, and the planted clique must be embedded consistently with the underlying point process.

What would settle it

Observe a sequence of random geometric graphs of growing size, each containing a planted clique of the claimed size, in which the common-neighbor algorithm misclassifies at least one planted vertex with probability bounded away from zero.

Figures

Figures reproduced from arXiv: 2510.12365 by Andrei Bobu, Konstantin Avrachenkov, Nelly Litvak, Riccardo Michielan.

Figure 1
Figure 1. Figure 1: Representation in dimension d = 2 of the regions R1, R2. Vertices with coordinates in R1 cannot connect to vertices with coordinates in R2, unless they are part of the planted clique. larger than rn; thus, they are not naturally adjacent. In other words, if i and j have common neighbors in both R1 and R2, then their common neighbors cannot form a clique. Therefore, the algorithm is likely to skip the pair … view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative representation of recovery results, in the [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Percentage of runs (out of 1000) in which the VD- and CN-algorithms [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Representation for d = 2 of the area that can contain common neighbors of i and j not all pairs of vertices in the intersection are necessarily connected to each other, as the diameter of the intersection is larger than r. For example, assume two vertices v, w ̸∈ K have positions Xv and Xw inside the regions R1 and R2, respectively (see [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
read the original abstract

We investigate the problem of identifying planted cliques in random geometric graphs, focusing on two distinct algorithmic approaches: the first based on vertex degrees (VD) and the other on common neighbors (CN). We analyze the performance of these methods under varying regimes of key parameters, namely the average degree of the graph and the size of the planted clique. We demonstrate that exact recovery is achieved with high probability as the graph size increases, in a specific set of parameters. Notably, our results reveal that the CN-algorithm significantly outperforms the VD-algorithm. In particular, in the connectivity regime, tiny planted cliques (even edges) are correctly identified by the CN-algorithm, yielding a significant impact on anomaly detection. Finally, our results are confirmed by a series of numerical experiments, showing that the devised algorithms are effective in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript investigates recovery of planted cliques in random geometric graphs via two algorithms: vertex-degree (VD) thresholding and common-neighbor (CN) counting. It asserts that both achieve exact recovery with high probability in suitable regimes of average degree and clique size k, that CN strictly outperforms VD, and that in the connectivity regime (average degree ∼ log n) the CN statistic recovers even k=2 planted edges with high probability; the claims are supported by asymptotic analysis and numerical simulations.

Significance. If the separation arguments and tail bounds hold, the work would extend planted-clique results to the geometric setting and demonstrate that local geometric overlap can be exploited for anomaly detection even in sparse regimes where degree-based methods fail. The explicit comparison of CN versus VD and the numerical confirmation constitute concrete strengths.

major comments (2)
  1. [Main recovery theorem (connectivity-regime case)] The high-probability exact recovery claim for the CN algorithm on k=2 planted edges in the connectivity regime rests on a strict separation between the common-neighbor count of a planted pair and that of all background pairs. The tail bounds invoked for this separation (presumably in the main theorem or its proof) rely on Poisson approximation and independence from the background point process outside the planted r-ball; these steps require explicit uniformity of the intensity measure and must be checked when the planted points are inserted.
  2. [Analysis of CN statistic for k=2] The statement that CN recovers 'tiny planted cliques (even edges)' with high probability in the connectivity regime is load-bearing on the geometric embedding: two points at distance less than r share a larger intersection of their r-balls than a typical pair. If the planted clique is not inserted consistently with the same intensity, the Chernoff-type bounds used to show that the maximum CN statistic identifies the planted edge no longer apply uniformly.
minor comments (2)
  1. [Abstract] The abstract refers to 'a specific set of parameters' without naming the precise scaling of r or the range of k; the introduction or statement of results should list the exact asymptotic regimes considered.
  2. [Numerical section] Numerical experiments are mentioned but the text does not report the number of Monte-Carlo trials, the precise values of n and r used, or error bars on the success probabilities; these details are needed for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and insightful comments on our manuscript. We address the major comments point by point below, providing clarifications on the proof techniques used for the connectivity regime and indicating the revisions we will implement to enhance the rigor of our arguments.

read point-by-point responses
  1. Referee: The high-probability exact recovery claim for the CN algorithm on k=2 planted edges in the connectivity regime rests on a strict separation between the common-neighbor count of a planted pair and that of all background pairs. The tail bounds invoked for this separation rely on Poisson approximation and independence from the background point process outside the planted r-ball; these steps require explicit uniformity of the intensity measure and must be checked when the planted points are inserted.

    Authors: In our analysis, we model the random geometric graph using a homogeneous Poisson point process with intensity proportional to the average degree. The planted clique of size k=2 is formed by inserting two points at a distance strictly less than the connection radius r. This insertion is local and does not affect the homogeneity of the intensity measure for the background points. The number of common neighbors for any pair is a Poisson random variable whose mean is the intensity times the area of the intersection of the two r-balls. For the planted pair, this area is larger, providing the necessary separation. The Poisson approximation and tail bounds (via Chernoff inequalities) hold uniformly because the intensity is constant across the domain, and the process in disjoint regions is independent by the properties of Poisson processes. We will revise the manuscript to include an explicit verification of these properties in the proof of the main theorem for the connectivity regime. revision: yes

  2. Referee: The statement that CN recovers 'tiny planted cliques (even edges)' with high probability in the connectivity regime is load-bearing on the geometric embedding: two points at distance less than r share a larger intersection of their r-balls than a typical pair. If the planted clique is not inserted consistently with the same intensity, the Chernoff-type bounds used to show that the maximum CN statistic identifies the planted edge no longer apply uniformly.

    Authors: The geometric embedding is indeed central to our results, as the increased overlap in the r-balls for nearby points leads to a higher expected common-neighbor count. We insert the planted points in a manner consistent with the underlying intensity measure, specifically by choosing their positions uniformly but ensuring their distance is less than r. The Chernoff bounds are applied to the Poisson counts, and since the mean depends only on the intersection area (which is deterministically larger for close pairs) and the intensity is uniform, the bounds apply uniformly to all pairs. This ensures that the maximum CN statistic correctly identifies the planted edge with high probability. To make this clearer, we will expand the discussion in the analysis of the CN statistic to detail the intersection area calculations and the uniform application of the bounds. revision: yes

Circularity Check

0 steps flagged

No circularity: asymptotic recovery claims rest on independent tail bounds and geometric separation arguments.

full rationale

The paper derives high-probability exact recovery for the CN algorithm (including k=2 planted edges in the connectivity regime) via Chernoff-type bounds on common-neighbor counts that exploit the fixed-radius ball intersection geometry and Poisson point process properties. These bounds are obtained from the model definition and standard concentration inequalities; they are not obtained by fitting parameters to the target recovery event or by renaming the input distribution. No self-citation chain is invoked to justify uniqueness or to smuggle an ansatz. The derivation therefore remains self-contained against the stated random geometric graph model and does not reduce to its own outputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the standard random geometric graph model (points in a metric space connected by distance threshold) and the planted-clique insertion procedure; these are domain assumptions drawn from prior literature rather than new postulates.

axioms (1)
  • domain assumption The input is a random geometric graph on n points in a bounded domain with connection radius chosen to achieve a target average degree.
    Invoked by the title and abstract when stating performance under varying average degree.

pith-pipeline@v0.9.0 · 5671 in / 1315 out tokens · 36468 ms · 2026-05-18T08:04:43.211895+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Planted clique detection and recovery from the hypergraph adjacency matrix

    math.ST 2026-04 unverdicted novelty 6.0

    Spectral norm test and leading-eigenvector method achieve detection and exact recovery of planted cliques from hypergraph adjacency matrices at the sqrt(n) scale.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · cited by 1 Pith paper

  1. [1]

    N. Alon, M. Krivelevich, and B. Sudakov,Finding a large hidden clique in a random graph, Random Structures & Algorithms13(1998), no. 3-4, 457–466. 22

  2. [2]

    Arias-Castro, S

    E. Arias-Castro, S. Bubeck, and G. Lugosi,Detecting positive correlations in a multivariate sample, Bernoulli21(2015), no. 1, 209 – 241. URLhttps: //doi.org/10.3150/13-BEJ565

  3. [3]

    Bubeck, J

    S. Bubeck, J. Ding, R. Eldan, and M. Z. R´ acz,Testing for high-dimensional geometry in random graphs, Random Structures & Algorithms49(2016), no. 3, 503–532

  4. [4]

    R. M. Corless, G. H. Gonnet, D. E. Hare, D. J. Jeffrey, and D. E. Knuth, On the lambert w function, Advances in Computational mathematics5(1996), 329–359

  5. [5]

    T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,Introduction to algorithms, MIT press, 2022

  6. [6]

    Den Hollander,Probability theory: The coupling method, Lec- ture notes available online (http://websites

    F. Den Hollander,Probability theory: The coupling method, Lec- ture notes available online (http://websites. math. leidenuniv. nl/probability/lecturenotes/CouplingLectures. pdf)3(2012)

  7. [7]

    Devroye, A

    L. Devroye, A. Gy¨ orgy, G. Lugosi, and F. Udina,High-dimensional random geometric graphs and their clique number, Electronic Journal of Probability16 (2011), 2481–2508

  8. [8]

    Gamarnik and I

    D. Gamarnik and I. Zadik,The landscape of the planted clique problem: Dense subgraphs and the overlap gap property, The Annals of Applied Probability34 (2024), no. 4, 3375–3434

  9. [9]

    Guyon,Random fields on a network: modeling, statistics, and applications, Springer Science & Business Media, 1995

    X. Guyon,Random fields on a network: modeling, statistics, and applications, Springer Science & Business Media, 1995

  10. [10]

    Hazan and R

    E. Hazan and R. Krauthgamer,How hard is it to approximate the best nash equilibrium?, SIAM Journal on Computing40(2011), no. 1, 79–91

  11. [11]

    D. J. Higham, M. Raˇ sajski, and N. Prˇ zulj,Fitting a geometric graph to a protein–protein interaction network, Bioinformatics24(2008), no. 8, 1093– 1099

  12. [12]

    A. D. Ker,The square root law of steganography: Bringing theory closer to practice,Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, 2017, 33–44

  13. [13]

    A. D. Ker, T. Pevn` y, J. Kodovsk` y, and J. Fridrich,The square root law of steganographic capacity,Proceedings of the 10th ACM workshop on Multimedia and security, 2008, 107–116

  14. [14]

    Kuˇ cera,Expected complexity of graph partitioning problems, Discrete Applied Mathematics57(1995), no

    L. Kuˇ cera,Expected complexity of graph partitioning problems, Discrete Applied Mathematics57(1995), no. 2-3, 193–212. 23

  15. [15]

    Liben-Nowell and J

    D. Liben-Nowell and J. Kleinberg,The link prediction problem for social net- works,Proceedings of the twelfth international conference on Information and knowledge management, 2003, 556–559

  16. [16]

    McDiarmid,Random channel assignment in the plane, Random Structures & Algorithms22(2003), no

    C. McDiarmid,Random channel assignment in the plane, Random Structures & Algorithms22(2003), no. 2, 187–212

  17. [17]

    McDiarmid and T

    C. McDiarmid and T. M¨ uller,On the chromatic number of random geometric graphs, Combinatorica31(2011), no. 4, 423–488

  18. [18]

    M¨ uller,Two-point concentration in random geometric graphs, Combinatorica 28(2008), 529–545

    T. M¨ uller,Two-point concentration in random geometric graphs, Combinatorica 28(2008), 529–545

  19. [19]

    Penrose,Random geometric graphs, vol

    M. Penrose,Random geometric graphs, vol. 5, OUP Oxford, 2003

  20. [20]

    V. M. Preciado and A. Jadbabaie,Spectral analysis of virus spreading in random geometric networks,Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, IEEE, 2009, 4802–4807

  21. [21]

    T. Zhou, L. L¨ u, and Y.-C. Zhang,Predicting missing links via local information, The European Physical Journal B71(2009), 623–630. 24