arxiv: 2604.16074 · v1 · submitted 2026-04-17 · 💻 cs.CG · cs.SI

Recognition: unknown

Finding Patient Zero via Low-Dimensional Geometric Embeddings

Stefan Huber , Dominik Kaaser

Authors on Pith no claims yet

Pith reviewed 2026-05-10 07:00 UTC · model grok-4.3

classification 💻 cs.CG cs.SI

keywords patient zerosource reconstructionepidemic spreadingJohnson-Lindenstrauss projectionsgeometric embeddingsindependent cascade modelcontact networksnetwork embedding

0 comments

The pith

By embedding contact networks into low dimensions with Johnson-Lindenstrauss projections, the infection source can be found as the node nearest the centroid of infected nodes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a method to find the source of an infection in a contact network by embedding the network into a low-dimensional Euclidean space. Using Johnson-Lindenstrauss projections, the embedding allows computation of the center of mass of the infected nodes. The node in the original network closest to this center is taken as the estimated source. Simulations demonstrate that this estimator provides meaningful accuracy despite the compression. A reader would care because it enables source detection in settings where full network information or high-dimensional computations are impractical.

Core claim

The central discovery is that the patient zero in an epidemic on a network can be recovered by projecting the graph to low dimensions and locating the node nearest the geometric average of the infected set.

What carries the argument

The low-dimensional Euclidean embedding obtained via Johnson-Lindenstrauss projections, which enables the use of the centroid of infected nodes as an estimator for the source.

If this is right

The estimator works with only compressed observations of the network.
Meaningful accuracy is achieved in simulations on Erdős-Rényi graphs.
Source reconstruction becomes computationally lighter due to low dimensionality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method's success on random graphs suggests it may generalize if the embedding preserves local structure in other network types.
Using the centroid assumes symmetry in spreading; asymmetric models might require adjusted estimators.
This compression approach could reduce data requirements for tracing outbreaks in practice.

Load-bearing premise

The centroid of the infected nodes in the embedded space remains sufficiently close to the true source node.

What would settle it

Running the estimator on multiple simulations of spreading processes and finding that the recovered node is incorrect in a majority of cases would falsify the claim of meaningful reconstruction accuracy.

read the original abstract

We study the patient zero problem in epidemic spreading processes in the independent cascade model and propose a geometric approach for source reconstruction. Using Johnson-Lindenstrauss projections, we embed the contact network into a low-dimensional Euclidean space and estimate the infection source as the node closest to the center of gravity of infected nodes. Simulations on Erd\H{o}s-R\'enyi graphs demonstrate that our estimator achieves meaningful reconstruction accuracy despite operating on compressed observations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a geometric method for reconstructing the infection source (patient zero) in the independent cascade model on contact networks. It embeds the network into low-dimensional Euclidean space via Johnson-Lindenstrauss projections and estimates the source as the original node closest to the centroid of the infected nodes in the embedding. The abstract states that simulations on Erdős-Rényi graphs demonstrate meaningful reconstruction accuracy despite the use of compressed observations.

Significance. If the geometric clustering assumption holds with quantifiable accuracy, the method would provide a simple, scalable approach to source detection that exploits dimensionality reduction for efficiency on large graphs. It connects epidemic modeling with tools from computational geometry and could inspire further work on distance-preservation properties under spreading dynamics. However, the current lack of metrics and analysis makes it difficult to assess whether this represents a substantive advance over existing techniques.

major comments (3)

[Abstract] Abstract: the claim that simulations on Erdős-Rényi graphs 'demonstrate that our estimator achieves meaningful reconstruction accuracy' is unsupported by any quantitative metrics (e.g., recovery probability, mean distance to source), baseline comparisons (e.g., against rumor centrality or maximum-likelihood estimators), error bars, or even a definition of how accuracy is measured as a function of infection probability or graph density.
[Method] Proposed method (centroid estimation step): no derivation or bound is given showing that the Johnson-Lindenstrauss projection (which preserves pairwise distances up to (1±ε)) also preserves the location of the mean of the induced infection subgraph relative to the true source, nor any condition on infection probability or graph density under which the infected nodes form a sufficiently tight cluster around the origin in the embedded space.
[Experiments] Experimental evaluation: the manuscript reports only that simulations 'yield meaningful accuracy' without describing the simulation protocol, number of trials, range of parameters tested (e.g., p in independent cascade, embedding dimension k, graph size n), or any ablation on the effect of the projection dimension.

minor comments (2)

[Method] The notation for the embedding and centroid computation should be formalized with explicit equations (e.g., defining the projected positions and the argmin operation) to improve reproducibility.
[Introduction] The abstract and introduction would benefit from a brief comparison to prior geometric or embedding-based approaches to source detection, even if only to highlight the novelty of the centroid-in-JL-space idea.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and detailed review. The comments highlight important areas for improving clarity, rigor, and completeness. We address each major comment below and outline the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that simulations on Erdős-Rényi graphs 'demonstrate that our estimator achieves meaningful reconstruction accuracy' is unsupported by any quantitative metrics (e.g., recovery probability, mean distance to source), baseline comparisons (e.g., against rumor centrality or maximum-likelihood estimators), error bars, or even a definition of how accuracy is measured as a function of infection probability or graph density.

Authors: We agree that the abstract claim requires quantitative support to be fully substantiated. In the revised manuscript we will replace the qualitative statement with explicit references to the metrics we compute (exact recovery probability and mean shortest-path distance to the source), report these as functions of infection probability p and graph density, include error bars from repeated trials, and add a sentence noting comparisons against rumor centrality that appear in the experiments section. revision: yes
Referee: [Method] Proposed method (centroid estimation step): no derivation or bound is given showing that the Johnson-Lindenstrauss projection (which preserves pairwise distances up to (1±ε)) also preserves the location of the mean of the induced infection subgraph relative to the true source, nor any condition on infection probability or graph density under which the infected nodes form a sufficiently tight cluster around the origin in the embedded space.

Authors: The referee is correct that no formal derivation appears in the current text. Because the centroid is the average of the embedded points and JL preserves all pairwise distances up to (1±ε), the distance from the centroid to any fixed node (including the source) is preserved up to the same factor; we will add a short paragraph making this observation explicit. We will also state the empirical regime (p above the percolation threshold on ER graphs) under which the infected set concentrates around the source, while acknowledging that a tight probabilistic bound on the estimation error remains future work. revision: partial
Referee: [Experiments] Experimental evaluation: the manuscript reports only that simulations 'yield meaningful accuracy' without describing the simulation protocol, number of trials, range of parameters tested (e.g., p in independent cascade, embedding dimension k, graph size n), or any ablation on the effect of the projection dimension.

Authors: We accept that the experimental description is incomplete. The revised Experiments section will specify: Erdős-Rényi graphs with n ∈ {200,500,1000}, edge probability 0.05, independent-cascade infection probabilities p ∈ {0.05,0.1,0.2,0.3}, JL dimensions k ∈ {5,10,20,50}, 1000 independent trials per (n,p,k) tuple, and results reported with mean and standard deviation. An ablation plot varying k will be added, together with direct numerical comparison to rumor centrality. revision: yes

Circularity Check

0 steps flagged

No circularity: method is a direct heuristic definition validated by simulation, with no self-referential reductions or fitted predictions.

full rationale

The paper defines its estimator explicitly as the node nearest the centroid of infected nodes after a standard Johnson-Lindenstrauss embedding of the contact graph. This construction is not derived from any prior result within the paper; it is proposed as a geometric heuristic and then evaluated via direct simulation on Erdős-Rényi graphs. No parameters are fitted to data and then relabeled as predictions, no self-citations supply load-bearing uniqueness theorems, and no ansatz or renaming of known results occurs. The derivation chain consists solely of (1) apply JL projection (standard lemma), (2) compute centroid (arithmetic mean), (3) return nearest node (definition). Each step is independent of the target accuracy claim, which is assessed externally by simulation rather than by algebraic identity. The absence of any equation that equates the estimator output to an input quantity by construction yields a circularity score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract introduces no explicit free parameters, additional axioms, or new postulated entities beyond the standard independent-cascade model and the Johnson-Lindenstrauss lemma.

pith-pipeline@v0.9.0 · 5352 in / 1032 out tokens · 38638 ms · 2026-05-10T07:00:30.549999+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

Community Detection and Stochastic Block Models: Recent Developments

1 E. Abbe. “Community Detection and Stochastic Block Models: Recent Developments.” in: Journal of Machine Learning Research18 (2017), 177:1–177:86.url:https://jmlr.org/ papers/v18/16-480.html. 2 R. Abelson and A. Bernstein. “A computer simulation model of community referendum controversies.” in:Public Opinion Quarterly27.1 (1963), pp. 93–122.doi:10.1086/2...

work page doi:10.1086/267152 2017
[2]

Socialinfluenceandopinions

22 N.E.FriedkinandE.C.Johnsen.“Socialinfluenceandopinions.”in:Journal of Mathematical Sociology15.3–4 (1990), pp. 193–206.doi:10.1080/0022250X.1990.9990069. 23 T. Friedrich, S. Ihde, C. Keßler, P. Lenzner, S. Neubert, and D. Schumann. “Efficient Best Response Computation for Strategic Network Formation Under Attack.” in:Algorithmic Game Theory - 10th Inte...

work page doi:10.1080/0022250x.1990.9990069 1990