arxiv: 2605.00534 · v1 · submitted 2026-05-01 · 📊 stat.ME

Recognition: unknown

Estimating Treatment and Spillover Effects with the Ego-Cluster Experimental Design

Xiao Liu , Feifang Hu , Jingfei Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:23 UTC · model grok-4.3

classification 📊 stat.ME

keywords network interferenceexperimental designspillover effectsego-cluster randomizationcausal inferencetreatment effectsasymptotic normalitycluster randomization

0 comments

The pith

The ego-cluster experimental design partitions networks into focal clusters to estimate both global treatment effects and spillover effects without bias from interference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

When a unit's outcome depends on treatments received by connected units, standard randomized experiments produce biased estimates of causal effects. The paper introduces an ego-cluster design that divides the network into clusters, each built around one focal unit and its direct neighbors, then assigns treatment at the cluster level. Model-based estimators are derived for the overall treatment effect and the spillover effect; these estimators are shown to be consistent and asymptotically normal, with variances that depend explicitly on how the clusters are formed. A sequential clustering algorithm is proposed to choose egos and assign alters so as to minimize those asymptotic variances. Sympathetic readers care because the method supplies both theoretical guarantees and a practical algorithm that can be applied to real networks such as social graphs or market platforms.

Core claim

Under the ego-cluster design the network is partitioned into clusters each consisting of an ego and its alters, treatment is randomized at the cluster level, and model-based estimators recover the global treatment effect and the spillover effect with consistency and asymptotic normality whose variance is governed by the ego-cluster structure; an ego-clustering algorithm then selects egos and assigns alters sequentially to minimize the relevant asymptotic variances.

What carries the argument

Ego-cluster randomization, which partitions the network into focal units (egos) plus their immediate neighbors (alters) and performs treatment assignment at the cluster level, thereby separating direct effects from spillover effects.

If this is right

The estimators are consistent and asymptotically normal under the stated model-based framework.
Asymptotic variances are explicitly determined by the ego-cluster structure, enabling optimization through the proposed clustering algorithm.
The design produces more accurate inference for both global treatment and spillover effects than existing network experimental designs.
Simulation studies and empirical applications confirm efficiency gains over alternatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same clustering logic could be adapted to networks observed at multiple time points by updating clusters dynamically.
Extensions to heterogeneous treatment effects or multi-level treatments would require only modest changes to the variance-minimization step.
Field experiments on online social platforms could directly compare the ego-cluster design against complete randomization to measure realized precision gains.
If the interference model is misspecified, the reported asymptotic variances may understate true uncertainty, suggesting a diagnostic based on comparing design-based and model-based variance estimates.

Load-bearing premise

The network interference follows a model-based structure that lets ego-cluster partitioning separate global treatment effects from spillover effects, with the clustering algorithm correctly minimizing the resulting asymptotic variances.

What would settle it

Run the proposed ego-clustering algorithm on a network with known interference structure and simulated outcomes; the estimators fail to achieve consistency or exhibit larger finite-sample variance than standard cluster randomization.

Figures

Figures reproduced from arXiv: 2605.00534 by Feifang Hu, Jingfei Zhang, Xiao Liu.

**Figure 1.** Figure 1: Illustration of ego-clusters in a toy network with 15 units. The network is parti view at source ↗

**Figure 2.** Figure 2: Illustration of the two-step ego-clustering algorithm. (Left) A small-world network. view at source ↗

**Figure 3.** Figure 3: RMSEs of ˆτ (top) and ˆγ (bottom) under different sample sizes and clustering methods over 2000 replications for each network setting. As shown in Table S1 and Figure S1 (Supplement S3.1), EgoCR has very small biases for both ˆτ and ˆγ across all settings considered. For estimating the global treatment effect, view at source ↗

**Figure 4.** Figure 4: RMSEs of ˆτ (top) and ˆγ (bottom) under different sample sizes and clustering methods over 2000 replications for each network setting under correlated errors. alternative methods except RGCR1hm. RGCR1hm, proposed by Ugander and Yin (2023), demonstrates comparable performance of ˆγ to EgoCR in view at source ↗

**Figure 5.** Figure 5: The network of selected 5 schools with grades distinguished by four colors. view at source ↗

read the original abstract

Network interference occurs when a unit's outcome depends not only on its own treatment but also on the treatments received by connected units in the network. Experimental designs and analysis methods that ignore such interference can yield biased estimators of causal effects. In this paper, we develop a new experimental design for the estimation and inference of global treatment effect and spillover effect under a model-based framework and ego-cluster randomization. Under this design, the network is partitioned into a collection of ego-clusters, each consisting of a focal unit (the ego) and its network neighbors (the alters), with randomization conducted at the cluster level. We propose model-based estimators for the global treatment effect and spillover effect and establish their consistency and asymptotic normality, with asymptotic variances determined by the ego-cluster structure. Building on these theoretical results, we introduce an ego-clustering algorithm that sequentially selects egos and assigns alters to minimize asymptotic variances. Simulation studies and two empirical applications demonstrate that the proposed procedure yields accurate inference and efficiency improvements over existing network experimental designs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The ego-cluster design and variance-minimizing algorithm are the real additions here for handling network interference in experiments.

read the letter

The paper's core move is to split a network into ego-clusters (one focal unit plus its neighbors) and randomize treatment at the cluster level. This lets them build model-based estimators for the global treatment effect and the spillover effect, then prove consistency and asymptotic normality with variances that depend only on the cluster structure. They also give a sequential algorithm that picks egos and assigns alters to shrink those variances. That combination is what is new relative to earlier network randomization schemes. The simulations and two empirical examples are presented as evidence that the approach improves precision over standard designs. Those pieces are the parts that could actually be useful to someone running a field experiment with connected units. The main limitation is the model-based assumption on interference. If the true spillover pattern does not line up with how the clusters separate direct and indirect effects, the consistency result does not apply. The paper does not appear to explore robustness checks against that kind of misspecification. The clustering algorithm itself is also tied to the same model, so its claimed efficiency gains are conditional. This work is aimed at researchers who already work on causal inference with network interference and want a concrete randomization scheme plus an algorithm for choosing clusters. A reader who needs practical tools for social or epidemiological experiments would find the design and the variance expressions worth looking at. The theoretical claims and the empirical illustrations are developed enough that the paper should go to peer review rather than a desk reject.

Referee Report

2 major / 3 minor

Summary. The paper introduces a novel ego-cluster experimental design to estimate global treatment effects and spillover effects in the presence of network interference. The approach partitions the network into clusters each consisting of an ego and its neighboring alters, performs randomization at the cluster level, and develops model-based estimators whose consistency and asymptotic normality are established, with asymptotic variances explicitly determined by the ego-cluster structure. An algorithm is proposed for sequentially selecting egos and assigning alters to minimize these asymptotic variances. The theoretical results are complemented by simulation studies and two empirical applications that demonstrate accurate inference and efficiency gains compared to existing designs.

Significance. This manuscript makes a meaningful contribution to the literature on causal inference under network interference by providing a design that balances theoretical guarantees with practical implementation via the variance-minimizing clustering algorithm. The model-based framework allows for clean separation of effects and derivation of asymptotic properties, which is a strength when the assumptions hold. The inclusion of reproducible simulation studies and real-data applications enhances the paper's impact. If the central claims are verified, it could influence how experiments are designed in social networks and other interconnected systems.

major comments (2)

[§3 (Theoretical Results)] The consistency and asymptotic normality of the estimators are derived assuming a fixed ego-cluster structure; however, because the clustering algorithm selects clusters based on the observed network to minimize variance, it is important to confirm that these asymptotic properties continue to hold when the partition is data-dependent. This could affect the validity of the inference procedures.
[§4 (Clustering Algorithm)] The sequential selection procedure is presented as minimizing the asymptotic variances, but without a proof of optimality or bounds on the approximation error relative to the global minimum, the claimed efficiency improvements may not be fully realized in all networks.

minor comments (3)

[Abstract] Consider specifying the types of networks or contexts in the two empirical applications to provide immediate context for the results.
[Simulation studies] Include details on the parameter values used in the data-generating process and the number of Monte Carlo replications to allow for better reproducibility.
[Notation and setup] Ensure that the definitions of the global treatment effect and spillover effect are clearly distinguished from standard average treatment effects early in the paper.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [§3 (Theoretical Results)] The consistency and asymptotic normality of the estimators are derived assuming a fixed ego-cluster structure; however, because the clustering algorithm selects clusters based on the observed network to minimize variance, it is important to confirm that these asymptotic properties continue to hold when the partition is data-dependent. This could affect the validity of the inference procedures.

Authors: We appreciate this observation. In the experimental setting, the network is observed prior to randomization and is regarded as fixed. The ego-clustering algorithm uses this fixed network to produce a deterministic partition, after which randomization occurs at the cluster level. The consistency and asymptotic normality results are derived conditional on the ego-cluster structure. We will add a clarifying statement in Section 3 to make this conditioning explicit and confirm that the asymptotic properties and inference procedures remain valid under the data-dependent but pre-randomization clustering. revision: yes
Referee: [§4 (Clustering Algorithm)] The sequential selection procedure is presented as minimizing the asymptotic variances, but without a proof of optimality or bounds on the approximation error relative to the global minimum, the claimed efficiency improvements may not be fully realized in all networks.

Authors: The algorithm is a greedy sequential procedure that iteratively selects egos and assigns alters to reduce the asymptotic variances in a computationally tractable way. We do not claim global optimality, as identifying the exact variance-minimizing partition is a combinatorial problem that is intractable for large networks. We will revise Section 4 to describe the procedure more precisely as a practical heuristic and will expand the discussion to note the absence of approximation bounds while emphasizing that the simulation studies demonstrate consistent efficiency gains relative to existing designs. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's derivation chain starts from the ego-cluster randomization design and a model-based interference structure, then derives model-based estimators whose consistency and asymptotic normality (with variances explicitly determined by cluster structure) follow from standard M-estimation or similar arguments under the stated assumptions. The ego-clustering algorithm is then defined to minimize those derived asymptotic variances. No equation reduces by construction to a fitted input, no prediction is statistically forced from a subset of the target data, and no load-bearing uniqueness theorem or ansatz is imported via self-citation. The central claims remain independent of the quantities they estimate.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on a model-based framework for interference that enables effect separation via clusters; no explicit free parameters or new entities are introduced in the abstract, but the approach depends on domain assumptions about network structure and randomization validity.

axioms (1)

domain assumption Network interference admits a model-based representation allowing consistent separation of global treatment effects from spillovers through ego-cluster randomization.
Invoked to justify the estimators and their asymptotic properties.

pith-pipeline@v0.9.0 · 5472 in / 1321 out tokens · 37947 ms · 2026-05-09T19:23:18.112717+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 6 canonical work pages

[1]

Aronow, P. M. and Samii, C. (2017). Estimating average causal effects under general inter- ference, with application to a social network experiment.The Annals of Applied Statistics, 11(4):1912–1947

2017
[2]

Athey, S., Eckles, D., and Imbens, G. W. (2018). Exact p-values for network interference. Journal of the American Statistical Association, 113(521):230–240

2018
[3]

G., Duflo, E., and Jackson, M

Banerjee, A., Chandrasekhar, A. G., Duflo, E., and Jackson, M. O. (2013). The diffusion of microfinance.Science, 341(6144):1236498. Barab´ asi, A.-L. and Albert, R. (1999). Emergence of scaling in random networks.Science, 286(5439):509–512

2013
[4]

and Feller, A

Basse, G. and Feller, A. (2018). Analyzing two-stage experiments in the presence of inter- ference.Journal of the American Statistical Association, 113(521):41–55

2018
[5]

Basse, G. W. and Airoldi, E. M. (2018). Model-assisted design of experiments in the presence of network-correlated outcomes.Biometrika, 105(4):849–858

2018
[6]

Belloni, A., Fang, F., and Volfovsky, A. (2025). Neighborhood adaptive estimators for causal inference under network interference.arXiv preprint arXiv:2212.03683v2

work page arXiv 2025
[7]

D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks.Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008

2008
[8]

A., Canay, I

Bugni, F. A., Canay, I. A., and Shaikh, A. M. (2018). Inference under covariate-adaptive ran- domization.Journal of the American Statistical Association, 113(524):1784–1796. PMID: 30906087

2018
[9]

Cai, C., Zhang, X., and Airoldi, E. (2024). Independent-set design of experiments for estimat- ing treatment and spillover effects under network interference. InThe Twelfth International Conference on Learning Representations. 31

2024
[10]

Cai, J., De Janvry, A., and Sadoulet, E. (2015). Social networks and the decision to insure. American Economic Journal: Applied Economics, 7(2):81–108

2015
[11]

Deng, L., Zhang, J., Wang, Y., and Chen, C. (2024). Ego group partition: a novel framework for improving ego experiments in social networks.arXiv preprint arXiv:2402.12655

work page arXiv 2024
[12]

Eckles, D., Karrer, B., and Ugander, J. (2017). Design and analysis of experiments in networks: Reducing bias from interference.Journal of Causal Inference, 5(1):1–23. Erd¨ os, P. and R´ enyi, A. (1959). On random graphs i.Publicationes Mathematicae Debrecen, 6:290–297

2017
[13]

M., and Mealli, F

Forastiere, L., Airoldi, E. M., and Mealli, F. (2021). Identification and estimation of treat- ment and interference effects in observational studies on networks.Journal of the American Statistical Association, 116(534):901–918

2021
[14]

and Ding, P

Gao, M. and Ding, P. (2025). Causal inference in network experiments: regression-based analysis and design-based properties.Journal of Econometrics, 252:106119

2025
[15]

Goodman, L. A. (1961). Snowball sampling.The Annals of Mathematical Statistics, 32(1):148–170

1961
[16]

and Hu, F

Hu, Y. and Hu, F. (2012). Asymptotic properties of covariate-adaptive randomization.The Annals of Statistics, 40(3):1794–1815

2012
[17]

Hu, Y., Li, S., and Wager, S. (2022). Average direct and indirect causal effects under interference.Biometrika, 109(4):1165–1172

2022
[18]

Hudgens, M. G. and Halloran, M. E. (2008). Toward causal inference with interference. Journal of the American Statistical Association, 103(482):832–842. PMID: 19081744

2008
[19]

Imbens, G. W. and Rubin, D. B. (2015).Causal Inference for Statistics, Social, and Biomed- ical Sciences: An Introduction. Cambridge University Press

2015
[20]

S., and Volfovsky, A

Jagadeesan, R., Pillai, N. S., and Volfovsky, A. (2020). Designs for estimating the treatment effect in networks with interference.The Annals of Statistics, 48(2):679–712

2020
[21]

B., Wang, X., and Yu, J

Jia, C., Li, Y., Carson, M. B., Wang, X., and Yu, J. (2017). Node attribute-enhanced community detection in complex networks.Scientific Reports, 7(1):2626

2017
[22]

Jiang, Z., Imai, K., and Malani, A. (2022). Statistical inference and power analysis for direct and spillover effects in two-stage randomized experiments.Biometrics, 79(3):2370–2381. 32

2022
[23]

Kandiros, V., Pipis, C., Daskalakis, C., and Harshaw, C. (2025). The conflict graph de- sign: estimating causal effects under arbitrary neighborhood interference.arXiv preprint arXiv:2411.10908

work page arXiv 2025
[24]

Leung, M. P. (2020). Treatment and spillover effects under network interference.The Review of Economics and Statistics, 102(2):368–380

2020
[25]

Leung, M. P. (2023). Network cluster-robust inference.Econometrica, 91(2):641–667

2023
[26]

and Wager, S

Li, S. and Wager, S. (2022). Random graph asymptotics for treatment effect estimation under network interference.The Annals of Statistics, 50(4):2334–2358

2022
[27]

G., and Becker-Dreps, S

Liu, L., Hudgens, M. G., and Becker-Dreps, S. (2016). On inverse probability-weighted estimators in the presence of interference.Biometrika, 103(4):829–842

2016
[28]

Liu, Y., Zhou, Y., Li, P., and Hu, F. (2022). Adaptive a/b test on networks with cluster structures. InProceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151, pages 10836–10851. PMLR

2022
[29]

Liu, Y., Zhou, Y., Li, P., and Hu, F. (2024). Cluster-adaptive network a/b testing: from randomization to estimation.Journal of Machine Learning Research, 25(170):1–48

2024
[30]

Ma, W., Li, P., Zhang, L.-X., and Hu, F. (2024). A new and unified family of covariate adap- tive randomization procedures and their properties.Journal of the American Statistical Association, 119(545):151–162

2024
[31]

Manski, C. F. (2000). Economic analysis of social interactions.Journal of Economic Per- spectives, 14(3):115–136

2000
[32]

Manski, C. F. (2013). Identification of treatment response with social interactions.The Econometrics Journal, 16(1):S1–S23

2013
[33]

L., Sofrygin, O., Diaz, I., and Van der Laan, M

Ogburn, E. L., Sofrygin, O., Diaz, I., and Van der Laan, M. J. (2024). Causal inference for social network data.Journal of the American Statistical Association, 119(545):597–611. 33

2024
[34]

L., Shepherd, H., and Aronow, P

Paluck, E. L., Shepherd, H., and Aronow, P. M. (2016). Changing climates of conflict: a social network experiment in 56 schools.Proceedings of the National Academy of Sciences, 113(3):566–571

2016
[35]

M., Gilmour, S

Parker, B. M., Gilmour, S. G., and Schormans, J. (2017). Optimal design of experiments on connected units with application to social networks.Journal of the Royal Statistical Society Series C: Applied Statistics, 66(3):455–480

2017
[36]

Phan, T. Q. and Airoldi, E. M. (2015). A natural experiment of social network formation and dynamics.Proceedings of the National Academy of Sciences, 112(21):6595–6600

2015
[37]

Ross, N. (2011). Fundamentals of stein’s method.Probability Surveys, 8:210–293

2011
[38]

Saint-Jacques, G., Varshney, M., Simpson, J., and Xu, Y. (2019). Using ego-clusters to measure network effects at linkedin.arXiv preprint arXiv:1903.08755. S¨ avje, F., Aronow, P., and Hudgens, M. (2021). Average treatment effects in the presence of unknown interference.The Annals of Statistics, 49(2):673–701

work page arXiv 2019
[39]

Shalizi, C. R. and Thomas, A. C. (2011). Homophily and contagion are generically confounded in observational social network studies.Sociological Methods & Research, 40(2):211–239. PMID: 22523436

2011
[40]

and Duan, W

Su, W. and Duan, W. (2024). Improving ego-cluster for network effect measurement. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5713–5722. Association for Computing Machinery

2024
[41]

and Kao, E

Toulis, P. and Kao, E. (2013). Estimation of causal peer influence effects. InProceedings of the 30th International Conference on Machine Learning, volume 28, pages 1489–1497. PMLR

2013
[42]

Ugander, J., Karrer, B., Backstrom, L., and Kleinberg, J. (2013). Graph cluster randomiza- tion: network exposure to multiple universes. InProceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 329–337. Asso- ciation for Computing Machinery

2013
[43]

and Yin, H

Ugander, J. and Yin, H. (2023). Randomized graph cluster randomization.Journal of Causal Inference, 11(1)

2023
[44]

Viviano, D. (2020). Experimental design under network interference.arXiv preprint arXiv:2003.08421. 34

work page arXiv 2020
[45]

Viviano, D., Lei, L., Imbens, G., Karrer, B., Schrijvers, O., and Shi, L. (2025). Causal clustering: design of cluster experiments under network interference.arXiv preprint arXiv:2310.14983v3

work page arXiv 2025
[46]

Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393(6684):440–442

1998
[47]

Zhou, Z., Li, P., and Hu, F. (2024). Adaptive randomization in network data.Electronic Journal of Statistics, 18(1):47–76. 35

2024