Likelihood Inference for Latent Network Models under Snowball Sampling
Pith reviewed 2026-06-26 13:29 UTC · model grok-4.3
The pith
The exact likelihood for continuous latent space models under snowball sampling reduces to closed form via conditional edge independence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that conditional edge independence in continuous latent space models reduces the marginal likelihood of a multi-wave snowball sample to a closed-form expression, portable across the CLS class, and demonstrate this via stochastic EM on the Euclidean distance model, where naive inference on patent co-inventor data severely underestimates latent variance and degrades goodness-of-fit.
What carries the argument
The closed-form marginal likelihood for multi-wave snowball samples in CLS models, obtained by factoring the unobserved network configurations under conditional edge independence given latent vertex quantities.
If this is right
- Parameter estimates for latent space variance and covariate effects become unbiased rather than systematically distorted.
- The method extends without further derivation to any model inside the continuous latent space class.
- Spectral goodness-of-fit on real networks improves substantially once the sampling mechanism is incorporated.
- Multiple independent snowball samples can be drawn from the same large population and analyzed jointly.
Where Pith is reading between the lines
- The same marginalization trick may apply to other sampling designs that preserve conditional edge independence.
- Quantitative interpretations of how covariates shape network structure become reliable only after this correction.
- The framework suggests checking whether real networks approximately satisfy the conditional independence assumption before applying the closed form.
Load-bearing premise
The observed data come from multi-wave snowball sampling and the network edges form independently conditional on latent vertex-level quantities.
What would settle it
A simulation that generates networks from a CLS model, draws snowball samples, computes both the closed-form likelihood and a brute-force numerical integral over unobserved edges, then checks whether the two agree exactly.
Figures
read the original abstract
Snowball sampling is a widely used design for collecting network data from large or hard-to-reach populations, yet naive inference that ignores the sampling mechanism produces systematically biased parameter estimates. We derive the exact likelihood of a multi-wave snowball sample for the class of continuous latent space (CLS) models, in which edges form independently conditional on latent vertex-level quantities, and show that conditional edge independence reduces the marginalization over unobserved network configurations to a closed-form expression portable across the entire CLS class. We develop a stochastic Expectation-Maximization algorithm for the Euclidean latent distance model as a concrete implementation, and apply the framework to the large-scale co-inventor network of German semiconductor patent applicants by drawing multiple snowball samples. We find that the naive procedure severely underestimates latent space variance, produces networks with nearly twice the observed edge count, and achieves a spectral goodness-of-fit nine times worse than the corrected model, which directly affects the quantitative interpretation of covariate effects.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper derives the exact likelihood for multi-wave snowball samples drawn from continuous latent space (CLS) network models. It shows that conditional edge independence (given vertex latents) reduces the marginalization over unobserved edge configurations consistent with the sampling design to a closed-form product expression that holds across the CLS class. A stochastic EM algorithm is developed for the Euclidean latent distance model as a concrete case, and the method is applied to multiple snowball samples from the German semiconductor co-inventor network, where the naive (sampling-ignoring) estimator is shown to underestimate latent space variance, inflate edge counts, and produce substantially worse spectral goodness-of-fit.
Significance. If the derivation is exact, the result supplies a portable likelihood for an important sampling design in a broad model class, directly correcting a known source of bias in network data. The empirical demonstration quantifies the practical consequences for parameter interpretation and model assessment. Credit is due for the factorization argument that avoids post-hoc adjustments and for the reproducible application to real patent data.
minor comments (2)
- [Abstract] The abstract states that the marginalization reduces to a closed-form expression, but the provided text supplies no explicit steps or verification; the full manuscript should include the product-form derivation (referencing the conditional independence assumption) with a short proof sketch or reference to the relevant model equations.
- [Application] In the application section, the reported factor-of-nine improvement in spectral goodness-of-fit and the doubling of edge count under the naive procedure would be strengthened by stating the precise definition of the spectral metric and the number of Monte Carlo replications used for the stochastic EM.
Simulated Author's Rebuttal
We thank the referee for the positive summary, recognition of the factorization argument, and recommendation of minor revision. No major comments were raised in the report.
Circularity Check
Derivation is self-contained from stated model assumptions
full rationale
The paper states that edges form independently conditional on latent vertex quantities and derives the exact likelihood under multi-wave snowball sampling by showing that this independence reduces the sum over unobserved edge configurations to a closed-form product. This follows directly from the joint probability factoring as a product of individual edge probabilities, with the sampling design only constraining which configurations are consistent with observed recruitment waves; no parameters are fitted to a subset and then renamed as a prediction, no self-citation supplies a load-bearing uniqueness result, and no ansatz is smuggled in. The stochastic EM is presented only as a concrete implementation for one member of the CLS class, leaving the general marginalization claim independent of any fitted values or prior author work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Edges form independently conditional on latent vertex-level quantities
Reference graph
Works this paper leans on
-
[1]
Goodman , journal =
Leo A. Goodman , journal =. Snowball Sampling , urldate =
-
[2]
Journal of Statistical Planning and Inference , volume =
Frank, Ove , title =. Journal of Statistical Planning and Inference , volume =
-
[3]
Perspectives on Social Network Research , publisher =
Estimation of population totals by use of snowball samples , editor =. Perspectives on Social Network Research , publisher =. 1979 , isbn =. doi:https://doi.org/10.1016/B978-0-12-352550-5.50021-3 , url =
-
[4]
Frank, Ove and Snijders, Tom A. B. , title =. Journal of Official Statistics , year =
-
[5]
Snijders, Tom A. B. , title =. BMS: Bulletin of Sociological Methodology , year =
-
[6]
Journal of Survey Statistics and Methodology , volume =
Vincent, Kyle and Thompson, Steve , title =. Journal of Survey Statistics and Methodology , volume =
-
[7]
Statistics and computing , volume=
Annealed importance sampling , author=. Statistics and computing , volume=. 2001 , publisher=
2001
-
[8]
Faming Liang and Ick Hoon Jin and Qifan Song and Jun S. Liu , title =. Journal of the American Statistical Association , volume =. 2016 , publisher =. doi:10.1080/01621459.2015.1009072 , URL =
-
[9]
Jaewoo Park and Murali Haran , title =. Journal of the American Statistical Association , volume =. 2018 , publisher =. doi:10.1080/01621459.2018.1448824 , URL =
-
[10]
An introduction to exponential random graph (p*) models for social networks , journal =
Garry Robins and Pip Pattison and Yuval Kalish and Dean Lusher , keywords =. An introduction to exponential random graph (p*) models for social networks , journal =. 2007 , note =. doi:https://doi.org/10.1016/j.socnet.2006.08.002 , url =
-
[11]
and Hooker, Giles and Staicu, Ana-Maria and Scheipl, Fabian and Ruppert, David , year =
David R. Hunter and Pavel N. Krivitsky and Michael Schweinberger , title =. Journal of Computational and Graphical Statistics , volume =. 2012 , publisher =. doi:10.1080/10618600.2012.732921 , note =
-
[12]
Journal of Statistical Software , author =
ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks , volume =. Journal of Statistical Software , author =. 2008 , pages =. doi:10.18637/jss.v024.i03 , abstract =
-
[13]
Journal of the American Statistical Association , volume =
Peter D Hoff and Adrian E Raftery and Mark S Handcock , title =. Journal of the American Statistical Association , volume =. 2002 , publisher =
2002
-
[14]
, year =
Kaur, Hardeep and Rastelli, Riccardo and Friel, Nial and Raftery, Adrian E. , year =. The Sage Handbook of Social Network Analysis , publisher =
-
[15]
2017 , publisher =
Survey sampling theory and applications , author =. 2017 , publisher =
2017
-
[16]
2018 , publisher =
Networks: An Introduction , author =. 2018 , publisher =
2018
-
[17]
Statistical science , volume =
The Geometry of Continuous Latent Space Models for Network Data , author =. Statistical science , volume =. 2019 , publisher =
2019
-
[18]
Network Science , volume =
Properties of latent variable network models , author =. Network Science , volume =
-
[19]
Journal of computational and graphical statistics , volume =
Fast inference for the latent space network model using a case-control approximate likelihood , author =. Journal of computational and graphical statistics , volume =. 2012 , publisher =
2012
-
[20]
Journal of Statistical Software , author =
Fitting Latent Cluster Models for Networks with latentnet , volume =. Journal of Statistical Software , author =. 2008 , pages =. doi:10.18637/jss.v024.i05 , abstract =
-
[21]
2008 , publisher =
Lemieux, Christiane , title =. 2008 , publisher =
2008
-
[22]
, title =
Erickson, Bonnie H. , title =. Sociological Methodology , year =
-
[23]
1995 , institution =
A stochastic EM algorithm for approximating the maximum likelihood estimate , author =. 1995 , institution =
1995
-
[24]
Computational statistics quarterly , volume =
The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem , author =. Computational statistics quarterly , volume =
-
[25]
Mathematical programming , volume =
On the limited memory BFGS method for large scale optimization , author =. Mathematical programming , volume =. 1989 , publisher =
1989
-
[26]
The Stochastic EM Algorithm: Estimation and Asymptotic Results , urldate =
Søren Feodor Nielsen , journal =. The Stochastic EM Algorithm: Estimation and Asymptotic Results , urldate =
-
[27]
Louis , journal =
Thomas A. Louis , journal =. Finding the Observed Information Matrix when Using the EM Algorithm , urldate =
-
[28]
International workshop on algorithms and models for the web-graph , pages =
Random dot product graph models for social networks , author =. International workshop on algorithms and models for the web-graph , pages =. 2007 , organization =
2007
-
[29]
Physical Review E—Statistical, Nonlinear, and Soft Matter Physics , volume =
Hyperbolic geometry of complex networks , author =. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics , volume =. 2010 , publisher =
2010
-
[30]
Journal of classification , volume =
Estimation and prediction for stochastic blockmodels for graphs with latent block structure , author =. Journal of classification , volume =. 1997 , publisher =
1997
-
[31]
The Annals of Statistics , volume =
Co-clustering separately exchangeable network data , author =. The Annals of Statistics , volume =
-
[32]
The Annals of Statistics , volume =
Rate-optimal graphon estimation , author =. The Annals of Statistics , volume =
-
[33]
Biometrika , volume =
Estimating network edge probabilities by neighbourhood smoothing , author =. Biometrika , volume =. 2017 , publisher =
2017
-
[34]
Journal of Computational and Graphical Statistics , volume =
Stochastic block smooth graphon model , author =. Journal of Computational and Graphical Statistics , volume =. 2025 , publisher =
2025
-
[35]
2007 , eprint=
Graph limits and exchangeable random graphs , author=. 2007 , eprint=
2007
-
[36]
Journal of Statistical Planning and Inference , volume =
Parameter identifiability in a class of random graph mixture models , author =. Journal of Statistical Planning and Inference , volume =. 2011 , publisher =
2011
-
[37]
Social Networks , volume =
Estimating network properties from snowball sampled data , author =. Social Networks , volume =. 2012 , publisher =
2012
-
[38]
Journal of the Royal Statistical Society Series A: Statistics in Society , volume =
Fritz, Cornelius and De Nicola, Giacomo and Kevork, Sevag and Harhoff, Dietmar and Kauermann, Göran , title =. Journal of the Royal Statistical Society Series A: Statistics in Society , volume =. 2023 , month =. doi:10.1093/jrsssa/qnad009 , url =
-
[39]
M. E. J. Newman , title =. Proceedings of the National Academy of Sciences , volume =. 2001 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.98.2.404 , abstract =
-
[40]
Hedges , title =
Larry V. Hedges , title =. Journal of Educational Statistics , volume =. 1992 , doi =
1992
-
[41]
and Davey Smith, George and Schmidt, Amand F
Hartwig, Fernando P. and Davey Smith, George and Schmidt, Amand F. and Sterne, Jonathan A. C. and Higgins, Julian P. T. and Bowden, Jack , title =. Research Synthesis Methods , volume =. doi:https://doi.org/10.1002/jrsm.1402 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1002/jrsm.1402 , abstract =
-
[42]
Social Networks , volume =
Conditional estimation of exponential random graph models from snowball sampling designs , author =. Social Networks , volume =. 2013 , doi =
2013
-
[43]
Computational Statistics , volume =
Multivariate plug-in bandwidth selection , author =. Computational Statistics , volume =
-
[44]
Journal of Statistical Software , author =
ks: Kernel Density Estimation and Kernel Discriminant Analysis for Multivariate Data in R , volume =. Journal of Statistical Software , author =. 2007 , pages =. doi:10.18637/jss.v021.i07 , abstract =
-
[45]
Journal of Nonparametric Statistics , volume =
Tarn Duong and Martin Hazelton , title =. Journal of Nonparametric Statistics , volume =. 2003 , publisher =. doi:10.1080/10485250306039 , URL =
-
[46]
Journal of the American Statistical Association , volume =
David R Hunter and Steven M Goodreau and Mark S Handcock , title =. Journal of the American Statistical Association , volume =. 2008 , publisher =. doi:10.1198/016214507000000446 , URL =
-
[47]
Spectral goodness of fit for network models , journal =
Jesse Shore and Benjamin Lubin , keywords =. Spectral goodness of fit for network models , journal =. 2015 , issn =. doi:https://doi.org/10.1016/j.socnet.2015.04.004 , url =
-
[48]
1997 , publisher =
Spectral graph theory , author =. 1997 , publisher =
1997
-
[49]
Ove Frank and David Strauss , title =. Journal of the American Statistical Association , volume =. 1986 , publisher =. doi:10.1080/01621459.1986.10478342 , URL =
-
[50]
2013 , eprint=
A Survey and Taxonomy of Graph Sampling , author=. 2013 , eprint=
2013
-
[51]
Zhang, L.-C. and Patone, M. , year =. Graph sampling , volume =. doi:10.1007/s40300-017-0126-y , pages =
-
[52]
Heckathorn, Douglas D. and Cameron, Christopher J. , title =. Annual Review of Sociology , year =. doi:https://doi.org/10.1146/annurev-soc-060116-053556 , url =
-
[53]
J.C. Johnson and J.S. Boster and D. Holbert , abstract =. Estimating relational attributes from snowball samples through simulation , journal =. 1989 , issn =. doi:https://doi.org/10.1016/0378-8733(89)90009-9 , url =
-
[54]
2023 , eprint =
Snowball sampling from graphs , author =. 2023 , eprint =
2023
-
[55]
The Annals of applied statistics , volume =
Modeling social networks from sampled data , author =. The Annals of applied statistics , volume =
-
[56]
Survey methodology , volume =
Model-based estimation with link-tracing sampling designs , author =. Survey methodology , volume =
-
[57]
Philippa E. Pattison and Garry L. Robins and Tom A.B. Snijders and Peng Wang , keywords =. Conditional estimation of exponential random graph models from snowball sampling designs , journal =. 2013 , note =. doi:https://doi.org/10.1016/j.jmp.2013.05.004 , url =
-
[58]
Alex D. Stivala and Johan H. Koskinen and David A. Rolls and Peng Wang and Garry L. Robins , keywords =. Snowball sampling for estimating exponential random graph models for large networks , journal =. 2016 , issn =. doi:https://doi.org/10.1016/j.socnet.2015.11.003 , url =
-
[59]
Paul W. Holland and Kathryn Blackmond Laskey and Samuel Leinhardt , abstract =. Stochastic blockmodels: First steps , journal =. 1983 , issn =. doi:https://doi.org/10.1016/0378-8733(83)90021-7 , url =
-
[60]
Probabilistic Foundations of Statistical Network Analysis , chapter =
Crane, Harry , year =. Probabilistic Foundations of Statistical Network Analysis , chapter =
-
[61]
Joseph G. Ibrahim , title =. Journal of the American Statistical Association , volume =. 1990 , publisher =. doi:10.1080/01621459.1990.10474938 , URL =
-
[62]
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics , pages =
Elliptical slice sampling , author =. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics , pages =. 2010 , editor =
2010
-
[63]
Nature , volume =
Collective dynamics of ‘small-world’networks , author =. Nature , volume =. 1998 , publisher =
1998
-
[64]
Brain connectivity , volume =
The ubiquity of small-world networks , author =. Brain connectivity , volume =. 2011 , publisher =
2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.