Rethinking Sampling Strategy in Link Prediction

Tao Zhou; Xinshan Jiao; Yilin Bi; Zhenyu Deng

arxiv: 2606.19775 · v1 · pith:LXWB72FAnew · submitted 2026-06-18 · 💻 cs.SI · stat.AP· stat.OT

Rethinking Sampling Strategy in Link Prediction

Yilin Bi , Zhenyu Deng , Xinshan Jiao , Tao Zhou This is my paper

Pith reviewed 2026-06-26 15:30 UTC · model grok-4.3

classification 💻 cs.SI stat.APstat.OT

keywords link predictionsampling strategyprobe setvalidation setnetwork incompletenessdegree biasβ-sampling

0 comments

The pith

The structural features of missing links, set by how probe sets are sampled, control link prediction accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines two sampling steps when preparing data for link prediction: first choosing the probe set of missing links, then choosing a validation set from the rest. It introduces β-sampling, which sets the chance of picking a link proportional to the product of its endpoints' degrees raised to β. Tests on 45 networks show that probe sets biased toward high-degree connections yield higher prediction scores. With the probe set held fixed, the second sampling step for validation still changes results, and the best choice differs from both random selection and from matching the probe set's structure.

Core claim

Using β-sampling to vary the probe set shows that prediction accuracy rises when missing links preferentially connect high-degree nodes. Even with a fixed probe set, different second-stage sampling strategies for the validation set produce measurably different accuracies, and the strategy that works best is neither random sampling nor consistent sampling that matches structural traits.

What carries the argument

β-sampling, a scheme that draws each observed link into the probe set with probability proportional to (degree_u × degree_v)^β, which tunes the degree bias of the simulated missing links.

If this is right

Prediction performance improves when missing links tend to join high-degree nodes.
Even a fixed probe set leaves room for second-stage sampling to shift measured accuracy.
The best validation-set sampling rule is not the same as random selection or consistent structural matching.
Evaluation protocols must account for both sampling stages rather than treating them as neutral.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real missing links in networks may follow particular degree patterns, so results could differ when those patterns are known.
Algorithms might be tuned or selected according to the expected β value of the missing data.
Benchmark suites could usefully include multiple probe sets generated at different β values.

Load-bearing premise

Changing the single parameter β cleanly alters only the structural properties of the probe set without introducing other confounding differences across the 45 networks.

What would settle it

If prediction accuracy shows no systematic change when β is varied across a fresh collection of networks, or if second-stage sampling effects disappear under controlled tests that hold all other factors fixed.

Figures

Figures reproduced from arXiv: 2606.19775 by Tao Zhou, Xinshan Jiao, Yilin Bi, Zhenyu Deng.

**Figure 1.** Figure 1: Illustration of the two-stage β-sampling processes. (a) The workflow of the two sampling processes. In the first-stage sampling, links in ET ∪ EV are sampled out from E and the remaining links constitute the probe set EP . In the second-stage sampling, links in ET are sampled out from ET ∪ EV and the remaining links belong to the validation set EV . (b)-(d) demonstrate the effect of β on link selection. Wi… view at source ↗

**Figure 2.** Figure 2: Best achievable AUC under different β1. Each curve reports AUC∗ (β1) for one algorithm. All results are averaged over 45 real-world networks, with 10 independent runs for each network. 3 Results To assess whether the structural characteristics of links in the probe set affect link prediction performance, we vary β1 in the first-stage sampling to generate different probe sets that can be considered as proxi… view at source ↗

**Figure 3.** Figure 3: Values of β ∗ 2 (β1) for parameterized algorithms. The blue solid line with circles and marked values represent β ∗ 2 (β1), while the green and magenta dash lines correspond to the random sampling (β2 = 0) and consistent sampling (β2 = β1). (a)-(d) report the results for Katz, LP, LRW, and TSAA, respectively. Having shown that β1 plays a critical role, we next investigate which strategy (i.e., which value … view at source ↗

**Figure 4.** Figure 4: Pairwise comparison of prediction performance among the three sampling strategies. Each node represents one sampling strategy, and the value associated with a directed edge from strategy A to strategy B indicates the winning rate that A outperforms B in total 225 pairwise comparisons (45 networks and 5 representative β1 values). Gray dashed bidirectional edges denote the proportion of cases in which two st… view at source ↗

**Figure 5.** Figure 5: Random-to-maximal Ratios (RMRs) of the eight link prediction algorithms based on AUC. Different colored boxes on the x-axis correspond to different link prediction algorithms, and the y-axis reports RMR values. For each algorithm, the distribution is obtained over the 45 real-world networks. The hollow diamond inside each box denotes the mean RMR, and the triangle denotes the minimum value. Two limitations… view at source ↗

read the original abstract

Many real-world networks are incomplete, making link prediction a fundamental challenge in network science. To train parameters and evaluate algorithms, observed links are usually divided into three subsets, namely training, validation, and probe sets. This division implicitly involves two sampling processes: first-stage sampling yields the probe set and second-stage sampling obtains the variation set. To date, our understanding of how these two sampling processes affect algorithm performance remains quite limited. To address this issue, we propose a sampling scheme called $\beta$-sampling, where the sampling probability of a link is proportional to the product of the degrees of its two endpoints raised to the power of $\beta$. Experiments on 45 real-world networks reveal that the structural characteristics of missing links, as simulated via varying probe sets, substantially impact prediction accuracy. When missing links tend to connect high-degree nodes, such links can be predicted accurately with ease. Furthermore, even with a fixed probe set, second-stage sampling still exerts a significant influence on prediction accuracy. Notably, the optimal second-stage sampling strategy differs from \textit{random sampling} (which randomly selects links to form the validation set) and \textit{consistent sampling} (which guarantees that links in the validation and probe sets share identical structural characteristics).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

β-sampling highlights that probe-set and validation-set choices affect link-prediction scores, but the experiments need tighter controls before the degree-product claim can be taken at face value.

read the letter

The paper's central observation is that the way you pick the probe set and then the validation set changes measured accuracy, and their β-sampling lets you tune the degree product of the held-out links. On 45 networks they report that probe sets biased toward high-degree-product edges are easier to predict and that the best second-stage sampling rule is neither random nor the consistent-sampling baseline. That is the actual new piece: a simple parametric knob and an empirical demonstration that the two sampling stages are not interchangeable.

The work is useful for anyone who runs link-prediction benchmarks, because it shows that standard random splits can hide or exaggerate performance differences. The 45-network scale is respectable and the distinction between random and consistent sampling is worth checking.

The soft spot is the lack of any reported controls or diagnostics for the obvious confounders. In real networks, degree-product correlates with clustering, community structure, and betweenness; nothing in the abstract indicates they matched or regressed those out. If the accuracy gaps shrink once those are held fixed, the claim that “structural characteristics of missing links” drive the result becomes harder to sustain. The abstract also gives no error bars, no multiple-testing correction, and no list of the networks, so the size and reliability of the reported effects cannot be judged from what is shown.

This is the kind of methodological note that belongs in a methods-focused venue or as a short note in a network-science journal. A serious referee should see it to verify the controls and the statistical reporting, but it is not a foundational result. I would bring it to a reading group for the sampling discussion and would not cite it in my own work unless the full experiments hold up after the confounder check.

Referee Report

3 major / 1 minor

Summary. The paper proposes a β-sampling scheme for dividing observed links into training, validation, and probe sets in link prediction, with sampling probability p ∝ (d_u d_v)^β. Experiments on 45 real-world networks are used to claim that structural characteristics of missing links (varied via probe sets) substantially affect prediction accuracy, that links between high-degree nodes are easier to predict, and that second-stage sampling for the validation set exerts significant influence even with fixed probe sets, with optimal strategies differing from random and consistent sampling.

Significance. If the empirical effects can be isolated from confounders, the work would demonstrate that standard evaluation practices in link prediction are sensitive to sampling choices and could motivate more careful benchmark design. The scale of 45 networks provides breadth, but the absence of statistical controls and the risk of entangled structural factors limit the strength of the conclusions.

major comments (3)

[Abstract] Abstract: the claim that 'structural characteristics of missing links, as simulated via varying probe sets, substantially impact prediction accuracy' lacks any reported statistical details, error bars, dataset list, p-values, or controls for multiple testing across 45 networks, making the 'substantial impact' assertion unverifiable from the given evidence.
[Method / Experimental Setup] β-sampling definition and experimental setup: varying the single parameter β in p ∝ (d_u d_v)^β does not demonstrably isolate the intended degree-product property, because high-(d_u d_v) links systematically co-vary with clustering, betweenness, and community structure in real networks; no matching, stratification, or regression controls are described to rule out these confounders.
[Results] Results on second-stage sampling: the claim that 'even with a fixed probe set, second-stage sampling still exerts a significant influence on prediction accuracy' inherits the same isolation problem once the probe set itself is generated by β-sampling, and no explicit test is reported showing independence from the first-stage structural variation.

minor comments (1)

[Abstract] Abstract: the definitions of 'random sampling' and 'consistent sampling' are given inline but would benefit from explicit mathematical notation when first introduced in the main text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments, which help clarify the need for greater statistical transparency and discussion of potential confounders. We respond to each major comment below and indicate the revisions that will be incorporated.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'structural characteristics of missing links, as simulated via varying probe sets, substantially impact prediction accuracy' lacks any reported statistical details, error bars, dataset list, p-values, or controls for multiple testing across 45 networks, making the 'substantial impact' assertion unverifiable from the given evidence.

Authors: We agree that the abstract would be strengthened by supporting statistical information. In the revised manuscript we will add error bars to all reported accuracy metrics, include a table summarizing the 45 networks, and report mean accuracy differences with standard deviations across networks to quantify the impact. We will also note the consistency of trends rather than relying on per-network p-values, thereby addressing multiple-testing concerns while keeping the focus on the observed empirical patterns. revision: yes
Referee: [Method / Experimental Setup] β-sampling definition and experimental setup: varying the single parameter β in p ∝ (d_u d_v)^β does not demonstrably isolate the intended degree-product property, because high-(d_u d_v) links systematically co-vary with clustering, betweenness, and community structure in real networks; no matching, stratification, or regression controls are described to rule out these confounders.

Authors: We acknowledge that degree-product correlates with other structural features in real networks and that β-sampling does not fully isolate the degree-product effect. Our contribution is empirical: systematically varying the degree-product distribution of missing links via β produces consistent changes in prediction accuracy across 45 networks. We will add an explicit discussion of potential confounders and note that future work could employ matching or regression controls. The current design prioritizes breadth over synthetic isolation. revision: partial
Referee: [Results] Results on second-stage sampling: the claim that 'even with a fixed probe set, second-stage sampling still exerts a significant influence on prediction accuracy' inherits the same isolation problem once the probe set itself is generated by β-sampling, and no explicit test is reported showing independence from the first-stage structural variation.

Authors: The second-stage experiments already hold the probe set fixed while varying only the validation-set sampling strategy. To further demonstrate robustness, we will add results for the same second-stage variations applied to probe sets generated under multiple distinct β values and report that the influence of second-stage sampling persists. This clarification and additional figures will be included in the revised results section. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical study with no derivation chain

full rationale

The paper proposes a β-sampling scheme and reports experimental results on 45 networks showing effects of probe-set structure and second-stage sampling on link-prediction accuracy. All central claims rest on direct empirical comparisons of accuracy metrics across varied β values; no equations, predictions, or uniqueness theorems are presented that could reduce by construction to fitted inputs, self-citations, or ansatzes. The methodology is self-contained as an observational study and exhibits none of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central empirical claim rests on the assumption that β controls probe-set structure in a manner independent of other network properties, plus the choice of 45 networks and the definition of 'structural characteristics.'

free parameters (1)

β
Exponent controlling sampling bias toward high-degree links; value(s) used in the 45-network experiments are not stated in the abstract.

pith-pipeline@v0.9.1-grok · 5750 in / 1208 out tokens · 24747 ms · 2026-06-26T15:30:58.958620+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

72 extracted references

[1]

Vega-Redondo F. 2007. Complex social networks.Cambridge University Press

2007
[2]

Gosak M, Markoviˇ c R, Dolenˇ sek J, Slak Rupnik M, Marhl M, Stoˇ zer A, Perc M. 2018. Network science of biological systems at different scales: A review.Physics of Life Reviews. 24: 118-135

2018
[3]

Strogatz SH. 2001. Exploring complex networks.Nature. 410(6825): 268-276

2001
[4]

Mitchell M. 2009. Complexity: A guided tour.Oxford University Press

2009
[5]

Barab´ asi AL. 2013. Network science.Cambridge University Press

2013
[6]

Newman MEJ. 2018. Networks.Oxford University Press

2018
[7]

Kossinets G. 2006. Effects of missing data in social networks.Social Networks. 28(3): 247-268

2006
[8]

L¨ u L, Zhou T. 2011. Link prediction in complex networks: A survey.Physica A: Statistical Mechanics and its Applications. 390(6): 1150-1170

2011
[9]

Wang P, Xu BW, Wu YR, Zhou XY. 2015. Link prediction in social networks: The state-of-the-art. Science China Information Sciences. 58(1): 1-38

2015
[10]

Mart´ ınez V, Berzal F, Cubero JC. 2016. A survey of link prediction in complex networks.ACM Computing Surveys. 49(4): 69

2016
[11]

Kumar A, Singh SS, Singh K, Biswas B. 2020. Link prediction techniques, applications, and perfor- mance: A survey.Physica A: Statistical Mechanics and its Applications. 553: 124289. 14

2020
[12]

Zhou T. 2021. Progresses and challenges in link prediction.iScience. 24(11): 103217

2021
[13]

Wu H, Song C, Ge Y, Ge T. 2022. Link prediction on complex networks: An experimental survey. Data Science and Engineering. 7(3): 253-278

2022
[14]

Xu E, Bi Y, Hu H, Chen X, Yu Z, Li Y, Hu Y, Zhou T. 2026. Predictability of complex systems. Physics Reports. 1166: 1-107

2026
[15]

Wang WQ, Zhang QM, Zhou T. 2012. Evaluating network models: A likelihood analysis.EPL. 98: 28004

2012
[16]

Zhang QM, Xu XK, Zhu YX, Zhou T. 2015. Measuring multiple evolution mechanisms of complex networks.Scientific Reports. 5: 10350

2015
[17]

L¨ u L, Medo M, Yeung CH, Zhang YC, Zhang ZK, Zhou T. 2012. Recommender systems.Physics Reports. 519(1): 1-49

2012
[18]

Bobadilla J, Ortega F, Hernando A, Guti´ errez A. 2013. Recommender systems survey.Knowledge- Based Systems. 46: 109-132

2013
[19]

Bi Y, Wang P. 2022. Exploring drought-responsive crucial genes inSorghum.iScience. 25(11): 105347

2022
[20]

Al Musawi AF, Roy S, Ghosh P. 2025. A review of link prediction applications in network biology. IEEE Access. 13: 54997-55016

2025
[21]

Liben-Nowell D, Kleinberg J. 2007. The link-prediction problem for social networks.Journal of the American Society for Information Science and Technology. 58(7): 1019-1031

2007
[22]

Adamic LA, Adar E. 2003. Friends and neighbors on the Web.Social Networks. 25(3): 211-230

2003
[23]

Ou Q, Jin YD, Zhou T, Wang BH, Yin BQ. 2007. Power-law strength-degree correlation from resource- allocation dynamics on weighted networks.Physical Review E. 75(2): 021102

2007
[24]

Zhou T, L¨ u L, Zhang YC. 2009. Predicting missing links via local information.The European Physical Journal B. 71(4): 623-630

2009
[25]

Clauset A, Moore C, Newman MEJ. 2008. Hierarchical structure and the prediction of missing links in networks.Nature. 453(7191): 98-101

2008
[26]

Guimer` a R, Sales-Pardo M. 2009. Missing and spurious interactions and the reconstruction of complex networks.Proceedings of the National Academy of Sciences of the United States of America. 106(52): 22073-22078

2009
[27]

Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. 2009. The graph neural network model. IEEE Transactions on Neural Networks. 20(1): 61-80. 15

2009
[28]

Hamilton WL, Ying R, Leskovec J. 2017. Inductive representation learning on large graphs. InPro- ceedings of the 31st International Conference on Neural Information Processing Systems. (Curran Associates, Inc.) pp. 1025-1035

2017
[29]

Veliˇ ckovi´ c P, Cucurull G, Casanova A, Romero A, Li` o P, Bengio Y. 2018. Graph attention networks. InProceedings of the International Conference on Learning Representations. (ICLR) pp. 1-12

2018
[30]

Kipf TN, Welling M. 2016. Variational graph auto-encoders. InProceedings of the NeurIPS Workshop on Bayesian Deep Learning. (PMLR) pp. 1-3

2016
[31]

Jiao X, Luo Y, Bi Y, Zhou T. 2026. Impacts of data splitting strategies on parameterized link prediction algorithms.Physica A: Statistical Mechanics and its Applications. 692: 131545

2026
[32]

He X, Ghasemian A, Lee E, Schwarze AC, Clauset A, Mucha PJ. 2024. Link prediction accuracy on real-world networks under non-uniform missing-edge patterns.PLoS ONE. 19(7): e0306883

2024
[33]

Zhu YX, L¨ u L, Zhang QM, Zhou T. 2012. Uncovering missing links with cold ends.Physica A: Statistical Mechanics and its Applications. 391(22): 5769-5778

2012
[34]

Ahmed N, Neville J, Kompella R. 2012. Network sampling designs for relational classification. In Proceedings of the International AAAI Conference on Web and Social Media. (AAAI) pp. 383-386

2012
[35]

Neville J, Gallagher B, Eliassi-Rad T. 2009. Evaluating statistical tests for within-network classifiers of relational data. Inthe Proceedings of the 9th IEEE International Conference on Data Mining. (IEEE) pp. 397-406

2009
[36]

Chebotarev P, Shamis E. 1997. The matrix-forest theorem and measuring relations in small social groups.Automation and Remote Control. 58(9): 1505-1514

1997
[37]

Stealing fire or stacking knowledge

Muscoloni A, Cannistraci CV. 2023. “Stealing fire or stacking knowledge” by machine intelligence to model link prediction in complex networks.iScience. 26(1): 105697

2023
[38]

L¨ u L, Jin CH, Zhou T. 2009. Similarity index based on local paths for link prediction of complex networks.Physical Review E. 80(4): 046122

2009
[39]

Liu W, L¨ u L. 2010. Link prediction based on local random walk.EPL. 89(5): 58007

2010
[40]

Katz L. 1953. A new status index derived from sociometric analysis.Psychometrika. 18(1): 39-43

1953
[41]

Sun D, Zhou T, Liu JG, Stanley HE, Zhang YC. 2009. Information filtering based on transferring similarity.Physical Review E. 80(1): 017101

2009
[42]

Zhou T. 2023. Discriminating abilities of threshold-free evaluation metrics in link prediction.Physica A: Statistical Mechanics and its Applications. 615: 128529. 16

2023
[43]

Jiao X, Wan S, Liu Q, Bi Y, Lee YL, Xu E, Hao D, Zhou T. 2024. Comparing discriminating abilities of evaluation metrics in link prediction.Journal of Physics: Complexity. 5(2): 025014

2024
[44]

Bi Y, Jiao X, Lee YL, Zhou T. 2024. Inconsistency among evaluation metrics in link prediction.PNAS Nexus. 3(11): pgae498

2024
[45]

Wan S, Bi Y, Jiao X, Zhou T. 2025. Quantifying discriminability of evaluation metrics in link prediction for real networks.Chaos, Solitons & Fractals. 199: 116864

2025
[46]

Hanley JA, McNeil BJ. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology. 143(1): 29-36

1982
[47]

Davis J, Goadrich M. 2006. The relationship between Precision-Recall and ROC curves. InProceedings of the 23rd International Conference on Machine Learning. (ACM) pp. 233-240

2006
[48]

J¨ arvelin K, Kek¨ al¨ ainen J. 2002. Cumulated gain-based evaluation of IR techniques.ACM Transactions on Information Systems. 20(4): 422-446

2002
[49]

Bi Y, Bian J, Wan S, Wang S, Zhou T. 2026. Domain matters: Towards domain-informed evaluation for link prediction.Physica A: Statistical Mechanics and its Applications. 693: 131551

2026
[50]

Watts DJ, Strogatz SH. 1998. Collective dynamics of ’small-world’ networks.Nature. 393(6684): 440- 442

1998
[51]

Newman MEJ. 2002. Assortative mixing in networks.Physical Review Letters. 89(20): 208701

2002
[52]

Schein AI, Popescul A, Ungar LH, Pennock DM. 2002. Methods and metrics for cold-start recommen- dations. InProceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (ACM) pp. 253-260

2002
[53]

Lam XN, Vu T, Le TD, Duong AD. 2008. Addressing cold-start problem in recommendation sys- tems. InProceedings of the 2nd International Conference on Ubiquitous Information Management and Communication. (ACM) pp. 208-211

2008
[54]

Zhang ZK, Liu C, Zhang YC, Zhou T. 2010. Solving the cold-start problem in recommender systems with social tags.EPL. 92(2): 28002

2010
[55]

Lika B, Kolomvatsos K, Hadjiefthymiades S. 2014. Facing the cold start problem in recommender systems.Expert Systems with Applications. 41(4): 2065-2073

2014
[56]

Rafiei D, Curial S. 2005. Effectively visualizing large networks through sampling. InIEEE Visualiza- tion. (IEEE) pp. 375-382. 17

2005
[57]

2005 Subnets of scale-free networks are not scale-free: Sampling prop- erties of networks.Proceedings of the National Academy of Sciences of the United States of America

Stumpf MPH, Wiuf C, May RM. 2005 Subnets of scale-free networks are not scale-free: Sampling prop- erties of networks.Proceedings of the National Academy of Sciences of the United States of America. 102(12): 4221-4224

2005
[58]

Leskovec J, Faloutsos C. 2006. Sampling from large graphs. InProceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (ACM) pp. 631-636

2006
[59]

Gjoka M, Kurant M, Butts CT, Markopoulou A. 2010. Walking in Facebook: A case study of unbiased sampling of OSNs. In2010 Proceedings IEEE INFOCOM. (IEEE) pp. 1-9

2010
[60]

Maiya AS, Berger-Wolf TY. 2010. Sampling community structure. InProceedings of the 19th Inter- national Conference on World Wide Web. (ACM) pp. 701-710

2010
[61]

De Choudhury M, Lin YR, Sundaram H, Candan KS, Xie L, Kelliher A. 2010. How does the data sampling strategy impact the discovery of information diffusion in social media?. InProceedings of the International AAAI Conference on Web and Social Media. (AAAI) pp. 34-41

2010
[62]

Lichtenwalter RN, Chawla NV. 2012. Link prediction: Fair and effective evaluation. In2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. (IEEE) pp. 376-383

2012
[63]

Mitchell S, Potash E, Barocas S, D’Amour A, Lum K. 2021. Algorithmic fairness: Choices, assump- tions, and definitions.Annual Review of Statistics and Its Application. 8: 141-163

2021
[64]

Murata T, Moriyasu S. 2007. Link prediction of social networks based on weighted proximity measures. InProceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence. (IEEE) pp. 85-88

2007
[65]

L¨ u L, Zhou T. 2010. Link prediction in weighted networks: The role of weak ties.EPL. 89: 18001

2010
[66]

Zhao J, Miao L, Yang J, Fang H, Zhang QM, Nie M, Holme P, Zhou T. 2015. Prediction of links and weights in networks by reliable routes.Scientific Reports. 5: 12261

2015
[67]

Dunlavy DM, Kolda TG, Acar E. 2011. Temporal link prediction using matrix and tensor factoriza- tions.ACM Transactions on Knowledge Discovery from Data. 5(2): 10

2011
[68]

Holme P, Saram¨ aki J. 2012. Temporal networks.Physics Reports. 519(3): 97-125

2012
[69]

Tang D. Du W. Shekhtman L. Wang Y. Havlin S. Cao X. Yan G. 2020. Predictability of real temporal networks.National Science Review. 7(5): 929-937

2020
[70]

Pujari M, Kanawati R. 2015. Link prediction in multiplex networks.Networks and Heterogeneous Media. 10(1): 17-35. 18

2015
[71]

Benson AR, Abebe R, Schaub MT, Jadbabaie A, Kleinberg J. 2018. Simplicial closure and higher- order link prediction.Proceedings of the National Academy of Sciences of the United States of America. 115(48): E11221-E11230

2018
[72]

Bian J, Zhou T, Bi Y. 2025. Unveiling the role of higher-order interactions via stepwise reduction. Communications Physics. 8(1): 228. 19

2025

[1] [1]

Vega-Redondo F. 2007. Complex social networks.Cambridge University Press

2007

[2] [2]

Gosak M, Markoviˇ c R, Dolenˇ sek J, Slak Rupnik M, Marhl M, Stoˇ zer A, Perc M. 2018. Network science of biological systems at different scales: A review.Physics of Life Reviews. 24: 118-135

2018

[3] [3]

Strogatz SH. 2001. Exploring complex networks.Nature. 410(6825): 268-276

2001

[4] [4]

Mitchell M. 2009. Complexity: A guided tour.Oxford University Press

2009

[5] [5]

Barab´ asi AL. 2013. Network science.Cambridge University Press

2013

[6] [6]

Newman MEJ. 2018. Networks.Oxford University Press

2018

[7] [7]

Kossinets G. 2006. Effects of missing data in social networks.Social Networks. 28(3): 247-268

2006

[8] [8]

L¨ u L, Zhou T. 2011. Link prediction in complex networks: A survey.Physica A: Statistical Mechanics and its Applications. 390(6): 1150-1170

2011

[9] [9]

Wang P, Xu BW, Wu YR, Zhou XY. 2015. Link prediction in social networks: The state-of-the-art. Science China Information Sciences. 58(1): 1-38

2015

[10] [10]

Mart´ ınez V, Berzal F, Cubero JC. 2016. A survey of link prediction in complex networks.ACM Computing Surveys. 49(4): 69

2016

[11] [11]

Kumar A, Singh SS, Singh K, Biswas B. 2020. Link prediction techniques, applications, and perfor- mance: A survey.Physica A: Statistical Mechanics and its Applications. 553: 124289. 14

2020

[12] [12]

Zhou T. 2021. Progresses and challenges in link prediction.iScience. 24(11): 103217

2021

[13] [13]

Wu H, Song C, Ge Y, Ge T. 2022. Link prediction on complex networks: An experimental survey. Data Science and Engineering. 7(3): 253-278

2022

[14] [14]

Xu E, Bi Y, Hu H, Chen X, Yu Z, Li Y, Hu Y, Zhou T. 2026. Predictability of complex systems. Physics Reports. 1166: 1-107

2026

[15] [15]

Wang WQ, Zhang QM, Zhou T. 2012. Evaluating network models: A likelihood analysis.EPL. 98: 28004

2012

[16] [16]

Zhang QM, Xu XK, Zhu YX, Zhou T. 2015. Measuring multiple evolution mechanisms of complex networks.Scientific Reports. 5: 10350

2015

[17] [17]

L¨ u L, Medo M, Yeung CH, Zhang YC, Zhang ZK, Zhou T. 2012. Recommender systems.Physics Reports. 519(1): 1-49

2012

[18] [18]

Bobadilla J, Ortega F, Hernando A, Guti´ errez A. 2013. Recommender systems survey.Knowledge- Based Systems. 46: 109-132

2013

[19] [19]

Bi Y, Wang P. 2022. Exploring drought-responsive crucial genes inSorghum.iScience. 25(11): 105347

2022

[20] [20]

Al Musawi AF, Roy S, Ghosh P. 2025. A review of link prediction applications in network biology. IEEE Access. 13: 54997-55016

2025

[21] [21]

Liben-Nowell D, Kleinberg J. 2007. The link-prediction problem for social networks.Journal of the American Society for Information Science and Technology. 58(7): 1019-1031

2007

[22] [22]

Adamic LA, Adar E. 2003. Friends and neighbors on the Web.Social Networks. 25(3): 211-230

2003

[23] [23]

Ou Q, Jin YD, Zhou T, Wang BH, Yin BQ. 2007. Power-law strength-degree correlation from resource- allocation dynamics on weighted networks.Physical Review E. 75(2): 021102

2007

[24] [24]

Zhou T, L¨ u L, Zhang YC. 2009. Predicting missing links via local information.The European Physical Journal B. 71(4): 623-630

2009

[25] [25]

Clauset A, Moore C, Newman MEJ. 2008. Hierarchical structure and the prediction of missing links in networks.Nature. 453(7191): 98-101

2008

[26] [26]

Guimer` a R, Sales-Pardo M. 2009. Missing and spurious interactions and the reconstruction of complex networks.Proceedings of the National Academy of Sciences of the United States of America. 106(52): 22073-22078

2009

[27] [27]

Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. 2009. The graph neural network model. IEEE Transactions on Neural Networks. 20(1): 61-80. 15

2009

[28] [28]

Hamilton WL, Ying R, Leskovec J. 2017. Inductive representation learning on large graphs. InPro- ceedings of the 31st International Conference on Neural Information Processing Systems. (Curran Associates, Inc.) pp. 1025-1035

2017

[29] [29]

Veliˇ ckovi´ c P, Cucurull G, Casanova A, Romero A, Li` o P, Bengio Y. 2018. Graph attention networks. InProceedings of the International Conference on Learning Representations. (ICLR) pp. 1-12

2018

[30] [30]

Kipf TN, Welling M. 2016. Variational graph auto-encoders. InProceedings of the NeurIPS Workshop on Bayesian Deep Learning. (PMLR) pp. 1-3

2016

[31] [31]

Jiao X, Luo Y, Bi Y, Zhou T. 2026. Impacts of data splitting strategies on parameterized link prediction algorithms.Physica A: Statistical Mechanics and its Applications. 692: 131545

2026

[32] [32]

He X, Ghasemian A, Lee E, Schwarze AC, Clauset A, Mucha PJ. 2024. Link prediction accuracy on real-world networks under non-uniform missing-edge patterns.PLoS ONE. 19(7): e0306883

2024

[33] [33]

Zhu YX, L¨ u L, Zhang QM, Zhou T. 2012. Uncovering missing links with cold ends.Physica A: Statistical Mechanics and its Applications. 391(22): 5769-5778

2012

[34] [34]

Ahmed N, Neville J, Kompella R. 2012. Network sampling designs for relational classification. In Proceedings of the International AAAI Conference on Web and Social Media. (AAAI) pp. 383-386

2012

[35] [35]

Neville J, Gallagher B, Eliassi-Rad T. 2009. Evaluating statistical tests for within-network classifiers of relational data. Inthe Proceedings of the 9th IEEE International Conference on Data Mining. (IEEE) pp. 397-406

2009

[36] [36]

Chebotarev P, Shamis E. 1997. The matrix-forest theorem and measuring relations in small social groups.Automation and Remote Control. 58(9): 1505-1514

1997

[37] [37]

Stealing fire or stacking knowledge

Muscoloni A, Cannistraci CV. 2023. “Stealing fire or stacking knowledge” by machine intelligence to model link prediction in complex networks.iScience. 26(1): 105697

2023

[38] [38]

L¨ u L, Jin CH, Zhou T. 2009. Similarity index based on local paths for link prediction of complex networks.Physical Review E. 80(4): 046122

2009

[39] [39]

Liu W, L¨ u L. 2010. Link prediction based on local random walk.EPL. 89(5): 58007

2010

[40] [40]

Katz L. 1953. A new status index derived from sociometric analysis.Psychometrika. 18(1): 39-43

1953

[41] [41]

Sun D, Zhou T, Liu JG, Stanley HE, Zhang YC. 2009. Information filtering based on transferring similarity.Physical Review E. 80(1): 017101

2009

[42] [42]

Zhou T. 2023. Discriminating abilities of threshold-free evaluation metrics in link prediction.Physica A: Statistical Mechanics and its Applications. 615: 128529. 16

2023

[43] [43]

Jiao X, Wan S, Liu Q, Bi Y, Lee YL, Xu E, Hao D, Zhou T. 2024. Comparing discriminating abilities of evaluation metrics in link prediction.Journal of Physics: Complexity. 5(2): 025014

2024

[44] [44]

Bi Y, Jiao X, Lee YL, Zhou T. 2024. Inconsistency among evaluation metrics in link prediction.PNAS Nexus. 3(11): pgae498

2024

[45] [45]

Wan S, Bi Y, Jiao X, Zhou T. 2025. Quantifying discriminability of evaluation metrics in link prediction for real networks.Chaos, Solitons & Fractals. 199: 116864

2025

[46] [46]

Hanley JA, McNeil BJ. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology. 143(1): 29-36

1982

[47] [47]

Davis J, Goadrich M. 2006. The relationship between Precision-Recall and ROC curves. InProceedings of the 23rd International Conference on Machine Learning. (ACM) pp. 233-240

2006

[48] [48]

J¨ arvelin K, Kek¨ al¨ ainen J. 2002. Cumulated gain-based evaluation of IR techniques.ACM Transactions on Information Systems. 20(4): 422-446

2002

[49] [49]

Bi Y, Bian J, Wan S, Wang S, Zhou T. 2026. Domain matters: Towards domain-informed evaluation for link prediction.Physica A: Statistical Mechanics and its Applications. 693: 131551

2026

[50] [50]

Watts DJ, Strogatz SH. 1998. Collective dynamics of ’small-world’ networks.Nature. 393(6684): 440- 442

1998

[51] [51]

Newman MEJ. 2002. Assortative mixing in networks.Physical Review Letters. 89(20): 208701

2002

[52] [52]

Schein AI, Popescul A, Ungar LH, Pennock DM. 2002. Methods and metrics for cold-start recommen- dations. InProceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (ACM) pp. 253-260

2002

[53] [53]

Lam XN, Vu T, Le TD, Duong AD. 2008. Addressing cold-start problem in recommendation sys- tems. InProceedings of the 2nd International Conference on Ubiquitous Information Management and Communication. (ACM) pp. 208-211

2008

[54] [54]

Zhang ZK, Liu C, Zhang YC, Zhou T. 2010. Solving the cold-start problem in recommender systems with social tags.EPL. 92(2): 28002

2010

[55] [55]

Lika B, Kolomvatsos K, Hadjiefthymiades S. 2014. Facing the cold start problem in recommender systems.Expert Systems with Applications. 41(4): 2065-2073

2014

[56] [56]

Rafiei D, Curial S. 2005. Effectively visualizing large networks through sampling. InIEEE Visualiza- tion. (IEEE) pp. 375-382. 17

2005

[57] [57]

2005 Subnets of scale-free networks are not scale-free: Sampling prop- erties of networks.Proceedings of the National Academy of Sciences of the United States of America

Stumpf MPH, Wiuf C, May RM. 2005 Subnets of scale-free networks are not scale-free: Sampling prop- erties of networks.Proceedings of the National Academy of Sciences of the United States of America. 102(12): 4221-4224

2005

[58] [58]

Leskovec J, Faloutsos C. 2006. Sampling from large graphs. InProceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (ACM) pp. 631-636

2006

[59] [59]

Gjoka M, Kurant M, Butts CT, Markopoulou A. 2010. Walking in Facebook: A case study of unbiased sampling of OSNs. In2010 Proceedings IEEE INFOCOM. (IEEE) pp. 1-9

2010

[60] [60]

Maiya AS, Berger-Wolf TY. 2010. Sampling community structure. InProceedings of the 19th Inter- national Conference on World Wide Web. (ACM) pp. 701-710

2010

[61] [61]

De Choudhury M, Lin YR, Sundaram H, Candan KS, Xie L, Kelliher A. 2010. How does the data sampling strategy impact the discovery of information diffusion in social media?. InProceedings of the International AAAI Conference on Web and Social Media. (AAAI) pp. 34-41

2010

[62] [62]

Lichtenwalter RN, Chawla NV. 2012. Link prediction: Fair and effective evaluation. In2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. (IEEE) pp. 376-383

2012

[63] [63]

Mitchell S, Potash E, Barocas S, D’Amour A, Lum K. 2021. Algorithmic fairness: Choices, assump- tions, and definitions.Annual Review of Statistics and Its Application. 8: 141-163

2021

[64] [64]

Murata T, Moriyasu S. 2007. Link prediction of social networks based on weighted proximity measures. InProceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence. (IEEE) pp. 85-88

2007

[65] [65]

L¨ u L, Zhou T. 2010. Link prediction in weighted networks: The role of weak ties.EPL. 89: 18001

2010

[66] [66]

Zhao J, Miao L, Yang J, Fang H, Zhang QM, Nie M, Holme P, Zhou T. 2015. Prediction of links and weights in networks by reliable routes.Scientific Reports. 5: 12261

2015

[67] [67]

Dunlavy DM, Kolda TG, Acar E. 2011. Temporal link prediction using matrix and tensor factoriza- tions.ACM Transactions on Knowledge Discovery from Data. 5(2): 10

2011

[68] [68]

Holme P, Saram¨ aki J. 2012. Temporal networks.Physics Reports. 519(3): 97-125

2012

[69] [69]

Tang D. Du W. Shekhtman L. Wang Y. Havlin S. Cao X. Yan G. 2020. Predictability of real temporal networks.National Science Review. 7(5): 929-937

2020

[70] [70]

Pujari M, Kanawati R. 2015. Link prediction in multiplex networks.Networks and Heterogeneous Media. 10(1): 17-35. 18

2015

[71] [71]

Benson AR, Abebe R, Schaub MT, Jadbabaie A, Kleinberg J. 2018. Simplicial closure and higher- order link prediction.Proceedings of the National Academy of Sciences of the United States of America. 115(48): E11221-E11230

2018

[72] [72]

Bian J, Zhou T, Bi Y. 2025. Unveiling the role of higher-order interactions via stepwise reduction. Communications Physics. 8(1): 228. 19

2025