pith. machine review for the scientific record. sign in

arxiv: 2604.25597 · v1 · submitted 2026-04-28 · 💻 cs.SI · cs.DL· physics.soc-ph· stat.AP

Recognition: unknown

Generating Synthetic Citation Networks with Communities

Authors on Pith no claims yet

Pith reviewed 2026-05-07 14:13 UTC · model grok-4.3

classification 💻 cs.SI cs.DLphysics.soc-phstat.AP
keywords citation networkssynthetic graph generationcommunity detection benchmarksPrice-Pareto modeldirected acyclic graphsnetwork growth modelsstochastic block modelsmesoscopic structure
0
0 comments X

The pith

The Citation Seeder generates realistic synthetic citation networks using up to four orders of magnitude fewer parameters than leading alternatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper systematically compares twelve generators for directed nearly acyclic graphs that include planted community structure, testing them against seven real citation networks on twenty-six metrics. It shows that simply reversing edge directions in static models improves their fit to citation flow and that high-parameter models tend to overfit by memorizing specific community statistics. The authors introduce the Citation Seeder, an iterative process based on the Price-Pareto preferential attachment rule, which matches the performance of the strongest baselines while remaining interpretable and computationally linear in the size of the output graph.

Core claim

The Citation Seeder algorithm iteratively constructs directed graphs by adding nodes and directing edges according to a Price-Pareto citation process. When evaluated on seven real citation networks using twenty-six metrics, it produces results competitive with the best-performing baselines while using up to four orders of magnitude fewer parameters. The paper further establishes that reversing edges in static generators breaks cycles and induces realistic flow, and that exogenous mesoscopic similarities between generated and real networks matter more for realism than endogenous ones; high-parameter models fail because they memorize planted community statistics rather than capturing general,,

What carries the argument

The Citation Seeder algorithm, an iterative generator that adds nodes and edges according to the Price-Pareto preferential attachment model with a small number of interpretable parameters and linear runtime.

If this is right

  • Reversing edge directions in static community generators produces more citation-like acyclic structures and improves performance of the degree-corrected stochastic block model.
  • Exogenous mesoscopic similarities are more important than endogenous ones when judging how realistic a generated network is.
  • Models with many free parameters overfit by memorizing the statistics of planted communities and therefore fail to produce realistic overall network structure.
  • The Citation Seeder supplies an interpretable framework that can both explain observed citation growth and forecast future network evolution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The small number of interpretable parameters could be estimated from early citation data to forecast how a new paper or patent will accumulate citations over time.
  • The same iterative construction might be adapted to generate synthetic patent or software-dependency networks that also exhibit near-acyclic flow.
  • Improved low-parameter benchmarks could make community-detection methods more robust when tested on networks whose community structure evolves rather than remaining fixed.
  • The distinction between endogenous and exogenous similarities suggests that future generators should prioritize matching global growth statistics over exact reproduction of any single planted partition.

Load-bearing premise

The chosen twenty-six metrics on seven real networks are enough to judge whether a synthetic citation network is realistic, and that the ground-truth communities are fixed external features rather than arising from the network's own growth rules.

What would settle it

Fit the Citation Seeder parameters to the first half of a real citation network's history, generate forward trajectories, and check whether the predicted citation counts and community evolution match the actual second half of the same network.

Figures

Figures reproduced from arXiv: 2604.25597 by Grzegorz Siudem, {\L}ukasz Brzozowski, Marek Gagolewski.

Figure 1
Figure 1. Figure 1: An example network generated by our algorithm with parameters estimated from the Cora citation view at source ↗
Figure 2
Figure 2. Figure 2: CS-generated vs theoretical in-degree distributions (Eq. view at source ↗
Figure 3
Figure 3. Figure 3: Global mean ranks with 95% bootstrap confidence intervals, obtained by block-resampling view at source ↗
Figure 4
Figure 4. Figure 4: Pairwise comparison: number of CS wins / ties / losses against its six nearest competitors, view at source ↗
Figure 5
Figure 5. Figure 5: Category-level performance scores for selected methods. Score view at source ↗
Figure 6
Figure 6. Figure 6: Heatmap presenting impact of cycle breaking per category. Each cell represents a percent of view at source ↗
Figure 7
Figure 7. Figure 7: Ablation: effect of back-edge injection on category scores. Positive differences (annotated) indicate view at source ↗
Figure 8
Figure 8. Figure 8: Per-dataset performance scores for CS, CS (DAG), and DC-SBM-nD. Left: all 26 metrics; right: view at source ↗
Figure 9
Figure 9. Figure 9: Performance-parsimony trade-off. Horizontal axis: geometric mean of parameter counts across view at source ↗
read the original abstract

Generating realistic synthetic citation, patent, or component dependency networks is essential for benchmarking community detection, graph visualisation, and network data mining algorithms. We present the first systematic comparison of generators of directed graphs that are nearly acyclic and have a ground-truth community structure. We evaluate 12 methods across 7 real citation networks and 26 metrics. We propose the practice of reversing directions of edges in static generators to break cycles and induce a citation-like flow, which significantly improves the performance of a degree-corrected Stochastic Block Model. Our novel methodological approach to evaluating community detection benchmarks distinguishes between endogenous and exogenous mesoscopic similarities, with the latter proving more important. This distinction reveals that high-parameter models suffer from overfitting by memorising planted community statistics which lead to their failing to produce realistic networks. Finally, we introduce the Citation Seeder (CS) algorithm, an iterative generator grounded in the Price-Pareto model of citation networks, with interpretable parameters and O(N+E) runtime. CS achieves competitive results against the best-performing baselines while using up to four orders of magnitude fewer parameters and providing a clean framework for explaining and predicting a network's future growth.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents the first systematic comparison of 12 generators for directed, nearly acyclic graphs with ground-truth community structure. It evaluates them across 7 real citation networks using 26 metrics, proposes reversing edge directions in static generators (improving degree-corrected SBM performance), distinguishes endogenous from exogenous mesoscopic similarities (finding the latter more important), critiques high-parameter models for overfitting planted communities, and introduces the Citation Seeder (CS) algorithm based on the Price-Pareto model. CS is claimed to achieve competitive results with up to four orders of magnitude fewer parameters, O(N+E) runtime, and a framework for explaining and predicting network growth.

Significance. If the results hold, the work supplies a parsimonious, interpretable generative model for citation networks that avoids the overfitting issues identified in high-parameter baselines. The endogenous/exogenous distinction offers a useful methodological lens for designing community-detection benchmarks, and the low-parameter count plus linear runtime would make CS practical for large-scale synthetic data generation in network science.

major comments (2)
  1. [Abstract / Evaluation] Abstract and evaluation description: The central claim that CS supplies a framework for predicting a network's future growth is not load-bearingly supported by the described evaluation. The 26 metrics are applied to static snapshots of 7 networks; temporal validation (e.g., citation-age distributions, attachment-rate evolution across time slices, or hold-out prediction of future edges) is required to test whether the iterative Price-Pareto process reproduces observed growth dynamics rather than merely matching static statistics.
  2. [Abstract] Abstract: The assertion that exogenous mesoscopic similarities are more important than endogenous ones, and that this explains why high-parameter models fail, needs explicit quantitative linkage to the 26 metrics. Which specific metrics demonstrate the overfitting of planted community statistics, and how does CS avoid this while still producing realistic networks?
minor comments (1)
  1. The abstract refers to '26 metrics' without enumerating or categorizing them (structural, community, temporal, etc.). A concise table or appendix listing the metrics and their groupings would improve reproducibility and allow readers to assess coverage of growth-related properties.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed report. The comments highlight important aspects of our evaluation and claims that we address point by point below. We propose targeted revisions to strengthen the manuscript without altering its core contributions.

read point-by-point responses
  1. Referee: [Abstract / Evaluation] Abstract and evaluation description: The central claim that CS supplies a framework for predicting a network's future growth is not load-bearingly supported by the described evaluation. The 26 metrics are applied to static snapshots of 7 networks; temporal validation (e.g., citation-age distributions, attachment-rate evolution across time slices, or hold-out prediction of future edges) is required to test whether the iterative Price-Pareto process reproduces observed growth dynamics rather than merely matching static statistics.

    Authors: We agree that the predictive capability of the CS framework would benefit from explicit temporal validation to fully substantiate the claim in the abstract. The Price-Pareto model is iterative by design, with parameters that directly control attachment rates and growth, enabling forward simulation in principle. However, the current experiments evaluate static match to real networks. We will add a new subsection with hold-out experiments: training CS parameters on early time slices of the citation networks and evaluating prediction of later edges via metrics such as precision@K for future citations and evolution of in-degree distributions. This will be reported alongside the existing 26 metrics. revision: yes

  2. Referee: [Abstract] Abstract: The assertion that exogenous mesoscopic similarities are more important than endogenous ones, and that this explains why high-parameter models fail, needs explicit quantitative linkage to the 26 metrics. Which specific metrics demonstrate the overfitting of planted community statistics, and how does CS avoid this while still producing realistic networks?

    Authors: The distinction between endogenous and exogenous mesoscopic structure is operationalized by comparing models that directly optimize or plant community statistics (e.g., degree-corrected SBM variants) against those that generate communities as an emergent outcome of the growth process (CS and certain baselines). Overfitting is evidenced by high-parameter models achieving strong scores on community-aware metrics such as modularity, conductance, and normalized mutual information with the planted labels, yet performing poorly on global structural metrics including degree distribution, clustering coefficient, and acyclicity measures. CS avoids this by using only a handful of interpretable parameters that govern preferential attachment and community seeding without directly fitting mesoscopic statistics. We will insert a new paragraph and reference table in the evaluation section that explicitly maps subsets of the 26 metrics to the endogenous/exogenous distinction and the overfitting diagnosis. revision: partial

Circularity Check

0 steps flagged

No significant circularity: CS derivation grounded in external Price-Pareto model with independent benchmarks

full rationale

The paper grounds the Citation Seeder (CS) in the established Price-Pareto citation model and evaluates generated networks against 7 independent real citation networks using 26 metrics. This provides external validation rather than reducing claims to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The distinction between endogenous/exogenous structure is a methodological contribution evaluated on held-out real data, and the predictive growth framework follows directly from the iterative generative process without tautological reduction. No equations or steps in the provided text exhibit the enumerated circular patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the paper relies on the Price-Pareto model as a domain assumption and introduces CS with its parameters.

free parameters (1)
  • parameters of CS
    Interpretable parameters but number not specified in abstract.
axioms (1)
  • domain assumption Citation networks follow the Price-Pareto model
    The CS is grounded in this model.

pith-pipeline@v0.9.0 · 5507 in / 1087 out tokens · 52771 ms · 2026-05-07T14:13:28.367327+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 35 canonical work pages · 1 internal anchor

  1. [1]

    Community detection and stochastic block models: Recent developments

    Abbe, E., 2018. Community detection and stochastic block models: Recent developments. Journal of Machine Learning Research 18, 1–86

  2. [2]

    Achieving the KS threshold in the general Stochastic Block Model with linearized acyclic belief propagation, in: Proc

    Abbé, E., Sandon, C., 2016. Achieving the KS threshold in the general Stochastic Block Model with linearized acyclic belief propagation, in: Proc. Neural Information Processing Systems (NIPS’16), pp. 1334–1342

  3. [3]

    Statistical mechanics of complex networks

    Albert, R., Barabási, A.L., 2002. Statistical mechanics of complex networks. Reviews of Modern Physics 74, 47–97. doi:10.1103/RevModPhys.74.47

  4. [4]

    Time -Related Outcome Following Palliative Spatially Fractionated Stereotactic Radiation Therapy (Lattice) of Large Tumors,

    Bertoli-Barsotti, L., Gagolewski, M., Siudem, G., Żogała Siudem, B., 2024. Equivalence of inequality indices in the three-dimensional model of informetric impact. Journal of Informetrics 18, 101566. doi:10.1016/j.joi.2024.101566

  5. [5]

    Competition and multiscaling in evolving networks

    Bianconi, G., Barabási, A.L., 2001. Competition and multiscaling in evolving networks. Europhysics Letters 54, 436. doi:10.1209/epl/i2001-00260-6

  6. [6]

    Deep Gaussian embedding of graphs: Unsupervised inductive learning via ranking, in: International Conference on Learning Representations (ICLR’18)

    Bojchevski, A., Günnemann, S., 2018. Deep Gaussian embedding of graphs: Unsupervised inductive learning via ranking, in: International Conference on Learning Representations (ICLR’18)

  7. [7]

    A probabilistic proof of an asymptotic formula for the number of labelled regular graphs

    Bollobás, B., 1980. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. European Journal of Combinatorics 1, 311–316. doi:10.1016/S0195-6698(80)80030-8

  8. [8]

    Graph generators

    Bonifati, A., Holubová, I., Prat-Pèrez, A., Sakr, S., 2020. Graph generators. ACM Computing Surveys 53, 1–30. doi:10. 1145/3379445

  9. [9]

    DynBenchmark: Customizable ground truths to benchmark community detection and tracking in temporal networks, in: Proc

    Brisson, L., Bothorel, C., Duminy, N., 2025. DynBenchmark: Customizable ground truths to benchmark community detection and tracking in temporal networks, in: Proc. France’s International Conference on Complex Systems (FRCCS 2025), pp. 74–85. doi:10.1007/978-3-032-00206-8_8

  10. [10]

    UnPACD: Unified patent and citation dataset.https://github.com/lukaszbrzozowski/unpacd

    Brzozowski, L., 2026. UnPACD: Unified patent and citation dataset.https://github.com/lukaszbrzozowski/unpacd

  11. [11]

    The Price-Pareto growth model of networks with community structure

    Brzozowski, L., Gagolewski, M., Siudem, G., Żogała Siudem, B., 2026. The Price-Pareto growth model of networks with community structure doi:10.48550/arxiv.2510.13392. under review (preprint). Brzozowski, Gagolewski, Siudem:Preprint; Last updated on April 29, 2026Page 21 of 23 Generating Synthetic Citation Networks with Communities

  12. [12]

    Recovering asymmetric communities in the Stochastic Block Model

    Caltagirone, F., Lelarge, M., Miolane, L., 2017. Recovering asymmetric communities in the Stochastic Block Model. IEEE Transactions on Network Science and Engineering 5, 237–246. doi:10.1109/tnse.2017.2758201

  13. [13]

    Generating directed graphs with dual attention and asymmetric encoding, in: The Fourteenth International Conference on Learning Representations

    Carballo-Castro, A., Madeira, M., QIN, Y., Thanou, D., Frossard, P., 2026. Generating directed graphs with dual attention and asymmetric encoding, in: The Fourteenth International Conference on Learning Representations

  14. [14]

    Relational topic models for document networks, in: van Dyk, D., Welling, M

    Chang, J., Blei, D., 2009. Relational topic models for document networks, in: van Dyk, D., Welling, M. (Eds.), Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, pp. 81–88

  15. [15]

    Community detection in subspace of attribute

    Chen, H., Yu, Z., Yang, Q., Shao, J., 2022. Community detection in subspace of attribute. Information Sciences 602, 220–235. doi:10.1016/j.ins.2022.04.047

  16. [16]

    Asymptotic analysis of the Stochastic Block Model for modular networks and its algorithmic applications

    Decelle, A., Krząkała, F., Moore, C., Zdeborová, L., 2011. Asymptotic analysis of the Stochastic Block Model for modular networks and its algorithmic applications. Physical Review E 84. doi:10.1103/physreve.84.066106

  17. [17]

    Random graph modeling: A survey of the concepts

    Drobyshevskiy, M., Turdakov, D., 2019. Random graph modeling: A survey of the concepts. ACM Computing Surveys 52, 131. doi:10.1145/3369782

  18. [18]

    A fast and effective heuristic for the feedback arc set problem

    Eades, P., Lin, X., Smyth, W., 1993. A fast and effective heuristic for the feedback arc set problem. Information Processing Letters 47, 319–323. doi:10.1016/0020-0190(93)90079-O

  19. [19]

    The use of ranks to avoid the assumption of normality implicit in the analysis of variance

    Friedman, M., 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association 32, 675–701

  20. [20]

    Mannhardt, A

    Gagolewski, M., 2022. A framework for benchmarking clustering algorithms. SoftwareX 20, 101270. doi:10.1016/j.softx. 2022.101270

  21. [21]

    Are cluster validity measures (in)valid? Information Sciences 581, 620–636

    Gagolewski, M., Bartoszuk, M., Cena, A., 2021. Are cluster validity measures (in)valid? Information Sciences 581, 620–636. doi:10.1016/j.ins.2021.10.004

  22. [22]

    doi: https://doi.org/10.1016/0378-8733(83)90021-7

    Holland, P.W., Laskey, K.B., Leinhardt, S., 1983. Stochastic blockmodels: First steps. Social Networks 5, 109–137. doi:10.1016/0378-8733(83)90021-7

  23. [23]

    The aging effect in evolving scientific citation networks

    Hu, F., Ma, L., Zhan, X.X., Zhou, Y., Liu, C., Zhao, H., Zhang, Z.K., 2021. The aging effect in evolving scientific citation networks. Scientometrics 126, 4297–4309. doi:10.1007/s11192-021-03929-8

  24. [24]

    Open Graph Benchmark: Datasets for machine learning on graphs

    Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M., Leskovec, J., 2020. Open Graph Benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems 33, 22118–22133

  25. [25]

    Cluster analysis: A modern statistical review

    Jaeger, A., Banks, D., 2023. Clusteranalysis: Amodernstatisticalreview. WileyInterdisciplinaryReviews: Computational Statistics 15, e1597. doi:10.1002/wics.1597

  26. [26]

    Random graph models for directed acyclic networks

    Karrer, B., Newman, M.E.J., 2009. Random graph models for directed acyclic networks. Physical Review E 80, 046110. doi:10.1103/PhysRevE.80.046110

  27. [27]

    Stochastic blockmodels with a growing number of classes

    Karrer, B., Newman, M.E.J., 2011. Stochastic blockmodels with a growing number of classes. Physical Review E 83, 016107. doi:10.1103/PhysRevE.83.016107

  28. [28]

    Benchmark graphs for testing community detection algorithms , volume =

    Lancichinetti, A., Fortunato, S., Radicchi, F., 2008. Benchmark graphs for testing community detection algorithms. Physical Review E 78. doi:10.1103/physreve.78.046110

  29. [29]

    A review of Stochastic Block Models and extensions for graph clustering

    Lee, C., Wilkinson, D., 2019. A review of Stochastic Block Models and extensions for graph clustering. Applied Network Science 4. doi:10.1007/s41109-019-0232-2

  30. [30]

    Bayesian testing for exogenous partition structures in Stochastic Block Models

    Legramanti, S., Rigon, T., Durante, D., 2022. Bayesian testing for exogenous partition structures in Stochastic Block Models. Sankhya A 84, 108–126. doi:10.1007/s13171-020-00231-2

  31. [31]

    SNAPDatasets: Stanfordlargenetworkdatasetcollection.http://snap.stanford.edu/data

    Leskovec, J., Krevl, A., 2014. SNAPDatasets: Stanfordlargenetworkdatasetcollection.http://snap.stanford.edu/data

  32. [32]

    Dirichlet graph variational autoencoder, in: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H

    Li, J., Yu, J., Li, J., Zhang, H., Zhao, K., Rong, Y., Cheng, H., Huang, J., 2020. Dirichlet graph variational autoencoder, in: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (Eds.), Advances in Neural Information Processing Systems, pp. 5274–5283

  33. [33]

    Hygen: Generating random graphs with hyperbolic communities

    Metzler, S., Miettinen, P., 2019. Hygen: Generating random graphs with hyperbolic communities. Applied Network Science 4. doi:10.1007/s41109-019-0166-8

  34. [34]

    The computer science and physics of community detection: Landscapes, phase transitions, and hardness

    Moore, C., 2017. The computer science and physics of community detection: Landscapes, phase transitions, and hardness. Bulletin of the EATCS 121

  35. [35]

    Multilayer network approach to modeling authorship influence on citation dynamics in physics journals

    Nanumyan, V., Gote, C., Schweitzer, F., 2020. Multilayer network approach to modeling authorship influence on citation dynamics in physics journals. Physical Review E 102. doi:10.1103/physreve.102.032303

  36. [36]

    ma-CODE: A multi-phase approach on community detection in evolving networks

    Nath, K., Shanmugam, R., Varadaranjan, V., 2021. ma-CODE: A multi-phase approach on community detection in evolving networks. Information Sciences 569, 326–343. doi:10.1016/j.ins.2021.02.068

  37. [37]

    Rezzolla and O

    Newman, M., 2018. Networks. Oxford University Press. doi:10.1093/oso/9780198805090.001.0001

  38. [38]

    The first-mover advantage in scientific publication

    Newman, M.E.J., 2009. The first-mover advantage in scientific publication. EPL (Europhysics Letters) 86, 68001–68001. doi:10.1209/0295-5075/86/68001

  39. [39]

    Networks of scientific papers

    Price, D., 1965. Networks of scientific papers. Science 149, 510–515. doi:10.1126/science.149.3683.510

  40. [40]

    The map equation

    Rosvall, M., Axelsson, D., Bergstrom, C.T., 2009. The map equation. The European Physical Journal Special Topics 178, 13–23. doi:10.1140/epjst/e2010-01179-1

  41. [41]

    Collective classification in network data

    Sen, P., Namata, G.M., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T., 2008. Collective classification in network data. AI Magazine 29, 93–106

  42. [42]

    Three dimensions of scientific impact

    Siudem, G., Żogała Siudem, B., Cena, A., Gagolewski, M., 2020. Three dimensions of scientific impact. Proceedings of the National Academy of Sciences of the United States of America (PNAS) 117, 13896–13900. doi:10.1073/pnas.2001064117

  43. [43]

    Community detection in directed acyclic graphs

    Speidel, L., Takaguchi, T., Masuda, N., 2015. Community detection in directed acyclic graphs. The European Physical Journal B 88. doi:10.1140/epjb/e2015-60226-y

  44. [44]

    Sun, J., Ajwani, D., Nicholson, P.K., Sala, A., Parthasarathy, S., 2017. Breaking cycles in noisy hierarchies, in: Proceedings Brzozowski, Gagolewski, Siudem:Preprint; Last updated on April 29, 2026Page 22 of 23 Generating Synthetic Citation Networks with Communities of the 2017 ACM on Web Science Conference, Association for Computing Machinery, New York,...

  45. [45]

    Arnetminer: Extraction and mining of academic social networks, in: KDD’08, pp

    Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z., 2008. Arnetminer: Extraction and mining of academic social networks, in: KDD’08, pp. 990–998

  46. [46]

    Scientific Reports9(1), 5233 (2019) https://doi

    Traag, V., Waltman, L., vanEck, N.J., 2019. FromLouvaintoLeiden: Guaranteeingwell-connectedcommunities. Scientific Reports 9, 5233. doi:10.1038/s41598-019-41695-z

  47. [47]

    Validation of cluster analysis results on validation data: A systematic framework

    Ullmann, T., Hennig, C., Boulesteix, A.L., 2022. Validation of cluster analysis results on validation data: A systematic framework. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12, e1444. doi:10.1002/widm.1444

  48. [48]

    A white paper on good research practices in benchmarking: The case of cluster analysis

    van Mechelen, I., Boulesteix, A.L., Dangl, R., et al., 2023. A white paper on good research practices in benchmarking: The case of cluster analysis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 13, e1511. doi:10. 1002/widm.1511

  49. [49]

    Quantifying long-term scientific impact

    Wang, D., Song, C., Barabási, A.L., 2013. Quantifying long-term scientific impact. Science 342, 127–132

  50. [50]

    Detectability of macroscopic structures in directed asymmetric Stochastic Block Model

    Wiliński, M., Mazzarisi, P., Tantari, D., Lillo, F., 2019. Detectability of macroscopic structures in directed asymmetric Stochastic Block Model. Physical Review E 99. doi:10.1103/physreve.99.042310

  51. [51]

    Community detection based on modularity and k-plexes

    Zhu, J., Chen, B., Zeng, Y., 2020. Community detection based on modularity and k-plexes. Information Sciences 513, 127–142. doi:10.1016/j.ins.2019.10.076. Brzozowski, Gagolewski, Siudem:Preprint; Last updated on April 29, 2026Page 23 of 23