pith. machine review for the scientific record. sign in

arxiv: 2605.15173 · v1 · submitted 2026-05-14 · 💻 cs.DS · cs.DB

Recognition: 1 theorem link

· Lean Theorem

Hybrid Sketching Methods for Dynamic Connectivity on Sparse Graphs

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:56 UTC · model grok-4.3

classification 💻 cs.DS cs.DB
keywords dynamic connectivitygraph sketchinghybrid algorithmsl0-samplingdynamic graphsspace efficiencyfully dynamicsemi-streaming
0
0 comments X

The pith

Hybrid sketching algorithms achieve dynamic connectivity space that matches the best of lossless and sketch-based methods on all graph densities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that sparse real-world graphs often have dense cores on few vertices holding most edges. Hybrid algorithms sketch only those cores while storing the sparse parts losslessly, yielding a space bound of O(min{V+E, V log V log(2+E/V)}) with high probability. This matches traditional lossless structures on sparse graphs and pure sketching on dense ones, while improving on both in between. A new BalloonSketch l0-sampler shrinks per-vertex sketch sizes by up to 8 times. The resulting system saves up to 15 percent space on sparse graphs, 92 percent on intermediate ones, and 97 percent on dense graphs compared to lossless baselines.

Core claim

The authors present hybrid algorithms for fully-dynamic and semi-streaming connectivity whose space is O(min{V+E, V log V log(2+E/V)}) w.h.p. By partitioning the graph into a sketched dense core and a lossless sparse periphery, and using a new l0-sampler called BalloonSketch that reduces sketch size by up to 8x, the methods achieve the minimum space of either pure approach and deliver practical savings on real-world graphs.

What carries the argument

The hybrid sketching partition that sketches only the dense core using BalloonSketch l0-samplers while maintaining the sparse periphery losslessly.

If this is right

  • Space usage matches the lossless bound O(V+E) on sparse graphs.
  • Space usage matches the sketching bound O(V log V log(2+E/V)) on dense graphs.
  • Space savings reach 15 percent on graphs with average degree under 100.
  • Space savings reach 92 percent on graphs with average degree 100 to 1000.
  • Space savings reach 97 percent on graphs with average degree over 1000.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Dynamic graph algorithms for other problems could benefit from similar core-periphery hybrids.
  • Real-world graph processing systems might adopt automatic detection of dense cores for space optimization.
  • Extensions to semi-streaming settings suggest broader applicability beyond fully dynamic updates.

Load-bearing premise

Real-world sparse graphs contain dense cores on a small subset of vertices that account for a large fraction of edges so that sketching only the core yields net space savings.

What would settle it

Finding a family of graphs where the dense parts still require sketching space comparable to the full graph without producing net savings from the hybrid approach.

Figures

Figures reproduced from arXiv: 2605.15173 by David Tench, Gilvir Gill, Laxman Dhulipala, Michael A. Bender, Quinten De Man.

Figure 1
Figure 1. Figure 1: A plot visualizing the space complexity of various [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An example of the partitioning of a graph by our hybrid framework with density threshold [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Peak memory usage of HybridSCALE, Cluster Forest, and CUPCaKE on various input graph streams (normalized to HybridSCALE). A bar with diagonal hashes indicates that the machine ran out of memory (OOM) while processing that input. GOOG FRND TWIT ORKT EN WIKI YT ROAD USA ROAD GER SIFT RS SIFT KNN MSSP RS MSSP KNN KRON 13 KRON 15 KRON 16 10 0 10 1 Update Time HybridSCALE Cluster Forest CUPCaKE CUPCaKE OOM [PI… view at source ↗
Figure 4
Figure 4. Figure 4: The total time taken by HybridSCALE, Cluster Forest, and CUPCaKE to process all updates for various input graph streams (normalized to HybridSCALE). The blue × indicates that CUPCaKE ran out of memory (OOM) during initialization. 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 2 10 Average Degree (2|E|/|V|) 10 8 10 9 10 10 10 11 Memory (B) HybridSCALE Cluster Forest CUPCaKE [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Memory usage [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
read the original abstract

Dynamic connectivity is a fundamental dynamic graph problem, and recent algorithmic breakthroughs on dynamic graph sketching have reshaped what is theoretically possible: by encoding the graph as per-vertex linear sketches, these algorithms solve dynamic connectivity in only $\Theta(V \log^2 V)$ space, independent of the number of edges,outperforming lossless $\Theta(V+E)$-space structures that grow as the graph becomes denser. Prior to this work, no practical dynamic connectivity algorithm has been able to translate these theoretical breakthroughs into space savings on real-world graphs. The main obstacle is that per-vertex sketches cost thousands of bytes per vertex, so sketching only pays off once the graph becomes extremely dense. We observe that sparse real-world graphs are often not uniformly sparse, these graphs can contain dense cores on a small subset of vertices that account for a large fraction of edges. We exploit this structure via hybrid sketching: sketch only the dense core, and store the sparse periphery losslessly. We design new hybrid algorithms for fully-dynamic and semi-streaming connectivity with space $O(\min\{V+E, V \log V \log(2+E/V)\})$ w.h.p., simultaneously matching the lossless bound on sparse graphs, the sketching bound on dense graphs, and improving on both in an intermediate regime. A key component is BalloonSketch, a new l0-sampler reducing per-vertex sketch sizes by up to 8x. We implement HybridSCALE, a modular system treating the lossless and sketch-based components as subroutines. HybridSCALE is the first sketch-based dynamic connectivity system to save space on common real-world graphs. Compared to the state-of-the-art lossless baseline, HybridSCALE saves up to 15% space on sparse graphs (average degree < 100), up to 92% on intermediate density graphs (average degree ~ 100-1000), and up to 97% on dense graphs (average degree > 1000).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents hybrid sketching algorithms for fully-dynamic and semi-streaming dynamic connectivity. It claims a space bound of O(min{V+E, V log V log(2+E/V)}) w.h.p. by sketching only dense cores while storing sparse peripheries losslessly, introduces the BalloonSketch l0-sampler to reduce per-vertex sketch size by up to 8x, and implements the HybridSCALE system that achieves up to 15% space savings on sparse real-world graphs, 92% on intermediate-density graphs, and 97% on dense graphs relative to a lossless baseline.

Significance. If the space bound holds after accounting for all dynamic maintenance costs and the empirical savings are reproducible, the work bridges theoretical sketching results with practical utility on non-uniform sparse graphs. It is the first sketch-based dynamic connectivity implementation to demonstrate net space savings on common real-world graphs rather than only on artificially dense instances.

major comments (2)
  1. [§4] §4 (Space Analysis) and the statement of the main theorem: the claimed O(min{V+E, V log V log(2+E/V)}) bound must explicitly charge the space and sampling overhead of dynamic vertex reclassification between the lossless periphery and the sketched core. The abstract and high-level description leave open whether degree-based moves incur per-update sketch initialization or extra samples that are not folded into the min expression.
  2. [§5.1] §5.1 (Experimental Methodology): the reported space savings (15% on avg-degree <100, 92% on avg-degree 100-1000) are measured against a lossless baseline, but the paper does not state whether the hybrid implementation uses identical update handling, hash functions, or memory allocators as the baseline; without this, the savings cannot be attributed solely to the hybrid structure.
minor comments (2)
  1. [Abstract] The notation log(2+E/V) in the space bound should be clarified as log(2 + E/V) to avoid ambiguity with log-base-2 of the ratio.
  2. [§3] BalloonSketch is introduced as reducing sketch sizes by up to 8x; the precise reduction factor and the parameter settings used in the experiments should be stated explicitly in §3.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address the major comments point by point below and will revise the manuscript to incorporate clarifications where needed.

read point-by-point responses
  1. Referee: [§4] §4 (Space Analysis) and the statement of the main theorem: the claimed O(min{V+E, V log V log(2+E/V)}) bound must explicitly charge the space and sampling overhead of dynamic vertex reclassification between the lossless periphery and the sketched core. The abstract and high-level description leave open whether degree-based moves incur per-update sketch initialization or extra samples that are not folded into the min expression.

    Authors: We appreciate this observation. The analysis in §4 already amortizes reclassification costs: a vertex moves from the lossless periphery to the sketched core only after a sequence of degree-increasing updates, and the sketch initialization overhead (including any additional samples required by BalloonSketch) is charged to those updates. This charging is folded into the O(V log V log(2+E/V)) term, while the min expression ensures the bound reverts to O(V+E) on uniformly sparse graphs without extra factors. To make the argument fully explicit, we will add a dedicated paragraph in the proof of the main theorem detailing the amortization for dynamic moves. revision: yes

  2. Referee: [§5.1] §5.1 (Experimental Methodology): the reported space savings (15% on avg-degree <100, 92% on avg-degree 100-1000) are measured against a lossless baseline, but the paper does not state whether the hybrid implementation uses identical update handling, hash functions, or memory allocators as the baseline; without this, the savings cannot be attributed solely to the hybrid structure.

    Authors: We thank the referee for noting this omission. The HybridSCALE implementation shares identical update-handling routines, hash-function implementations, and memory allocators between the hybrid and baseline versions, ensuring that the reported savings are attributable solely to the hybrid structure. We will add an explicit clarifying sentence in §5.1 documenting this shared infrastructure for reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity: hybrid bound is min of independent prior regimes

full rationale

The claimed space bound O(min{V+E, V log V log(2+E/V)}) is obtained by designing a hybrid that stores the sparse periphery losslessly and sketches only the dense core. This is the explicit minimum of two previously established regimes (lossless Θ(V+E) and sketching Θ(V log² V)), with the hybrid improving only in the intermediate-density regime. No equations reduce the bound to itself by construction, no parameters are fitted then renamed as predictions, and no load-bearing self-citations or imported uniqueness theorems appear in the provided claims or abstract. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on standard high-probability analysis for sketching and the empirical observation that real graphs have dense cores; no free parameters are introduced in the abstract.

axioms (1)
  • standard math High-probability bounds from sketching primitives hold as stated
    Invoked for the O(min{V+E, V log V log(2+E/V)}) space guarantee.
invented entities (2)
  • BalloonSketch no independent evidence
    purpose: l0-sampler that reduces per-vertex sketch size by up to 8x
    New primitive introduced to make sketching practical on sparser graphs.
  • HybridSCALE no independent evidence
    purpose: Modular implementation treating lossless and sketch components as subroutines
    System that realizes the hybrid algorithms.

pith-pipeline@v0.9.0 · 5666 in / 1313 out tokens · 35305 ms · 2026-05-15T02:56:46.810947+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 3 internal anchors

  1. [1]

    Acar, Daniel Anderson, Guy E

    Umut A. Acar, Daniel Anderson, Guy E. Blelloch, and Laxman Dhulipala. 2019. Parallel Batch-Dynamic Graph Connectivity. InProceedings of the 31st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 381–392. https://doi.org/10.1145/3323165.3323196

  2. [2]

    Acar, Daniel Anderson, Guy E

    Umut A. Acar, Daniel Anderson, Guy E. Blelloch, Laxman Dhulipala, and Sam Westrick. 2020. Parallel Batch-Dynamic Trees via Change Propaga- tion. In28th Annual European Symposium on Algorithms (ESA). 2:1–2:23. https://doi.org/10.4230/LIPIcs.ESA.2020.2

  3. [3]

    Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Analyzing graph structure via linear measurements. InProceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms. SIAM, 459–467

  4. [4]

    Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Graph sketches: sparsification, spanners, and subgraphs. InPODS. ACM, 5–14

  5. [5]

    Noga Alon, Yossi Matias, and Mario Szegedy. 1999. The Space Complexity of Approximating the Frequency Moments.J. Comput. System Sci.58, 1 (1999), 137–147. https://doi.org/10.1006/jcss.1997.1545

  6. [6]

    Ignacio Alvarez-Hamelin, Luca Dall’Asta, Alain Barrat, and Alessandro Vespignani

    J. Ignacio Alvarez-Hamelin, Luca Dall’Asta, Alain Barrat, and Alessandro Vespignani. 2005. Large Scale Networks Fingerprinting and Visualization Using the 𝑘-Core Decomposition. InAdvances in Neural Information Processing Systems (NeurIPS). 41–50

  7. [7]

    Ang, Brian W

    James A. Ang, Brian W. Barrett, Kyle B. Wheeler, and Richard C. Murphy

  8. [8]

    InCray User Group (CUG) Proceedings

    Introducing the Graph 500. InCray User Group (CUG) Proceedings. https://cug.org/5-publications/proceedings_attendee_lists/CUG10CD/pages/1- program/final_program/CUG10_Proceedings/pages/authors/11-15Wednesday/ 14C-Murphy-paper.pdf

  9. [9]

    Vladimir Batagelj and Matjaž Zaveršnik. 2003. An𝑂(𝑚) Algorithm for Cores Decomposition of Networks.arXiv preprint cs/0310049(2003)

  10. [10]

    Djamal Belazzougui, Gregory Kucherov, and Stefan Walzer. 2024. Better Space-Time-Robustness Trade-Offs for Set Reconciliation. In51st Interna- tional Colloquium on Automata, Languages, and Programming (ICALP 2024) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 297), Karl Bring- mann, Martin Grohe, Gabriele Puppis, and Ola Svensson (Eds.)...

  11. [11]

    Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compres- sion Techniques. InProceedings of the 13th International Conference on World Wide Web (WWW). ACM, 595–602. https://doi.org/10.1145/988672.988752

  12. [12]

    Borgatti and Martin G

    Stephen P. Borgatti and Martin G. Everett. 2000. Models of Core/Periphery Structures.Social Networks21, 4 (2000), 375–395. https://doi.org/10.1016/S0378- 8733(99)00019-2

  13. [13]

    Daniel G. Brown. 2011. How I Wasted Too Long Finding a Concentration Inequality for Sums of Geometric Variables. (2011)

  14. [14]

    CJ Carey, Jonathan Halcrow, Rajesh Jayaram, Vahab Mirrokni, Warren Schudy, and Peilin Zhong. 2022. Stars: Tera-Scale Graph Building for Clustering and Learning. InAdvances in Neural Information Processing Systems (NeurIPS)

  15. [15]

    Shai Carmi, Shlomo Havlin, Scott Kirkpatrick, Yuval Shavitt, and Eran Shir

  16. [16]

    https://doi.org/10.1073/pnas.0701175104

    A Model of Internet Topology Using𝑘-Shell Decomposition.Proceedings of the National Academy of Sciences (PNAS)104, 27 (2007), 11150–11154. https://doi.org/10.1073/pnas.0701175104

  17. [17]

    Moses Charikar, Kevin Chen, and Martin Farach-Colton. [n. d.]. Finding Frequent Items in Data Streams. ([n. d.])

  18. [18]

    Julia Chuzhoy, Yu Gao, Jason Li, Danupon Nanongkai, Richard Peng, and Thatchaphol Saranurak. 2020. A Deterministic Algorithm for Balanced Cut with Applications to Dynamic Connectivity, Flows, and Beyond. In61st Annual IEEE Symposium on Foundations of Computer Science (FOCS). 1158–1167. https://doi.org/10.1109/FOCS46700.2020.00111

  19. [19]

    Yann Collet. 2016. xxHash-Extremely fast non-cryptographic hash algorithm. URL https://github. com/Cyan4973/xxHash(2016)

  20. [20]

    Graham Cormode and Donatella Firmani. 2014. A unifying framework for l0-sampling algorithms.Distrib. Parallel Databases32, 3 (sep 2014), 315–335. https://doi.org/10.1007/s10619-013-7131-9

  21. [21]

    Graham Cormode and Shanmugavelayutham Muthukrishnan. 2005. An Improved Data Stream Summary: The Count-Min Sketch and Its Applications.Journal of Algorithms55, 1 (2005), 58–75. https://doi.org/10.1016/j.jalgor.2003.12.001

  22. [22]

    Davis and Yifan Hu

    Timothy A. Davis and Yifan Hu. 2011. The university of Florida sparse matrix collection.ACM Trans. Math. Softw.38, 1, Article 1 (dec 2011), 25 pages. https://doi.org/10.1145/2049662.2049663

  23. [23]

    Quinten De Man, Laxman Dhulipala, Adam Karczmarz, Jakub Łącki, Julian Shun, and Zhongqi Wang. 2024. Towards Scalable and Practical Batch- Dynamic Connectivity.Proc. VLDB Endow.18, 3 (Nov. 2024), 889–901. https://doi.org/10.14778/3712221.3712250

  24. [24]

    West, Michael A

    Quinten De Man, Qamber Jafri, Daniel Delayo, Evan T. West, Michael A. Bender, and David Tench. 2025. Fast and Compact Sketch-Based Dynamic Connectivity. arXiv:2509.14433 [cs.DS] https://arxiv.org/abs/2509.14433

  25. [25]

    Quinten De Man, Atharva Sharma, Kishen N Gowda, and Laxman Dhulipala

  26. [26]

    InProceedings of the 31st ACM SIGPLAN Annual Symposium on Prin- ciples and Practice of Parallel Programming(Sydney, NSW, Australia)(PPoPP ’26)

    UFO Trees: Practical and Provably-Efficient Parallel Batch-Dynamic Trees. InProceedings of the 31st ACM SIGPLAN Annual Symposium on Prin- ciples and Practice of Parallel Programming(Sydney, NSW, Australia)(PPoPP ’26). Association for Computing Machinery, New York, NY, USA, 109–122. https://doi.org/10.1145/3774934.3786431

  27. [27]

    Wei Dong, Moses Charikar, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. InProceedings of the 20th International Conference on World Wide Web (WWW)

  28. [28]

    Efraimidis and Paul G

    Pavlos S. Efraimidis and Paul G. Spirakis. 2006. Weighted Random Sampling with a Reservoir.Inform. Process. Lett.97, 5 (2006), 181–185. https://doi.org/10.1016/j.ipl.2005.11.003

  29. [29]

    Italiano, and Amnon Nissenzweig

    David Eppstein, Zvi Galil, Giuseppe F. Italiano, and Amnon Nissenzweig. 1997. Sparsification—A Technique for Speeding Up Dynamic Graph Algorithms.J. ACM44, 5 (1997), 669–696. https://doi.org/10.1145/265910.265914

  30. [30]

    Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On Power-Law Relationships of the Internet Topology.ACM SIGCOMM Computer Communication Review29, 4 (1999), 251–262. https://doi.org/10.1145/316194.316229

  31. [31]

    Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. 2005. On Graph Problems in a Semi-Streaming Model.Theor. Comput. Sci.348, 2 (Dec. 2005), 207–216. https://doi.org/10.1016/j.tcs.2005.09.013

  32. [32]

    Frederickson

    Greg N. Frederickson. 1985. Data Structures for On-Line Updating of Minimum Spanning Trees, with Applications.SIAM J. Comput.14, 4 (1985), 781–798. https://doi.org/10.1137/0214055

  33. [33]

    David Gibb, Bruce Kapron, Valerie King, and Nolan Thorn. 2015. Dynamic graph connectivity with improved worst case update time and sublinear space. arXiv:1509.06464 [cs.DS]

  34. [34]

    Michael T Goodrich and Michael Mitzenmacher. 2011. Invertible bloom lookup tables. In2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 792–799

  35. [35]

    Henzinger and Valerie King

    Monika R. Henzinger and Valerie King. 1999. Randomized Fully Dynamic Graph Algorithms with Polylogarithmic Time per Operation.J. ACM46, 4 (1999), 502–516. https://doi.org/10.1145/320211.320215

  36. [36]

    Henzinger and Mikkel Thorup

    Monika R. Henzinger and Mikkel Thorup. 1997. Sampling to Provide or to Bound: With Applications to Fully Dynamic Graph Algorithms.Random Structures & Algorithms11, 4 (1997), 369–379. https://doi.org/10.1002/(SICI)1098- 2418(199712)11:4<369::AID-RSA5>3.0.CO;2-V

  37. [37]

    Jacob Holm, Kristian de Lichtenberg, and Mikkel Thorup. 2001. Poly- Logarithmic Deterministic Fully-Dynamic Algorithms for Connectivity, Minimum Spanning Tree, 2-Edge, and Biconnectivity.J. ACM48, 4 (2001), 723–760. https://doi.org/10.1145/502090.502095

  38. [38]

    Jacob Holm, Eva Rotenberg, and Mikkel Thorup. 2018. Dynamic Bridge- Finding in ˜𝑂(log 2 𝑛) Amortized Time. InProceedings of the 29th An- nual ACM-SIAM Symposium on Discrete Algorithms (SODA). 35–52. https://doi.org/10.1137/1.9781611975031.3

  39. [39]

    Shang-En Huang, Dawei Huang, Tsvi Kopelowitz, Seth Pettie, and Mikkel Thorup

  40. [40]

    https://doi.org/10.46298/theoretics.23.6

    Fully Dynamic Connectivity in𝑂(log𝑛(log log𝑛) 2 ) Amortized Expected Time.Theoretics2 (2023), 6:1–6:56. https://doi.org/10.46298/theoretics.23.6

  41. [41]

    Kapron, Valerie King, and Ben Mountjoy

    Bruce M. Kapron, Valerie King, and Ben Mountjoy. 2013. Dynamic graph con- nectivity in polylogarithmic worst case time. InProceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). SIAM, 1131–1142

  42. [42]

    Karp, Scott Shenker, and Christos H

    Richard M. Karp, Scott Shenker, and Christos H. Papadimitriou. 2003. A Simple Algorithm for Finding Frequent Elements in Streams and Its Application to Load Balancing. InProceedings of the 15th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 1–10. https://doi.org/10.1145/777412.777415

  43. [43]

    Gallos, Shlomo Havlin, Fredrik Liljeros, Lev Muchnik, H

    Maksim Kitsak, Lazaros K. Gallos, Shlomo Havlin, Fredrik Liljeros, Lev Muchnik, H. Eugene Stanley, and Hernán A. Makse. 2010. Identification of Influen- tial Spreaders in Complex Networks.Nature Physics6, 11 (2010), 888–893. https://doi.org/10.1038/nphys1746

  44. [44]

    Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. InProceedings of the 19th International Conference on World Wide Web (WWW)

  45. [45]

    Julian J McAuley and Jure Leskovec. 2012. Learning to discover social circles in ego networks.. InNIPS, Vol. 2012. Citeseer, 548–56

  46. [46]

    Andrew McGregor. 2014. Graph stream algorithms: a survey.ACM SIGMOD Record43, 1 (2014), 9–20

  47. [47]

    Gummadi, Peter Druschel, and Bobby Bhattacharjee

    Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and Analysis of Online Social Networks. InProceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC). 29–42. https://doi.org/10.1145/1298306.1298311

  48. [48]

    Jayadev Misra and David Gries. 1982. Finding Repeated Elements.Science of Computer Programming2, 2 (1982), 143–152. https://doi.org/10.1016/0167- 6423(82)90012-0

  49. [49]

    Muthukrishnan

    S. Muthukrishnan. 2005. Data Streams: Algorithms and Applications.Foun- dations and Trends®in Theoretical Computer Science1, 2 (2005), 117–236. https://doi.org/10.1561/0400000002

  50. [50]

    Danupon Nanongkai, Thatchaphol Saranurak, and Christian Wulff-Nilsen. 2017. Dynamic Minimum Spanning Forest with Subpolynomial Worst-Case Update Time. InIEEE 58th Annual Symposium on Foundations of Computer Science (FOCS). 950–961. https://doi.org/10.1109/FOCS.2017.92

  51. [51]

    M. E. J. Newman. 2003. The Structure and Function of Complex Networks.SIAM Rev.45, 2 (2003), 167–256. https://doi.org/10.1137/S003614450342480

  52. [52]

    OpenStreetMap contributors. 2017. Planet dump retrieved from https://planet.osm.org . https://www.openstreetmap.org

  53. [53]

    Rossi and Nesreen K

    Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repos- itory with Interactive Graph Analytics and Visualization. InAAAI. https://networkrepository.com

  54. [54]

    Stephen B. Seidman. 1983. Network Structure and Minimum Degree.Social Networks5, 3 (1983), 269–287. https://doi.org/10.1016/0378-8733(83)90028-X

  55. [55]

    George Shanthikumar (Eds.)

    Moshe Shaked and J. George Shanthikumar (Eds.). 2007.Stochastic Orders. Springer, New York, NY. https://doi.org/10.1007/978-0-387-34675-5

  56. [56]

    Harsha Vardhan Simhadri, George Williams, Madlib-Abhimanyu, et al . 2021. Results of the NeurIPS’21 Challenge on Billion-Scale Approximate Nearest Neighbor Search. InNeurIPS 2021 Competitions and Demonstrations Track. PMLR

  57. [57]

    Daniel D Sleator and Robert Endre Tarjan. 1983. A data structure for dynamic trees.J. Comput. System Sci.26, 3 (1983)

  58. [58]

    Sleator and Robert E

    Daniel D. Sleator and Robert E. Tarjan. 1985. Self-Adjusting Binary Search Trees. J. ACM32, 3 (1985), 652–686. https://doi.org/10.1145/3828.3835

  59. [59]

    Bender, Abiyaz Chowdhury, J

    David Tench, Evan West, Victor Zhang, Michael A. Bender, Abiyaz Chowdhury, J. Ahmed Dellas, Martin Farach-Colton, Tyler Seip, and Kenny Zhang. 2022. GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph Streams. InProceedings of the 2022 International Conference on Man- agement of Data(Philadelphia, PA, USA)(SIGMOD ’22). Asso...

  60. [60]

    West, Kenny Zhang, Michael Bender, Daniel DeLayo, Martin Farach-Colton, Gilvir Gill, Tyler Seip, and Victor Zhang

    David Tench, Evan T. West, Kenny Zhang, Michael Bender, Daniel DeLayo, Martin Farach-Colton, Gilvir Gill, Tyler Seip, and Victor Zhang. 2024. Exploring the Landscape of Distributed Graph Sketching. arXiv:2410.07518 [cs.DC] https://arxiv.org/abs/2410.07518

  61. [61]

    Mikkel Thorup. 2000. Near-Optimal Fully-Dynamic Graph Connectivity. In Proceedings of the 32nd Annual ACM Symposium on Theory of Computing (STOC). 343–350. https://doi.org/10.1145/335305.335345

  62. [62]

    Sketching as a Tool for Numerical Linear Algebra

    David P. Woodruff. 2014. Sketching as a Tool for Numerical Linear Algebra. Foundations and Trends®in Theoretical Computer Science10, 1-2 (2014), 1–157. https://doi.org/10.1561/0400000060 arXiv:1411.4357 [cs]

  63. [63]

    Christian Wulff-Nilsen. 2013. Faster deterministic fully-dynamic graph connectiv- ity. InProceedings of the twenty-fourth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 1757–1769

  64. [64]

    Lantian Xu, Dong Wen, Lu Qin, Ronghua Li, Ying Zhang, and Xuemin Lin. 2024. Constant-time Connectivity Querying in Dynamic Graphs.Proc. ACM Manag. Data2, 6, Article 230 (Dec. 2024), 23 pages. https://doi.org/10.1145/3698805

  65. [65]

    Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating Network Communities based on Ground-truth. InIEEE International Conference on Data Mining (ICDM)

  66. [66]

    ∑︁ 𝑧∈𝑍 1{𝑧∈𝐵 𝑖 } ! ≥1 # ≤E

    Shangdi Yu, Yan Gu, and Julian Shun. 2023. ParlayANN: Scalable and Deterministic Algorithms for Approximate Nearest Neighbor Search. InProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP). ABalloonSketchAnalysis A.1 Probability Preliminaries A.1.1 Stochastic Dominance. Definition A.1 (Stochastic D...