pith. sign in

arxiv: 1906.11518 · v1 · pith:R5BO3F4Enew · submitted 2019-06-27 · 💻 cs.DB

A Survey and Experimental Analysis of Distributed Subgraph Matching

Pith reviewed 2026-05-25 14:07 UTC · model grok-4.3

classification 💻 cs.DB
keywords distributed subgraph matchingstrategy comparisonTimely dataflowexperimental analysisunlabelled matchinglabelled matchingpractical guidegraph query processing
0
0 comments X

The pith

Reimplementing four distributed subgraph matching strategies inside one dataflow system isolates their performance differences and produces a practical selection guide.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extracts four core strategies and three shared optimizations from existing distributed subgraph matching algorithms. It then codes every strategy together with the optimizations on the single Timely dataflow platform so that differences in results can be attributed to strategy rather than platform. Large-scale experiments on both unlabelled and labelled graphs test the variants across many graph sizes, query patterns, and hardware settings. The measured trade-offs are condensed into concrete advice on which strategy to use for a given workload. A sympathetic reader cares because subgraph matching is a fundamental operation in large-scale graph analytics, yet earlier comparisons had mixed strategy effects with unrelated implementation choices.

Core claim

The authors identify four strategies and three general-purpose optimizations from representative state-of-the-art works. They implement the four strategies with the optimizations based on the common Timely dataflow system for systematic strategy-level comparison. Their implementation covers all representation algorithms. They conduct extensive experiments for both unlabelled matching and labelled matching to analyze the performance of distributed subgraph matching under various settings, which is finally summarized as a practical guide.

What carries the argument

Classification of algorithms into four strategies plus three optimizations, all realized inside the same Timely dataflow engine so that strategy effects can be measured in isolation.

If this is right

  • Performance rankings among the four strategies become directly comparable because platform differences are removed.
  • The practical guide indicates which strategy wins for particular combinations of graph scale, query shape, and label presence.
  • The three optimizations improve every strategy, yet their relative value changes with the underlying strategy.
  • Results apply separately to unlabelled and labelled matching workloads.
  • The experimental methodology covers enough settings to support workload-specific recommendations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same isolation technique could be applied to other distributed graph primitives such as triangle listing or reachability queries.
  • If a new strategy appears, it can be inserted into the Timely framework and ranked against the existing four without re-running all prior experiments.
  • Communication volume and load balance patterns exposed by the common runtime may explain why certain strategies scale better on particular graphs.
  • The guide could be validated by deploying its recommended strategy on a production workload and measuring end-to-end query latency.

Load-bearing premise

Re-implementing the original algorithms inside Timely dataflow faithfully reproduces their performance characteristics without introducing new biases from the shared runtime.

What would settle it

If the original published implementations of the four strategies, run on identical hardware and inputs, produce different relative performance orderings than the Timely reimplementations, the claim of faithful strategy-level comparison would be falsified.

Figures

Figures reproduced from arXiv: 1906.11518 by Jingren Zhou, Kongzhang Hao, Longbin Lai, Lu Qin, Ran Wang, Wenjie Zhang, Xin Jin, Xuemin Lin, Ying Zhang, Zhengmin Lai, Zhengping Qian, Zhengyi Yang, Zhu Qing.

Figure 1
Figure 1. Figure 1: Query Graph Q (Left) and Data Graph G (Right). By treating the query vertices as attributes and data edges as relational table, we can write subgraph matching query as a multiway-way join of the edge relations. For example, regardless of label and order constraints, the query of Example 2.1 can be written as the following join R(Q) = E(v1, v2) on E(v2, v3) on E(v3, v4) on E(v1, v4) on E(v2, v4). (1) This m… view at source ↗
Figure 2
Figure 2. Figure 2: Simple BINJOIN Join Plans. To improve the performance of BINJOIN, people devoted their efforts into: (1) using more complex base relations other than edge; (2) devising better join plan P. The base relations B[q] represent the matches of a set of sub-structures [q] of the query graph Q. Each p ∈ [q] is called a join unit, and it must satisfy VQ = S p∈[q] Vp and EQ = S p∈[q] Ep. With the data graph partitio… view at source ↗
Figure 3
Figure 3. Figure 3: The Core-Crystal Decomposition of the query graph. With core-crystal decomposition, the computation has accordingly split into three stages: 1. Core computation. Given that Q(V c Q) itself is a query graph, the algorithm can be recursively applied to compute Q(V c Q) according to [45]. 2. Crystal computation. A special case of crystal is Q(x, 1), which is indeed a (x + 1)-clique. Suppose an instance of the… view at source ↗
Figure 4
Figure 4. Figure 4: The unlabelled queries. q2 q5 10 0 10 1 10 2 10 3 10 4 >3h T (sec) (a) BINJOIN on US q2 q5 10 0 10 1 10 2 10 3 10 4 >3h T (sec) (b) BINJOIN on LJ q2 q5 10 0 10 1 10 2 10 3 10 4 >3h T (sec) (c) WOPTJOIN on US q2 q5 10 0 10 1 10 2 10 3 10 4 >3h T (sec) (d) WOPTJOIN on LJ [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effectiveness of optimizations. on US. The reason is that US is a sparse dataset with few room for Compression, while Compression itself incurs extra cost. We also compare BINJOIN with BINJOIN(w.o.c.) on the other sparse graph EU, and the results are the same. For WOPTJOIN strategy, Batching has little impact to the performance. Surprisingly, after using TrIndexing to WOPTJOIN, the improvement by average i… view at source ↗
Figure 6
Figure 6. Figure 6: Challenging queries. FULLREP beats all the other strategies, while SHRCUBE fails q8 and q9 on GO because of OT. Although SHRCUBE uses the same local algorithm as FULLREP, it spends a lot of time on deduplication (Section 3.3). We focus on comparing BINJOIN and WOPTJOIN on GO dataset. On the one hand, WOPTJOIN outperforms BINJOIN for q7 and q8. Their join plans of q7 are nearly the same except that BINJOIN … view at source ↗
Figure 7
Figure 7. Figure 7: All-around comparisons. FULLREP typically outperforms the other strategies. Observe that WOPTJOIN’s performance is often very close to FULLREP. The reason is that the WOPTJOIN’s computing plans for these evaluated queries are similar to “DualSim” adopted by FULLREP. The extra communication cost of WOPTJOIN has been reduced to very low while adopting TrIndexing optimization. While comparing WOPTJOIN with BI… view at source ↗
Figure 8
Figure 8. Figure 8: Labelled queries. Queries. The queries, shown in [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: All-around comparisons of labelled matching. [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: A practical guide of distributed subgraph matching. The unlabelled case is also known as subgraph listing/enumeration, and due to the gigantic (intermediate) results, people have been either seeking scalable algorithms in parallel, or devising techniques to compress the results. Other than the algorithms studied in this paper (Section 3), Kim et al. proposed the external-memory-based parallel algorithm DU… view at source ↗
Figure 11
Figure 11. Figure 11: Scalability experiment: querying q1 and q2 on LJ. All strategies demonstrate reasonable scaling regarding both queries. In terms of COST, note that FULLREP is slightly larger than 1, because “DualSim” is implemented in general for arbitrary query, while “SingleThread” uses a hand￾24 [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Varying densities of labelled graph. Exp-8 Vary Labels for Labelled Matching. We generate the datasets with number of labels 0, 5, 10, 15 and 20 based on DG10. Note that there are 5 labels in labelled queries q4 and q7, which are called the target labels. The 10-label dataset is the original DG10. For the one with 5 labels, we will replace each label not in the target labels as one random target label. Fo… view at source ↗
Figure 13
Figure 13. Figure 13: Varying labels of labelled graph [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗
read the original abstract

Recently there emerge many distributed algorithms that aim at solving subgraph matching at scale. Existing algorithm-level comparisons failed to provide a systematic view to the pros and cons of each algorithm mainly due to the intertwining of strategy and optimization. In this paper, we identify four strategies and three general-purpose optimizations from representative state-of-the-art works. We implement the four strategies with the optimizations based on the common Timely dataflow system for systematic strategy-level comparison. Our implementation covers all representation algorithms. We conduct extensive experiments for both unlabelled matching and labelled matching to analyze the performance of distributed subgraph matching under various settings, which is finally summarized as a practical guide.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript identifies four strategies and three general-purpose optimizations from representative state-of-the-art works on distributed subgraph matching. It re-implements the four strategies (with the optimizations) inside the common Timely dataflow system to enable systematic strategy-level comparison, states that the implementation covers all representation algorithms, conducts extensive experiments on both unlabelled and labelled matching under various settings, and distills the results into a practical guide.

Significance. If the re-implementations faithfully preserve the performance-relevant behavior of the original algorithms, the work supplies a valuable platform-independent comparison that existing algorithm-level studies have not provided. The shared Timely runtime is a methodological strength for isolating strategy differences. The empirical scope (unlabelled and labelled cases, multiple settings) and the resulting practical guide would be useful to the community.

major comments (2)
  1. [§4 (Implementation)] §4 (Implementation): the manuscript asserts that the Timely re-implementations cover all representation algorithms but supplies no external validation (e.g., runtime or scalability numbers on shared benchmarks compared against the original published results). This gap is load-bearing because the entire strategy comparison and the derived practical guide rest on the assumption that observed differences reflect the strategies rather than platform-specific artifacts.
  2. [§5 (Experiments)] §5 (Experiments): without fidelity checks on the re-implementations, it is impossible to determine whether performance differences under varying settings (graph size, label density, etc.) arise from the four strategies or from unstated choices in the common Timely runtime such as partitioning, batching, or indexing.
minor comments (1)
  1. [Abstract] The abstract would benefit from naming the four strategies and three optimizations explicitly rather than referring to them only by count.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of the paper's contributions and for the constructive major comments. We address each point below regarding the validation of the Timely re-implementations.

read point-by-point responses
  1. Referee: [§4 (Implementation)] the manuscript asserts that the Timely re-implementations cover all representation algorithms but supplies no external validation (e.g., runtime or scalability numbers on shared benchmarks compared against the original published results). This gap is load-bearing because the entire strategy comparison and the derived practical guide rest on the assumption that observed differences reflect the strategies rather than platform-specific artifacts.

    Authors: We agree that external validation against original published results would strengthen confidence in the re-implementations. However, the majority of the source papers do not provide open-source implementations, which precludes direct runtime or scalability comparisons on identical benchmarks and hardware. Our re-implementations are derived strictly from the algorithmic descriptions and pseudocode in the original works, preserving the four core strategies while standardizing the runtime. In the revision we will expand §4 with an explicit discussion of implementation fidelity, including any simplifications made and why they do not alter the strategy-level behavior being compared. revision: partial

  2. Referee: [§5 (Experiments)] without fidelity checks on the re-implementations, it is impossible to determine whether performance differences under varying settings (graph size, label density, etc.) arise from the four strategies or from unstated choices in the common Timely runtime such as partitioning, batching, or indexing.

    Authors: All four strategy implementations share the identical Timely dataflow configuration, including the same partitioning scheme, batch sizes, and indexing structures; only the query-processing logic differs according to the extracted strategy. We will revise §4 and §5 to make this shared configuration explicit and to state that any observed performance variation is therefore attributable to the strategies rather than to runtime parameters. We will also add a short paragraph confirming that the same Timely version and configuration flags were used throughout the experimental campaign. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical survey with re-implementations on shared runtime

full rationale

The paper is a survey and experimental analysis that extracts four strategies and three optimizations from prior literature, re-implements them inside a common Timely dataflow system, runs head-to-head experiments on labelled and unlabelled matching, and condenses results into a practical guide. No equations, derivations, or first-principles claims appear. No self-definitional loops, fitted parameters renamed as predictions, load-bearing self-citations, uniqueness theorems imported from the authors' prior work, or ansatzes smuggled via citation are present. The comparison rests on external experimental outcomes rather than reducing to its own inputs by construction. The methodological assumption that re-implementations preserve original performance characteristics is a standard empirical concern but does not instantiate any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a survey and experimental comparison; it introduces no new mathematical derivations, fitted parameters, axioms, or postulated entities.

pith-pipeline@v0.9.0 · 5663 in / 1071 out tokens · 23205 ms · 2026-05-25T14:07:59.831616+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

  1. [1]

    http://www.dis.uniroma1.it/challenge9

    The challenge9 datasets. http://www.dis.uniroma1.it/challenge9

  2. [2]

    https://lemurproject.org/clueweb12

    The clubweb12 dataset. https://lemurproject.org/clueweb12

  3. [3]

    https://en.wikipedia.org/wiki/Sparse_matrix

    Compressed sparse row. https://en.wikipedia.org/wiki/Sparse_matrix

  4. [4]

    http://giraph.apache.org/

    Giraph. http://giraph.apache.org/

  5. [5]

    https://github.com/frankmcsherry/dataflow-join/

    The implementation of bigjoin. https://github.com/frankmcsherry/dataflow-join/

  6. [6]

    http://ldbcouncil.org/benchmarks

    Ldbc benchmarks. http://ldbcouncil.org/benchmarks

  7. [7]

    https://ldbc.github.io/ldbc_snb_docs/ ldbc-snb-specification.pdf

    The ldbc social network benchmark. https://ldbc.github.io/ldbc_snb_docs/ ldbc-snb-specification.pdf

  8. [8]

    https://github.com/frankmcsherry/timely-dataflow

    The open-sourced timely dataflow system. https://github.com/frankmcsherry/timely-dataflow

  9. [9]

    http://snap.stanford.edu/data/index.html

    The snap datasets. http://snap.stanford.edu/data/index.html

  10. [10]

    http://law.di.unimi.it/datasets.php

    The webgraph datasets. http://law.di.unimi.it/datasets.php

  11. [11]

    C. R. Aberger, S. Tu, K. Olukotun, and C. Ré. Emptyheaded: A relational engine for graph processing. SIGMOD ’16, pages 431–446

  12. [12]

    F. N. Afrati, D. Fotakis, and J. D. Ullman. Enumerating subgraph instances using map-reduce. In Proc. of ICDE’13, 2013

  13. [13]

    Ammar, F

    K. Ammar, F. McSherry, S. Salihoglu, and M. Joglekar. Distributed evaluation of subgraph queries using worst- case optimal low-memory dataflows. PVLDB, 11(6):691–704, 2018

  14. [14]

    Angles, M

    R. Angles, M. Arenas, P. Barceló, A. Hogan, J. Reutter, and D. Vrgo ˇc. Foundations of modern query languages for graph databases. ACM Comput. Surv., 50(5), 2017

  15. [15]

    Beame, P

    P. Beame, P. Koutris, and D. Suciu. Communication steps for parallel query processing. J. ACM, 64(6):40:1– 40:58, 2017

  16. [16]

    F. Bi, L. Chang, X. Lin, L. Qin, and W. Zhang. Efficient subgraph matching by postponing cartesian products. SIGMOD ’16, pages 1199–1214, 2016

  17. [17]

    Carbone, A

    P. Carbone, A. Katsifodimos, Kth, S. Sweden, S. Ewen, V . Markl, S. Haridi, and K. Tzoumas. Apache flink TM: Stream and batch processing in a single engine. 38, 01 2015

  18. [18]

    Chakrabarti, Y

    D. Chakrabarti, Y . Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SDM, 2004

  19. [19]

    Choudhury, L

    S. Choudhury, L. B. Holder, G. Chin, K. Agarwal, and J. Feo. A selectivity based approach to continuous pattern detection in streaming graphs. In EDBT, 2015

  20. [20]

    S. Chu, M. Balazinska, and D. Suciu. From theory to practice: Efficient join query evaluation in a parallel database system. SIGMOD ’15, pages 63–78

  21. [21]

    F. R. K. Chung, L. Lu, and V . H. Vu. The spectra of random graphs with given expected degrees. Internet Mathematics, 1(3), 2003

  22. [22]

    D. J. DeWitt and J. Gray. Parallel database systems: The future of database processing or a passing fad?SIGMOD Rec., 19(4):104–112

  23. [23]

    Erdos and A

    P. Erdos and A. Renyi. On the evolution of random graphs. In Publ. Math. Inst. Hungary. Acad. Sci., 1960

  24. [24]

    W. Fan, J. Li, J. Luo, Z. Tan, X. Wang, and Y . Wu. Incremental graph pattern matching. SIGMOD ’11, pages 925–936, 2011

  25. [25]

    Gorka and R

    S. Gorka and R. Philip. Improving first-party bank fraud detection with graph databases, 2016

  26. [26]

    J. A. Grochow and M. Kellis. Network motif discovery using subgraph enumeration and symmetry-breaking. In Proc. of RECOMB’07, 2007. 22 A PREPRINT - J UNE 28, 2019

  27. [27]

    W.-S. Han, J. Lee, and J.-H. Lee. Turboiso: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proc. of SIGMOD’13, 2013

  28. [28]

    J. He, S. Zhang, and B. He. In-cache query co-processing on coupled cpu-gpu architectures. PVLDB, 8(4):329– 340, 2014

  29. [29]

    Herlihy and N

    M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. 2008

  30. [30]

    Inoue, M

    H. Inoue, M. Ohara, and K. Taura. Faster set intersection with simd instructions by reducing branch mispredic- tions. PVLDB, 8(3):293–304, 2014

  31. [31]

    Y . E. Ioannidis and Y . C. Kang. Left-deep vs. bushy trees: An analysis of strategy spaces and its implications for query optimization. In SIGMOD’91, pages 168–177, 1991

  32. [32]

    Junghanns, M

    M. Junghanns, M. Kiesling, A. Averbuch, A. Petermann, and E. Rahm. Cypher-based graph pattern matching in gradoop. GRADES’17, pages 3:1–3:8

  33. [33]

    Kankanamge, S

    C. Kankanamge, S. Sahu, A. Mhedbhi, J. Chen, and S. Salihoglu. Graphflow: An active graph database. SIG- MOD ’17, pages 1695–1698

  34. [34]

    H. Kim, J. Lee, S. S. Bhowmick, W.-S. Han, J. Lee, S. Ko, and M. H. Jarrah. Dualsim: Parallel subgraph enumeration in a massive graph on a single machine. SIGMOD ’16, pages 1231–1245, 2016

  35. [35]

    L. Lai, L. Qin, X. Lin, and L. Chang. Scalable subgraph enumeration in mapreduce. PVLDB, 8(10):974–985, 2015

  36. [36]

    L. Lai, L. Qin, X. Lin, and L. Chang. Scalable subgraph enumeration in mapreduce: A cost-oriented approach. The VLDB Journal, 26(3):421–446, June 2017

  37. [37]

    L. Lai, L. Qin, X. Lin, Y . Zhang, L. Chang, and S. Yang. Scalable distributed subgraph enumeration. PVLDB, 10(3):217–228, 2016

  38. [38]

    Y . Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: A frame- work for machine learning and data mining in the cloud. PVLDB, 5(8):716–727, 2012

  39. [39]

    Y . Luo, W. Wang, and X. Lin. Spark: A keyword search engine on relational databases. In ICDE, pages 1552– 1555, April 2008

  40. [40]

    Y . Luo, W. Wang, X. Lin, X. Zhou, J. Wang, and K. Li. Spark2: Top-k keyword query in relational databases. IEEE Transactions on Knowledge and Data Engineering, 23(12):1763–1780, Dec 2011

  41. [41]

    Malewicz, M

    G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proc. of SIGMOD’10, 2010

  42. [42]

    McSherry, M

    F. McSherry, M. Isard, and D. G. Murray. Scalability! but at what cost? HOTOS’15, 2015

  43. [43]

    D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. SOSP ’13, pages 439–455, 2013

  44. [44]

    H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms. J. ACM, 65(3), 2018

  45. [45]

    M. Qiao, H. Zhang, and H. Cheng. Subgraph matching: On compression and computation. PVLDB, 11(2):176– 188, 2017

  46. [46]

    Ren and J

    X. Ren and J. Wang. Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. PVLDB, 8(5):617–628, 2015

  47. [47]

    Shamir and D

    R. Shamir and D. Tsur. Faster subtree isomorphism. In Proceedings of the Fifth Israeli Symposium on Theory of Computing and Systems, pages 126–131, 1997

  48. [48]

    Shang, Y

    H. Shang, Y . Zhang, X. Lin, and J. X. Yu. Taming verification hardness: An efficient algorithm for testing subgraph isomorphism. PVLDB, 1(1):364–375, 2008

  49. [49]

    B. Shao, H. Wang, and Y . Li. Trinity: A distributed graph engine on a memory cloud. SIGMOD ’13, pages 505–516, 2013

  50. [50]

    Y . Shao, B. Cui, L. Chen, L. Ma, J. Yao, and N. Xu. Parallel subgraph listing in a large-scale graph. In SIGMOD’14, pages 625–636. ACM, 2014

  51. [51]

    Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. PVLDB, 5(9):788–799, 2012

  52. [52]

    C. H. C. Teixeira, A. J. Fonseca, M. Serafini, G. Siganos, M. J. Zaki, and A. Aboulnaga. Arabesque: A system for distributed graph mining. SOSP ’15, pages 425–440. 23 A PREPRINT - J UNE 28, 2019

  53. [53]

    H. Wei, J. X. Yu, C. Lu, and X. Lin. Speedup graph processing by graph ordering. SIGMOD ’16, pages 1813– 1828

  54. [54]

    Single Thread

    M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In HotCloud’10, pages 10–10. A Auxiliary Experiments Exp-6 Scalability of Unlabelled Matching. We vary the number of machines as 1, 2, 4, 6, 8, 10, and run the unlabelled queriesq1 andq2 to see how each strategy (B INJOIN, WO PTJOIN, S HRCUBE a...