pith. sign in

arxiv: 2605.30470 · v1 · pith:COPHPMY5new · submitted 2026-05-28 · 💻 cs.LG

Can Subgraph Explanations Be Weaponized to Steal Graph Neural Networks?

Pith reviewed 2026-06-29 08:31 UTC · model grok-4.3

classification 💻 cs.LG
keywords model extraction attackgraph neural networksexplainabilityblack-box attackgraph classificationMonte Carlo samplingsubgraph explanations
0
0 comments X

The pith

Binary explanation masks from graph models enable model extraction attacks under strict black-box constraints with only labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the first model extraction attack tailored to graph classification that operates when the attacker sees only discrete class labels and binary explanation masks, with no probabilities, gradients, or confidence scores. It guides Monte Carlo edge sensitivity estimation using the masks to focus on decision boundaries, supported by Hoeffding concentration guarantees, and uses explanation subgraphs to shrink the search space. A sympathetic reader cares because this shows how regulatory-mandated explainability in graph machine learning services can create direct vulnerabilities for stealing deployed models.

Core claim

The attack uses model-generated binary explanation masks to direct Monte Carlo sampling of edge sensitivities toward decision boundaries, with provable concentration bounds on estimation accuracy, while the masks themselves narrow the boundary search space for efficient extraction of graph classifiers.

What carries the argument

Explanation-guided Monte Carlo edge sensitivity estimation that leverages binary masks to prioritize edges near decision boundaries and reduce search space.

If this is right

  • The method extracts surrogate models that match target accuracy on benchmark graph datasets from multiple domains.
  • It outperforms prior black-box extraction baselines that lack access to explanation outputs.
  • Explainability interfaces in GMLaaS platforms constitute exploitable attack surfaces even under label-only constraints.
  • Results inform both technical defenses and policy considerations around mandatory explainability requirements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar attacks could target other structured data domains if explanations reveal decision-relevant substructures.
  • Adding controlled noise to explanation masks might serve as a lightweight defense without fully removing transparency.
  • Regulators mandating explanations may need to weigh extraction risks against transparency benefits in high-stakes graph applications.

Load-bearing premise

The binary explanation masks produced by the target model carry enough structural information about decision boundaries to meaningfully improve Monte Carlo sampling without any continuous model outputs.

What would settle it

An experiment replacing the model's binary explanation masks with random binary masks of the same density and measuring whether extraction accuracy drops to levels indistinguishable from uniform random querying.

Figures

Figures reproduced from arXiv: 2605.30470 by Jiate Li, Ojas Nimase, Yue Zhao, Yushun Dong.

Figure 1
Figure 1. Figure 1: Multiple possible “midpoint graphs” at equal edit distance from [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Sampling strategies for model extraction: surfaces are decision boundaries and triangles are classifications with [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Fidelity (%) by sampling strategy on equal-sized [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: A visual depicting the setup of MEA and the pipeline of our proposed attack. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Fidelity (%) across increasing training set size on NCI1. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Phase and Component Ablations averaged across [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Extraction fidelity (%) across all architectures and datasets, grouped by explainer (left: [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Fidelity (%) vs. training set size across binary datasets, victim architectures, and explainers. [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
read the original abstract

Graph Machine Learning as a Service (GMLaaS) platforms increasingly implement explainability interfaces to meet regulatory transparency requirements. However, this transparency creates exploitable vulnerabilities for model extraction attacks. We present the first model extraction attack specifically designed for graph classification under strict black-box constraints where the attacker observes only discrete class labels and binary explanation masks (no probability scores, gradients, or confidence values). Our method (1) uses model explanation outputs to guide Monte Carlo edge sensitivity estimation toward decision boundaries, with Hoeffding concentration guarantees on estimation accuracy and (2) exploits explanation subgraphs to efficiently narrow the boundary search space. Extensive experiments on benchmark graph datasets across multiple domains demonstrate our method's superiority over comparable baselines. These findings demonstrate that such explainability interfaces create exploitable attack surfaces, informing both defensive mechanisms and policy frameworks for explainable AI mandates. The implementation code is provided in https://github.com/LabRAI/XSTEAL/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims to introduce the first model extraction attack on graph neural networks for classification under strict black-box access (only discrete labels and binary explanation masks). The attack guides Monte Carlo edge-sensitivity estimation toward decision boundaries using explanation outputs, invokes Hoeffding concentration bounds for accuracy guarantees, and narrows the search via explanation subgraphs. Experiments on benchmark graph datasets reportedly show superiority over baselines; code is released.

Significance. If the attack and its guarantees hold, the work identifies a concrete attack surface created by mandatory explainability interfaces in GMLaaS, with implications for defenses and policy. Releasing implementation code is a positive contribution to reproducibility. The result would be of interest to the graph-ML security community, though its impact is limited by the unresolved dependence issue in the concentration analysis.

major comments (1)
  1. [Abstract / Method (Hoeffding section)] Abstract and method description (Hoeffding application): the stated 'Hoeffding concentration guarantees on estimation accuracy' for Monte Carlo edge sensitivity rest on an independence assumption that is violated by graph structure. Perturbing one edge changes neighborhoods and thus the sensitivity estimates of adjacent edges; the manuscript does not derive or substitute a dependence-aware bound (e.g., via McDiarmid or graph-specific concentration), so the claimed accuracy guarantees on the recovered decision boundary are not supported.
minor comments (1)
  1. [Abstract] The abstract refers to 'benchmark graph datasets across multiple domains' without naming them or providing basic statistics (node/edge counts, class balance); moving a concise table of dataset characteristics to §4 or the experimental setup would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review. The identification of the dependence issue in the concentration analysis is a valid technical point that we address directly below.

read point-by-point responses
  1. Referee: [Abstract / Method (Hoeffding section)] Abstract and method description (Hoeffding application): the stated 'Hoeffding concentration guarantees on estimation accuracy' for Monte Carlo edge sensitivity estimation rest on an independence assumption that is violated by graph structure. Perturbing one edge changes neighborhoods and thus the sensitivity estimates of adjacent edges; the manuscript does not derive or substitute a dependence-aware bound (e.g., via McDiarmid or graph-specific concentration), so the claimed accuracy guarantees on the recovered decision boundary are not supported.

    Authors: We agree that this is a substantive issue. The manuscript invokes the standard Hoeffding inequality for the Monte Carlo edge-sensitivity estimates, which formally requires independent samples. Because perturbing one edge alters node neighborhoods, the sensitivity estimates for adjacent edges are statistically dependent; no dependence-aware concentration inequality (such as McDiarmid) or graph-specific analysis is derived or substituted. Consequently, the claimed accuracy guarantees on the recovered decision boundary are not rigorously supported by the stated argument. We will revise the abstract and the Hoeffding subsection to remove the language of 'Hoeffding concentration guarantees.' The Monte Carlo procedure will instead be presented as an empirical estimator whose practical accuracy is demonstrated by the experimental results. A brief discussion noting the dependence limitation and the possibility of future work on bounded-difference or graph-concentration bounds will be added if space allows. These changes will appear in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's core contribution is an attack method that applies standard Monte Carlo edge sensitivity estimation guided by binary explanation masks, combined with Hoeffding concentration bounds for accuracy guarantees. No load-bearing step reduces by the paper's own equations to a fitted parameter, self-citation chain, or definitional equivalence; the derivation relies on independent application of concentration inequalities to the proposed sampling procedure without smuggling ansatzes or renaming known results as novel derivations.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The attack construction rests on the applicability of Hoeffding's inequality to edge sensitivity estimates derived from explanation masks and on the premise that explanation subgraphs meaningfully reduce the search space; no new entities are postulated.

free parameters (1)
  • Monte Carlo sample count
    Number of edge samples drawn to estimate sensitivity; chosen to satisfy the Hoeffding bound for the desired accuracy.
axioms (1)
  • standard math Hoeffding's inequality provides valid concentration bounds on the Monte Carlo edge sensitivity estimates
    Invoked to guarantee estimation accuracy when guiding the search with explanation outputs.

pith-pipeline@v0.9.1-grok · 5695 in / 1316 out tokens · 26571 ms · 2026-06-29T08:31:06.140955+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 35 canonical work pages · 7 internal anchors

  1. [1]

    Build a GNN-based real-time fraud detection solution using Amazon SageMaker, Amazon Neptune, and the Deep Graph Library

    Amazon Web Services. Build a GNN-based real-time fraud detection solution using Amazon SageMaker, Amazon Neptune, and the Deep Graph Library. https://aws.amazon.com/blo gs/machine-learning/build-a-gnn-based-real-time-fraud-detection-solut ion-using-amazon-sagemaker-amazon-neptune-and-the-deep-graph-library/ , Aug. 2022

  2. [2]

    Biton Dor and Y

    M. Biton Dor and Y . Mirsky. Efficient model extraction via boundary sampling. InProceedings of the 2024 Workshop on Artificial Intelligence and Security, AISec ’24, page 1–11, New York, NY , USA, 2024. Association for Computing Machinery. ISBN 9798400712289. doi: 10.1145/3689932.3694756. URLhttps://doi.org/10.1145/3689932.3694756

  3. [3]

    Biton Dor and Y

    M. Biton Dor and Y . Mirsky. Efficient model extraction via boundary sampling. InProceedings of the 2024 Workshop on Artificial Intelligence and Security, pages 1–11, 2024

  4. [4]

    Boucheron, G

    S. Boucheron, G. Lugosi, and P. Massart.Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 02 2013. ISBN 9780199535255. doi: 10.1093/acprof:oso/9780199535255.001.0001. URL https://doi.org/10.1093/acprof: oso/9780199535255.001.0001

  5. [5]

    Cheng, B

    Z. Cheng, B. Shen, T. Sha, Y . Gao, S. Li, and Y . Dong. Atom: A framework of detecting query-based model extraction attacks for graph neural networks, 2025. URL https://arxiv. org/abs/2503.16693

  6. [6]

    SB24-205: Concerning Consumer Protections in Interactions with Artificial Intelligence Systems, 2024

    Colorado General Assembly. SB24-205: Concerning Consumer Protections in Interactions with Artificial Intelligence Systems, 2024. URL https://leg.colorado.gov/bills/sb24-205

  7. [7]

    H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song. Adversarial attack on graph structured data, 2018. URLhttps://arxiv.org/abs/1806.02371

  8. [8]

    Davies, G

    X. Davies, G. Giglemiani, E. Lau, E. Winsor, G. Irving, and Y . Gal. Boundary point jailbreaking of black-box llms.arXiv preprint arXiv:2602.15001, 2026

  9. [9]

    V . P. Dwivedi, C. K. Joshi, A. T. Luu, T. Laurent, Y . Bengio, and X. Bresson. Benchmarking graph neural networks, 2022. URLhttps://arxiv.org/abs/2003.00982

  10. [10]

    Regulation (eu) 2024/1689 laying down harmonised rules on artificial intelligence (artificial intelligence act)

    European Parliament and Council of the European Union. Regulation (eu) 2024/1689 laying down harmonised rules on artificial intelligence (artificial intelligence act). Official Journal of the European Union, L 2024/1689, July 2024

  11. [11]

    F. Guan, T. Zhu, H. Tong, and W. Zhou. A realistic model extraction attack against graph neural networks.Knowledge-Based Systems, 300:112144, 2024. ISSN 0950-7051. doi: https://doi.org/10.1016/j.knosys.2024.112144. URL https://www.sciencedirect.com/sc ience/article/pii/S0950705124007780

  12. [12]

    W. L. Hamilton, R. Ying, and J. Leskovec. Inductive representation learning on large graphs,

  13. [13]

    URLhttps://arxiv.org/abs/1706.02216

  14. [14]

    R. Jin, Y . Zhou, Y . Tang, J. Song, and S. Gao. Query-efficient zeroth-order algorithms for nonconvex optimization, 2025. URLhttps://arxiv.org/abs/2510.19165

  15. [15]

    T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks,

  16. [16]

    URLhttps://arxiv.org/abs/1609.02907

  17. [17]

    L. Li, B. Shen, C. Zhao, Y . Sun, K. Zhao, S. Pan, and Y . Dong. Intellectual property in graph- based machine learning as a service: Attacks and defenses, 2025. URL https://arxiv.org/ abs/2508.19641. 10

  18. [18]

    Z. Li, X. Yuan, B. Shen, K. Le, H. Wang, X. Zhou, S. Gao, and Y . Dong. Query-efficient domain knowledge stealing against large language models.Proceedings of the AAAI Conference on Artificial Intelligence, 40(38):31870–31878, Mar. 2026. doi: 10.1609/aaai.v40i38.40456. URL https://ojs.aaai.org/index.php/AAAI/article/view/40456

  19. [19]

    D. Luo, W. Cheng, D. Xu, W. Yu, B. Zong, H. Chen, and X. Zhang. Parameterized explainer for graph neural network. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 19620–19631. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/p aper/2020...

  20. [20]

    B. Ma, Y . Feng, M. Lin, and E. Dai. How explanations leak the decision logic: Stealing graph neural networks via explanation alignment, 2025. URL https://arxiv.org/abs/2506.0 3087

  21. [21]

    Maslej, L

    N. Maslej, L. Fattorini, R. Perrault, Y . Gil, V . Parli, N. Kariuki, E. Capstick, A. Reuel, E. Bryn- jolfsson, J. Etchemendy, K. Ligett, T. Lyons, J. Manyika, J. C. Niebles, Y . Shoham, R. Wald, T. Walsh, A. Hamrah, L. Santarlasci, J. B. Lotufo, A. Rome, A. Shi, and S. Oak. Artificial intelligence index report 2025, 2025. URLhttps://arxiv.org/abs/2504.07139

  22. [22]

    Milli, L

    S. Milli, L. Schmidt, A. D. Dragan, and M. Hardt. Model reconstruction from model explanations. InProceedings of the Conference on Fairness, Accountability, and Trans- parency, FAT* ’19, page 1–9, New York, NY , USA, 2019. Association for Computing Machinery. ISBN 9781450361255. doi: 10.1145/3287560.3287562. URL https: //doi.org/10.1145/3287560.3287562

  23. [23]

    Miura, T

    T. Miura, T. Shibahara, and N. Yanai. Megex: Data-free model extraction attack against gradient- based explainable ai. InProceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems, SecTL ’24, page 56–66, New York, NY , USA, 2024. Association for Computing Machinery. ISBN 9798400706912. doi: 10.1145/3665451.3665533. URL https://doi...

  24. [24]

    Molina-Pérez, E

    D. Molina-Pérez, E. A. Portilla-Flores, E. Mezura-Montes, E. Vega-Alvarado, and M. B. Calva- Yañez. Efficiently handling constraints in mixed-integer nonlinear programming problems using gradient-based repair differential evolution.PeerJ Computer Science, 10:e2095, May 2024. ISSN 2376-5992. doi: 10.7717/peerj-cs.2095. URL https://doi.org/10.7717/peerj-c s.2095

  25. [25]

    Geometric deep learning on graphs and manifolds using mixture model CNNs

    F. Monti, D. Boscaini, J. Masci, E. Rodolà, J. Svoboda, and M. M. Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns, 2016. URL https://arxiv.or g/abs/1611.08402

  26. [26]

    TUDataset: A collection of benchmark datasets for learning with graphs

    C. Morris, N. M. Kriege, F. Bause, K. Kersting, P. Mutzel, and M. Neumann. Tudataset: A collection of benchmark datasets for learning with graphs, 2020. URL https://arxiv.org/ abs/2007.08663

  27. [27]

    Nimase, Y

    O. Nimase, Y . Zhao, and Y . Dong. Navigating between explainability and extractability in machine learning as a service. In2025 IEEE International Conference on Data Mining Workshops (ICDMW), pages 2483–2487, 2025. doi: 10.1109/ICDMW69685.2025.00305

  28. [28]

    A. C. Oksuz, A. Halimi, and E. Ayday. Autolycus: Exploiting explainable artificial in- telligence (xai) for model extraction attacks against interpretable models.Proceedings on Privacy Enhancing Technologies, 2024(4):684–699, Oct. 2024. ISSN 2299-0984. doi: 10.56553/popets-2024-0137. URLhttp://dx.doi.org/10.56553/popets-2024-0137

  29. [29]

    Oliynyk, R

    D. Oliynyk, R. Mayer, and A. Rauber. I know what you trained last summer: A survey on stealing machine learning models and defences.ACM Comput. Surv., 55(14s), July 2023. ISSN 0360-0300. doi: 10.1145/3595292. URLhttps://doi.org/10.1145/3595292

  30. [30]

    Practical Black-Box Attacks against Machine Learning

    N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. Practical black-box attacks against machine learning, 2017. URLhttps://arxiv.org/abs/1602.02697. 11

  31. [31]

    C. Piao, T. Xu, X. Sun, Y . Rong, K. Zhao, and H. Cheng. Computing graph edit distance via neural graph matching.Proc. VLDB Endow., 16(8):1817–1829, Apr. 2023. ISSN 2150-8097. doi: 10.14778/3594512.3594514. URLhttps://doi.org/10.14778/3594512.3594514

  32. [32]

    Podhajski, J

    M. Podhajski, J. Dubi ´nski, F. Boenisch, A. Dziedzic, A. Pregowska, and T. P. Michalak. Efficient model-stealing attacks against inductive graph neural networks, 2024. URL https: //arxiv.org/abs/2405.12295

  33. [33]

    Riesen and H

    K. Riesen and H. Bunke. Iam graph database repository for graph based pattern recognition and machine learning. In N. da Vitoria Lobo, T. Kasparis, F. Roli, J. T. Kwok, M. Georgiopoulos, G. C. Anagnostopoulos, and M. Loog, editors,Structural, Syntactic, and Statistical Pattern Recognition, pages 287–297, Berlin, Heidelberg, 2008. Springer Berlin Heidelber...

  34. [34]

    Rosenbacke, Å

    R. Rosenbacke, Å. Melhus, M. McKee, and D. Stuckler. How explainable artificial intelligence can increase or decrease clinicians’ trust in AI applications in health care: Systematic review. JMIR AI, 3:e53207, 2024. doi: 10.2196/53207. URLhttps://doi.org/10.2196/53207

  35. [35]

    B. Shen, Z. Cheng, N. Z. Gong, F. Yao, and Y . Dong. Credit: Certified ownership verification of deep neural networks against model extraction attacks, 2026. URL https://arxiv.org/ abs/2602.20419

  36. [36]

    Y . Shen, X. He, Y . Han, and Y . Zhang. Model stealing attacks against inductive graph neural networks, 2021. URLhttps://arxiv.org/abs/2112.08331

  37. [37]

    Tramèr, F

    F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart. Stealing machine learning models via prediction apis. InProceedings of the 25th USENIX Conference on Security Symposium, SEC’16, page 601–618, USA, 2016. USENIX Association. ISBN 9781931971324

  38. [38]

    GNNIE: GNN-recommender-as-a-service

    UC Berkeley School of Information. GNNIE: GNN-recommender-as-a-service. https: //www.ischool.berkeley.edu/projects/2022/gnnie-gnn-recommender-service , 2023

  39. [39]

    Graph Attention Networks

    P. Veliˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y . Bengio. Graph attention networks, 2018. URLhttps://arxiv.org/abs/1710.10903

  40. [40]

    Z. Wang, M. Lin, B. Shen, K. Anderson, M. Liu, T. Cai, and Y . Dong. Cega: A cost-effective approach for graph-based model extraction and acquisition, 2025. URL https://arxiv.or g/abs/2506.17709

  41. [41]

    B. Wu, X. Yang, S. Pan, and X. Yuan. Model extraction attacks on graph neural networks: Taxonomy and realization, 2021. URLhttps://arxiv.org/abs/2010.12751

  42. [42]

    S. Wu, F. Sun, W. Zhang, X. Xie, and B. Cui. Graph neural networks in recommender systems: A survey.ACM Comput. Surv., 55(5), Dec. 2022. ISSN 0360-0300. doi: 10.1145/3535101. URLhttps://doi.org/10.1145/3535101

  43. [43]

    R. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec. Gnnexplainer: Generating explana- tions for graph neural networks, 2019. URLhttps://arxiv.org/abs/1903.03894

  44. [44]

    H. Yuan, H. Yu, J. Wang, K. Li, and S. Ji. On explainability of graph neural networks via subgraph explorations, 2021. URLhttps://arxiv.org/abs/2102.05152

  45. [45]

    K. Zhao, L. Li, K. Ding, N. Z. Gong, Y . Zhao, and Y . Dong. A survey on model extraction attacks and defenses for large language models, 2025. URL https://arxiv.org/abs/2506.22521

  46. [46]

    J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun. Graph neural networks: A review of methods and applications.AI Open, 1:57–81, 2020. ISSN 2666-6510. doi: https://doi.org/10.1016/j.aiopen.2021.01.001. URL https://www.sciencedirect.co m/science/article/pii/S2666651021000012

  47. [47]

    Zhuang, C

    Y . Zhuang, C. Shi, M. Zhang, J. Chen, L. Lyu, P. Zhou, and L. Sun. Unveiling the secrets without data: can graph neural networks be exploited through data-free model extraction attacks? In Proceedings of the 33rd USENIX Conference on Security Symposium, SEC ’24, USA, 2024. USENIX Association. ISBN 978-1-939133-44-1. 12 APPENDIXCONTENTS A Related Work 14 ...