arxiv: 2605.06814 · v1 · submitted 2026-05-07 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

From Model to Data (M2D): Shifting Complexity from GNNs to Graphs for Transparent Graph Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:04 UTC · model grok-4.3

classification 💻 cs.LG

keywords graph neural networksmodel distillationgraph augmentationmodel transparencyexplainabilityknowledge distillationGNN architectures

0 comments

The pith

M2D distills a complex GNN into an augmented graph so a simple student model matches the teacher's performance while exposing architectural mechanisms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Model-to-Data distillation to move complexity out of opaque Graph Neural Networks and into the input graph itself. It enriches the graph's features and structure with information from a teacher model, after which a basic student model achieves equivalent accuracy. This shift lets people examine the data directly to see why certain architectures succeed, such as by revealing fairness objectives or attention patterns. The method preserves performance while increasing human inspectability of GNN design choices.

Core claim

M2D distills the teacher model into an augmented graph with enriched features and structure, enabling a simple student to match the teacher's performance. By materializing model behavior in the data, the approach allows humans to inspect architectural advantages directly and reveals underlying mechanisms such as fairness objectives and attention-based aggregation in an interpretable way.

What carries the argument

M2D distillation, which transfers teacher model behavior into augmented graph features and edges so the data itself carries the complexity.

If this is right

Simple non-GNN models can reach high performance on graph tasks once the graph carries the necessary enriched structure.
Performance differences between GNN architectures become attributable to visible data properties rather than hidden computations.
Architectural features such as attention aggregation or fairness constraints appear as explicit patterns in the augmented graph.
Transparency is achieved without sacrificing predictive accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same distillation idea could simplify models in other modalities by augmenting images or text with model-derived signals.
Inspecting the augmented graph might surface data-level biases that affect downstream fairness even when the original model is complex.
Graph design itself could become a tunable step that reduces the need for ever-deeper GNN layers.

Load-bearing premise

The augmented graph fully encodes the teacher model's behavior without distortion, so a simple student matches performance and direct data inspection reveals the true architectural mechanisms.

What would settle it

An experiment in which the student model on the M2D-augmented graph still underperforms the original teacher or in which inspecting the graph fails to surface the expected mechanisms such as fairness or attention patterns.

Figures

Figures reproduced from arXiv: 2605.06814 by Arlei Silva, Debolina Halder Lina.

**Figure 2.** Figure 2: Distillation design space with data (x-axis) and model (yaxis) complexity. Knowledge (K) distillation reduces model complexity (vertical), and data (D) distillation reduces data complexity (horizontal). Model-to-Data (M2D) distillation transfers capacity from the model to the data (diagonal). By shifting complexity to the data (right), fairness is no longer hidden within opaque adversarial training or… view at source ↗

**Figure 3.** Figure 3: The feature and structure learners iteratively refine features and structure by optimizing [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Accuracy–fairness trade-offs for FairGNN, FairVGNN, and NIFTY and their distilled [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Learned features colored by class label and sensitive attribute for the German dataset. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Original and augmented structure for a sample of 40 nodes from the German dataset, [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Analysis of learned node features with respect to the attention of the teacher GAT. For [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of edge weights on Cora (homophilic) using row-normalized M2D learned [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Learned features visualization across five benchmark datasets (Cora, Citeseer, Photo, [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Learned features visualization across five benchmark datasets (Cora, Citeseer, Photo, [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗

**Figure 11.** Figure 11: Learned feature visualization on CORA with GAT as teacher and GCN as student. Black circles denote nodes misclassified by standard GCN distillation but correctly classified by GCN(Feat.). The learned features form clear class-wise clusters, and these nodes lie near cluster boundaries, where GCN(Feat.) successfully matches the teacher while standard GCN fails. near the boundaries between clusters. While st… view at source ↗

**Figure 12.** Figure 12: Visualization of edge importance on Cornell (heterophilic) using row-normalized M2D [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

**Figure 13.** Figure 13: Analysis of learned features with respect to GT attention. We bin feature similarity [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗

read the original abstract

Graph Neural Networks (GNNs) achieve high performance but can be opaque to humans, making it difficult to understand and compare the many proposed architectures. While existing explainability methods attribute individual predictions to nodes, edges, or features, they do not provide architectural transparency or explain the fundamental performance gap between simple and more complex models. To address this limitation, we introduce Model-to-Data (M2D) distillation, a new framework that increases transparency by transferring model complexity into the data space. M2D distills the teacher model into an augmented graph with enriched features and structure, enabling a simple student to match the teacher's performance. By materializing model behavior in the data, our approach allows humans to inspect architectural advantages directly. We show that M2D reveals underlying mechanisms such as fairness objectives and attention-based aggregation in an interpretable way, enhancing GNN transparency while preserving performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

M2D has a clean idea for making GNN advantages inspectable by baking them into the data, but the abstract gives no way to check if it actually works without hiding the mechanisms.

read the letter

The core claim here is that you can distill a complex GNN into an augmented graph with richer features and structure, so a simple student model matches the teacher's accuracy while letting people look directly at the data to see architectural advantages like attention or fairness effects. That is the punchline, and it is distinct from the usual post-hoc attribution methods that the abstract contrasts it against. The framing of shifting complexity from model to data is a reasonable way to think about transparency in GNNs, and the paper correctly notes that current explainability work does not really address why one architecture outperforms another at a structural level. That part lands as a useful observation. The problem is that nothing in the provided text shows how the augmentation is performed, what the mapping from teacher to data looks like, or whether it is invertible enough for inspection to be meaningful. There are no equations, no algorithm, and no results, so the key assumption that the enriched graph captures the teacher's internal behavior without distortion or loss remains untested. The stress-test concern about non-linear or implicit encodings transferring performance while breaking interpretability seems directly relevant here. Without those details it is impossible to tell if the approach delivers on its promise or just moves the opacity into the data creation step. This is aimed at people working on GNN interpretability and data-centric explanations. A reader could get some value from the high-level direction if they are brainstorming alternatives to model-based explainers, but the lack of substance means it is not ready to cite or build on. I would not send it to peer review in this form; the authors need to supply the actual method, experiments, and checks on whether inspection really reveals the claimed mechanisms before it is worth referee time.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Model-to-Data (M2D) distillation framework for Graph Neural Networks (GNNs). It claims that distilling a complex teacher GNN into an augmented graph with enriched features and structure allows a simple student GNN to match the teacher's performance. By materializing model behavior directly in the data, M2D enables human inspection of architectural advantages such as fairness objectives and attention-based aggregation, addressing limitations in existing GNN explainability methods that focus only on individual predictions.

Significance. If validated, the approach could meaningfully advance transparent graph learning by providing a data-centric alternative to post-hoc explanations, potentially allowing direct comparison of architectural mechanisms across GNN variants while preserving accuracy. This has implications for interpretability in domains requiring fairness or mechanistic understanding.

major comments (2)

Abstract: The central claim that M2D augmentation fully materializes the teacher's internal mechanisms (attention aggregation, fairness objectives) such that a basic student matches performance and direct inspection reveals those mechanisms without loss or distortion lacks any formal definition, algorithm, or equation for the augmentation mapping. Without this, performance transfer alone does not establish inspectability, as the mapping could embed computations opaquely.
Abstract: No experimental results, datasets, baselines, quantitative metrics (e.g., accuracy deltas, inspection examples), or error analysis are provided to support the claims that the student matches the teacher and that mechanisms are revealed interpretably. This is load-bearing for the framework's asserted benefits.

minor comments (1)

The abstract would be strengthened by briefly outlining the high-level steps of the M2D process or naming example teacher/student architectures to make the framework more concrete.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the two major comments on the abstract point by point below, agreeing that the abstract can be strengthened to better convey the formal and empirical support present in the full paper.

read point-by-point responses

Referee: Abstract: The central claim that M2D augmentation fully materializes the teacher's internal mechanisms (attention aggregation, fairness objectives) such that a basic student matches performance and direct inspection reveals those mechanisms without loss or distortion lacks any formal definition, algorithm, or equation for the augmentation mapping. Without this, performance transfer alone does not establish inspectability, as the mapping could embed computations opaquely.

Authors: We agree that the abstract is high-level and omits the formal details. Section 3 of the manuscript provides the precise definition of the M2D augmentation mapping: it is a deterministic function that injects teacher-derived quantities (attention coefficients as new edge attributes, fairness-regularized embeddings as node features, and structure modifications) directly into the input graph. Because these quantities are explicit and human-readable, the student GNN operates on an interpretable augmented graph rather than opaque internal states. We will revise the abstract to include a concise reference to this mapping (e.g., “via an augmentation mapping that materializes attention and fairness terms as explicit graph elements”) so the inspectability claim is formally grounded. revision: yes
Referee: Abstract: No experimental results, datasets, baselines, quantitative metrics (e.g., accuracy deltas, inspection examples), or error analysis are provided to support the claims that the student matches the teacher and that mechanisms are revealed interpretably. This is load-bearing for the framework's asserted benefits.

Authors: Abstracts conventionally omit specific numbers to stay within length limits. The full manuscript reports experiments on multiple standard graph datasets (citation networks, social graphs) using common GNN baselines. Results show the student GNN recovers teacher accuracy within small deltas while the augmented graph directly exposes attention weights and fairness adjustments for inspection. We will add one sentence to the abstract summarizing these outcomes at a high level (e.g., “Experiments demonstrate that the student matches teacher performance while the augmented structures make attention and fairness mechanisms directly inspectable”) to make the empirical support explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity; conceptual framework presented without self-referential derivations or fitted predictions

full rationale

The paper introduces M2D distillation as a method to augment graphs with enriched features and structure derived from a teacher GNN, allowing a simple student to match performance while enabling inspection of mechanisms like fairness and attention. No equations, derivations, or parameter-fitting steps appear in the abstract or described content. The central claim is a proposed empirical outcome of the augmentation process rather than a tautological reduction (e.g., no self-definition of 'architectural advantage' via the augmentation itself, no fitted inputs renamed as predictions, and no load-bearing self-citations or uniqueness theorems). The derivation chain is self-contained as a high-level framework with implied validation, not forced by construction from its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, axioms, or invented entities; all lists are therefore empty.

pith-pipeline@v0.9.0 · 5455 in / 1140 out tokens · 47684 ms · 2026-05-11T01:04:32.146384+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

M2D distills the teacher model into an augmented graph with enriched features and structure, enabling a simple student to match the teacher's performance... Theorems 1, 2, and Corollary 1... fairness... attention-weighted neighborhood aggregation
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

min θg,θs Ldis(fT(G),fs(˜G)) + Lcls(fs(˜G),y) − S(G,˜G)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

71 extracted references · 10 canonical work pages · 2 internal anchors

[1]

Towards a unified framework for fair and stable graph representation learning

Chirag Agarwal, Himabindu Lakkaraju, and Marinka Zitnik. Towards a unified framework for fair and stable graph representation learning. InUncertainty in Artificial Intelligence, pages 2114–2124. PMLR, 2021

2021
[2]

Graph Rewiring in GNNs to Mitigate Over-Squashing and Over-Smoothing: A Survey

Hugo Attali, Davide Buscaldi, and Nathalie Pernelle. Rewiring techniques to mitigate over- squashing and oversmoothing in gnns: A survey.arXiv preprint arXiv:2411.17429, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

arXiv preprint arXiv:1905.13686 (2019)

Federico Baldassarre and Hossein Azizpour. Explainability techniques for graph convolutional networks.arXiv preprint arXiv:1905.13686, 2019

work page arXiv 1905
[4]

Iterative deep graph learning for graph neural networks: Better and robust node embeddings.Advances in Neural Information Processing Systems, 33:19314–19326, 2020

Yu Chen, Lingfei Wu, and Mohammed Zaki. Iterative deep graph learning for graph neural networks: Better and robust node embeddings.Advances in Neural Information Processing Systems, 33:19314–19326, 2020

2020
[5]

On self- distilling graph neural network.arXiv preprint arXiv:2011.02255, 2020

Yuzhao Chen, Yatao Bian, Xi Xiao, Yu Rong, Tingyang Xu, and Junzhou Huang. On self- distilling graph neural network.arXiv preprint arXiv:2011.02255, 2020

work page arXiv 2011
[6]

Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information

Enyan Dai and Suhang Wang. Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information. InProceedings of the 14th ACM International Conference on Web Search and Data Mining, pages 680–688, 2021

2021
[7]

Graph-free knowledge distillation for graph neural networks

Xiang Deng and Zhongfei Zhang. Graph-free knowledge distillation for graph neural networks. arXiv preprint arXiv:2105.07519, 2021

work page arXiv 2021
[8]

Exgc: Bridging efficiency and explainability in graph condensation

Junfeng Fang, Xinglin Li, Yongduo Sui, Yuan Gao, Guibin Zhang, Kun Wang, Xiang Wang, and Xiangnan He. Exgc: Bridging efficiency and explainability in graph condensation. In Proceedings of the ACM Web Conference 2024, WWW ’24, page 721–732, New York, NY , USA, 2024. Association for Computing Machinery

2024
[9]

Freekd: Free-direction knowledge distillation for graph neural networks

Kaituo Feng, Changsheng Li, Ye Yuan, and Guoren Wang. Freekd: Free-direction knowledge distillation for graph neural networks. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 357–366, 2022

2022
[10]

Fair graph distillation.Advances in Neural Information Processing Systems, 36:80644–80660, 2023

Qizhang Feng, Zhimeng Stephen Jiang, Ruiquan Li, Yicheng Wang, Na Zou, Jiang Bian, and Xia Hu. Fair graph distillation.Advances in Neural Information Processing Systems, 36:80644–80660, 2023

2023
[11]

Graph condensation for inductive node representation learning

Xinyi Gao, Tong Chen, Yilong Zang, Wentao Zhang, Quoc Viet Hung Nguyen, Kai Zheng, and Hongzhi Yin. Graph condensation for inductive node representation learning. In2024 IEEE 40th International Conference on Data Engineering (ICDE), pages 3056–3069, 2024

2024
[12]

Graph condensation for open-world graph learning

Xinyi Gao, Tong Chen, Wentao Zhang, Yayong Li, Xiangguo Sun, and Hongzhi Yin. Graph condensation for open-world graph learning. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 851–862, 2024

2024
[13]

Robgc: Towards robust graph condensation.IEEE Transactions on Knowledge and Data Engineering, 37(8):4791–4804, 2025

Xinyi Gao, Hongzhi Yin, Tong Chen, Guanhua Ye, Wentao Zhang, and Bin Cui. Robgc: Towards robust graph condensation.IEEE Transactions on Knowledge and Data Engineering, 37(8):4791–4804, 2025

2025
[14]

Using fuzzy logic to leverage html markup for web page representation.IEEE Transactions on Fuzzy Systems, 25(4):919–933, 2016

Alberto P García-Plaza, Víctor Fresno, Raquel Martínez Unanue, and Arkaitz Zubiaga. Using fuzzy logic to leverage html markup for web page representation.IEEE Transactions on Fuzzy Systems, 25(4):919–933, 2016

2016
[15]

Neural message passing for quantum chemistry

Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. InInternational Conference on Machine Learning, pages 1263–1272. PMLR, 2017

2017
[16]

Knowledge distillation: A survey.International Journal of Computer Vision, 129(6):1789–1819, 2021

Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey.International Journal of Computer Vision, 129(6):1789–1819, 2021

2021
[17]

Implicit regularization in matrix factorization.Advances in neural information processing systems, 30, 2017

Suriya Gunasekar, Blake E Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, and Nati Sre- bro. Implicit regularization in matrix factorization.Advances in neural information processing systems, 30, 2017. 10

2017
[18]

Alignahead: Online cross-layer knowledge extraction on graph neural networks

Jiongyu Guo, Defang Chen, and Can Wang. Alignahead: Online cross-layer knowledge extraction on graph neural networks. In2022 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2022

2022
[19]

Inductive representation learning on large graphs

Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. InNeurIPS, 2017

2017
[20]

A comprehensive survey on graph reduction: Sparsification, coarsening, and condensation

Mohammad Hashemi, Shengbo Gong, Juntong Ni, Wenqi Fan, B Aditya Prakash, and Wei Jin. A comprehensive survey on graph reduction: Sparsification, coarsening, and condensation. arXiv preprint arXiv:2402.03358, 2024

work page arXiv 2024
[21]

Compressing deep graph neural networks via adversarial knowledge distillation

Huarui He, Jie Wang, Zhanqiu Zhang, and Feng Wu. Compressing deep graph neural networks via adversarial knowledge distillation. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 534–544, 2022

2022
[22]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[23]

Statlog (German Credit Data)

Hans Hofmann. Statlog (German Credit Data). UCI Machine Learning Repository, 1994

1994
[24]

Graphlime: Local interpretable model explanations for graph neural networks.IEEE Transactions on Knowledge and Data Engineering, 35(7):6968–6972, 2022

Qiang Huang, Makoto Yamada, Yuan Tian, Dinesh Singh, and Yi Chang. Graphlime: Local interpretable model explanations for graph neural networks.IEEE Transactions on Knowledge and Data Engineering, 35(7):6968–6972, 2022

2022
[25]

T2-gnn: Graph neural networks for graphs with incomplete features and structure via teacher-student distillation

Cuiying Huo, Di Jin, Yawen Li, Dongxiao He, Yu-Bin Yang, and Lingfei Wu. T2-gnn: Graph neural networks for graphs with incomplete features and structure via teacher-student distillation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 4339–4346, 2023

2023
[26]

Condensing graphs via one-step gradient matching

Wei Jin, Xianfeng Tang, Haoming Jiang, Zheng Li, Danqing Zhang, Jiliang Tang, and Bing Yin. Condensing graphs via one-step gradient matching. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 720–730, New York, NY , USA, 2022. Association for Computing Machinery

2022
[27]

On representation knowledge distillation for graph neural networks.IEEE transactions on Neural Networks and Learning Systems, 35(4):4656–4667, 2022

Chaitanya K Joshi, Fayao Liu, Xu Xun, Jie Lin, and Chuan Sheng Foo. On representation knowledge distillation for graph neural networks.IEEE transactions on Neural Networks and Learning Systems, 35(4):4656–4667, 2022

2022
[28]

How to learn a graph from smooth signals

Vassilis Kalofolias. How to learn a graph from smooth signals. InArtificial Intelligence and Statistics, pages 920–929. PMLR, 2016

2016
[29]

Ganexplainer: Explainability method for graph neural network with generative adversarial nets

Xinrui Kang, Dong Liang, and Qinfeng Li. Ganexplainer: Explainability method for graph neural network with generative adversarial nets. InInternational Conference on Computing and Pattern Recognition, pages 297–302, 2022

2022
[30]

Semi-supervised classification with graph convolutional networks

Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. InICLR, 2017

2017
[31]

A comprehensive survey of dataset distillation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):17–32, 2023

Shiye Lei and Dacheng Tao. A comprehensive survey of dataset distillation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):17–32, 2023

2023
[32]

The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery.Queue, 16(3):31–57, 2018

Zachary C Lipton. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery.Queue, 16(3):31–57, 2018

2018
[33]

Learning model-level explanations of graph neural networks via subgraph order embedding space.Neural Networks, 191:107815, 2025

Li Liu, Pengyu Wan, Feiyan Zhang, Youmin Zhang, Qun Liu, and Guoyin Wang. Learning model-level explanations of graph neural networks via subgraph order embedding space.Neural Networks, 191:107815, 2025

2025
[34]

Graph distillation with eigenbasis matching

Yang Liu, Deyu Bo, and Chuan Shi. Graph distillation with eigenbasis matching.arXiv preprint arXiv:2310.09202, 2023

work page arXiv 2023
[35]

Cat: Balanced continual graph learning with graph condensation

Yilun Liu, Ruihong Qiu, and Zi Huang. Cat: Balanced continual graph learning with graph condensation. In2023 IEEE International Conference on Data Mining (ICDM), pages 1157–
[36]

Graph data condensation via self-expressive graph structure reconstruction

Zhanyu Liu, Chaolv Zeng, and Guanjie Zheng. Graph data condensation via self-expressive graph structure reconstruction. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1992–2002, 2024

1992
[37]

Parameterized explainer for graph neural network.Advances in Neural Information Processing Systems, 33:19620–19631, 2020

Dongsheng Luo, Wei Cheng, Dongkuan Xu, Wenchao Yu, Bo Zong, Haifeng Chen, and Xiang Zhang. Parameterized explainer for graph neural network.Advances in Neural Information Processing Systems, 33:19620–19631, 2020

2020
[38]

Gcare: Mitigating subgroup unfairness in graph conden- sation through adversarial regularization.Applied Sciences, 2023

Runze Mao, Wenqi Fan, and Qing Li. Gcare: Mitigating subgroup unfairness in graph conden- sation through adversarial regularization.Applied Sciences, 2023

2023
[39]

Dis- till n’explain: explaining graph neural networks using simple surrogates

Tamara Pereira, Erik Nascimento, Lucas E Resck, Diego Mesquita, and Amauri Souza. Dis- till n’explain: explaining graph neural networks using simple surrogates. InInternational Conference on Artificial Intelligence and Statistics, pages 6199–6214. PMLR, 2023

2023
[40]

Explainability methods for graph convolutional neural networks

Phillip E Pope, Soheil Kolouri, Mohammad Rostami, Charles E Martin, and Heiko Hoff- mann. Explainability methods for graph convolutional neural networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10772–10781, 2019

2019
[41]

Interpreting graph neural net- works for nlp with differentiable edge masking

Michael Sejr Schlichtkrull, Nicola De Cao, and Ivan Titov. Interpreting graph neural net- works for nlp with differentiable edge masking. InInternational Conference on Learning Representations, 2020

2020
[42]

Collective classification in network data.AI Magazine, 29(3):93–93, 2008

Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi- Rad. Collective classification in network data.AI Magazine, 29(3):93–93, 2008

2008
[43]

Pitfalls of Graph Neural Network Evaluation

Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. Pitfalls of graph neural network evaluation.arXiv preprint arXiv:1811.05868, 2018

work page Pith review arXiv 2018
[44]

Page: Prototype-based model-level explanations for graph neural networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(10):6559–6576, 2024

Yong-Min Shin, Sun-Woo Kim, and Won-Yong Shin. Page: Prototype-based model-level explanations for graph neural networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(10):6559–6576, 2024

2024
[45]

Learning MLPs on graphs: A unified view of effectiveness, robustness, and efficiency

Yijun Tian, Chuxu Zhang, Zhichun Guo, Xiangliang Zhang, and Nitesh Chawla. Learning MLPs on graphs: A unified view of effectiveness, robustness, and efficiency. InInternational Conference on Learning Representations, 2023

2023
[46]

[re] gnninterpreter: A probabilistic generative model-level explanation for graph neural networks.Transactions on Machine Learning Research, 2024

Ana Vasilcoiu, THF Stessen, Thies Kersten, and Batu Helvacioglu. [re] gnninterpreter: A probabilistic generative model-level explanation for graph neural networks.Transactions on Machine Learning Research, 2024

2024
[47]

Attention is all you need.Advances in Neural Information Processing Systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in Neural Information Processing Systems, 30, 2017

2017
[48]

Graph attention networks

Petar Veliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, Yoshua Bengio, et al. Graph attention networks. InInternational Conference on Learning Representa- tions, 2018

2018
[49]

Pgm-explainer: Probabilistic graphical model explanations for graph neural networks.Advances in Neural Information Processing Systems, 33:12225–12235, 2020

Minh Vu and My T Thai. Pgm-explainer: Probabilistic graphical model explanations for graph neural networks.Advances in Neural Information Processing Systems, 33:12225–12235, 2020

2020
[50]

Collaborative knowledge distillation for heterogeneous information network embedding

Can Wang, Sheng Zhou, Kang Yu, Defang Chen, Bolang Li, Yan Feng, and Chun Chen. Collaborative knowledge distillation for heterogeneous information network embedding. In Proceedings of the ACM Web Conference, pages 1631–1639, 2022

2022
[51]

Improving fairness in graph neural networks via mitigating sensitive attribute leakage

Yu Wang, Yuying Zhao, Yushun Dong, Huiyuan Chen, Jundong Li, and Tyler Derr. Improving fairness in graph neural networks via mitigating sensitive attribute leakage. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1938–1948, 2022

1938
[52]

Knowledge distillation improves graph structure augmentation for graph neural networks.Advances in Neural Information Processing Systems, 35:11815–11827, 2022

Lirong Wu, Haitao Lin, Yufei Huang, and Stan Z Li. Knowledge distillation improves graph structure augmentation for graph neural networks.Advances in Neural Information Processing Systems, 35:11815–11827, 2022. 12

2022
[53]

Kernel ridge regression-based graph dataset distillation

Zhe Xu, Yuzhong Chen, Menghai Pan, Huiyuan Chen, Mahashweta Das, Hao Yang, and Hanghang Tong. Kernel ridge regression-based graph dataset distillation. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2850–2861, 2023

2023
[54]

Global concept-based interpretability for graph neural networks via neuron analysis

Han Xuanyuan, Pietro Barbiero, Dobrik Georgiev, Lucie Charlotte Magister, and Pietro Liò. Global concept-based interpretability for graph neural networks via neuron analysis. InPro- ceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 10675–10683, 2023

2023
[55]

Tinygnn: Learning efficient graph neural networks

Bencheng Yan, Chaokun Wang, Gaoyang Guo, and Yunkai Lou. Tinygnn: Learning efficient graph neural networks. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1848–1856, 2020

2020
[56]

Extract the knowledge of graph neural networks and go beyond it: An effective knowledge distillation framework

Cheng Yang, Jiawei Liu, and Chuan Shi. Extract the knowledge of graph neural networks and go beyond it: An effective knowledge distillation framework. InProceedings of the Web Conference, pages 1227–1237, 2021

2021
[57]

Geometric knowledge distillation: Topology compression for graph neural networks.Advances in Neural Information Processing Systems, 35:29761–29775, 2022

Chenxiao Yang, Qitian Wu, and Junchi Yan. Geometric knowledge distillation: Topology compression for graph neural networks.Advances in Neural Information Processing Systems, 35:29761–29775, 2022

2022
[58]

Distilling knowledge from graph convolutional networks

Yiding Yang, Jiayan Qiu, Mingli Song, Dacheng Tao, and Xinchao Wang. Distilling knowledge from graph convolutional networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7074–7083, 2020

2020
[59]

Do transformers really perform badly for graph representation?Advances in Neural Information Processing Systems, 34:28877–28888, 2021

Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. Do transformers really perform badly for graph representation?Advances in Neural Information Processing Systems, 34:28877–28888, 2021

2021
[60]

Gnnexplainer: Generating explanations for graph neural networks.Advances in Neural Information Processing Systems, 32, 2019

Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. Gnnexplainer: Generating explanations for graph neural networks.Advances in Neural Information Processing Systems, 32, 2019

2019
[61]

Sail: Self-augmented graph contrastive learning

Lu Yu, Shichao Pei, Lizhong Ding, Jun Zhou, Longfei Li, Chuxu Zhang, and Xiangliang Zhang. Sail: Self-augmented graph contrastive learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8927–8935, 2022

2022
[62]

On explainability of graph neural networks via subgraph explorations

Hao Yuan, Haiyang Yu, Jie Wang, Kang Li, and Shuiwang Ji. On explainability of graph neural networks via subgraph explorations. InInternational Conference on Machine Learning, pages 12241–12252. PMLR, 2021

2021
[63]

Multi-scale distillation from multiple graph neural networks

Chunhai Zhang, Jie Liu, Kai Dang, and Wenzheng Zhang. Multi-scale distillation from multiple graph neural networks. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 4337–4344, 2022

2022
[64]

Graph-less neural networks: Teaching old mlps new tricks via distillation

Shichang Zhang, Yozen Liu, Yizhou Sun, and Neil Shah. Graph-less neural networks: Teaching old mlps new tricks via distillation. InInternational Conference on Learning Representations, 2022

2022
[65]

Two trades is not baffled: Condensing graph via crafting rational gradient matching.arXiv preprint arXiv:2402.04924, 2024

Tianle Zhang, Yuchen Zhang, Kun Wang, Kai Wang, Beining Yang, Kaipeng Zhang, Wenqi Shao, Ping Liu, Joey Tianyi Zhou, and Yang You. Two trades is not baffled: Condensing graph via crafting rational gradient matching.arXiv preprint arXiv:2402.04924, 2024

work page arXiv 2024
[66]

Rod: reception-aware online distillation for sparse graphs

Wentao Zhang, Yuezihan Jiang, Yang Li, Zeang Sheng, Yu Shen, Xupeng Miao, Liang Wang, Zhi Yang, and Bin Cui. Rod: reception-aware online distillation for sparse graphs. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 2232–2242, 2021

2021
[67]

Reliable data distillation on graph convolutional network

Wentao Zhang, Xupeng Miao, Yingxia Shao, Jiawei Jiang, Lei Chen, Olivier Ruas, and Bin Cui. Reliable data distillation on graph convolutional network. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pages 1399–1414, 2020. 13

2020
[68]

Relex: A model-agnostic relational model explainer

Yue Zhang, David Defazio, and Arti Ramesh. Relex: A model-agnostic relational model explainer. InProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 1042–1049, 2021

2021
[69]

Cold brew: Distilling graph node representations with incomplete or missing neighborhoods

Wenqing Zheng, Edward W Huang, Nikhil Rao, Sumeet Katariya, Zhangyang Wang, and Karthik Subbian. Cold brew: Distilling graph node representations with incomplete or missing neighborhoods. InInternational Conference on Learning Representations, 2022

2022
[70]

Structure-free graph condensation: From large-scale graphs to condensed graph-free data

Xin Zheng, Miao Zhang, Chunyang Chen, Quoc Viet Hung Nguyen, Xingquan Zhu, and Shirui Pan. Structure-free graph condensation: From large-scale graphs to condensed graph-free data. Advances in Neural Information Processing Systems, 36:6026–6047, 2023

2023
[71]

Data-free adversarial knowledge distillation for graph neural networks.arXiv preprint arXiv:2205.03811, 2022

Yuanxin Zhuang, Lingjuan Lyu, Chuan Shi, Carl Yang, and Lichao Sun. Data-free adversarial knowledge distillation for graph neural networks.arXiv preprint arXiv:2205.03811, 2022. 14 A Technical appendices and supplementary material Graph Neural Networks We adopt aGraph Convolutional Network (GCN)as the student and aGraph Attention Network (GAT)as the teach...

work page arXiv 2022