pith. machine review for the scientific record. sign in

arxiv: 1810.00826 · v3 · submitted 2018-10-01 · 💻 cs.LG · cs.CV· stat.ML

Recognition: no theorem link

How Powerful are Graph Neural Networks?

Jure Leskovec, Keyulu Xu, Stefanie Jegelka, Weihua Hu

Pith reviewed 2026-05-12 08:29 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML
keywords graph neural networksexpressive powerWeisfeiler-Lehman testgraph isomorphismGINneighborhood aggregationgraph classification
0
0 comments X

The pith

A simple graph neural network architecture is as powerful as the Weisfeiler-Lehman graph isomorphism test.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Graph neural networks build node representations by recursively aggregating and transforming features from neighboring nodes. The paper proves that popular models such as Graph Convolutional Networks and GraphSAGE cannot distinguish certain pairs of non-isomorphic graphs because their aggregation steps lose information about neighbor multisets. It introduces a new model, GIN, whose update rule combines summation of neighbor features with multilayer perceptrons to recover all distinctions possible under the neighborhood aggregation scheme. This makes GIN exactly as strong as the Weisfeiler-Lehman test, a standard procedure for checking graph isomorphism. Readers care because the result explains why some GNNs succeed on graph tasks while others fail and supplies a concrete way to design stronger models.

Core claim

The paper shows that within the neighborhood aggregation framework, no GNN can exceed the discriminative power of the Weisfeiler-Lehman test, and that GIN achieves this upper bound by using an injective aggregation function realized through summation followed by a multilayer perceptron.

What carries the argument

The GIN update rule that feeds the sum of neighbor representations plus a scaled self-representation into a multilayer perceptron to produce the next node state.

If this is right

  • Standard models such as GCN and GraphSAGE are provably weaker and cannot separate certain simple non-isomorphic graphs.
  • GIN can learn any distinction that the Weisfeiler-Lehman test can make.
  • The same analysis framework can rank the power of any other GNN variant by examining whether its aggregation is injective.
  • GIN achieves state-of-the-art accuracy on multiple graph classification benchmarks when the theoretical condition holds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • For graph problems where the Weisfeiler-Lehman test itself fails, such as certain strongly regular graphs, even GIN will be unable to separate them.
  • The result indicates that adding mechanisms like attention or gating inside the same aggregation framework cannot increase power beyond the Weisfeiler-Lehman level.
  • The framework could be applied to analyze GNN variants that operate on directed graphs or graphs with continuous edge attributes.

Load-bearing premise

Multilayer perceptrons can realize injective functions on the multisets of neighbor feature vectors.

What would settle it

A pair of non-isomorphic graphs that the Weisfeiler-Lehman test distinguishes but a trained GIN with sufficient capacity and injective multilayer perceptrons fails to distinguish.

read the original abstract

Graph Neural Networks (GNNs) are an effective framework for representation learning of graphs. GNNs follow a neighborhood aggregation scheme, where the representation vector of a node is computed by recursively aggregating and transforming representation vectors of its neighboring nodes. Many GNN variants have been proposed and have achieved state-of-the-art results on both node and graph classification tasks. However, despite GNNs revolutionizing graph representation learning, there is limited understanding of their representational properties and limitations. Here, we present a theoretical framework for analyzing the expressive power of GNNs to capture different graph structures. Our results characterize the discriminative power of popular GNN variants, such as Graph Convolutional Networks and GraphSAGE, and show that they cannot learn to distinguish certain simple graph structures. We then develop a simple architecture that is provably the most expressive among the class of GNNs and is as powerful as the Weisfeiler-Lehman graph isomorphism test. We empirically validate our theoretical findings on a number of graph classification benchmarks, and demonstrate that our model achieves state-of-the-art performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a theoretical framework for analyzing the expressive power of graph neural networks based on neighborhood aggregation. It shows that common GNNs such as GCN and GraphSAGE cannot distinguish certain simple graph structures. The authors then introduce the Graph Isomorphism Network (GIN), which is proven to be the most expressive GNN and equivalent in power to the Weisfeiler-Lehman graph isomorphism test when the update function is injective. Empirical results on standard graph classification benchmarks demonstrate that GIN achieves state-of-the-art performance.

Significance. If the central results hold, the work provides a valuable theoretical characterization of GNN limitations and capabilities by connecting them directly to the WL test. The GIN construction is derived cleanly from the requirement of injective multiset aggregation. Credit is due for the use of public benchmarks and consistent empirical gains that support the theory.

major comments (2)
  1. [§3] §3 (theoretical analysis of GIN): The claim that GIN is as powerful as the WL test rests on the update function distinguishing arbitrary multisets of neighbor features via sum aggregation plus an injective map. The manuscript asserts that MLPs can realize the required injectivity, but invokes this without an explicit construction or argument that finite-width MLPs with standard activations (e.g., ReLU) remain injective over the real-valued feature spaces and the countable multisets generated across WL iterations on finite graphs.
  2. [Theorem 3] Proof of Theorem 3 (or equivalent statement on WL equivalence): Universal approximation guarantees that MLPs can approximate continuous functions on compact sets, yet this does not automatically ensure the approximating network is injective on the discrete multiset inputs that matter for the isomorphism test. Because the equivalence is load-bearing for the 'most expressive among GNNs' claim, this gap requires clarification or a concrete fix (e.g., a note on practical injectivity or a different construction).
minor comments (2)
  1. [§4] The empirical section reports consistent gains on public benchmarks but would benefit from explicit statements on the number of runs and variance to strengthen reproducibility claims.
  2. [Eq. (3)] Notation in the GIN update equation could be clarified to separate the learnable MLP parameters from the fixed aggregation operator.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive summary and constructive major comments. The feedback highlights a subtle but important point about ensuring injectivity when using MLPs as update functions. We address both comments below and will revise the manuscript accordingly to strengthen the presentation of the theoretical results without altering the core claims.

read point-by-point responses
  1. Referee: [§3] §3 (theoretical analysis of GIN): The claim that GIN is as powerful as the WL test rests on the update function distinguishing arbitrary multisets of neighbor features via sum aggregation plus an injective map. The manuscript asserts that MLPs can realize the required injectivity, but invokes this without an explicit construction or argument that finite-width MLPs with standard activations (e.g., ReLU) remain injective over the real-valued feature spaces and the countable multisets generated across WL iterations on finite graphs.

    Authors: We appreciate this observation. Theorem 3 states that GIN is as powerful as the WL test whenever the update function is injective; the manuscript notes that MLPs can serve as such functions via universal approximation. The referee is correct that density in the continuous-function space does not automatically yield an injective finite-width network on the specific (real-valued) multiset sums arising in WL iterations. Because any finite graph produces only finitely many distinct node features after a finite number of iterations, the relevant domain is a finite set of points in R^d. On any finite set there always exists an injective map, and a sufficiently wide MLP with ReLU (or LeakyReLU) activations can realize it by separating those points. We will add a short clarifying paragraph in §3 (and a footnote to Theorem 3) that (i) explicitly invokes the finiteness of the feature set on finite graphs and (ii) sketches a concrete construction: a linear layer of width at least equal to the number of distinct sums plus one, followed by a coordinate-wise strictly increasing activation on the extra dimension. This revision clarifies the argument while leaving the theorem statements unchanged. revision: yes

  2. Referee: [Theorem 3] Proof of Theorem 3 (or equivalent statement on WL equivalence): Universal approximation guarantees that MLPs can approximate continuous functions on compact sets, yet this does not automatically ensure the approximating network is injective on the discrete multiset inputs that matter for the isomorphism test. Because the equivalence is load-bearing for the 'most expressive among GNNs' claim, this gap requires clarification or a concrete fix (e.g., a note on practical injectivity or a different construction).

    Authors: We agree that the universal-approximation argument alone leaves a gap regarding injectivity on the discrete inputs relevant to the isomorphism test. The proof of Theorem 3 proceeds by showing that an injective update function together with sum aggregation recovers the WL multiset hash; it therefore assumes the existence of an injective MLP rather than proving that every MLP is injective. To close the gap we will revise the proof sketch and the surrounding text to include the finite-set argument mentioned above together with an explicit, practical construction: embed the aggregated vector v into (v, ε·||v||_2) for a small ε>0 and apply a coordinate-wise sigmoid; the resulting map is injective on any bounded set and can be realized by a two-layer MLP. We will also note that, in practice, the random initialization and width used in our experiments already produce injective behavior on the graphs we tested. These additions address the referee’s concern directly and will be incorporated in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: GIN-WL equivalence derived from external isomorphism test

full rationale

The paper's central claim derives GIN expressivity by constructing an update rule (sum aggregation plus MLP) whose distinguishing power is shown to match the independently defined Weisfeiler-Lehman procedure. This is not a self-definition, fitted prediction, or self-citation chain; the WL test is an external benchmark whose definition and iteration rules predate the paper and are not constructed from GIN. The MLP injectivity assumption is invoked to bridge the construction to WL power but does not create a reduction of the result to its own inputs by construction. The derivation remains self-contained against the external reference.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The central claim rests on the neighborhood-aggregation framework, the universal approximation property of MLPs, and the established Weisfeiler-Lehman procedure as the expressivity reference.

free parameters (1)
  • epsilon
    Learnable scalar that scales the node's own feature in the GIN update; its value is fitted during training but does not affect the expressivity proof.
axioms (2)
  • standard math Multilayer perceptrons can approximate any continuous function on compact sets to arbitrary precision
    Invoked to guarantee that the update function can be made injective.
  • domain assumption The Weisfeiler-Lehman algorithm is a standard, externally defined measure of graph discriminative power
    Used as the benchmark against which GNN power is compared.
invented entities (1)
  • Graph Isomorphism Network (GIN) no independent evidence
    purpose: A neighborhood-aggregation GNN whose update rule is injective and therefore maximally expressive within the GNN class
    New architecture introduced and analyzed in the paper; its power is established by construction and proof rather than by external data.

pith-pipeline@v0.9.0 · 5491 in / 1469 out tokens · 74445 ms · 2026-05-12T08:29:11.374212+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 38 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Hyper-Dimensional Fingerprints as Molecular Representations

    cs.LG 2026-04 unverdicted novelty 8.0

    Hyperdimensional fingerprints use algebraic operations on high-dimensional vectors to create training-free molecular representations that preserve similarity better than Morgan fingerprints at low dimensions and impro...

  2. Gauge-Equivariant Graph Neural Networks for Lattice Gauge Theories

    cond-mat.str-el 2026-04 unverdicted novelty 8.0

    Gauge-equivariant graph neural networks embed non-Abelian local symmetries directly into message passing for lattice gauge theories, enabling learning of nonlocal observables from local operations.

  3. Topology-Preserving Neural Operator Learning via Hodge Decomposition

    cs.LG 2026-05 unverdicted novelty 7.0

    Hodge Spectral Duality provides a topology-preserving neural operator by isolating unlearnable topological components via Hodge orthogonality and operator splitting.

  4. scShapeBench: Discovering geometry from high dimensional scRNAseq data

    cs.LG 2026-05 unverdicted novelty 7.0

    scShapeBench supplies synthetic and real annotated single-cell datasets across four shape categories, with scReebTower outperforming PAGA and Mapper on topology-aware metrics.

  5. TopoU-Net: a U-Net architecture for topological domains

    cs.LG 2026-05 unverdicted novelty 7.0

    TopoU-Net is a rank-path U-Net for combinatorial complexes that encodes by lifting cochains upward along incidences, decodes by transporting downward, and merges via skip connections at matched ranks.

  6. CTQWformer: A CTQW-based Transformer for Graph Classification

    cs.LG 2026-05 unverdicted novelty 7.0

    CTQWformer fuses continuous-time quantum walks into a graph transformer and recurrent module to outperform standard GNNs and graph kernels on classification benchmarks.

  7. Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning

    cs.LG 2026-05 unverdicted novelty 7.0

    SoftBlobGIN combines ESM-2 representations with protein contact graphs via a lightweight GNN and differentiable substructure pooling to achieve 92.8% accuracy on enzyme classification, raise binding-site AUROC to 0.98...

  8. Graph Neural Networks in the Wilson Loop Representation of Abelian Lattice Gauge Theories

    cond-mat.str-el 2026-05 unverdicted novelty 7.0

    A gauge-invariant GNN using Wilson loops as inputs accurately predicts observables and simulates dynamics in Z2 and U(1) lattice gauge models.

  9. LUMINA: A Grid Foundation Model for Benchmarking AC Optimal Power Flow Surrogate Learning

    cs.LG 2026-05 unverdicted novelty 7.0

    LUMINA-Bench is a standardized evaluation framework for ACOPF surrogate models that tests generalization across multiple grid topologies using accuracy and physics-constraint metrics.

  10. PiGGO: Physics-Guided Learnable Graph Kalman Filters for Virtual Sensing of Nonlinear Dynamic Structures under Uncertainty

    cs.LG 2026-04 unverdicted novelty 7.0

    PiGGO integrates a learned graph neural ODE as the continuous-time dynamics model within an extended Kalman filter to enable online virtual sensing and uncertainty-aware state estimation for nonlinear dynamic systems ...

  11. Break the Optimization Barrier of LLM-Enhanced Recommenders: A Theoretical Analysis and Practical Framework

    cs.IR 2026-04 unverdicted novelty 7.0

    TF-LLMER resolves optimization barriers in LLM-enhanced recommenders through embedding normalization and Rec-PCA that aligns semantic representations with collaborative co-occurrence graphs.

  12. Concept Graph Convolutions: Message Passing in the Concept Space

    cs.LG 2026-04 unverdicted novelty 7.0

    Concept Graph Convolutions perform message passing on node concepts to increase interpretability of graph neural networks without losing task performance.

  13. Modeling Higher-Order Brain Interactions via a Multi-View Information Bottleneck Framework for fMRI-based Psychiatric Diagnosis

    cs.LG 2026-04 unverdicted novelty 7.0

    A tri-view information-bottleneck model that fuses pairwise, triadic and tetradic O-information outperforms eleven baselines on four fMRI psychiatric datasets while revealing region-level synergy-redundancy patterns.

  14. Beyond Nodes vs. Edges: A Multi-View Fusion Framework for Provenance-Based Intrusion Detection

    cs.CR 2026-04 unverdicted novelty 7.0

    PROVFUSION fuses three complementary views of provenance data with lightweight schemes and voting to achieve higher detection accuracy and lower false positives than node- or edge-only baselines on nine benchmarks.

  15. R2G: A Multi-View Circuit Graph Benchmark Suite from RTL to GDSII

    cs.CV 2026-04 accept novelty 7.0

    R2G is a multi-view circuit graph benchmark showing that representation choice affects GNN accuracy more than model architecture, with node-centric views and deeper decoders performing best.

  16. Graph Topology Information Enhanced Heterogeneous Graph Representation Learning

    cs.LG 2026-04 unverdicted novelty 7.0

    ToGRL learns high-quality graph structures from raw heterogeneous graphs via a two-stage topology extraction process and prompt tuning, outperforming prior methods on five datasets.

  17. DSBD: Dual-Aligned Structural Basis Distillation for Graph Domain Adaptation

    cs.LG 2026-04 unverdicted novelty 7.0

    DSBD distills a dual-aligned structural basis to adapt GNNs across graphs with structural distribution shifts, outperforming prior methods on benchmarks.

  18. Complex-Valued GNNs for Distributed Basis-Invariant Control of Planar Systems

    cs.LG 2026-04 unverdicted novelty 7.0

    Complex-valued GNNs using phase-equivariant activations achieve global basis invariance for distributed planar control, outperforming real-valued baselines in data efficiency, tracking, and generalization on flocking.

  19. Rethinking Efficient Graph Coarsening via a Non-Selfishness Principle

    cs.LG 2026-05 unverdicted novelty 6.0

    NOPE coarsens graphs via neighborhood interference rather than selfish pairwise matching to reach linear memory and near-linear time, with NOPE* variant delivering 1.8-10x speedups and comparable or better learning re...

  20. Learning the Interaction Prior for Protein-Protein Interaction Prediction: A Model-Agnostic Approach

    cs.AI 2026-05 unverdicted novelty 6.0

    L3-PPI reformulates PPI pair classification as graph classification over a prompt graph with controlled virtual L3 paths to inject the biological interaction prior and boost performance on existing models.

  21. Learning the Interaction Prior for Protein-Protein Interaction Prediction: A Model-Agnostic Approach

    cs.AI 2026-05 unverdicted novelty 6.0

    L3-PPI reformulates protein-protein interaction prediction as a graph classification task over a prompt graph containing virtual L3 paths to incorporate biological complementarity prior and improve performance.

  22. Fairness of Explanations in Artificial Intelligence (AI): A Unifying Framework, Axioms, and Future Direction toward Responsible AI

    cs.AI 2026-05 unverdicted novelty 6.0

    A conditional invariance framework defines explanation fairness as explanations being statistically independent of protected attributes given task-relevant features, unifying existing metrics and enabling procedural b...

  23. Quantum Injection Pathways for Implicit Graph Neural Networks

    quant-ph 2026-05 unverdicted novelty 6.0

    Independent quantum signal injection into graph DEQs yields higher test accuracy and fewer solver iterations than state-dependent or backbone-dependent injection and classical equilibrium models on NCI1, PROTEINS, and...

  24. GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model

    cs.AI 2026-05 unverdicted novelty 6.0

    GCCM prevents shortcut collapse in consistency models for graph prediction by using contrastive negative pairs and input feature perturbation, leading to better performance than deterministic baselines.

  25. H3: A Healthcare Three-Hop Index for Physician Referral Network Prediction

    cs.SI 2026-05 unverdicted novelty 6.0

    H3 is a new three-hop index that predicts physician referrals using normalized indirect pathways and outperforms heuristics and neural nets on Medicare shared-patient data in both within-period and cross-period settings.

  26. TransXion: A High-Fidelity Graph Benchmark for Realistic Anti-Money Laundering

    cs.LG 2026-04 unverdicted novelty 6.0

    TransXion supplies a 3-million-transaction graph benchmark with profile-aware normal activity and stochastic illicit subgraphs that produces lower detection scores than prior AML datasets.

  27. GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning

    cs.AI 2026-04 unverdicted novelty 6.0

    GraphDC applies divide-and-conquer multi-agent LLM reasoning to graph algorithms by decomposing graphs into subgraphs for local agents and integrating via a master agent, outperforming direct methods especially on lar...

  28. UniDetect: LLM-Driven Universal Fraud Detection across Heterogeneous Blockchains

    cs.CR 2026-04 unverdicted novelty 6.0

    UniDetect is an LLM-based system that generates universal transaction summary texts and uses two-stage multimodal training on text plus graphs to detect fraudulent accounts across heterogeneous blockchains, outperform...

  29. NOSE: Neural Olfactory-Semantic Embedding with Tri-Modal Orthogonal Contrastive Learning

    cs.CL 2026-04 unverdicted novelty 6.0

    NOSE aligns molecular, receptor, and linguistic modalities in a shared embedding space via tri-modal orthogonal contrastive learning and weak positive samples, achieving SOTA performance and zero-shot generalization o...

  30. ML for the hKLM at the 2nd Detector

    physics.ins-det 2026-04 unverdicted novelty 6.0

    Graph neural networks trained on simulated hits outperform classical methods for energy resolution, timing, and particle identification in an iron-scintillator sampling calorimeter, with an integrated multi-objective ...

  31. BiScale-GTR: Fragment-Aware Graph Transformers for Multi-Scale Molecular Representation Learning

    cs.LG 2026-04 unverdicted novelty 6.0

    BiScale-GTR achieves claimed state-of-the-art results on MoleculeNet, PharmaBench and LRGB by combining improved fragment tokenization with a parallel GNN-Transformer architecture that operates at both atom and fragme...

  32. MMP-Refer: Multimodal Path Retrieval-augmented LLMs For Explainable Recommendation

    cs.IR 2026-04 conditional novelty 6.0

    MMP-Refer augments LLMs with multimodal retrieval paths and a trainable collaborative adapter to produce more accurate and explainable recommendations.

  33. Layer Embedding Deep Fusion Graph Neural Network

    cs.LG 2026-04 unverdicted novelty 5.0

    LEDF-GNN fuses multi-layer embeddings nonlinearly and runs parallel processing on original and reconstructed topologies to capture long-range dependencies and mitigate heterophily-induced misaggregation in deep GNNs.

  34. Inductive Subgraphs as Shortcuts: Causal Disentanglement for Heterophilic Graph Learning

    cs.LG 2026-04 unverdicted novelty 5.0

    Inductive subgraphs serve as shortcuts in heterophilic graphs, and CD-GNN disentangles spurious from causal subgraphs by blocking non-causal paths to improve robustness and accuracy.

  35. Empowering Power Outage Prediction with Spatially Aware Hybrid Graph Neural Networks and Contrastive Learning

    cs.LG 2026-04 unverdicted novelty 5.0

    SA-HGNN with contrastive learning improves power outage prediction by modeling spatial effects of extreme weather on infrastructure across multiple utility territories.

  36. Extracting Money Laundering Transactions from Quasi-Temporal Graph Representation

    cs.LG 2026-04 unverdicted novelty 5.0

    ExSTraQt uses quasi-temporal graph representations and supervised learning to detect suspicious transactions, achieving F1 score uplifts of up to 1% on real data and over 8% on synthetic datasets compared to prior AML models.

  37. On Improving Graph Neural Networks for QSAR by Pre-training on Extended-Connectivity Fingerprints

    cs.LG 2026-05 unverdicted novelty 4.0

    Pre-training GNNs on ECFP prediction produces statistically significant QSAR gains on five of six Biogen benchmarks with OOD splits, but underperforms on heterogeneous datasets and complex endpoints like binding affinity.

  38. Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction

    cs.LG 2026-04 unverdicted novelty 4.0

    Large benchmark shows classical ML and GNNs outperform pretrained large models on most of 22 drug-discovery endpoints under strict cross-validation.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · cited by 37 Pith papers

  1. [1]

    Diffusion-convolutional neural networks

    James Atwood and Don Towsley. Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), pp. 1993–2001,

  2. [2]

    Canonical labelling of graphs in linear average time

    László Babai and Ludik Kucera. Canonical labelling of graphs in linear average time. In Foundations of Computer Science, 1979., 20th Annual Symposium on, pp. 39–46. IEEE,

  3. [3]

    The weisfeiler-lehman method and graph isomorphism testing

    Brendan L Douglas. The weisfeiler-lehman method and graph isomorphism testing. arXiv preprint arXiv:1101.5211,

  4. [4]

    Multilayer feedforward networks are universal approximators

    11 Published as a conference paper at ICLR 2019 Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366,

  5. [5]

    Deriving neural architectures from sequence and graph kernels

    Tao Lei, Wengong Jin, Regina Barzilay, and Tommi Jaakkola. Deriving neural architectures from sequence and graph kernels. pp. 2024–2033,

  6. [6]

    Janossy pooling: Learning deep permutation-invariant functions for variable-size inputs.arXiv:1811.01900,

    Ryan L Murphy, Balasubramaniam Srinivasan, Vinayak Rao, and Bruno Ribeiro. Janossy pool- ing: Learning deep permutation-invariant functions for variable-size inputs. arXiv preprint arXiv:1811.01900,

  7. [7]

    Learning convolutional neural networks for graphs

    Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural networks for graphs. In International Conference on Machine Learning (ICML), pp. 2014–2023,

  8. [8]

    Dropout: a simple way to prevent neural networks from overfitting

    Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958,

  9. [9]

    Graph attention networks

    12 Published as a conference paper at ICLR 2019 Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations (ICLR),

  10. [10]

    Graph capsule convolutional neural networks

    Saurabh Verma and Zhi-Li Zhang. Graph capsule convolutional neural networks. arXiv preprint arXiv:1805.08090,

  11. [11]

    Suppose after k iterations, a graph neural networkA hasA(G1)̸=A(G2) but the WL test cannot decide G1 and G2 are non-isomorphic

    13 Published as a conference paper at ICLR 2019 A P ROOF FOR LEMMA 2 Proof. Suppose after k iterations, a graph neural networkA hasA(G1)̸=A(G2) but the WL test cannot decide G1 and G2 are non-isomorphic. It follows that from iteration 0 to k in the WL test, G1 and G2 always have the same collection of node labels. In particular, because G1 and G2 have the...

  12. [12]

    If we can show that the range of any function g defined on multisets of bounded size from a countable set is also countable, then the lemma holds for any g(k) by induction

    Now we go back to proving our lemma. If we can show that the range of any function g defined on multisets of bounded size from a countable set is also countable, then the lemma holds for any g(k) by induction. Thus, our goal is to show that the range of such g is countable. First, it is clear that the mapping from g(X) to X is injective because g is a well...