pith. sign in

arxiv: 2007.08663 · v1 · pith:BO7S35T3new · submitted 2020-07-16 · 💻 cs.LG · cs.NE· stat.ML

TUDataset: A collection of benchmark datasets for learning with graphs

Pith reviewed 2026-05-25 07:49 UTC · model grok-4.3

classification 💻 cs.LG cs.NEstat.ML
keywords graph classificationgraph regressionbenchmark datasetsgraph neural networksTUDatasetmachine learning on graphskernel methods
0
0 comments X

The pith

The TUDataset supplies over 120 benchmark datasets for graph classification and regression together with Python data loaders, kernel and graph neural network baselines, and evaluation tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the TUDataset collection to address the shortage of standardized benchmarks for supervised learning on graph data. It gathers more than 120 datasets spanning many applications and sizes. Accompanying resources include Python loaders for the data, reference implementations of kernel methods and graph neural networks, plus tools to run and compare experiments. The goal is to make it easier for researchers to perform consistent and reproducible work in graph classification and regression. All materials are released online.

Core claim

We introduce the TUDataset for graph classification and regression. The collection consists of over 120 datasets of varying sizes from a wide range of applications. We provide Python-based data loaders, kernel and graph neural network baseline implementations, and evaluation tools. Here, we give an overview of the datasets, standardized evaluation procedures, and provide baseline experiments.

What carries the argument

The TUDataset collection, which aggregates more than 120 benchmark datasets for graph tasks and supplies loaders plus baseline code for kernels and graph neural networks.

If this is right

  • Standardized evaluation procedures enable direct comparisons of methods on the same graph classification tasks.
  • Baseline kernel and graph neural network results serve as reference points for new approaches.
  • Access to datasets from many application areas supports testing across domains.
  • Reproducible code allows verification of reported performance numbers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread adoption might reduce the spread of incomparable results across different studies.
  • The collection could become a starting point for creating additional standardized test suites in related graph tasks.

Load-bearing premise

The main obstacle to progress in graph learning is the lack of meaningful benchmark datasets and standardized evaluation procedures, so releasing this collection will reduce that obstacle.

What would settle it

Papers in the area continue to rely on non-overlapping datasets and differing evaluation protocols without adopting the TUDataset resources.

read the original abstract

Recently, there has been an increasing interest in (supervised) learning with graph data, especially using graph neural networks. However, the development of meaningful benchmark datasets and standardized evaluation procedures is lagging, consequently hindering advancements in this area. To address this, we introduce the TUDataset for graph classification and regression. The collection consists of over 120 datasets of varying sizes from a wide range of applications. We provide Python-based data loaders, kernel and graph neural network baseline implementations, and evaluation tools. Here, we give an overview of the datasets, standardized evaluation procedures, and provide baseline experiments. All datasets are available at www.graphlearning.io. The experiments are fully reproducible from the code available at www.github.com/chrsmrrs/tudataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper claims to introduce the TUDataset collection for graph classification and regression, consisting of over 120 datasets from various applications. It provides Python-based data loaders, kernel and GNN baseline implementations, and evaluation tools. The abstract states that an overview of the datasets, standardized evaluation procedures, and baseline experiments are given, with all datasets available at www.graphlearning.io and experiments reproducible from code at the provided GitHub repository.

Significance. If the collection is comprehensive and the tools effective, this resource could help standardize benchmarks in graph learning, facilitating advancements by addressing the lack of meaningful benchmarks. The explicit commitment to reproducibility through available code is a strength that enhances the potential impact.

major comments (1)
  1. [Abstract] Abstract: The abstract asserts the existence and availability of the collection and tools but supplies no details on dataset selection criteria, validation, or baseline performance numbers; soundness of the central claim cannot be verified beyond the statement of availability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and address the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The abstract asserts the existence and availability of the collection and tools but supplies no details on dataset selection criteria, validation, or baseline performance numbers; soundness of the central claim cannot be verified beyond the statement of availability.

    Authors: Abstracts are intentionally concise and serve to summarize the paper's contributions at a high level. The manuscript body provides the overview of the datasets (including selection criteria and characteristics from various applications), standardized evaluation procedures, and baseline experiments with performance numbers. The central claim of introducing a reproducible collection is substantiated by the public availability of all datasets at www.graphlearning.io and the code at the GitHub repository, enabling direct verification and use by the community. We maintain that the abstract appropriately highlights these elements without requiring the level of detail suggested. revision: no

Circularity Check

0 steps flagged

No significant circularity; resource announcement only

full rationale

The paper is a dataset collection announcement containing no derivations, equations, predictions, fitted parameters, or load-bearing technical claims. The abstract describes introducing TUDataset with loaders and baselines but presents no analytic chain that could reduce to self-definition, fitted inputs, or self-citations. This is a standard resource paper whose claims are self-contained and externally verifiable by dataset availability.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are required; the contribution is a curated resource rather than a derivation.

pith-pipeline@v0.9.0 · 5643 in / 1026 out tokens · 42756 ms · 2026-05-25T07:49:20.880476+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 24 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. GraphIP-Bench: How Hard Is It to Steal a Graph Neural Network, and Can We Stop It?

    cs.CR 2026-05 accept novelty 8.0

    GraphIP-Bench shows stealing GNNs is easy at moderate query budgets, most defenses fail to block or reliably trace extraction, and watermarks lose verification power on surrogates while heterophilic graphs are harder ...

  2. HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals

    cs.LG 2025-06 conditional novelty 8.0

    Authors release HSG-12M, a dataset of 16.7 million spatial multigraphs generated from non-Hermitian crystal energy spectra via the Poly2Graph pipeline, along with initial GNN benchmarks.

  3. Beyond Oversquashing: Understanding Signal Propagation in GNNs Via Observables

    cs.LG 2026-05 unverdicted novelty 7.0

    Quantum-inspired observables reveal poor signal routing in standard spectral GNNs and motivate Schrödinger GNNs with superior propagation capacity.

  4. Higher-order Persistence Diagrams

    cs.CG 2026-05 unverdicted novelty 7.0

    Higher-order persistence diagrams are defined recursively via interval containments, and their aggregations can be evaluated in nearly linear time using zeta transforms instead of explicit pair enumeration.

  5. CTQWformer: A CTQW-based Transformer for Graph Classification

    cs.LG 2026-05 unverdicted novelty 7.0

    CTQWformer fuses continuous-time quantum walks into a graph transformer and recurrent module to outperform standard GNNs and graph kernels on classification benchmarks.

  6. Concept Graph Convolutions: Message Passing in the Concept Space

    cs.LG 2026-04 unverdicted novelty 7.0

    Concept Graph Convolutions perform message passing on node concepts to increase interpretability of graph neural networks without losing task performance.

  7. R2G: A Multi-View Circuit Graph Benchmark Suite from RTL to GDSII

    cs.CV 2026-04 accept novelty 7.0

    R2G is a multi-view circuit graph benchmark showing that representation choice affects GNN accuracy more than model architecture, with node-centric views and deeper decoders performing best.

  8. Efficient and Accurate Graph Classification with Hyperdimensional Computing on FPGA

    cs.AR 2025-12 conditional novelty 7.0

    HyperX is the first end-to-end FPGA accelerator for Nyström-based HDC graph classification, delivering 6.85× speedup and 169× energy efficiency over CPU baselines plus 3.4% average accuracy gain on TUDataset benchmarks.

  9. Graph Learning via Logic-Based Weisfeiler-Leman Variants and Tabularization

    cs.LG 2025-08 unverdicted novelty 7.0

    Logic-based Weisfeiler-Leman variants enable graph-to-table conversion for classification that matches GNN and graph transformer accuracy while running 5-20x faster without GPUs.

  10. HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals

    cs.LG 2025-06 unverdicted novelty 7.0

    HSG-12M is a large dataset of spatial multigraphs derived from non-Hermitian crystal energy spectra via the Poly2Graph pipeline, positioned as the first large-scale benchmark of this graph type.

  11. A Benchmark Dataset for Graph Regression with Homogeneous and Multi-Relational Variants

    cs.LG 2025-05 unverdicted novelty 7.0

    RelSC is a new graph regression benchmark from program graphs with execution time labels, released in homogeneous (RelSC-H) and multi-relational (RelSC-M) variants to study representation effects.

  12. Estimating Subgraph Importance with Structural Prior Domain Knowledge

    cs.LG 2026-05 unverdicted novelty 6.0

    A label-free Group Lasso method estimates important subgraphs in pretrained GNNs by incorporating domain structural knowledge.

  13. Quantum Injection Pathways for Implicit Graph Neural Networks

    quant-ph 2026-05 unverdicted novelty 6.0

    Independent quantum signal injection into graph DEQs yields higher test accuracy and fewer solver iterations than state-dependent or backbone-dependent injection and classical equilibrium models on NCI1, PROTEINS, and...

  14. GraphNetz: Statistical Benchmarking of Graph Neural Networks with Paired Tests and Rank Aggregation

    cs.CE 2026-05 unverdicted novelty 6.0

    GraphNetz supplies an automated statistical pipeline for GNN benchmarking that includes per-cell confidence intervals, paired tests with multiple-comparison correction, and critical-difference diagrams across tasks an...

  15. Subgraph Concept Networks: Concept Levels in Graph Classification

    cs.LG 2026-04 unverdicted novelty 6.0

    Subgraph Concept Network is a new GNN architecture that distills meaningful concepts at node, subgraph, and graph levels via soft clustering to improve explainability while maintaining competitive accuracy.

  16. Learning from Historical Activations in Graph Neural Networks

    cs.LG 2026-01 unverdicted novelty 6.0

    HISTOGRAPH applies unified layer-wise attention followed by node-wise attention over historical GNN activations to improve graph classification, especially in deep models.

  17. Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks

    cs.LG 2025-09 unverdicted novelty 6.0

    Adaptive canonicalization selects input canonical forms by maximizing network predictive confidence to yield continuous symmetry-preserving models with universal approximation for equivariant geometric networks.

  18. How Embeddings Shape Graph Neural Networks: Classical vs Quantum-Oriented Node Representations

    cs.LG 2026-04 unverdicted novelty 5.0

    Quantum-oriented embeddings deliver consistent gains on structure-driven graph datasets while classical baselines perform adequately on attribute-limited social graphs, under identical training pipelines across five T...

  19. GP2F: Cross-Domain Graph Prompting with Adaptive Fusion of Pre-trained Graph Neural Networks

    cs.LG 2026-02 unverdicted novelty 5.0

    GP2F is a dual-branch graph prompting framework that fuses frozen pre-trained knowledge with task-specific adaptation to reduce estimation error and outperform baselines in cross-domain few-shot node and graph classification.

  20. OpenGLT: A Comprehensive Benchmark of Graph Neural Networks for Graph-Level Tasks

    cs.LG 2025-01 unverdicted novelty 5.0

    OpenGLT benchmark finds no single GNN architecture dominates graph-level tasks, with subgraph-based models strongest in expressiveness, graph learning and SSL models in robustness, node and pooling models in efficienc...

  21. Position: Graph Condensation Needs a Reset -- Move Beyond Full-dataset Training and Model-Dependence

    cs.LG 2026-05 conditional novelty 4.0

    The paper claims current graph condensation approaches are flawed due to full-dataset training requirements, high overhead, poor generalization, and misleading evaluation metrics, calling for a reset toward lightweigh...

  22. Fine-Grained Graph Generation through Latent Mixture Scheduling

    cs.AI 2026-05 unverdicted novelty 4.0

    A novel CVAE with mixture scheduling achieves fine-grained structural control in graph generation, showing high quality and controllability on five datasets.

  23. Position: Graph Condensation Needs a Reset -- Move Beyond Full-dataset Training and Model-Dependence

    cs.LG 2026-05 unverdicted novelty 3.0

    Graph condensation methods must move beyond full-dataset training and model dependence toward lightweight, architecture-agnostic designs to achieve practical efficiency in GNNs.

  24. Graph Rewiring in GNNs to Mitigate Over-Squashing and Over-Smoothing: A Survey

    cs.LG 2024-11 unverdicted novelty 2.0

    A survey compiling graph rewiring techniques for mitigating over-squashing and over-smoothing in GNNs.