GraphBench: Next-generation graph learning benchmarking

Timo Stoll , Chendi Qian , Ben Finkelshtein , Ali Parviz , Darius Weber , Fabrizio Frasca , Hadar Shavit , Antoine Siraudin

show 11 more authors

Arman Mielke Marie Anastacio Erik M\"uller Maya Bechler-Speicher Michael Bronstein Mikhail Galkin Holger Hoos Mathias Niepert Bryan Perozzi Jan T\"onshoff Christopher Morris

Authors on Pith no claims yet

classification 💻 cs.LG cs.AIcs.NEstat.ML

keywords graphbenchevaluationgraphacrossbenchmarkingdomainsfurtherincluding

0 comments

read the original abstract

Machine learning on graphs has made substantial progress across domains such as molecular property prediction and chip design. Yet benchmarking practices remain fragmented, often relying on narrow, task-specific datasets and inconsistent evaluation protocols, hindering reproducibility and broader progress. With the recent popularity of graph foundation models, these weaknesses have become apparent, as existing benchmarks are insufficient for thorough evaluation. To address these challenges, we introduce GraphBench, a comprehensive benchmark suite spanning diverse real-world domains and task settings, including node-level, edge-level, graph-level, and generative tasks. GraphBench provides standardized evaluation protocols, including consistent dataset splits and metrics for assessing out-of-distribution generalization across selected tasks, as well as a unified hyperparameter-tuning framework. We further evaluate GraphBench with recent message-passing neural networks and graph transformer models, establishing principled baselines for future research. See www.graphbench.io for further details.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Have Graph -- Will Lift? The Case for Higher-Order Benchmarks
cs.LG 2026-05 unverdicted novelty 3.0

The paper argues that the topological deep learning community should develop new benchmark datasets with native higher-order structure rather than continuing to lift graph datasets.