pith. machine review for the scientific record. sign in

arxiv: 2512.04475 · v5 · submitted 2025-12-04 · 💻 cs.LG · cs.AI· cs.NE· stat.ML

Recognition: unknown

GraphBench: Next-generation graph learning benchmarking

Authors on Pith no claims yet
classification 💻 cs.LG cs.AIcs.NEstat.ML
keywords graphbenchevaluationgraphacrossbenchmarkingdomainsfurtherincluding
0
0 comments X
read the original abstract

Machine learning on graphs has made substantial progress across domains such as molecular property prediction and chip design. Yet benchmarking practices remain fragmented, often relying on narrow, task-specific datasets and inconsistent evaluation protocols, hindering reproducibility and broader progress. With the recent popularity of graph foundation models, these weaknesses have become apparent, as existing benchmarks are insufficient for thorough evaluation. To address these challenges, we introduce GraphBench, a comprehensive benchmark suite spanning diverse real-world domains and task settings, including node-level, edge-level, graph-level, and generative tasks. GraphBench provides standardized evaluation protocols, including consistent dataset splits and metrics for assessing out-of-distribution generalization across selected tasks, as well as a unified hyperparameter-tuning framework. We further evaluate GraphBench with recent message-passing neural networks and graph transformer models, establishing principled baselines for future research. See www.graphbench.io for further details.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Have Graph -- Will Lift? The Case for Higher-Order Benchmarks

    cs.LG 2026-05 unverdicted novelty 3.0

    The paper argues that the topological deep learning community should develop new benchmark datasets with native higher-order structure rather than continuing to lift graph datasets.