The GAP Benchmark Suite

Scott Beamer , Krste Asanovi\'c , David Patterson

Authors on Pith no claims yet

classification 💻 cs.DC cs.DS

keywords benchmarkgraphgraphsprocessingsuitebaselinedemonstrateimplementations

read the original abstract

We present a graph processing benchmark suite with the goal of helping to standardize graph processing evaluations. Fewer differences between graph processing evaluations will make it easier to compare different research efforts and quantify improvements. The benchmark not only specifies graph kernels, input graphs, and evaluation methodologies, but it also provides optimized baseline implementations. These baseline implementations are representative of state-of-the-art performance, and thus new contributions should outperform them to demonstrate an improvement. The input graphs are sized appropriately for shared memory platforms, but any implementation on any platform that conforms to the benchmark's specifications could be compared. This benchmark suite can be used in a variety of settings. Graph framework developers can demonstrate the generality of their programming model by implementing all of the benchmark's kernels and delivering competitive performance on all of the benchmark's graphs. Algorithm designers can use the input graphs and the baseline implementations to demonstrate their contribution. Platform designers and performance analysts can use the suite as a workload representative of graph processing.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States
cs.DC 2026-04 unverdicted novelty 7.0

QiankunNet-cuSCI achieves up to 2.32x end-to-end speedup on 64 A100 GPUs for NNQS-SCI while preserving chemical accuracy by fully accelerating global de-duplication and coupled-configuration generation on the device.
Understanding Simulated Architecture via gem5 Call-Stack Profiling
cs.AR 2026-05 unverdicted novelty 6.0

A specialized profiling tool using Linux perf_event samples gem5 call-stacks to expose simulated architecture behaviors such as TimingSimpleCPU inefficiencies and cache coherence deadlocks not visible in conventional stats.
Proxics: an efficient programming model for far memory accelerators
cs.OS 2026-04 conditional novelty 6.0

Proxics introduces lightweight virtual processors and low-latency communication channels as portable OS abstractions for programming near-data processing accelerators, demonstrated on real hardware for memory-intensiv...
TierBPF: Page Migration Admission Control for Tiered Memory via eBPF
cs.OS 2026-04 unverdicted novelty 6.0

TierBPF uses lightweight eBPF hooks for custom page admission control in tiered memory, delivering up to 17.7% geomean and 75% peak throughput gains across 17 workloads on three systems.
Efficient Page Migration in Hybrid Memory Systems
cs.AR 2026-04 unverdicted novelty 5.0

Duon eliminates TLB shootdown and cache invalidation costs during page migration in flat-address hybrid memory systems by updating mappings in-place, delivering 3.87% IPC gains over prior methods.