Characterizes a reproducible post-GPU cache-displacement window on M4 Pro and quantifies a one-pass CPU recovery mechanism via Metal experiments and PMU data.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
OpenMP port of gPLUTO achieves comparable performance to OpenACC on NVIDIA but is 3x slower at application level and up to 10x at kernel level on AMD MI250X, driven by strided memory accesses, latency bounds, and C++ abstraction overheads.
citing papers explorer
-
Residual GPU Cache State on Apple M4 Pro
Characterizes a reproducible post-GPU cache-displacement window on M4 Pro and quantifies a one-pass CPU recovery mechanism via Metal experiments and PMU data.
-
On the Limits of Performance Portability in Directive-Based GPU Programming
OpenMP port of gPLUTO achieves comparable performance to OpenACC on NVIDIA but is 3x slower at application level and up to 10x at kernel level on AMD MI250X, driven by strided memory accesses, latency bounds, and C++ abstraction overheads.