OpenMP port of gPLUTO achieves comparable performance to OpenACC on NVIDIA but is 3x slower at application level and up to 10x at kernel level on AMD MI250X, driven by strided memory accesses, latency bounds, and C++ abstraction overheads.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
On the Limits of Performance Portability in Directive-Based GPU Programming
OpenMP port of gPLUTO achieves comparable performance to OpenACC on NVIDIA but is 3x slower at application level and up to 10x at kernel level on AMD MI250X, driven by strided memory accesses, latency bounds, and C++ abstraction overheads.