A method using shared-memory occupancy shaping and elevated communication priority achieves up to 25.5% faster multi-GPU ML execution on NVIDIA and AMD GPUs.
In: 2025 IEEE Inter- national Symposium on Performance Analysis of Systems and Software (ISPASS), pp
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Resource-aware Computation-Communication Overlap for multi-GPU ML Workloads
A method using shared-memory occupancy shaping and elevated communication priority achieves up to 25.5% faster multi-GPU ML execution on NVIDIA and AMD GPUs.