Persistent RMA Alltoallv reduces runtime by up to 44% for large messages by amortizing metadata costs, with fence-based designs showing the best gains on a real supercomputer.
A Distributed Dynamic Load Balancer for Iterative Applications , booktitle =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.DC 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Charm++ techniques enable efficient overdecomposition on multi-vendor GPGPU distributed systems.
citing papers explorer
-
Analyzing Persistent Alltoallv RMA Implementations for High-Performance MPI Communication
Persistent RMA Alltoallv reduces runtime by up to 44% for large messages by amortizing metadata costs, with fence-based designs showing the best gains on a real supercomputer.
-
Efficient and Portable Support for Overdecomposition on Distributed Memory GPGPU Platforms
Charm++ techniques enable efficient overdecomposition on multi-vendor GPGPU distributed systems.