Introduces a 2D cyclic decomposition triangle counting algorithm for distributed-memory systems achieving 3.24-7.22x relative speedup on 169 MPI ranks and 10.2x over prior distributed algorithms.
Highly Parallel Sparse Matrix-Matrix Multiplication
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Generalized sparse matrix-matrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based on two-dimensional block distribution of sparse matrices where serial sections use a novel hypersparse kernel for scalability. We give a state-of-the-art MPI implementation of one of our algorithms. Our experiments show scaling up to thousands of processors on a variety of test scenarios.
fields
cs.DC 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
A 2D Parallel Triangle Counting Algorithm for Distributed-Memory Architectures
Introduces a 2D cyclic decomposition triangle counting algorithm for distributed-memory systems achieving 3.24-7.22x relative speedup on 169 MPI ranks and 10.2x over prior distributed algorithms.