NCCLZ decouples quantization and entropy coding across NCCL stack layers to enable overlapped compression, delivering up to 9.65x speedup over plain NCCL on scientific and training workloads.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.DC 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
GTaP delivers a GPU-resident fork-join task-parallel runtime with pragma support and EPAQ that outperforms CPU OpenMP on several irregular applications.
citing papers explorer
-
NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding
NCCLZ decouples quantization and entropy coding across NCCL stack layers to enable overlapped compression, delivering up to 9.65x speedup over plain NCCL on scientific and training workloads.
-
GTaP: A GPU-Resident Fork-Join Task-Parallel Runtime with a Pragma-Based Interface
GTaP delivers a GPU-resident fork-join task-parallel runtime with pragma support and EPAQ that outperforms CPU OpenMP on several irregular applications.