SegFold achieves 1.95× geometric-mean speedup over prior SpGEMM accelerators via fine-grained dynamic scheduling and remapping in its Segment dataflow.
cusz: An efficient GPU-based error-bounded lossy compression framework for scientific data
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
NCCLZ decouples quantization and entropy coding across NCCL stack layers to enable overlapped compression, delivering up to 9.65x speedup over plain NCCL on scientific and training workloads.
citing papers explorer
-
SegFold: Accelerating Sparse GEMM with a Fine-Grained Dynamic Dataflow
SegFold achieves 1.95× geometric-mean speedup over prior SpGEMM accelerators via fine-grained dynamic scheduling and remapping in its Segment dataflow.
-
NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding
NCCLZ decouples quantization and entropy coding across NCCL stack layers to enable overlapped compression, delivering up to 9.65x speedup over plain NCCL on scientific and training workloads.