NCCLZ decouples quantization and entropy coding across NCCL stack layers to enable overlapped compression, delivering up to 9.65x speedup over plain NCCL on scientific and training workloads.
Accelerating MPI allreduce communication with efficient gpu-based, compression schemes on modern GPU clusters
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
A new LCNU-to-LCU decomposition enables a generalized quantum framework for Carleman-linearized polynomial systems like the lattice Boltzmann equation, with Ns scaling as O(α² Q²) independent of spatial and temporal discretization points.
Qurator jointly optimizes queue time and fidelity for hybrid quantum-classical workflows across providers using quantum-aware DAG scheduling and a unified logarithmic fidelity score, achieving 30-75% wait reduction at high load with bounded accuracy cost.
QAOA with default parameters is compared per-shot to Goemans-Williamson on realistic Max-Cut instances, highlighting practical limitations under black-box use.
citing papers explorer
-
NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding
NCCLZ decouples quantization and entropy coding across NCCL stack layers to enable overlapped compression, delivering up to 9.65x speedup over plain NCCL on scientific and training workloads.
-
Quantum Data Loading for Carleman Linearized Systems: Application to the Lattice-Boltzmann Equation
A new LCNU-to-LCU decomposition enables a generalized quantum framework for Carleman-linearized polynomial systems like the lattice Boltzmann equation, with Ns scaling as O(α² Q²) independent of spatial and temporal discretization points.
-
Qurator: Scheduling Hybrid Quantum-Classical Workflows Across Heterogeneous Cloud Providers
Qurator jointly optimizes queue time and fidelity for hybrid quantum-classical workflows across providers using quantum-aware DAG scheduling and a unified logarithmic fidelity score, achieving 30-75% wait reduction at high load with bounded accuracy cost.
-
Per-Shot Evaluation of QAOA on Max-Cut: A Black-Box Implementation Comparison with Goemans-Williamson
QAOA with default parameters is compared per-shot to Goemans-Williamson on realistic Max-Cut instances, highlighting practical limitations under black-box use.