Accelerating MPI allreduce communication with efficient gpu-based, compression schemes on modern GPU clusters

· 2024 · arXiv 2024.105289

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding

cs.DC · 2026-05-12 · unverdicted · novelty 7.0

NCCLZ decouples quantization and entropy coding across NCCL stack layers to enable overlapped compression, delivering up to 9.65x speedup over plain NCCL on scientific and training workloads.

Quantum Data Loading for Carleman Linearized Systems: Application to the Lattice-Boltzmann Equation

quant-ph · 2026-05-01 · unverdicted · novelty 7.0 · 2 refs

A new LCNU-to-LCU decomposition enables a generalized quantum framework for Carleman-linearized polynomial systems like the lattice Boltzmann equation, with Ns scaling as O(α² Q²) independent of spatial and temporal discretization points.

Qurator: Scheduling Hybrid Quantum-Classical Workflows Across Heterogeneous Cloud Providers

quant-ph · 2026-04-07 · unverdicted · novelty 7.0

Qurator jointly optimizes queue time and fidelity for hybrid quantum-classical workflows across providers using quantum-aware DAG scheduling and a unified logarithmic fidelity score, achieving 30-75% wait reduction at high load with bounded accuracy cost.

Per-Shot Evaluation of QAOA on Max-Cut: A Black-Box Implementation Comparison with Goemans-Williamson

quant-ph · 2026-04-09 · unverdicted · novelty 5.0

QAOA with default parameters is compared per-shot to Goemans-Williamson on realistic Max-Cut instances, highlighting practical limitations under black-box use.

citing papers explorer

Showing 4 of 4 citing papers.

NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding cs.DC · 2026-05-12 · unverdicted · none · ref 39
NCCLZ decouples quantization and entropy coding across NCCL stack layers to enable overlapped compression, delivering up to 9.65x speedup over plain NCCL on scientific and training workloads.
Quantum Data Loading for Carleman Linearized Systems: Application to the Lattice-Boltzmann Equation quant-ph · 2026-05-01 · unverdicted · none · ref 65 · 2 links
A new LCNU-to-LCU decomposition enables a generalized quantum framework for Carleman-linearized polynomial systems like the lattice Boltzmann equation, with Ns scaling as O(α² Q²) independent of spatial and temporal discretization points.
Qurator: Scheduling Hybrid Quantum-Classical Workflows Across Heterogeneous Cloud Providers quant-ph · 2026-04-07 · unverdicted · none · ref 69
Qurator jointly optimizes queue time and fidelity for hybrid quantum-classical workflows across providers using quantum-aware DAG scheduling and a unified logarithmic fidelity score, achieving 30-75% wait reduction at high load with bounded accuracy cost.
Per-Shot Evaluation of QAOA on Max-Cut: A Black-Box Implementation Comparison with Goemans-Williamson quant-ph · 2026-04-09 · unverdicted · none · ref 19
QAOA with default parameters is compared per-shot to Goemans-Williamson on realistic Max-Cut instances, highlighting practical limitations under black-box use.

Accelerating MPI allreduce communication with efficient gpu-based, compression schemes on modern GPU clusters

fields

years

verdicts

representative citing papers

citing papers explorer