In: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Akram, A · 2021 · arXiv 9936.2021

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding

cs.DC · 2026-05-12 · unverdicted · novelty 7.0

NCCLZ decouples quantization and entropy coding across NCCL stack layers to enable overlapped compression, delivering up to 9.65x speedup over plain NCCL on scientific and training workloads.

PackSELL: A Sparse Matrix Format for Precision-Agnostic High-Performance SpMV

cs.DC · 2026-04-15 · unverdicted · novelty 7.0

PackSELL packs delta-encoded indices and values into single words with tunable bit allocation, delivering up to 1.63x faster FP16 SpMV and FP32-accurate performance exceeding FP16 cuSPARSE while reducing memory traffic.

Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores

cs.DC · 2026-03-10 · unverdicted · novelty 7.0

FP64 tensor cores accelerate high-order finite-element kernels in MFEM by up to 2x with 83% energy gains and near-perfect weak scaling on exascale hardware.

Accelerating Locality-Driven Integration in Quantum Chemistry with Block-Structured Matrix Multiplication

physics.comp-ph · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

KerneLDI accelerates exchange-correlation integration in Kohn-Sham DFT by up to 10x through block-structured matrix multiplication that exploits spatial locality on GPUs while preserving accuracy.

UCCL-Zip: Lossless Compression Supercharged GPU Communication

cs.DC · 2026-04-19 · unverdicted · novelty 6.0

UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.

Analysis of Floating-Point Matrix Multiplication Computed via Integer Arithmetic

math.NA · 2025-06-12 · unverdicted · novelty 5.0

Error analysis and cost estimator for recasting floating-point matrix multiplication as accumulated integer products on mixed-precision hardware.

RAMSES: Secure high-performance computing for sensitive data

cs.DC · 2026-06-26 · unverdicted · novelty 3.0

RAMSES integrates commercial encryption technologies and OS hardening into an HPC platform to enable secure sensitive-data processing with limited performance overhead, as shown by biomedical-sector benchmarks.

citing papers explorer

Showing 7 of 7 citing papers.

NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding cs.DC · 2026-05-12 · unverdicted · none · ref 28
NCCLZ decouples quantization and entropy coding across NCCL stack layers to enable overlapped compression, delivering up to 9.65x speedup over plain NCCL on scientific and training workloads.
PackSELL: A Sparse Matrix Format for Precision-Agnostic High-Performance SpMV cs.DC · 2026-04-15 · unverdicted · none · ref 33
PackSELL packs delta-encoded indices and values into single words with tunable bit allocation, delivering up to 1.63x faster FP16 SpMV and FP32-accurate performance exceeding FP16 cuSPARSE while reducing memory traffic.
Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores cs.DC · 2026-03-10 · unverdicted · none · ref 6
FP64 tensor cores accelerate high-order finite-element kernels in MFEM by up to 2x with 83% energy gains and near-perfect weak scaling on exascale hardware.
Accelerating Locality-Driven Integration in Quantum Chemistry with Block-Structured Matrix Multiplication physics.comp-ph · 2026-05-11 · unverdicted · none · ref 31 · 2 links
KerneLDI accelerates exchange-correlation integration in Kohn-Sham DFT by up to 10x through block-structured matrix multiplication that exploits spatial locality on GPUs while preserving accuracy.
UCCL-Zip: Lossless Compression Supercharged GPU Communication cs.DC · 2026-04-19 · unverdicted · none · ref 60
UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.
Analysis of Floating-Point Matrix Multiplication Computed via Integer Arithmetic math.NA · 2025-06-12 · unverdicted · none · ref 44
Error analysis and cost estimator for recasting floating-point matrix multiplication as accumulated integer products on mixed-precision hardware.
RAMSES: Secure high-performance computing for sensitive data cs.DC · 2026-06-26 · unverdicted · none · ref 2
RAMSES integrates commercial encryption technologies and OS hardening into an HPC platform to enable secure sensitive-data processing with limited performance overhead, as shown by biomedical-sector benchmarks.

In: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer