Title resolution pending

· 2021 · arXiv 9936.2021

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding

cs.DC · 2026-05-12 · unverdicted · novelty 7.0

NCCLZ decouples quantization and entropy coding across NCCL stack layers to enable overlapped compression, delivering up to 9.65x speedup over plain NCCL on scientific and training workloads.

PackSELL: A Sparse Matrix Format for Precision-Agnostic High-Performance SpMV

cs.DC · 2026-04-15 · unverdicted · novelty 7.0

PackSELL packs delta-encoded indices and values into single words with tunable bit allocation, delivering up to 1.63x faster FP16 SpMV and FP32-accurate performance exceeding FP16 cuSPARSE while reducing memory traffic.

Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores

cs.DC · 2026-03-10 · unverdicted · novelty 7.0

FP64 tensor cores accelerate high-order finite-element kernels in MFEM by up to 2x with 83% energy gains and near-perfect weak scaling on exascale hardware.

Accelerating Locality-Driven Integration in Quantum Chemistry with Block-Structured Matrix Multiplication

physics.comp-ph · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

KerneLDI accelerates exchange-correlation integration in Kohn-Sham DFT by up to 10x through block-structured matrix multiplication that exploits spatial locality on GPUs while preserving accuracy.

UCCL-Zip: Lossless Compression Supercharged GPU Communication

cs.DC · 2026-04-19 · unverdicted · novelty 6.0

UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.

Analysis of Floating-Point Matrix Multiplication Computed via Integer Arithmetic

math.NA · 2025-06-12 · unverdicted · novelty 5.0

Error analysis and cost estimator for recasting floating-point matrix multiplication as accumulated integer products on mixed-precision hardware.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Accelerating Locality-Driven Integration in Quantum Chemistry with Block-Structured Matrix Multiplication physics.comp-ph · 2026-05-11 · unverdicted · none · ref 31 · 2 links
KerneLDI accelerates exchange-correlation integration in Kohn-Sham DFT by up to 10x through block-structured matrix multiplication that exploits spatial locality on GPUs while preserving accuracy.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer