Error analysis and cost estimator for recasting floating-point matrix multiplication as accumulated integer products on mixed-precision hardware.
The linpack benchmark: past, present and future
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Aurora reached 1.01 EF/s FP64 HPL and 11.64 EF/s HPL-MxP through locality-aware mapping, CPU-GPU pipelining, mixed-precision orchestration, and hybrid resilience on a large Intel GPU-based system.
BBS is a broadcast algorithm that maximizes node utilization through balanced saturation cycles, outperforming standard methods in simulations across multiple network topologies.
citing papers explorer
-
Analysis of Floating-Point Matrix Multiplication Computed via Integer Arithmetic
Error analysis and cost estimator for recasting floating-point matrix multiplication as accumulated integer products on mixed-precision hardware.
-
Sustaining Exascale Performance: Lessons from HPL and HPL-MxP on Aurora
Aurora reached 1.01 EF/s FP64 HPL and 11.64 EF/s HPL-MxP through locality-aware mapping, CPU-GPU pipelining, mixed-precision orchestration, and hybrid resilience on a large Intel GPU-based system.
-
A New Broadcast Model for Several Network Topologies
BBS is a broadcast algorithm that maximizes node utilization through balanced saturation cycles, outperforming standard methods in simulations across multiple network topologies.