The authors derive the first bit-accurate arithmetic models for matrix multiply-accumulate operations on ten GPU architectures spanning NVIDIA Volta to Blackwell and AMD CDNA1 to CDNA3.
FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
other 1
citation-polarity summary
roles
other 1polarities
unclear 1representative citing papers
Eidola is a gem5 extension that emulates cycle-level peer-to-peer GPU writes via real-application timing profiles to simulate traffic and synchronization in multi-GPU AI systems.
citing papers explorer
-
Bit-Accurate Modeling of GPU Matrix Multiply-Accumulate Units: Demystifying Numerical Discrepancy and Accuracy
The authors derive the first bit-accurate arithmetic models for matrix multiply-accumulate operations on ten GPU architectures spanning NVIDIA Volta to Blackwell and AMD CDNA1 to CDNA3.
-
Eidola: Modeling Multi-GPU Network Communication Traffic in Distributed AI Workloads
Eidola is a gem5 extension that emulates cycle-level peer-to-peer GPU writes via real-application timing profiles to simulate traffic and synchronization in multi-GPU AI systems.
- KubePACS: Kubernetes Cluster Using Performant, Highly Available, and Cost Efficient Spot Instances