Mixed-precision CA-SGD for GLMs on A100 GPUs matches FP32 loss within 0.5% while delivering 5.1-6.8x speedup via a nine-choice finite-precision error recipe.
Mixed precision s-step lanczos and conjugate gradient algorithms
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
GPU implementation of preconditioned s-step CG from 1989 that aggregates operations and overlaps communication with computation for better scalability on Poisson benchmarks.
citing papers explorer
-
Mixed-Precision Communication-Avoiding SGD for Generalized Linear Models on GPUs
Mixed-precision CA-SGD for GLMs on A100 GPUs matches FP32 loss within 0.5% while delivering 5.1-6.8x speedup via a nine-choice finite-precision error recipe.
-
Communication-reduced Conjugate Gradient Variants for GPU-accelerated Clusters
GPU implementation of preconditioned s-step CG from 1989 that aggregates operations and overlaps communication with computation for better scalability on Poisson benchmarks.