arXiv preprint arXiv:2402.00025 , year=

Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decomposition , author= · arXiv 2402.00025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

HASTE: Hardware-Aware Dynamic Sparse Training for Large Output Spaces

cs.LG · 2026-05-31 · unverdicted · novelty 7.0

HASTE proposes group-shared fixed fan-in sparsity and dense-sparse output decomposition to deliver up to 25x backward speedup and near-dense precision in large-scale XMC.

citing papers explorer

Showing 1 of 1 citing paper.

HASTE: Hardware-Aware Dynamic Sparse Training for Large Output Spaces cs.LG · 2026-05-31 · unverdicted · none · ref 38
HASTE proposes group-shared fixed fan-in sparsity and dense-sparse output decomposition to deliver up to 25x backward speedup and near-dense precision in large-scale XMC.

arXiv preprint arXiv:2402.00025 , year=

fields

years

verdicts

representative citing papers

citing papers explorer