A single fused int4 KV cache kernel on Apple Silicon outperforms fp16 in latency with 3x memory compression and near-zero quality loss on tested models.
FPTQuant: Function-preserving transforms for LLM quantization
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.PF 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
When Quantization Is Free: An int4 KV Cache That Outruns fp16 on Apple Silicon
A single fused int4 KV cache kernel on Apple Silicon outperforms fp16 in latency with 3x memory compression and near-zero quality loss on tested models.