A single fused int4 KV cache kernel on Apple Silicon outperforms fp16 in latency with 3x memory compression and near-zero quality loss on tested models.
ButterflyQuant: Ultra-low- bit LLM quantization through learnable orthogonal butterfly transforms
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
A WHT rotation plus per-coordinate activation-energy rescaling before auto-round quantization lowers WikiText-2 perplexity 15-58% versus vanilla auto-round at W2A16 on models from 135M to 1.5B parameters.
citing papers explorer
-
When Quantization Is Free: An int4 KV Cache That Outruns fp16 on Apple Silicon
A single fused int4 KV cache kernel on Apple Silicon outperforms fp16 in latency with 3x memory compression and near-zero quality loss on tested models.
-
Influence-Inspired Spectral Rotations for Extreme Low-Bit LLM Quantization
A WHT rotation plus per-coordinate activation-energy rescaling before auto-round quantization lowers WikiText-2 perplexity 15-58% versus vanilla auto-round at W2A16 on models from 135M to 1.5B parameters.
- ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation