ButterflyQuant: Ultra-low- bit LLM quantization through learnable orthogonal butterfly transforms

Bingxin Xu, Zhen Dong, Oussama Elachqar, Yuzhang Shang · 2025 · arXiv 2509.09679

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

When Quantization Is Free: An int4 KV Cache That Outruns fp16 on Apple Silicon

cs.PF · 2026-05-07 · unverdicted · novelty 7.0

A single fused int4 KV cache kernel on Apple Silicon outperforms fp16 in latency with 3x memory compression and near-zero quality loss on tested models.

ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation

cs.CV · 2026-04-13 · unverdicted · novelty 5.0

ReSpinQuant achieves state-of-the-art accuracy in W4A4 and W3A3 LLM quantization by using efficient residual subspace rotation approximations that match layer-wise performance while retaining the inference speed of global rotation methods.

citing papers explorer

Showing 2 of 2 citing papers.

When Quantization Is Free: An int4 KV Cache That Outruns fp16 on Apple Silicon cs.PF · 2026-05-07 · unverdicted · none · ref 29
A single fused int4 KV cache kernel on Apple Silicon outperforms fp16 in latency with 3x memory compression and near-zero quality loss on tested models.
ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation cs.CV · 2026-04-13 · unverdicted · none · ref 11
ReSpinQuant achieves state-of-the-art accuracy in W4A4 and W3A3 LLM quantization by using efficient residual subspace rotation approximations that match layer-wise performance while retaining the inference speed of global rotation methods.

ButterflyQuant: Ultra-low- bit LLM quantization through learnable orthogonal butterfly transforms

fields

years

verdicts

representative citing papers

citing papers explorer