ButterflyQuant: Ultra-low- bit LLM quantization through learnable orthogonal butterfly transforms

· 2025 · arXiv 2509.09679

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

When Quantization Is Free: An int4 KV Cache That Outruns fp16 on Apple Silicon

cs.PF · 2026-05-07 · unverdicted · novelty 7.0

A single fused int4 KV cache kernel on Apple Silicon outperforms fp16 in latency with 3x memory compression and near-zero quality loss on tested models.

Influence-Inspired Spectral Rotations for Extreme Low-Bit LLM Quantization

cs.LG · 2026-05-24 · unverdicted · novelty 4.0

A WHT rotation plus per-coordinate activation-energy rescaling before auto-round quantization lowers WikiText-2 perplexity 15-58% versus vanilla auto-round at W2A16 on models from 135M to 1.5B parameters.

ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation

cs.CV · 2026-04-13

citing papers explorer

Showing 1 of 1 citing paper after filters.

ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation cs.CV · 2026-04-13 · unreviewed · ref 11

ButterflyQuant: Ultra-low- bit LLM quantization through learnable orthogonal butterfly transforms

fields

years

verdicts

representative citing papers

citing papers explorer