Recognition: unknown
Provable Quantization with Randomized Hadamard Transform
Pith reviewed 2026-05-14 19:23 UTC · model grok-4.3
The pith
Dithered quantization after randomized Hadamard transform matches the error of dense random rotations at linearithmic cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Dithered quantization with a single randomized Hadamard transform is unbiased and provides mean squared error bounds that asymptotically match those achievable with truly random rotation matrices. In particular, a dithered version of TurboQuant achieves mean squared error (π√3/2 + o(1)) · 4^{-b} at b bits per coordinate, where the o(1) term vanishes uniformly over all unit vectors and all dimensions as the number of quantization levels grows.
What carries the argument
Dithered scalar quantization after one randomized Hadamard transform, where a random scalar offset is subtracted before quantization to add extra randomness at negligible cost.
Load-bearing premise
The analysis requires that input vectors have unit length and that the number of quantization levels grows large enough for the o(1) term to vanish, with uniformity over dimensions depending on properties of the dither distribution.
What would settle it
Run the dithered TurboQuant procedure on random unit vectors in dimensions 100 to 10000, increase the number of quantization levels, and check whether the measured mean squared error converges to within a small additive term of (π√3/2) · 4^{-b}.
Figures
read the original abstract
Vector quantization via random projection followed by scalar quantization is a fundamental primitive in machine learning, with applications ranging from similarity search to federated learning and KV cache compression. While dense random rotations yield clean theoretical guarantees, they require $\Theta(d^2)$ time. The randomized Hadamard transform $HD$ reduces this cost to $O(d \log d)$, but its discrete structure complicates analysis and leads to weaker or purely empirical compression guarantees. In this work, we study a variant of this approach: dithered quantization with a single randomized Hadamard transform. Specifically, the quantizer applies $HD$ to the input vector and subtracts a random scalar offset before quantizing, injecting additional randomness at negligible cost. We prove that this approach is unbiased and provides mean squared error bounds that asymptotically match those achievable with truly random rotation matrices. In particular, we prove that a dithered version of TurboQuant achieves mean squared error $\bigl(\pi\sqrt{3}/2 + o(1)\bigr) \cdot 4^{-b}$ at $b$ bits per coordinate, where the $o(1)$ term vanishes uniformly over all unit vectors and all dimensions as the number of quantization levels grows.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies dithered scalar quantization after a single randomized Hadamard transform HD. It proves that the resulting estimator is unbiased and that a dithered variant of TurboQuant attains mean-squared error (π√3/2 + o(1)) · 4^{-b} per coordinate, where the o(1) term vanishes uniformly over all unit vectors and all dimensions d as the number of quantization levels 2^b tends to infinity.
Significance. If the uniformity claim holds, the result supplies the first rigorous guarantee that an O(d log d) Hadamard-based quantizer matches the asymptotic MSE of dense random rotations. This directly benefits high-dimensional compression primitives used in federated learning, similarity search, and KV-cache quantization. The first-principles derivation from the dithered transform, rather than data-fitting or self-citation, is a methodological strength.
major comments (1)
- [§4.2, Theorem 4.3] §4.2, Theorem 4.3 and the surrounding uniformity argument: the o(1) term is asserted to vanish uniformly in d and over all unit vectors, yet the proof relies on the dither distribution erasing residual coordinate-wise dependence induced by the Walsh structure of HD. The manuscript must state the precise support, variance, and independence conditions on the scalar dither that guarantee this cancellation rate is independent of d; without them the uniformity claim is not fully substantiated.
minor comments (2)
- [§2.1] §2.1: the definition of the randomized Hadamard matrix HD should explicitly note how non-dyadic dimensions are handled (padding or embedding) to avoid ambiguity for general d.
- [Figure 2] Figure 2: axis labels and legend should indicate the exact bit-width b and the range of dimensions plotted so that the empirical convergence can be directly compared to the stated asymptotic rate.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the need to make the dither conditions explicit. We address the major comment below.
read point-by-point responses
-
Referee: [§4.2, Theorem 4.3] §4.2, Theorem 4.3 and the surrounding uniformity argument: the o(1) term is asserted to vanish uniformly in d and over all unit vectors, yet the proof relies on the dither distribution erasing residual coordinate-wise dependence induced by the Walsh structure of HD. The manuscript must state the precise support, variance, and independence conditions on the scalar dither that guarantee this cancellation rate is independent of d; without them the uniformity claim is not fully substantiated.
Authors: We agree that the uniformity argument benefits from an explicit statement of the dither assumptions. In the revised manuscript we will insert, immediately before Theorem 4.3, a short paragraph and a supporting lemma that specify: the scalar dither is drawn i.i.d. uniformly from [-Δ/2, Δ/2] (where Δ denotes the quantization bin width), is independent of the input vector and of the random diagonal sign matrix in HD, and has zero mean and variance Δ²/12. Under these conditions the dither renders the per-coordinate quantization errors uncorrelated after the Hadamard transform, which is the key step that makes the o(1) term vanish uniformly in d and over the unit sphere. The existing proof already relies on these properties; we will only make them visible and add the short lemma that quantifies the rate of dependence erasure. revision: yes
Circularity Check
No circularity: first-principles asymptotic analysis of dithered Hadamard quantizer
full rationale
The paper states a direct proof that dithered TurboQuant with one randomized Hadamard transform yields MSE (π√3/2 + o(1))·4^{-b} with the o(1) term vanishing uniformly over unit vectors and dimensions. No equation or claim in the abstract reduces the bound to a fitted parameter, self-citation chain, or definitional equivalence; the result is presented as derived from the dither injection and Hadamard properties rather than imported from prior author work or renamed empirical patterns. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Input vectors are unit vectors
- domain assumption Dither offset is drawn from a continuous distribution independent of the transform
Reference graph
Works this paper leans on
-
[1]
Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing , pages=
Similarity estimation techniques from rounding algorithms , author=. Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing , pages=
-
[2]
IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=
Product quantization for nearest neighbor search , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2010 , publisher=
work page 2010
-
[3]
Proceedings of the Twentieth Annual Symposium on Computational Geometry , pages=
Locality-sensitive hashing scheme based on p -stable distributions , author=. Proceedings of the Twentieth Annual Symposium on Computational Geometry , pages=
-
[4]
Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces , author=. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval , pages=
-
[5]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Sign-full random projections , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
- [6]
-
[7]
Proceedings of the ACM on Management of Data , volume=
Practical and asymptotically optimal quantization of high-dimensional vectors in euclidean space for approximate nearest neighbor search , author=. Proceedings of the ACM on Management of Data , volume=. 2025 , publisher=
work page 2025
-
[8]
IEEE Transactions on Information Theory , year=
Optimal quantization for matrix multiplication , author=. IEEE Transactions on Information Theory , year=
-
[9]
Albert Tseng and Jerry Chee and Qingyao Sun and Volodymyr Kuleshov and Christopher De Sa , booktitle=. Qu. 2024 , url=
work page 2024
-
[10]
Advances in Neural Information Processing Systems , volume=
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs , author=. Advances in Neural Information Processing Systems , volume=
-
[11]
Vargaftik, Shay and Basat, Ran Ben and Portnoy, Amit and Mendelson, Gal and Itzhak, Yaniv Ben and Mitzenmacher, Michael , booktitle=. 2022 , organization=
work page 2022
-
[12]
Vargaftik, Shay and Ben-Basat, Ran and Portnoy, Amit and Mendelson, Gal and Ben-Itzhak, Yaniv and Mitzenmacher, Michael , journal=
-
[13]
Forty-First International Conference on Machine Learning , year=
Accelerating federated learning with quick distributed mean estimation , author=. Forty-First International Conference on Machine Learning , year=
-
[14]
Zandieh, Amir and Daliri, Majid and Hadian, Majid and Mirrokni, Vahab , journal=
-
[15]
ACM Transactions on Algorithms (TALG) , volume=
An almost optimal unrestricted fast Johnson-Lindenstrauss transform , author=. ACM Transactions on Algorithms (TALG) , volume=. 2013 , publisher=
work page 2013
- [16]
-
[17]
Applied and Computational Harmonic Analysis , volume=
A fast randomized algorithm for the approximation of matrices , author=. Applied and Computational Harmonic Analysis , volume=. 2008 , publisher=
work page 2008
-
[18]
2006 47th annual IEEE symposium on foundations of computer science (FOCS'06) , pages=
Improved approximation algorithms for large matrices via random projections , author=. 2006 47th annual IEEE symposium on foundations of computer science (FOCS'06) , pages=. 2006 , organization=
work page 2006
-
[19]
Krahmer, Felix and Ward, Rachel , journal=. New and improved. 2011 , publisher=
work page 2011
-
[20]
Andoni, Alexandr and Indyk, Piotr and Laarhoven, Thijs and Razenshteyn, Ilya and Schmidt, Ludwig , journal=. Practical and optimal
-
[21]
International Conference on Machine Learning , pages=
Distributed mean estimation with limited communication , author=. International Conference on Machine Learning , pages=. 2017 , organization=
work page 2017
-
[22]
Fast locality-sensitive hashing , author=. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=
-
[23]
Fastfood: Approximate Kernel Expansions in Loglinear Time
Fastfood: Approximate kernel expansions in loglinear time , author=. arXiv preprint arXiv:1408.3060 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[24]
Faster binary embeddings for preserving
Zhang, Jinjie and Saab, Rayan , booktitle=. Faster binary embeddings for preserving
-
[25]
Fast Cross-Polytope Locality-Sensitive Hashing
Fast cross-polytope locality-sensitive hashing , author=. arXiv preprint arXiv:1602.06922 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[26]
Approximating uniform random rotations by two-block structured
Zilca, Tomer and Mendelson, Gal , journal=. Approximating uniform random rotations by two-block structured
-
[27]
8-bit Rotational Quantization: How to Compress Vectors by 4x and Improve the Speed-Quality Tradeoff of Vector Search , author=. 2025 , howpublished=
work page 2025
-
[28]
Krish Agarwal and Rishi Astra and Adnan Hoque and Mudhakar Srivatsa and Raghu Ganti and Less Wright and Sijia Chen , year=
-
[29]
IEEE Transactions on Information Theory , volume=
On universal quantization by randomized uniform/lattice quantizers , author=. IEEE Transactions on Information Theory , volume=. 1992 , publisher=
work page 1992
-
[30]
IEEE Transactions on Communication Technology , volume=
Dither Signals and Their Effect on Quantization Noise , author=. IEEE Transactions on Communication Technology , volume=
-
[31]
Contemporary mathematics , volume=
Extensions of Lipschitz mappings into a Hilbert space , author=. Contemporary mathematics , volume=
-
[32]
Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions , author=. SIAM review , volume=. 2011 , publisher=
work page 2011
-
[33]
2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=
Optimal compression of approximate inner products and dimension reduction , author=. 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2017 , organization=
work page 2017
-
[34]
Low-power computer vision , pages=
A survey of quantization methods for efficient neural network inference , author=. Low-power computer vision , pages=. 2022 , publisher=
work page 2022
-
[35]
International Conference on Machine Learning , pages=
Binary embedding: Fundamental limits and fast algorithm , author=. International Conference on Machine Learning , pages=. 2015 , organization=
work page 2015
-
[36]
Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing , pages=
Uniform approximations for randomized hadamard transforms with applications , author=. Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing , pages=
-
[37]
Improved analysis of the subsampled randomized
Tropp, Joel A , journal=. Improved analysis of the subsampled randomized. 2011 , publisher=
work page 2011
-
[38]
A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work
A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work , author=. arXiv preprint arXiv:2604.18555 , year=
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.