pith. machine review for the scientific record. sign in

arxiv: 2605.13810 · v1 · submitted 2026-05-13 · 💻 cs.LG · cs.DS

Recognition: unknown

Provable Quantization with Randomized Hadamard Transform

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:23 UTC · model grok-4.3

classification 💻 cs.LG cs.DS
keywords vector quantizationrandomized Hadamard transformdithered quantizationmean squared errormachine learning compressionsimilarity searchfederated learning
0
0 comments X

The pith

Dithered quantization after randomized Hadamard transform matches the error of dense random rotations at linearithmic cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that applying one randomized Hadamard transform to a unit vector, subtracting a random scalar dither offset, and then performing scalar quantization produces an unbiased result whose mean squared error per coordinate approaches (π√3/2) times 4 to the power of negative b. The extra error term shrinks uniformly over every unit vector and every dimension once the number of quantization levels grows. Readers care because this replaces the Θ(d²) cost of dense random rotations with an O(d log d) structured transform while keeping the same asymptotic compression guarantees for tasks such as similarity search and federated learning.

Core claim

Dithered quantization with a single randomized Hadamard transform is unbiased and provides mean squared error bounds that asymptotically match those achievable with truly random rotation matrices. In particular, a dithered version of TurboQuant achieves mean squared error (π√3/2 + o(1)) · 4^{-b} at b bits per coordinate, where the o(1) term vanishes uniformly over all unit vectors and all dimensions as the number of quantization levels grows.

What carries the argument

Dithered scalar quantization after one randomized Hadamard transform, where a random scalar offset is subtracted before quantization to add extra randomness at negligible cost.

Load-bearing premise

The analysis requires that input vectors have unit length and that the number of quantization levels grows large enough for the o(1) term to vanish, with uniformity over dimensions depending on properties of the dither distribution.

What would settle it

Run the dithered TurboQuant procedure on random unit vectors in dimensions 100 to 10000, increase the number of quantization levels, and check whether the measured mean squared error converges to within a small additive term of (π√3/2) · 4^{-b}.

Figures

Figures reproduced from arXiv: 2605.13810 by Boris Prokhorov, Dmitry Krachun, Michael Kapralov, Piotr Indyk, Ying Feng.

Figure 1
Figure 1. Figure 1: Illustration of the scalar quantizer mapping with [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
read the original abstract

Vector quantization via random projection followed by scalar quantization is a fundamental primitive in machine learning, with applications ranging from similarity search to federated learning and KV cache compression. While dense random rotations yield clean theoretical guarantees, they require $\Theta(d^2)$ time. The randomized Hadamard transform $HD$ reduces this cost to $O(d \log d)$, but its discrete structure complicates analysis and leads to weaker or purely empirical compression guarantees. In this work, we study a variant of this approach: dithered quantization with a single randomized Hadamard transform. Specifically, the quantizer applies $HD$ to the input vector and subtracts a random scalar offset before quantizing, injecting additional randomness at negligible cost. We prove that this approach is unbiased and provides mean squared error bounds that asymptotically match those achievable with truly random rotation matrices. In particular, we prove that a dithered version of TurboQuant achieves mean squared error $\bigl(\pi\sqrt{3}/2 + o(1)\bigr) \cdot 4^{-b}$ at $b$ bits per coordinate, where the $o(1)$ term vanishes uniformly over all unit vectors and all dimensions as the number of quantization levels grows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript studies dithered scalar quantization after a single randomized Hadamard transform HD. It proves that the resulting estimator is unbiased and that a dithered variant of TurboQuant attains mean-squared error (π√3/2 + o(1)) · 4^{-b} per coordinate, where the o(1) term vanishes uniformly over all unit vectors and all dimensions d as the number of quantization levels 2^b tends to infinity.

Significance. If the uniformity claim holds, the result supplies the first rigorous guarantee that an O(d log d) Hadamard-based quantizer matches the asymptotic MSE of dense random rotations. This directly benefits high-dimensional compression primitives used in federated learning, similarity search, and KV-cache quantization. The first-principles derivation from the dithered transform, rather than data-fitting or self-citation, is a methodological strength.

major comments (1)
  1. [§4.2, Theorem 4.3] §4.2, Theorem 4.3 and the surrounding uniformity argument: the o(1) term is asserted to vanish uniformly in d and over all unit vectors, yet the proof relies on the dither distribution erasing residual coordinate-wise dependence induced by the Walsh structure of HD. The manuscript must state the precise support, variance, and independence conditions on the scalar dither that guarantee this cancellation rate is independent of d; without them the uniformity claim is not fully substantiated.
minor comments (2)
  1. [§2.1] §2.1: the definition of the randomized Hadamard matrix HD should explicitly note how non-dyadic dimensions are handled (padding or embedding) to avoid ambiguity for general d.
  2. [Figure 2] Figure 2: axis labels and legend should indicate the exact bit-width b and the range of dimensions plotted so that the empirical convergence can be directly compared to the stated asymptotic rate.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the need to make the dither conditions explicit. We address the major comment below.

read point-by-point responses
  1. Referee: [§4.2, Theorem 4.3] §4.2, Theorem 4.3 and the surrounding uniformity argument: the o(1) term is asserted to vanish uniformly in d and over all unit vectors, yet the proof relies on the dither distribution erasing residual coordinate-wise dependence induced by the Walsh structure of HD. The manuscript must state the precise support, variance, and independence conditions on the scalar dither that guarantee this cancellation rate is independent of d; without them the uniformity claim is not fully substantiated.

    Authors: We agree that the uniformity argument benefits from an explicit statement of the dither assumptions. In the revised manuscript we will insert, immediately before Theorem 4.3, a short paragraph and a supporting lemma that specify: the scalar dither is drawn i.i.d. uniformly from [-Δ/2, Δ/2] (where Δ denotes the quantization bin width), is independent of the input vector and of the random diagonal sign matrix in HD, and has zero mean and variance Δ²/12. Under these conditions the dither renders the per-coordinate quantization errors uncorrelated after the Hadamard transform, which is the key step that makes the o(1) term vanish uniformly in d and over the unit sphere. The existing proof already relies on these properties; we will only make them visible and add the short lemma that quantifies the rate of dependence erasure. revision: yes

Circularity Check

0 steps flagged

No circularity: first-principles asymptotic analysis of dithered Hadamard quantizer

full rationale

The paper states a direct proof that dithered TurboQuant with one randomized Hadamard transform yields MSE (π√3/2 + o(1))·4^{-b} with the o(1) term vanishing uniformly over unit vectors and dimensions. No equation or claim in the abstract reduces the bound to a fitted parameter, self-citation chain, or definitional equivalence; the result is presented as derived from the dither injection and Hadamard properties rather than imported from prior author work or renamed empirical patterns. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard properties of the Hadamard transform, uniform randomness of the dither offset, and the assumption that inputs are unit vectors. No new entities are postulated and no parameters are fitted to data.

axioms (2)
  • domain assumption Input vectors are unit vectors
    The MSE bound is stated uniformly over all unit vectors; this is invoked in the uniformity claim.
  • domain assumption Dither offset is drawn from a continuous distribution independent of the transform
    Dithering is the source of additional randomness that restores unbiasedness and concentration.

pith-pipeline@v0.9.0 · 5525 in / 1396 out tokens · 62497 ms · 2026-05-14T19:23:05.736386+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 3 internal anchors

  1. [1]

    Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing , pages=

    Similarity estimation techniques from rounding algorithms , author=. Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing , pages=

  2. [2]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

    Product quantization for nearest neighbor search , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2010 , publisher=

  3. [3]

    Proceedings of the Twentieth Annual Symposium on Computational Geometry , pages=

    Locality-sensitive hashing scheme based on p -stable distributions , author=. Proceedings of the Twentieth Annual Symposium on Computational Geometry , pages=

  4. [4]

    Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval , pages=

    Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces , author=. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval , pages=

  5. [5]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Sign-full random projections , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  6. [6]

    2024 , publisher=

    Gao, Jianyang and Long, Cheng , journal=. 2024 , publisher=

  7. [7]

    Proceedings of the ACM on Management of Data , volume=

    Practical and asymptotically optimal quantization of high-dimensional vectors in euclidean space for approximate nearest neighbor search , author=. Proceedings of the ACM on Management of Data , volume=. 2025 , publisher=

  8. [8]

    IEEE Transactions on Information Theory , year=

    Optimal quantization for matrix multiplication , author=. IEEE Transactions on Information Theory , year=

  9. [9]

    Albert Tseng and Jerry Chee and Qingyao Sun and Volodymyr Kuleshov and Christopher De Sa , booktitle=. Qu. 2024 , url=

  10. [10]

    Advances in Neural Information Processing Systems , volume=

    QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs , author=. Advances in Neural Information Processing Systems , volume=

  11. [11]

    2022 , organization=

    Vargaftik, Shay and Basat, Ran Ben and Portnoy, Amit and Mendelson, Gal and Itzhak, Yaniv Ben and Mitzenmacher, Michael , booktitle=. 2022 , organization=

  12. [12]

    Vargaftik, Shay and Ben-Basat, Ran and Portnoy, Amit and Mendelson, Gal and Ben-Itzhak, Yaniv and Mitzenmacher, Michael , journal=

  13. [13]

    Forty-First International Conference on Machine Learning , year=

    Accelerating federated learning with quick distributed mean estimation , author=. Forty-First International Conference on Machine Learning , year=

  14. [14]

    Zandieh, Amir and Daliri, Majid and Hadian, Majid and Mirrokni, Vahab , journal=

  15. [15]

    ACM Transactions on Algorithms (TALG) , volume=

    An almost optimal unrestricted fast Johnson-Lindenstrauss transform , author=. ACM Transactions on Algorithms (TALG) , volume=. 2013 , publisher=

  16. [16]

    The fast

    Ailon, Nir and Chazelle, Bernard , journal=. The fast. 2009 , publisher=

  17. [17]

    Applied and Computational Harmonic Analysis , volume=

    A fast randomized algorithm for the approximation of matrices , author=. Applied and Computational Harmonic Analysis , volume=. 2008 , publisher=

  18. [18]

    2006 47th annual IEEE symposium on foundations of computer science (FOCS'06) , pages=

    Improved approximation algorithms for large matrices via random projections , author=. 2006 47th annual IEEE symposium on foundations of computer science (FOCS'06) , pages=. 2006 , organization=

  19. [19]

    New and improved

    Krahmer, Felix and Ward, Rachel , journal=. New and improved. 2011 , publisher=

  20. [20]

    Practical and optimal

    Andoni, Alexandr and Indyk, Piotr and Laarhoven, Thijs and Razenshteyn, Ilya and Schmidt, Ludwig , journal=. Practical and optimal

  21. [21]

    International Conference on Machine Learning , pages=

    Distributed mean estimation with limited communication , author=. International Conference on Machine Learning , pages=. 2017 , organization=

  22. [22]

    Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

    Fast locality-sensitive hashing , author=. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

  23. [23]

    Fastfood: Approximate Kernel Expansions in Loglinear Time

    Fastfood: Approximate kernel expansions in loglinear time , author=. arXiv preprint arXiv:1408.3060 , year=

  24. [24]

    Faster binary embeddings for preserving

    Zhang, Jinjie and Saab, Rayan , booktitle=. Faster binary embeddings for preserving

  25. [25]

    Fast Cross-Polytope Locality-Sensitive Hashing

    Fast cross-polytope locality-sensitive hashing , author=. arXiv preprint arXiv:1602.06922 , year=

  26. [26]

    Approximating uniform random rotations by two-block structured

    Zilca, Tomer and Mendelson, Gal , journal=. Approximating uniform random rotations by two-block structured

  27. [27]

    2025 , howpublished=

    8-bit Rotational Quantization: How to Compress Vectors by 4x and Improve the Speed-Quality Tradeoff of Vector Search , author=. 2025 , howpublished=

  28. [28]

    Krish Agarwal and Rishi Astra and Adnan Hoque and Mudhakar Srivatsa and Raghu Ganti and Less Wright and Sijia Chen , year=

  29. [29]

    IEEE Transactions on Information Theory , volume=

    On universal quantization by randomized uniform/lattice quantizers , author=. IEEE Transactions on Information Theory , volume=. 1992 , publisher=

  30. [30]

    IEEE Transactions on Communication Technology , volume=

    Dither Signals and Their Effect on Quantization Noise , author=. IEEE Transactions on Communication Technology , volume=

  31. [31]

    Contemporary mathematics , volume=

    Extensions of Lipschitz mappings into a Hilbert space , author=. Contemporary mathematics , volume=

  32. [32]

    SIAM review , volume=

    Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions , author=. SIAM review , volume=. 2011 , publisher=

  33. [33]

    2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=

    Optimal compression of approximate inner products and dimension reduction , author=. 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2017 , organization=

  34. [34]

    Low-power computer vision , pages=

    A survey of quantization methods for efficient neural network inference , author=. Low-power computer vision , pages=. 2022 , publisher=

  35. [35]

    International Conference on Machine Learning , pages=

    Binary embedding: Fundamental limits and fast algorithm , author=. International Conference on Machine Learning , pages=. 2015 , organization=

  36. [36]

    Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing , pages=

    Uniform approximations for randomized hadamard transforms with applications , author=. Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing , pages=

  37. [37]

    Improved analysis of the subsampled randomized

    Tropp, Joel A , journal=. Improved analysis of the subsampled randomized. 2011 , publisher=

  38. [38]

    A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work

    A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work , author=. arXiv preprint arXiv:2604.18555 , year=