Block-Sphere Vector Quantization

Heesang Ann; Joongkyu Lee; Min-hwan Oh

arxiv: 2605.19972 · v1 · pith:E43ZAXHTnew · submitted 2026-05-19 · 💻 cs.LG · cs.AI· cs.DB· cs.DS

Block-Sphere Vector Quantization

Heesang Ann , Joongkyu Lee , Min-hwan Oh This is my paper

Pith reviewed 2026-05-20 07:24 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DBcs.DS

keywords vector quantizationrotation-based quantizersblock quantizationspherical geometryMSE distortioninner-product distortionembedding compressionLLM inference

0 comments

The pith

Block-Sphere Quantization improves reconstruction MSE and expected inner-product distortion by quantizing blocks on the sphere after random rotation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper first unifies the theoretical comparison of recent rotation-based quantizers EDEN, RabitQ, and TurboQuant, showing that their relative strengths depend on the chosen distortion criterion rather than being absolute. It then presents Block-Sphere Quantization, or BlockQuant, which applies quantization to blocks of randomly rotated vectors treated as points on the sphere instead of coordinate by coordinate. The authors prove this block-spherical approach yields lower reconstruction mean squared error and better expected inner-product preservation than the baselines. Experiments on embedding datasets and long-context LLM inference tasks demonstrate matching practical improvements.

Core claim

Block-Sphere Quantization (BlockQuant) is a rotation-based vector quantizer that quantizes blocks of randomly rotated vectors directly on the sphere. Unlike coordinate-wise methods, this design preserves the geometry of the rotated embeddings more faithfully. The paper proves that BlockQuant improves over EDEN, RabitQ, and TurboQuant on both reconstruction MSE and expected inner-product distortion, with experiments confirming consistent gains in storage and inference settings.

What carries the argument

The block-spherical quantization step, which maps blocks of randomly rotated vectors onto the sphere before applying quantization.

If this is right

BlockQuant supplies stronger guarantees than the baselines for MSE distortion.
It also delivers lower expected inner-product distortion, aiding similarity search.
The unified comparison shows prior methods trade off strengths by distortion type.
Practical gains appear in real embedding datasets and long-context LLM inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The spherical block approach might extend to product quantization or other structured compressors.
Performance differences could be largest in very high-dimensional or anisotropic data.
Integration into vector databases could trade modest overhead for reduced retrieval distortion.

Load-bearing premise

That quantizing blocks on the sphere after random rotation preserves vector geometry more faithfully than coordinate-wise quantization.

What would settle it

A head-to-head evaluation on a standard embedding dataset in which BlockQuant produces higher average MSE or expected inner-product error than EDEN or TurboQuant.

Figures

Figures reproduced from arXiv: 2605.19972 by Heesang Ann, Joongkyu Lee, Min-hwan Oh.

**Figure 1.** Figure 1: Conceptual comparison of codebooks used by rotation-based quantizers for b = 2 in a two-coordinate projection when d > 2. The shaded disk indicates the feasible region of the two displayed coordinates of a rotated unit vector. Left: EDEN and TurboQuant use a Cartesian-product codebook formed by coordinate-wise MSE-optimized scalar centroids. Middle: RabitQ uses spherical codewords obtained by projecting a … view at source ↗

**Figure 2.** Figure 2: Distribution of MSE. Inner product error. We next evaluate inner-product estimation accuracy on DBpedia Entities (Thakur et al., 2021) using 1, 536-dimensional embeddings, with 100, 000 database vectors and 1, 000 query vectors. All vectors are normalized, and only the database vectors are quantized. For each pair (xi , yj ), we measure the inner-product estimation error eij = ⟨xbi , yj ⟩ − ⟨xi , yj ⟩, whe… view at source ↗

**Figure 3.** Figure 3: Distribution of inner product error 6.2 Nearest-Neighbor Search. We evaluate retrieval quality using Recall@1@k. For each query q, let g(q) denote the exact top-1 neighbor computed using full-precision inner products, and let Ak(q) denote the set of top-k candidates returned by a method using quantized inner-product estimates. We define Recall@1@k = 1 |Q| X q∈Q 1{g(q) ∈ Ak(q)} , where Q is the query set. T… view at source ↗

**Figure 4.** Figure 4: Recall comparison at 4 bits across different datasets. 1 2 4 8 16 32 64 Top-k 0.5 0.6 0.7 0.8 0.9 1.0 Recall@1@k GloVe, d=200, 2bits 1 2 4 8 16 32 64 Top-k 0.85 0.90 0.95 1.00 OpenAI3, d=1536, 2bits 1 2 4 8 16 32 64 Top-k 0.875 0.900 0.925 0.950 0.975 1.000 OpenAI3, d=3072, 2bits TurboQuantPROD RabitQUB EDENUB BlockQuantUB, approx [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Recall comparison at 2 bits across different datasets. 6.3 KV Cache Quantization. We further evaluate whether the improved distortion of BlockQuant translates into end-toend LLM performance under KV-cache quantization. We apply each quantizer to the KV cache of Llama-3.1-8B-Instruct while keeping the model weights unchanged. In the attention computation, the query states are kept in full precision and are… view at source ↗

**Figure 6.** Figure 6: Evaluation of Llama-3.1-8B-Instruct on the “Needle-In-A-Haystack” benchmark over five random seeds. Results are reported as mean, with standard deviations shown in parentheses [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

read the original abstract

Vector quantization is a fundamental primitive for scalable machine learning systems, enabling memory-efficient storage, fast retrieval, and compressed inference. Recent rotation-based quantizers such as EDEN, RabitQ, and TurboQuant have introduced strong guarantees and empirical performance, but the surrounding comparisons have been difficult to interpret because they rely on different distortion criteria, probability regimes, and implementation assumptions. As our first contribution, we provide a unified theoretical comparison of these methods and show that their relative advantages are criterion-dependent rather than absolute: EDEN and TurboQuant are favorable for MSE distortion, EDEN is also effective for expected inner-product distortion, and RabitQ provides strong high-probability control. This comparison further clarifies that EDEN provides particularly strong guarantees for expected distortion measures. As our second contribution, we introduce Block-Sphere Quantization (BlockQuant), a new rotation-based block quantization algorithm designed around the spherical geometry of randomly rotated vectors. Unlike coordinate-wise quantizers, BlockQuant quantizes blocks on the sphere, preserving the geometry of rotated embeddings more faithfully. We prove that this block-spherical design theoretically improves over the baselines considered in this paper for both reconstruction MSE and expected inner-product distortion. Our experiments on real embedding datasets and long-context LLM inference tasks show practical gains that are consistent with our theoretical improvements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BlockQuant adds a block-spherical quantizer after rotation and unifies comparisons across recent methods, but the claimed strict gains rest on an isotropy assumption that needs checking for real embeddings.

read the letter

The paper's main contribution is BlockQuant, which rotates the vectors and then quantizes them in blocks while respecting the spherical geometry. It also sorts out the comparisons between EDEN, RabitQ, and TurboQuant by looking at them under the same distortion criteria instead of letting each paper pick its own. The unified comparison is the part that lands well. It shows that no single method dominates across the board: some do better on mean squared error, others on inner-product preservation or high-probability bounds. That clears up why the literature has been hard to read. The new algorithm is positioned as improving on both MSE and expected inner-product distortion, with experiments on real embeddings and LLM tasks that line up with the theory. One place that needs a closer look is the assumption behind the theoretical gains. The argument treats the rotated blocks as independent spherical vectors whose quantization errors combine without leftover terms from the shared rotation or the partitioning. For embeddings that keep some low-rank structure or when the dimension isn't a clean multiple of the block size, that independence might not be exact. The abstract claims strict improvement, so the proofs will need to show how they handle any residual correlations. Readers who care about practical vector quantization for storage and fast lookup in large models will find this relevant. The experiments on long-context inference give it a concrete angle. It is worth sending to peer review because the ideas build directly on recent work and the claims are testable with the right derivations and data checks. I'd recommend putting it through review. The potential payoff in scalable systems makes the verification effort worthwhile.

Referee Report

2 major / 2 minor

Summary. The manuscript provides a unified theoretical comparison of rotation-based vector quantizers (EDEN, RabitQ, TurboQuant), showing that their relative advantages are criterion-dependent (e.g., EDEN and TurboQuant favorable for MSE, EDEN for expected inner-product distortion, RabitQ for high-probability control). It introduces Block-Sphere Quantization (BlockQuant), which performs quantization on blocks lying on the sphere after a random rotation, and proves that this design yields strict improvements over the considered baselines in both reconstruction MSE and expected inner-product distortion. Experiments on embedding datasets and long-context LLM inference tasks report consistent practical gains aligned with the theory.

Significance. If the proofs hold under the stated assumptions, the work would be significant for scalable ML systems by offering a quantization primitive with simultaneous theoretical guarantees on multiple distortion measures, which is valuable for compressed storage and inference. The unified comparison clarifies trade-offs among recent methods and is a useful service to the community. The manuscript ships explicit theoretical proofs of improvement and reproducible experimental validation on real tasks, which are strengths.

major comments (2)

[§4] §4 (BlockQuant Theoretical Analysis): The proof that the block-spherical design after random rotation strictly dominates coordinate-wise baselines in both MSE and expected inner-product distortion invokes per-block independence and isotropy on the sphere. The manuscript does not bound or analyze residual cross-block correlations induced by the shared rotation matrix or cases where block size does not evenly divide the embedding dimension; without this, the claimed strict improvement does not necessarily follow for structured or low-rank embeddings.
[Theorem 1] Theorem 1 and surrounding derivation: The unified comparison correctly notes criterion dependence, but the dominance claims for BlockQuant reduce to per-block spherical quantization error bounds only under the additional premise that overall inner-product distortion is exactly the sum of block distortions; an explicit expansion showing the cross terms vanish would be required to make the argument load-bearing.

minor comments (2)

[Algorithm 1] The definition of the random rotation matrix and its interaction with block partitioning could be stated more explicitly in the algorithm description to aid reproducibility.
[Figure 3] Figure 3 (distortion vs. bit-rate curves) would benefit from error bars or multiple random seeds to visually support the consistency of gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable suggestions. We address the major comments below and plan to incorporate revisions to strengthen the theoretical analysis.

read point-by-point responses

Referee: [§4] §4 (BlockQuant Theoretical Analysis): The proof that the block-spherical design after random rotation strictly dominates coordinate-wise baselines in both MSE and expected inner-product distortion invokes per-block independence and isotropy on the sphere. The manuscript does not bound or analyze residual cross-block correlations induced by the shared rotation matrix or cases where block size does not evenly divide the embedding dimension; without this, the claimed strict improvement does not necessarily follow for structured or low-rank embeddings.

Authors: We appreciate this observation. While the random orthogonal rotation ensures that each block is isotropically distributed on the sphere, we acknowledge that residual correlations between blocks due to the shared rotation are not explicitly bounded in the current manuscript. For the expected distortion measures, these correlations average to zero over the randomness of the rotation. To address concerns for structured embeddings, we will add an analysis bounding the cross-block terms and discuss handling of dimensions that are not multiples of the block size (e.g., via padding). This will be included in a revised §4. revision: yes
Referee: [Theorem 1] Theorem 1 and surrounding derivation: The unified comparison correctly notes criterion dependence, but the dominance claims for BlockQuant reduce to per-block spherical quantization error bounds only under the additional premise that overall inner-product distortion is exactly the sum of block distortions; an explicit expansion showing the cross terms vanish would be required to make the argument load-bearing.

Authors: We agree that making the cross terms explicit would improve clarity. The squared inner-product distortion for the full vector expands to the sum of per-block distortions plus twice the sum of cross-block inner products. Under the random rotation, the expectation of these cross terms is zero due to the isotropy and orthogonality. We will provide this explicit expansion in the text surrounding Theorem 1 to confirm that the overall expected distortion is the sum of block distortions. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained with no circular reductions identified.

full rationale

The abstract describes a unified theoretical comparison of existing methods (EDEN, RabitQ, TurboQuant) against external baselines and introduces BlockQuant with a claimed proof of improvement for MSE and expected inner-product distortion. The new design is motivated by spherical geometry after random rotation, with comparisons framed as criterion-dependent rather than absolute. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes are visible that would reduce the claimed strict improvements to inputs by construction. The derivation appears independent and self-contained against the stated external benchmarks and probability regimes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that spherical geometry after random rotation is the right structure to preserve and that block quantization respects it better than per-coordinate methods; no free parameters or invented entities are identifiable from the abstract.

axioms (1)

domain assumption Randomly rotated vectors exhibit spherical geometry that block quantization can preserve more faithfully than coordinate-wise approaches.
Invoked as the design principle for BlockQuant and the source of its theoretical advantage.

pith-pipeline@v0.9.0 · 5764 in / 1165 out tokens · 28371 ms · 2026-05-20T07:24:42.294368+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Lemma 1 (Block marginal distribution... fp,d(zj)=Γ(d/2)/[π^{p/2}Γ((d-p)/2)](1-∥zj∥₂²)^{(d-p-2)/2}
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 3 ... DMSE(Q(p=2)) ≤ 2.015·4^{-b}(1+o(1))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 5 internal anchors

[1]

Advances in Neural Information Processing Systems , volume=

Combinatorial multi-armed bandit with general reward functions , author=. Advances in Neural Information Processing Systems , volume=

work page
[2]

International conference on machine learning , pages=

Contextual combinatorial cascading bandits , author=. International conference on machine learning , pages=. 2016 , organization=

work page 2016
[3]

2025 , eprint=

Combinatorial Logistic Bandits , author=. 2025 , eprint=

work page 2025
[4]

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Turboquant: Online vector quantization with near-optimal distortion rate , author=. arXiv preprint arXiv:2504.19874 , year=

work page internal anchor Pith review arXiv
[5]

Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments

Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments , author=. arXiv preprint arXiv:2604.19528 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[6]

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=

Optimal compression of approximate inner products and dimension reduction , author=. 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2017 , organization=

work page 2017
[7]

Proceedings of the ACM on Management of Data , volume=

Practical and asymptotically optimal quantization of high-dimensional vectors in euclidean space for approximate nearest neighbor search , author=. Proceedings of the ACM on Management of Data , volume=. 2025 , publisher=

work page 2025
[8]

Proceedings of the ACM on Management of Data , volume=

Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search , author=. Proceedings of the ACM on Management of Data , volume=. 2024 , publisher=

work page 2024
[9]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Qjl: 1-bit quantized jl transform for kv cache quantization with zero overhead , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[11]

Advances in Neural Information Processing Systems , volume=

Drive: One-bit distributed mean estimation , author=. Advances in Neural Information Processing Systems , volume=

work page
[12]

International Conference on Machine Learning , pages=

Eden: Communication-efficient and robust distributed mean estimation for federated learning , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[13]

IRE Transactions on Information Theory , volume =

Max, Joel , title =. IRE Transactions on Information Theory , volume =. 1960 , doi =

work page 1960
[14]

, title =

Lloyd, Stuart P. , title =. IEEE Transactions on Information Theory , volume =. 1982 , doi =

work page 1982
[15]

, title =

Linde, Yoseph and Buzo, Andres and Gray, Robert M. , title =. IEEE Transactions on Communications , volume =. 1980 , doi =

work page 1980
[16]

IEEE Transactions on Information Theory , volume =

Gersho, Allen , title =. IEEE Transactions on Information Theory , volume =. 1979 , doi =

work page 1979
[17]

, title =

Zador, Paul L. , title =. IEEE Transactions on Information Theory , volume =. 1982 , doi =

work page 1982
[18]

, title =

Gersho, Allen and Gray, Robert M. , title =

work page
[19]

Product Quantization for Nearest Neighbor Search , journal =

J. Product Quantization for Nearest Neighbor Search , journal =. 2011 , doi =

work page 2011
[20]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Ge, Tiezheng and He, Kaiming and Ke, Qifa and Sun, Jian , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =. 2013 , doi =

work page 2013
[21]

, title =

Norouzi, Mohammad and Fleet, David J. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =. 2013 , doi =

work page 2013
[22]

IEEE Transactions on Knowledge and Data Engineering , volume =

Wang, Jianfeng and Wang, Jingdong and Song, Jingkuan and Xu, Xin-Shun and Shen, Heng Tao and Li, Shipeng , title =. IEEE Transactions on Knowledge and Data Engineering , volume =. 2015 , doi =

work page 2015
[23]

, title =

Fischer, Thomas R. , title =. IEEE Transactions on Information Theory , volume =

work page
[24]

and Sayood, Khalid , title =

Gibson, Jerry D. and Sayood, Khalid , title =. Advances in Electronics and Electron Physics , volume =

work page
[25]

and Sloane, Neil J

Conway, John H. and Sloane, Neil J. A. , title =. 1999 , doi =

work page 1999
[26]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Eghbali, Sepehr and Tahvildari, Ladan , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

work page
[27]

van der Ouderaa, Tycho F. A. and Croci, Maximilian L. and Hilmkil, Agrin and Hensman, James , title =. arXiv preprint arXiv:2410.16926 , year =

work page arXiv
[28]

Image and Video Tokenization with Binary Spherical Quantization , booktitle =

Zhao, Yue and Xiong, Yuanjun and Kr. Image and Video Tokenization with Binary Spherical Quantization , booktitle =. 2025 , url =

work page 2025
[29]

van der Ouderaa, Tycho F. A. and van Baalen, Mart and Whatmough, Paul and Nagel, Markus , title =. arXiv preprint arXiv:2603.11021 , year =

work page arXiv
[30]

A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work

Ben-Basat, Ran and Ben-Itzhak, Yaniv and Mendelson, Gal and Mitzenmacher, Michael and Portnoy, Amit and Vargaftik, Shay , title =. arXiv preprint arXiv:2604.18555 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[31]

arXiv preprint arXiv:1909.10766 , year =

Pagh, Rasmus and Sivertsen, Johan , title =. arXiv preprint arXiv:1909.10766 , year =

work page arXiv 1909
[32]

IEEE transactions on big data , volume=

Billion-scale similarity search with GPUs , author=. IEEE transactions on big data , volume=. 2019 , publisher=

work page 2019
[33]

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Kivi: A tuning-free asymmetric 2bit quantization for kv cache , author=. arXiv preprint arXiv:2402.02750 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[34]

arXiv preprint arXiv:2504.03717 , year=

Raana: A fast, flexible, and data-efficient post-training quantization algorithm , author=. arXiv preprint arXiv:2504.03717 , year=

work page arXiv
[35]

Proceedings of the IRE , volume=

Quantization distortion in pulse-count modulation with nonuniform spacing of levels , author=. Proceedings of the IRE , volume=. 2006 , publisher=

work page 2006
[36]

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models , author=. arXiv preprint arXiv:2104.08663 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[37]

URL https://github

Needle in a haystack-pressure testing llms, 2023 , author=. URL https://github. com/gkamradt/LLMTest\_NeedleInAHaystack , year=

work page 2023
[38]

Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=

Longbench: A bilingual, multitask benchmark for long context understanding , author=. Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=

work page

[1] [1]

Advances in Neural Information Processing Systems , volume=

Combinatorial multi-armed bandit with general reward functions , author=. Advances in Neural Information Processing Systems , volume=

work page

[2] [2]

International conference on machine learning , pages=

Contextual combinatorial cascading bandits , author=. International conference on machine learning , pages=. 2016 , organization=

work page 2016

[3] [3]

2025 , eprint=

Combinatorial Logistic Bandits , author=. 2025 , eprint=

work page 2025

[4] [4]

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Turboquant: Online vector quantization with near-optimal distortion rate , author=. arXiv preprint arXiv:2504.19874 , year=

work page internal anchor Pith review arXiv

[5] [5]

Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments

Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments , author=. arXiv preprint arXiv:2604.19528 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=

Optimal compression of approximate inner products and dimension reduction , author=. 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2017 , organization=

work page 2017

[7] [7]

Proceedings of the ACM on Management of Data , volume=

Practical and asymptotically optimal quantization of high-dimensional vectors in euclidean space for approximate nearest neighbor search , author=. Proceedings of the ACM on Management of Data , volume=. 2025 , publisher=

work page 2025

[8] [8]

Proceedings of the ACM on Management of Data , volume=

Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search , author=. Proceedings of the ACM on Management of Data , volume=. 2024 , publisher=

work page 2024

[9] [9]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Qjl: 1-bit quantized jl transform for kv cache quantization with zero overhead , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[10] [11]

Advances in Neural Information Processing Systems , volume=

Drive: One-bit distributed mean estimation , author=. Advances in Neural Information Processing Systems , volume=

work page

[11] [12]

International Conference on Machine Learning , pages=

Eden: Communication-efficient and robust distributed mean estimation for federated learning , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022

[12] [13]

IRE Transactions on Information Theory , volume =

Max, Joel , title =. IRE Transactions on Information Theory , volume =. 1960 , doi =

work page 1960

[13] [14]

, title =

Lloyd, Stuart P. , title =. IEEE Transactions on Information Theory , volume =. 1982 , doi =

work page 1982

[14] [15]

, title =

Linde, Yoseph and Buzo, Andres and Gray, Robert M. , title =. IEEE Transactions on Communications , volume =. 1980 , doi =

work page 1980

[15] [16]

IEEE Transactions on Information Theory , volume =

Gersho, Allen , title =. IEEE Transactions on Information Theory , volume =. 1979 , doi =

work page 1979

[16] [17]

, title =

Zador, Paul L. , title =. IEEE Transactions on Information Theory , volume =. 1982 , doi =

work page 1982

[17] [18]

, title =

Gersho, Allen and Gray, Robert M. , title =

work page

[18] [19]

Product Quantization for Nearest Neighbor Search , journal =

J. Product Quantization for Nearest Neighbor Search , journal =. 2011 , doi =

work page 2011

[19] [20]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Ge, Tiezheng and He, Kaiming and Ke, Qifa and Sun, Jian , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =. 2013 , doi =

work page 2013

[20] [21]

, title =

Norouzi, Mohammad and Fleet, David J. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =. 2013 , doi =

work page 2013

[21] [22]

IEEE Transactions on Knowledge and Data Engineering , volume =

Wang, Jianfeng and Wang, Jingdong and Song, Jingkuan and Xu, Xin-Shun and Shen, Heng Tao and Li, Shipeng , title =. IEEE Transactions on Knowledge and Data Engineering , volume =. 2015 , doi =

work page 2015

[22] [23]

, title =

Fischer, Thomas R. , title =. IEEE Transactions on Information Theory , volume =

work page

[23] [24]

and Sayood, Khalid , title =

Gibson, Jerry D. and Sayood, Khalid , title =. Advances in Electronics and Electron Physics , volume =

work page

[24] [25]

and Sloane, Neil J

Conway, John H. and Sloane, Neil J. A. , title =. 1999 , doi =

work page 1999

[25] [26]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Eghbali, Sepehr and Tahvildari, Ladan , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

work page

[26] [27]

van der Ouderaa, Tycho F. A. and Croci, Maximilian L. and Hilmkil, Agrin and Hensman, James , title =. arXiv preprint arXiv:2410.16926 , year =

work page arXiv

[27] [28]

Image and Video Tokenization with Binary Spherical Quantization , booktitle =

Zhao, Yue and Xiong, Yuanjun and Kr. Image and Video Tokenization with Binary Spherical Quantization , booktitle =. 2025 , url =

work page 2025

[28] [29]

van der Ouderaa, Tycho F. A. and van Baalen, Mart and Whatmough, Paul and Nagel, Markus , title =. arXiv preprint arXiv:2603.11021 , year =

work page arXiv

[29] [30]

A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work

Ben-Basat, Ran and Ben-Itzhak, Yaniv and Mendelson, Gal and Mitzenmacher, Michael and Portnoy, Amit and Vargaftik, Shay , title =. arXiv preprint arXiv:2604.18555 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[30] [31]

arXiv preprint arXiv:1909.10766 , year =

Pagh, Rasmus and Sivertsen, Johan , title =. arXiv preprint arXiv:1909.10766 , year =

work page arXiv 1909

[31] [32]

IEEE transactions on big data , volume=

Billion-scale similarity search with GPUs , author=. IEEE transactions on big data , volume=. 2019 , publisher=

work page 2019

[32] [33]

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Kivi: A tuning-free asymmetric 2bit quantization for kv cache , author=. arXiv preprint arXiv:2402.02750 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[33] [34]

arXiv preprint arXiv:2504.03717 , year=

Raana: A fast, flexible, and data-efficient post-training quantization algorithm , author=. arXiv preprint arXiv:2504.03717 , year=

work page arXiv

[34] [35]

Proceedings of the IRE , volume=

Quantization distortion in pulse-count modulation with nonuniform spacing of levels , author=. Proceedings of the IRE , volume=. 2006 , publisher=

work page 2006

[35] [36]

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models , author=. arXiv preprint arXiv:2104.08663 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[36] [37]

URL https://github

Needle in a haystack-pressure testing llms, 2023 , author=. URL https://github. com/gkamradt/LLMTest\_NeedleInAHaystack , year=

work page 2023

[37] [38]

Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=

Longbench: A bilingual, multitask benchmark for long context understanding , author=. Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=

work page