arxiv: 2604.19528 · v2 · submitted 2026-04-21 · 💻 cs.LG · cs.AI· cs.DB

Recognition: unknown

Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments

Jianyang Gao , Yutong Gou , Yuexuan Xu , Jifan Shi , Yongyi Yang , Shuolin Li , Raymond Chi-Wing Wong , Cheng Long

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:39 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DB

keywords RaBitQTurboQuantquantizationinner-product estimationnearest neighbor searchKV cache quantizationreproducibility

0 comments

The pith

RaBitQ outperforms TurboQuant in most tested settings for inner-product estimation, nearest-neighbor search, and KV cache quantization, with several original TurboQuant results failing to reproduce.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper places RaBitQ and TurboQuant side by side in one experimental setup to compare their methods, theory, and measured performance. It reports that TurboQuant shows worse results than RaBitQ on the majority of inner-product, search, and cache-quantization tasks examined. The note also documents that some runtime and recall numbers published for TurboQuant cannot be obtained from the authors' released code under the stated settings. Readers care because both techniques aim to speed up large-scale vector operations in machine learning systems, and accurate head-to-head data affects which method gets adopted in practice.

Core claim

Under a single, symmetric evaluation framework the authors find that TurboQuant delivers lower accuracy or higher cost than RaBitQ across most configurations of inner-product estimation, nearest-neighbor search, and KV-cache quantization; in addition, several performance figures originally reported for TurboQuant cannot be reproduced from its publicly released implementation.

What carries the argument

A unified comparison framework that applies identical datasets, metrics, and configuration choices to both RaBitQ and TurboQuant so that differences in empirical behavior and reproducibility can be isolated.

If this is right

RaBitQ becomes the stronger default choice for practitioners who need reliable inner-product estimation or nearest-neighbor search under tight memory constraints.
KV-cache quantization pipelines that currently rely on TurboQuant should be re-tested against RaBitQ to measure any accuracy or latency gains.
Future papers that introduce new quantization schemes must include direct comparisons against both RaBitQ and TurboQuant using the same released code and settings.
Reproducibility checks should become standard when one quantization method claims superiority over another.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Quantization research would benefit from a shared public benchmark suite that locks both code versions and evaluation scripts so later claims can be verified without re-implementing setups.
The fact that two independently developed methods share substantial internal structure suggests that theoretical improvements discovered for one may transfer to the other with modest adaptation.
Practitioners deploying these techniques in production should run small-scale A/B tests on their own data rather than relying solely on published numbers.

Load-bearing premise

The chosen datasets, metrics, and implementation settings introduce no hidden bias that favors RaBitQ while the released TurboQuant code faithfully reflects the conditions described in its original paper.

What would settle it

Re-running the released TurboQuant implementation on the exact dataset sizes, bit widths, and hardware reported in the original TurboQuant paper and obtaining the same runtime and recall numbers claimed there would falsify the reproducibility finding.

Figures

Figures reproduced from arXiv: 2604.19528 by Cheng Long, Jianyang Gao, Jifan Shi, Raymond Chi-Wing Wong, Shuolin Li, Yongyi Yang, Yuexuan Xu, Yutong Gou.

**Figure 2.** Figure 2: Distribution of Inner Product error for TurboQuant. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Recall comparison on different datasets TurboQuantmse outperforms TurboQuantprod on recall. We observe that TurboQuantmse consistently achieves higher recall than TurboQuantprod across all settings. This is a notable finding because TurboQuantprod is the variant specifically designed for inner-product estimation, which is the objective directly relevant to nearest neighbor search. The fact that the reconst… view at source ↗

**Figure 4.** Figure 4: Evaluation of Llama-3.1-8B-Instruct on the “Needle-In-A-Haystack” test [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

read the original abstract

This technical note revisits the relationship between RaBitQ and TurboQuant under a unified comparison framework. We compare the two methods in terms of methodology, theoretical guarantees, and empirical performance, using a reproducible, transparent, and symmetric setup. Our results show that, despite the claimed advantage of TurboQuant, TurboQuant performs worse than RaBitQ in most tested settings of inner-product estimation, nearest-neighbor search and KV cache quantization. We further find that several reported runtime and recall results in the TurboQuant paper could not be reproduced from the released implementation under the stated configuration. Overall, this note clarifies the shared structure and genuine differences between the two lines of work, while documenting reproducibility issues in the experimental results reported by the TurboQuant paper.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RaBitQ beats TurboQuant in the symmetric tests here, and some TurboQuant numbers don't reproduce from the released code.

read the letter

The main thing to know is that this note runs both methods under one framework and finds RaBitQ ahead on inner-product estimation, nearest-neighbor search, and KV cache quantization, while also showing that several runtime and recall figures from the TurboQuant paper do not match what the released implementation produces under the stated settings. That direct comparison and the reproducibility check are the actual additions; the underlying techniques were already out there. The paper does a clean job laying out the shared structure and the real differences in guarantees and implementation, which makes the empirical gaps easier to interpret. It sticks to controlled experiments rather than new theory, and the abstract is upfront about the outcomes. The soft spot is the usual one for this kind of work: the performance differences rest on the claim that the unified setup really is symmetric at the level of code, memory layout, and runtime environment. If low-level details like vectorization or library versions differ between the two re-implementations, the gaps could shrink or shift. The paper says it uses a transparent and reproducible framework, so the next step is to look at the actual code and config files to see how tightly that was controlled. This is useful for anyone choosing a quantization method for vector search or model compression who wants to know which claims hold up. It is not opening new directions, but it cleans up the record on two existing ones. I would send it to peer review because the empirical claims are falsifiable and the topic matters for practitioners; a referee can check the implementation details and ask for any missing tables or scripts.

Referee Report

2 major / 2 minor

Summary. This technical note revisits RaBitQ and TurboQuant under a unified comparison framework. It compares the methods on methodology, theoretical guarantees, and empirical performance, concluding that TurboQuant performs worse than RaBitQ in most tested settings for inner-product estimation, nearest-neighbor search, and KV cache quantization. The note also reports that several runtime and recall results from the original TurboQuant paper could not be reproduced from the released implementation under the stated configurations.

Significance. If the unified setup is verifiably symmetric and free of implementation artifacts, the work would help clarify the practical differences between these quantization approaches for efficient vector operations and model serving. It also draws attention to reproducibility challenges in empirical evaluations of quantization methods, which is valuable for the broader literature on approximate nearest-neighbor search and KV-cache compression.

major comments (2)

[§4] §4 (Experimental Setup): The description of the 'symmetric' and 'reproducible' framework does not include explicit equivalence checks (e.g., identical vectorization, memory layouts, BLAS versions, or compiler flags) between the RaBitQ re-implementation and the released TurboQuant code. This detail is load-bearing for the central claim that observed performance gaps and non-reproducibility are intrinsic rather than environmental or optimization artifacts.
[§5.2] §5.2 (Reproducibility subsection): The statement that 'several reported runtime and recall results could not be reproduced' lacks the exact configuration files, seed values, or command-line invocations used in the reproduction attempt. Without these, readers cannot independently assess the discrepancy or rule out post-hoc configuration differences.

minor comments (2)

[Abstract] The abstract and introduction could more precisely quantify 'most tested settings' (e.g., number of datasets, bit-widths, and recall@K values) to allow quick assessment of the scope of the comparison.
[§2] Notation for the shared mathematical structure (e.g., the common quantization operator) should be introduced once with a clear table or equation reference rather than repeated inline descriptions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive comments on our technical note. We address each major comment below with point-by-point responses, indicating where revisions have been made to strengthen the manuscript's transparency and rigor.

read point-by-point responses

Referee: [§4] §4 (Experimental Setup): The description of the 'symmetric' and 'reproducible' framework does not include explicit equivalence checks (e.g., identical vectorization, memory layouts, BLAS versions, or compiler flags) between the RaBitQ re-implementation and the released TurboQuant code. This detail is load-bearing for the central claim that observed performance gaps and non-reproducibility are intrinsic rather than environmental or optimization artifacts.

Authors: We appreciate the referee's emphasis on this critical detail for validating the symmetry of the comparison. In the revised manuscript, Section 4 has been expanded with a new subsection on the experimental environment. It now explicitly documents the shared hardware (specific CPU/GPU models), OS, Python and library versions (including BLAS/MKL/OpenBLAS), compiler flags (-O3 -march=native), vectorization settings, and memory layouts. Both the RaBitQ re-implementation and the unmodified released TurboQuant code were executed under these identical conditions on the same machine. No custom optimizations were applied to either method beyond what is in the public releases. This addition directly supports that performance differences arise from methodological distinctions rather than setup artifacts. revision: yes
Referee: [§5.2] §5.2 (Reproducibility subsection): The statement that 'several reported runtime and recall results could not be reproduced' lacks the exact configuration files, seed values, or command-line invocations used in the reproduction attempt. Without these, readers cannot independently assess the discrepancy or rule out post-hoc configuration differences.

Authors: We agree that including these specifics is essential for full transparency and independent verification. The revised manuscript now includes a dedicated appendix (Appendix C) that provides the exact command-line invocations, configuration files (as YAML snippets), random seeds, and environment variables used in our reproduction attempts. These match the configurations described in the original TurboQuant paper. We also tabulate the reproduced versus originally reported runtime and recall values for the affected experiments. This allows readers to directly assess and replicate the noted discrepancies. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison without derivations or fitted predictions

full rationale

This technical note performs a symmetric empirical comparison of RaBitQ and TurboQuant using unified setups for methodology, theory, and experiments. It reports performance differences and reproducibility issues based on direct testing of released implementations. No mathematical derivations, first-principles results, predictions, or fitted models exist that could reduce to inputs by construction. All claims rest on external prior code and transparent experimental configurations rather than self-definitional steps, self-citation chains, or renamed known results. This is the expected non-finding for a pure comparison paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical comparison study with no new mathematical derivations, fitted parameters, axioms, or postulated entities. It relies on standard machine-learning experimental practices and prior published codebases.

pith-pipeline@v0.9.0 · 5453 in / 1172 out tokens · 37255 ms · 2026-05-10T02:39:48.050927+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression
cs.AI 2026-05 unverdicted novelty 7.0

FibQuant is a universal fixed-rate vector quantizer for KV-cache compression that uses a radial-angular codebook matched to the spherical-Beta source after Haar rotation and strictly outperforms scalar quantization at...

Reference graph

Works this paper leans on

22 extracted references · 5 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards
[2]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =
[3]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

2016
[4]

2010 , isbn =

Durrett, Rick , title =. 2010 , isbn =

2010
[5]

2025 , eprint=

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate , author=. 2025 , eprint=

2025
[6]

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=

Optimal compression of approximate inner products and dimension reduction , author=. 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2017 , organization=

2017
[7]

Zandieh, Amir and Daliri, Majid and Han, Insu , title =. Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence , articleno =. 2025 , isbn =. doi:10.1609/aaai.v39i24.34773 , abstract =

work page doi:10.1609/aaai.v39i24.34773 2025
[8]

arXiv preprint arXiv:2602.23999 , year=

GPU-Native Approximate Nearest Neighbor Search with IVF-RaBitQ: Fast Index Build and Search , author=. arXiv preprint arXiv:2602.23999 , year=

work page arXiv
[9]

Proceedings of the ACM on Management of Data , volume=

Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search , author=. Proceedings of the ACM on Management of Data , volume=. 2024 , publisher=

2024
[10]

Proceedings of the ACM on Management of Data , volume=

Practical and asymptotically optimal quantization of high-dimensional vectors in euclidean space for approximate nearest neighbor search , author=. Proceedings of the ACM on Management of Data , volume=. 2025 , publisher=

2025
[11]

Contemporary mathematics , volume=

Extensions of Lipschitz mappings into a Hilbert space 26 , author=. Contemporary mathematics , volume=
[12]

Khokhlov, V. I. , title =. Theory of Probability & Its Applications , volume =. 2006 , doi =. https://doi.org/10.1137/S0040585X97981846 , abstract =

work page doi:10.1137/s0040585x97981846 2006
[13]

The 1st Workshop on Vector Databases , year=

The RaBitQ Library , author=. The 1st Workshop on Vector Databases , year=
[14]

Proceedings of the 29th International Conference on Neural Information Processing Systems - Volume 1 , pages =

Andoni, Alexandr and Indyk, Piotr and Laarhoven, Thijs and Razenshteyn, Ilya and Schmidt, Ludwig , title =. Proceedings of the 29th International Conference on Neural Information Processing Systems - Volume 1 , pages =. 2015 , publisher =

2015
[15]

Pillai and Ashwin Sah and Mehtaab Sawhney and Aaron Smith , title =

Vishesh Jain and Natesh S. Pillai and Ashwin Sah and Mehtaab Sawhney and Aaron Smith , title =. The Annals of Applied Probability , number =. 2022 , doi =

2022
[16]

2022 , editor =

Vargaftik, Shay and Basat, Ran Ben and Portnoy, Amit and Mendelson, Gal and Itzhak, Yaniv Ben and Mitzenmacher, Michael , booktitle =. 2022 , editor =

2022
[17]

Accelerated Nearest Neighbor Search with Quick ADC , year =

Andr\'. Accelerated Nearest Neighbor Search with Quick ADC , year =. Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval , pages =. doi:10.1145/3078971.3078992 , abstract =

work page doi:10.1145/3078971.3078992 2017
[18]

Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=

Longbench: A bilingual, multitask benchmark for long context understanding , author=. Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=
[19]

2026 , note =

Zandieh, Amir and Mirrokni, Vahab , title =. 2026 , note =

2026
[20]

DRIVE: One-bit Distributed Mean Estimation , url =

Vargaftik, Shay and Ben-Basat, Ran and Portnoy, Amit and Mendelson, Gal and Ben-Itzhak, Yaniv and Mitzenmacher, Michael , booktitle =. DRIVE: One-bit Distributed Mean Estimation , url =
[21]

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection , author=. arXiv preprint arXiv:2602.03216 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Kamradt, Greg , title =