Recognition: no theorem link
GRAB-ANNS: High-Throughput Indexing and Hybrid Search via GPU-Native Bucketing
Pith reviewed 2026-05-13 23:44 UTC · model grok-4.3
The pith
GRAB-ANNS maps range predicates to GPU bucket selections and maintains navigability with a hybrid graph of dense local edges plus sparse remote edges.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GRAB-ANNS is a GPU-native graph index for dynamic hybrid search that introduces a bucket-based memory layout transforming range predicates into lightweight bucket selection for coalesced accesses and efficient SIMT execution, a hybrid graph topology combining dense intra-bucket local edges with sparse inter-bucket remote edges to preserve global navigability under arbitrary filters, and an append-only update pipeline supporting batched insertions and parallel graph maintenance on GPUs.
What carries the argument
The bucket-based memory layout that converts predicates into bucket selections, paired with the hybrid graph topology of dense intra-bucket local edges and sparse inter-bucket remote edges.
Load-bearing premise
The hybrid graph of dense intra-bucket edges and sparse inter-bucket edges preserves global navigability and high recall when arbitrary predicate filters activate only a few buckets.
What would settle it
A measured recall drop below acceptable thresholds on queries with highly selective predicates that touch only a small fraction of buckets, using a dataset where removing inter-bucket edges disconnects the graph.
Figures
read the original abstract
Hybrid search, which jointly optimizes vector similarity and structured predicate filtering, has become a fundamental building block for modern AI-driven systems. While recent predicate-aware ANN indices improve filtering efficiency on CPUs, their performance is increasingly constrained by limited memory bandwidth and parallelism. Although GPUs offer massive parallelism and superior memory bandwidth, directly porting CPU-centric hybrid search algorithms to GPUs leads to severe performance degradation due to architectural mismatches, including irregular memory access, branch divergence, and excessive CPU-GPU synchronization. In this paper, we present GRAB-ANNS, a high-throughput, GPU-native graph index for dynamic hybrid search. Our key insight is to rethink hybrid indexing from a hardware-first perspective. We introduce a bucket-based memory layout that transforms range predicates into lightweight bucket selection, enabling coalesced memory accesses and efficient SIMT execution. To preserve global navigability under arbitrary filters, we design a hybrid graph topology that combines dense intra-bucket local edges with sparse inter-bucket remote edges. We further develop an append-only update pipeline that supports efficient batched insertions and parallel graph maintenance on GPUs. Extensive experiments on large-scale datasets show that GRAB-ANNS achieves up to 240.1 times higher query throughput and 12.6 times faster index construction than state-of-the-art CPU-based systems, and up to 10 times higher throughput compared to optimized GPU-native reimplementations, while maintaining high recall.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents GRAB-ANNS, a GPU-native graph index for dynamic hybrid search combining vector similarity and structured predicate filtering. It introduces a bucket-based memory layout that converts range predicates into lightweight bucket selection for coalesced GPU accesses and SIMT execution. A hybrid graph topology with dense intra-bucket local edges and sparse inter-bucket remote edges is designed to preserve global navigability under arbitrary filters. An append-only update pipeline supports batched insertions and parallel maintenance. Experiments on large-scale datasets claim up to 240.1× higher query throughput and 12.6× faster index construction versus state-of-the-art CPU systems, plus up to 10× throughput over optimized GPU reimplementations, while maintaining high recall.
Significance. If the performance claims and recall guarantees are substantiated, the work would be significant for high-throughput hybrid search in AI-driven database systems. The hardware-first redesign addressing irregular access, divergence, and synchronization issues on GPUs offers a concrete path beyond CPU-centric predicate-aware indices, with potential impact on scalable vector search workloads.
major comments (2)
- [§4.2] §4.2 (Hybrid Graph Topology): The central claim that the hybrid topology (dense intra-bucket local edges + sparse inter-bucket remote edges) preserves global navigability and high recall for arbitrary (non-range) structured predicates lacks a connectivity argument or analysis of worst-case bucket sparsity. When a predicate selects a sparse or non-contiguous bucket set, the remote edges may fail to provide sufficient traversal paths, directly undermining both the recall guarantee and the reported 240.1×/10× throughput gains.
- [§5] §5 (Experimental Evaluation): The reported speedups (240.1× query throughput, 12.6× construction, 10× vs. GPU baselines) are load-bearing for the contribution but rest on unspecified datasets, predicate workloads (especially arbitrary filters), baseline reimplementations, recall thresholds, and query distributions. Without these details and ablations isolating the hybrid topology's effect, the claims cannot be verified as general.
minor comments (2)
- [Abstract] The abstract refers to 'large-scale datasets' without naming them, their cardinalities, or dimensionalities, which would aid immediate assessment of the scale at which the speedups apply.
- [§3.1] Clarify the selection and sensitivity of bucket granularity (listed as a free parameter) and its impact on the trade-off between intra-bucket density and inter-bucket sparsity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. The comments highlight important areas for strengthening the presentation of the hybrid topology and experimental details. We address each point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§4.2] §4.2 (Hybrid Graph Topology): The central claim that the hybrid topology (dense intra-bucket local edges + sparse inter-bucket remote edges) preserves global navigability and high recall for arbitrary (non-range) structured predicates lacks a connectivity argument or analysis of worst-case bucket sparsity. When a predicate selects a sparse or non-contiguous bucket set, the remote edges may fail to provide sufficient traversal paths, directly undermining both the recall guarantee and the reported 240.1×/10× throughput gains.
Authors: We acknowledge that §4.2 would benefit from an explicit connectivity argument. The hybrid design ensures navigability by maintaining a minimum density of inter-bucket edges to representative nodes across buckets, preserving the small-world property of the underlying graph. While the current manuscript relies on empirical recall results across filter selectivities to support this, we agree a probabilistic analysis of connectivity under varying bucket sparsity is valuable. We will add a new subsection in §4.2 providing a connectivity argument based on expected edge coverage and a worst-case sparsity analysis assuming uniform bucket distributions. This will be included in the revision. revision: yes
-
Referee: [§5] §5 (Experimental Evaluation): The reported speedups (240.1× query throughput, 12.6× construction, 10× vs. GPU baselines) are load-bearing for the contribution but rest on unspecified datasets, predicate workloads (especially arbitrary filters), baseline reimplementations, recall thresholds, and query distributions. Without these details and ablations isolating the hybrid topology's effect, the claims cannot be verified as general.
Authors: We agree that §5 requires expanded details for full reproducibility and verification. The manuscript specifies the datasets, predicate types (including arbitrary filters), recall targets, and query workloads, but these can be presented more explicitly. We will revise §5 to include: (1) a detailed table of all datasets and sizes, (2) explicit descriptions of predicate workloads with selectivity ranges for arbitrary filters, (3) precise specifications of baseline reimplementations and optimizations, (4) recall thresholds used (e.g., 0.9+), (5) query distribution parameters, and (6) new ablations comparing the full hybrid topology against intra-bucket-only variants to isolate its contribution. These changes will be made in the revised version. revision: yes
Circularity Check
No circularity: performance claims rest on new implementation and experiments
full rationale
The paper presents GRAB-ANNS as a hardware-first GPU-native design with bucket-based memory layout for range predicates and a hybrid graph topology (dense intra-bucket + sparse inter-bucket edges) for arbitrary filters. All central claims—240.1x query throughput, 12.6x faster construction, 10x vs GPU baselines—are tied to empirical results on large-scale datasets rather than any derivation that reduces to its own inputs by construction. No equations, fitted parameters renamed as predictions, self-citation load-bearing premises, or ansatzes smuggled via prior work appear in the abstract or design description. The navigability assumption is treated as an engineering choice validated experimentally, not a self-referential definition.
Axiom & Free-Parameter Ledger
free parameters (1)
- bucket granularity
axioms (1)
- domain assumption GPU SIMT execution benefits from coalesced memory accesses enabled by bucket layouts
Reference graph
Works this paper leans on
-
[1]
Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance.Advances in neural information processing systems28 (2015)
work page 2015
-
[2]
Muhammad Arslan, Hussam Ghanem, Saba Munawar, and Christophe Cruz. 2024. A Survey on RAG with LLMs.Procedia computer science246 (2024), 3781–3790
work page 2024
-
[3]
Dmitry Baranchuk, Artem Babenko, and Yury Malkov. 2018. Revisiting the inverted indices for billion-scale approximate nearest neighbors. InProceedings of the European Conference on Computer Vision (ECCV). 202–216. 14
work page 2018
-
[4]
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Ya- dav. 2025. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. 2024. A survey on rag meeting llms: Towards retrieval-augmented large language models. InProceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining. 6491–6501
work page 2024
-
[6]
Siddharth Gollapudi, Neel Karia, Varun Sivashankar, Ravishankar Krishnaswamy, Nikit Begwani, Swapnil Raz, Yiyong Lin, Yin Zhang, Neelam Mahapatro, Premku- mar Srinivasan, Amit Singh, and Harsha Vardhan Simhadri. 2023. Filtered- DiskANN: Graph Algorithms for Approximate Nearest Neighbor Search with Filters. InProceedings of the ACM Web Conference 2023(Aust...
-
[7]
Long Gong, Huayi Wang, Mitsunori Ogihara, and Jun Xu. 2020. iDEC: indexable distance estimating codes for approximate nearest neighbor search.Proceedings of the VLDB Endowment13, 9 (2020)
work page 2020
-
[8]
Fabian Groh, Lukas Ruppert, Patrick Wieschollek, and Hendrik P. A. Lensch
-
[9]
https://doi.org/10.1109/TBDATA.2022.3161156
GGNN: Graph-Based GPU Nearest Neighbor Search.IEEE Transactions on Big Data9, 1 (2023), 267–279. https://doi.org/10.1109/TBDATA.2022.3161156
-
[10]
Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating large-scale inference with anisotropic vector quantization. InInternational Conference on Machine Learning. PMLR, 3887–3896
work page 2020
-
[11]
Michael E Houle and Michael Nett. 2014. Rank-based similarity search: Reducing the dimensional dependence.IEEE transactions on pattern analysis and machine intelligence37, 1 (2014), 136–150
work page 2014
-
[12]
Omid Jafari, Parth Nagarkar, and Jonathan Montaño. 2020. mmlsh: A practical and efficient technique for processing approximate nearest neighbor queries on multimedia data. InInternational Conference on Similarity Search and Applications. Springer, 47–61
work page 2020
-
[13]
Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. Diskann: Fast accurate billion-point nearest neighbor search on a single node.Advances in neural information pro- cessing Systems32 (2019)
work page 2019
-
[14]
Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search.IEEE Transactions on Pattern Analysis and Machine Intelligence33, 1 (2011), 117–128
work page 2011
-
[15]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs.IEEE transactions on big data7, 3 (2019), 535–547
work page 2019
-
[16]
Mingjie Li, Ying Zhang, Yifang Sun, Wei Wang, Ivor W Tsang, and Xuemin Lin. 2020. I/O efficient approximate nearest neighbour search based on learned functions. In2020 IEEE 36th international conference on data engineering (ICDE). IEEE, 289–300
work page 2020
- [17]
-
[18]
Kejing Lu and Mineichi Kudo. 2020. R2LSH: A nearest neighbor search scheme based on two-dimensional projected spaces. In2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1045–1056
work page 2020
-
[19]
Kejing Lu, Hongya Wang, Wei Wang, and Mineichi Kudo. 2020. VHP: approxi- mate nearest neighbor search via virtual hypersphere partitioning.Proceedings of the VLDB Endowment13, 9 (2020), 1443–1455
work page 2020
-
[20]
Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov
-
[21]
Approximate nearest neighbor algorithm based on navigable small world graphs.Information Systems45 (2014), 61–68
work page 2014
-
[22]
Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE transactions on pattern analysis and machine intelligence42, 4 (2018), 824–836
work page 2018
-
[23]
Yu A. Malkov and D. A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence42, 4 (2020), 824–
work page 2020
-
[24]
https://doi.org/10.1109/TPAMI.2018.2889473
-
[25]
Marius Muja and David G Lowe. 2014. Scalable nearest neighbor algorithms for high dimensional data.IEEE transactions on pattern analysis and machine intelligence36, 11 (2014), 2227–2240
work page 2014
- [26]
-
[27]
Hiroyuki Ootomo, Akira Naruse, Corey Nolet, Ray Wang, Tamas Feher, and Yong Wang. 2024. CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs. In2024 IEEE 40th International Conference on Data Engineering (ICDE). 4236–4247. https://doi.org/10.1109/ICDE60146.2024. 00323
-
[28]
James Jie Pan, Jianguo Wang, and Guoliang Li. 2024. Vector database management techniques and systems. InCompanion of the 2024 International Conference on Management of Data. 597–604
work page 2024
-
[29]
Liana Patel, Peter Kraft, Carlos Guestrin, and Matei Zaharia. 2024. ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Struc- tured Data.Proc. ACM Manag. Data2, 3, Article 120 (May 2024), 27 pages. https://doi.org/10.1145/3654923
-
[30]
Guru Pramod Rusum and Sunil Anasuri. 2024. Vector Databases in Modern Applications: Real-Time Search, Recommendations, and Retrieval-Augmented Generation (RAG).International Journal of AI, BigData, Computational and Management Studies5, 4 (2024), 124–136
work page 2024
-
[31]
Mohamed Sarwat, Raha Moraffah, Mohamed F Mokbel, and James L Avery. 2017. Database system support for personalized recommendation applications. In2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, 1320–1331
work page 2017
-
[32]
Krishna Srinivasan, Karthik Raman, Jianshu Chen, Ankur Bapna, Yuan Cao, and Orhan Firat. 2021. WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2443– 2449
work page 2021
-
[33]
OceanBase Team. 2025. OceanBase seekdb: AI-Native Hybrid Search Database. https://github.com/oceanbase/seekdb. Accessed: [Replace with actual access date, e.g., 2024-10-01]
work page 2025
-
[34]
Tencent AI Infra Team, WeChat Infrastructure. 2026. Exploring GPU-Accelerated Vector Search: Engineering Practice of NVIDIA Cagra in WeChat’s Large-Scale Recommendation System. Tencent News (Tencent Technical Engineering Official Account). https://news.qq.com/rain/a/20260320A07E2900 Online; accessed March 27, 2026
-
[35]
Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xi- angyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, et al. 2021. Milvus: A purpose-built vector data management system. InProceedings of the 2021 international conference on management of data. 2614–2627
work page 2021
-
[36]
Yuxiang Wang, Ziyuan He, Yongxin Tong, Zimu Zhou, and Yiman Zhong. 2025. Timestamp Approximate Nearest Neighbor Search over High-Dimensional Vec- tor Data. In2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE Computer Society, 3043–3055
work page 2025
-
[37]
Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and Yuanzhe Cai. 2020. AnalyticDB-V: A Hybrid Analytical Engine Towards Query Fusion for Structured and Unstructured Data.Proc. VLDB Endow.13, 12 (2020), 3152–3165
work page 2020
-
[38]
Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang
-
[39]
A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [40]
-
[41]
Yuanhang Yu, Dong Wen, Ying Zhang, Lu Qin, Wenjie Zhang, and Xuemin Lin
-
[42]
In2022 IEEE 38th International Conference on Data Engineering (ICDE)
GPU-accelerated Proximity Graph Approximate Nearest Neighbor Search and Construction. In2022 IEEE 38th International Conference on Data Engineering (ICDE). 552–564. https://doi.org/10.1109/ICDE53745.2022.00046
-
[43]
Chen Zhao, Wu Gao, Feiping Nie, and Huiyang Zhou. 2021. A survey of GPU multitasking methods supported by hardware architecture.IEEE Transactions on Parallel and Distributed Systems33, 6 (2021), 1451–1463
work page 2021
-
[44]
Weijie Zhao, Shulong Tan, and Ping Li. 2020. SONG: Approximate Nearest Neighbor Search on GPU. In2020 IEEE 36th International Conference on Data Engineering (ICDE). 1033–1044. https://doi.org/10.1109/ICDE48307.2020.00094
- [45]
- [46]
- [47]
-
[48]
Chaoji Zuo, Miao Qiao, Wenchao Zhou, Feifei Li, and Dong Deng. 2024. SeRF: seg- ment graph for range-filtering approximate nearest neighbor search.Proceedings of the ACM on Management of Data2, 1 (2024), 1–26
work page 2024
-
[49]
Chaoji Zuo, Miao Qiao, Wenchao Zhou, Feifei Li, and Dong Deng. 2024. SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor Search.Proc. ACM Manag. Data2, 1, Article 69 (March 2024), 26 pages. https://doi.org/10. 1145/3639324 15
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.