DGAI: Decoupled On-Disk Graph-Based ANN Index for Efficient Updates and Queries

Ge Yu; Hao Guo; Jiahao Lou; Quan Yu; Shufeng Gong; Song Yu; Tiezheng Nie; Yanfeng Zhang; Youyou Lu

arxiv: 2510.25401 · v5 · submitted 2025-10-29 · 💻 cs.DB · cs.IR

DGAI: Decoupled On-Disk Graph-Based ANN Index for Efficient Updates and Queries

Jiahao Lou , Shufeng Gong , Quan Yu , Hao Guo , Youyou Lu , Song Yu , Yanfeng Zhang , Tiezheng Nie

show 1 more author

Ge Yu

This is my paper

Pith reviewed 2026-05-18 03:39 UTC · model grok-4.3

classification 💻 cs.DB cs.IR

keywords graph-based ANNon-disk indexdecoupled storageapproximate nearest neighbordynamic updatesproduct quantizationindex maintenance

0 comments

The pith

Decoupling vectors from graph topology in on-disk ANN indexes speeds up updates over 8x while maintaining fast queries via dynamic layout and hierarchical refinement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a decoupled storage design for graph-based approximate nearest neighbor indexes physically separates large vectors from the compact graph structure. This separation cuts redundant input-output operations that normally occur when updating the index, directly improving insertion and deletion speeds. To prevent the separation from harming query performance, the authors add a similarity-aware dynamic layout that reuses fetched data across search steps and a two-stage query process that uses hierarchical product quantization to quickly narrow candidates before exact checks on only a few vectors. A reader would care because real-world ANN systems must handle frequent updates alongside low-latency searches, yet current coupled designs force a tradeoff that limits practicality at billion-scale. If the approach holds, dynamic workloads become feasible without constant full-index rebuilds.

Core claim

DGAI proposes a decoupled storage architecture that physically separates heavy vectors from lightweight graph topology to reduce redundant I/O during updates. This is paired with a similarity-aware dynamic layout that converts read amplification into useful prefetching and a two-stage query mechanism that employs hierarchical PQ to identify promising candidates quickly before exact refinement on raw vectors for only a small subset. The result is an index that supports resource-efficient updates and low-latency queries at the same time.

What carries the argument

Decoupled storage architecture that separates vectors from graph topology, supported by similarity-aware dynamic layout for data reuse and hierarchical PQ two-stage query for candidate refinement.

If this is right

Insertions and deletions become up to 8 times faster by eliminating redundant vector I/O.
Peak query latency under mixed workloads drops by roughly two-thirds compared with prior on-disk graph indexes.
Systems can support more frequent updates without rebuilding the entire index from scratch.
The design maintains query efficiency by turning potential amplification into prefetch benefits and limiting exact checks to few candidates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation idea could apply to other disk-resident indexes where topology is lighter than data payloads.
In environments with expensive random I/O, the dynamic layout may deliver even larger relative gains than on uniform hardware.
Workloads with strong locality might need less aggressive dynamic reorganization, suggesting tunable parameters for different access patterns.

Load-bearing premise

The extra I/O from separating vectors and graph topology can be offset by the dynamic layout and hierarchical refinement across varied access patterns and hardware without large overhead.

What would settle it

Run mixed insert-delete-query workloads on multiple disk types and datasets; if peak query latency rises above the coupled baseline or update gains fall below 2x, the central tradeoff claim is refuted.

Figures

Figures reproduced from arXiv: 2510.25401 by Ge Yu, Hao Guo, Jiahao Lou, Quan Yu, Shufeng Gong, Song Yu, Tiezheng Nie, Yanfeng Zhang, Youyou Lu.

**Figure 2.** Figure 2: I/O volume breakdown during index update. SIFT GIST Dataset 0 2 4 6 8 Throughput(x1K) Base(FreshDiskANN) Base + Decoupled Base + Decoupled + Two-stage [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

On-disk graph-based indexes are favored for billion-scale Approximate Nearest Neighbor Search (ANNS) due to their high performance and cost-efficiency. However, existing systems typically rely on a coupled storage architecture that co-locates vectors and graph topology, which introduces substantial redundant I/O during index updates, thereby degrading usability in dynamic workloads. In this paper, we propose a decoupled storage architecture that physically separates heavy vectors from the lightweight graph topology. This design substantially improves update performance by reducing redundant I/O during updates. However, it introduces I/O amplification during ANNS, leading to degraded query efficiency.To improve query performance within the update-friendly architecture, we propose two techniques co-designed with the decoupled storage. We develop a similarity-aware dynamic layout that optimizes data placement online so that redundantly fetched data can be reused in subsequent search steps, effectively turning read amplification into useful prefetching. In addition, we propose a two-stage query mechanism enhanced by hierarchical PQ, which uses hierarchical PQ to rapidly and accurately identify promising candidates and performs exact refinement on raw vectors for only a small number of candidates. This design significantly reduces both the I/O and computational cost of the refinement stage. Overall, DGAI achieves resource-efficient updates and low-latency queries simultaneously. Experimental results demonstrate that \oursys improves update speed by 8.17x for insertions and 8.16x for deletions, while reducing peak query latency under mixed workloads by 67\% compared to state-of-the-art baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DGAI's decoupling of vectors from graph topology delivers clear update gains, but the query mitigations hinge on unproven adaptation speed of the online layout under shifting patterns.

read the letter

DGAI separates heavy vectors from the lightweight graph topology on disk to cut redundant I/O during inserts and deletes. This produces the reported 8x update speedups and addresses a practical bottleneck in dynamic billion-scale ANNS workloads. The authors then add a similarity-aware dynamic layout that tries to reuse extra reads as prefetching and a two-stage query with hierarchical PQ to limit exact vector checks to few candidates. These co-designed pieces are the actual new elements; prior on-disk graph indexes keep vectors and topology coupled, so the physical split plus the layout and refinement tricks stand out as distinct choices. The measured 67% drop in peak mixed-workload latency shows the techniques can work in the tested cases. The experiments report concrete numbers against state-of-the-art baselines, which is useful for systems readers who need to judge real tradeoffs. The central soft spot is the assumption that the online layout will adapt fast enough when query distributions move. The stress-test note correctly flags that no data on adaptation latency or sensitivity to non-stationary skew appears in the abstract, and if the layout lags the reported query wins could shrink. Minor additional gaps are limited detail on error bars and exact workload definitions, but these are common in early systems papers rather than load-bearing flaws. The work is aimed at engineers and researchers building production vector search or recommendation systems that must handle frequent updates without high tail latency. A reader focused on storage architectures for ANN indexes will extract concrete design ideas and performance deltas worth discussing. The thinking is straightforward and engages the existing literature on graph indexes without circular claims. I would send this to peer review so referees can check the full experimental setup and ask for targeted tests on distribution shifts.

Referee Report

2 major / 2 minor

Summary. The paper proposes DGAI, a decoupled on-disk graph-based approximate nearest neighbor search (ANNS) index that physically separates heavy vectors from lightweight graph topology to reduce redundant I/O during updates. To counteract resulting query I/O amplification, it introduces a similarity-aware dynamic layout for online data placement optimization (turning redundant fetches into prefetching) and a two-stage query mechanism with hierarchical product quantization (PQ) for rapid candidate identification followed by exact refinement on few candidates. Experiments claim 8.17x faster insertions, 8.16x faster deletions, and 67% lower peak query latency under mixed workloads versus state-of-the-art baselines.

Significance. If the empirical results hold under diverse access patterns, this would represent a meaningful advance for dynamic billion-scale on-disk ANNS by breaking the typical update-query tradeoff in coupled storage architectures. The co-design of layout and hierarchical PQ with the decoupled structure is a concrete systems contribution, and the reported speedups on real workloads provide falsifiable performance claims that could influence future index designs.

major comments (2)

[§4 (Dynamic Layout) and §5 (Experiments)] The central claim that I/O amplification from decoupling is reliably offset by the similarity-aware dynamic layout and hierarchical PQ (yielding the reported 67% peak-latency reduction) is load-bearing, yet the manuscript provides no quantification of layout adaptation latency or sensitivity to non-stationary query distributions. If query neighborhoods shift faster than the online optimization can track, the prefetching benefit would not materialize and the mixed-workload results would not generalize.
[§5 (Experimental Evaluation)] Table 3 and the mixed-workload latency plots report the 67% peak-latency improvement and update speedups, but lack error bars, explicit baseline configurations (e.g., exact versions of DiskANN or HNSW on-disk variants), workload definitions (insert/delete/query ratios and distribution skew), and hardware details. This weakens the strength of evidence for the headline numbers.

minor comments (2)

[§3.2] Notation for the hierarchical PQ levels and candidate counts in the two-stage query mechanism could be clarified with a small diagram or pseudocode to make the refinement cost reduction easier to follow.
[§1] The abstract and introduction would benefit from a brief comparison table of I/O costs (coupled vs. decoupled) to motivate the design before describing the mitigations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and positive assessment of the work's potential significance. We address the two major comments point by point below, agreeing that additional quantification and reporting details are needed to strengthen the evidence.

read point-by-point responses

Referee: [§4 (Dynamic Layout) and §5 (Experiments)] The central claim that I/O amplification from decoupling is reliably offset by the similarity-aware dynamic layout and hierarchical PQ (yielding the reported 67% peak-latency reduction) is load-bearing, yet the manuscript provides no quantification of layout adaptation latency or sensitivity to non-stationary query distributions. If query neighborhoods shift faster than the online optimization can track, the prefetching benefit would not materialize and the mixed-workload results would not generalize.

Authors: We agree that the manuscript does not currently quantify layout adaptation latency or include explicit sensitivity analysis for non-stationary query distributions. Section 4 describes the online similarity-aware placement mechanism and its periodic optimization, while Section 5 reports mixed-workload results that incorporate varying query patterns. However, these do not directly measure adaptation overhead or test rapid distribution shifts. In the revision we will add a dedicated subsection with measurements of adaptation latency across different update/query ratios and a sensitivity study that simulates non-stationary shifts, confirming that the prefetching benefit holds under the evaluated conditions. revision: yes
Referee: [§5 (Experimental Evaluation)] Table 3 and the mixed-workload latency plots report the 67% peak-latency improvement and update speedups, but lack error bars, explicit baseline configurations (e.g., exact versions of DiskANN or HNSW on-disk variants), workload definitions (insert/delete/query ratios and distribution skew), and hardware details. This weakens the strength of evidence for the headline numbers.

Authors: We concur that the experimental reporting can be improved for reproducibility. While Section 5 provides workload and hardware information, it does not include error bars, precise baseline version details, exact insert/delete/query ratios with skew parameters, or expanded hardware specifications. In the revised manuscript we will augment Table 3 and the latency plots with error bars from repeated runs, add explicit baseline configuration details (including on-disk variants of DiskANN and HNSW), specify all workload parameters, and expand the hardware description section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance measurements on proposed storage and query techniques

full rationale

The paper describes a decoupled on-disk graph index architecture, a similarity-aware dynamic layout for prefetching, and a hierarchical PQ two-stage query mechanism. All reported results (8.17x insertion speedup, 8.16x deletion speedup, 67% peak latency reduction) are presented as outcomes of experimental evaluation against baselines rather than any mathematical derivation, fitted parameter renamed as prediction, or self-referential definition. No equations, uniqueness theorems, or ansatzes are invoked that reduce to the paper's own inputs by construction. The work is self-contained as a systems contribution whose claims rest on reproducible measurements, not on a derivation chain that collapses into its assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The design rests on standard assumptions about I/O costs in disk-based systems and typical access patterns in dynamic ANNS workloads; no new entities or fitted constants are introduced in the abstract.

axioms (1)

domain assumption Dynamic workloads with frequent insertions, deletions, and queries on billion-scale vector datasets are common and performance-critical.
The motivation and evaluation target this class of workloads.

pith-pipeline@v0.9.0 · 5820 in / 1149 out tokens · 28649 ms · 2026-05-18T03:39:04.396905+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

NAVIS: Concurrent Search and Update with Low Position-Seeking Overhead in On-SSD Graph-Based Vector Search
cs.DC 2026-05 unverdicted novelty 5.0

NAVIS improves concurrent search and update throughput in on-SSD graph vector search by up to 2.74x for insertions and 1.37x for searches through reduced position-seeking overhead.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Amazon.https://www.amazon.com, 2025

work page 2025
[2]

https://github.com/facebookresearch/ faiss, 2025

Faiss. https://github.com/facebookresearch/ faiss, 2025

work page 2025
[3]

Mysql.https://dev.mysql.com/, 2025

work page 2025
[4]

https://github.com/thustorage/ PipeANN/blob/main/README-OdinANN.md, 2025

Odinann. https://github.com/thustorage/ PipeANN/blob/main/README-OdinANN.md, 2025

work page 2025
[5]

https://github.com/pgvector/ pgvector, 2025

pgvector. https://github.com/pgvector/ pgvector, 2025

work page 2025
[6]

Pinecone.https://www.pinecone.io, 2025

work page 2025
[7]

Zilliz.https://zilliz.com/, 2025

work page 2025
[8]

Retrieval-based language models and applications

Akari Asai, Sewon Min, Zexuan Zhong, and Danqi Chen. Retrieval-based language models and applications. In Proceedings of the 61st Annual Meeting of the Associa- tion for Computational Linguistics: Tutorial Abstracts, ACL 2023, Toronto, Canada, July 9-14, 2023, pages 41–

work page 2023
[9]

Association for Computational Linguistics, 2023

work page 2023
[10]

Graph-based vector search: An experimental evaluation of the state-of-the-art.Proceedings of the ACM on Man- agement of Data, 3(1):1–31, 2025

Ilias Azizi, Karima Echihabi, and Themis Palpanas. Graph-based vector search: An experimental evaluation of the state-of-the-art.Proceedings of the ACM on Man- agement of Data, 3(1):1–31, 2025

work page 2025
[11]

Gptcache: An open-source semantic cache for llm applications enabling faster answers and cost savings

Fu Bang. Gptcache: An open-source semantic cache for llm applications enabling faster answers and cost savings. InProceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 212–218, 2023

work page 2023
[12]

Singlestore-v: An integrated vector database system in singlestore.Proc

Cheng Chen, Chenzhe Jin, Yunan Zhang, Sasha Podol- sky, Chun Wu, Szu-Po Wang, Eric Hanson, Zhou Sun, Robert Walzer, and Jianguo Wang. Singlestore-v: An integrated vector database system in singlestore.Proc. VLDB Endow., 17(12):3772–3785, 2024

work page 2024
[13]

SPANN: highly-efficient billion-scale approx- imate nearest neighborhood search

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. SPANN: highly-efficient billion-scale approx- imate nearest neighborhood search. InAdvances in Neural Information Processing Systems 34: Annual Con- ference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages ...

work page 2021
[14]

Deep neu- ral networks for youtube recommendations

Paul Covington, Jay Adams, and Emre Sargin. Deep neu- ral networks for youtube recommendations. InProceed- ings of the 10th ACM Conference on Recommender Sys- tems, Boston, MA, USA, September 15-19, 2016, pages 191–198. ACM, 2016

work page 2016
[15]

Fast approximate nearest neighbor search with the nav- igating spreading-out graph.Proc

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approximate nearest neighbor search with the nav- igating spreading-out graph.Proc. VLDB Endow., 12(5):461–474, 2019

work page 2019
[16]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.CoRR, abs/2312.10997, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[17]

Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with ssd

Hao Guo and Youyou Lu. Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with ssd. In19th USENIX Symposium on Oper- ating Systems Design and Implementation (OSDI 25), pages 171–186, Boston, MA, 2025. USENIX Associa- tion

work page 2025
[18]

Odinann: Direct insert for consistently stable performance in billion-scale graph- based vector search

Hao Guo and Youyou Lu. Odinann: Direct insert for consistently stable performance in billion-scale graph- based vector search. In24th USENIX Conference on File and Storage Technologies (F AST 26), Santa Clara, CA, 2026. USENIX Association

work page 2026
[19]

Diskann: Fast accurate billion-point nearest neighbor search on a single node.Advances in neural information processing Systems, 32, 2019

Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vard- han Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. Diskann: Fast accurate billion-point nearest neighbor search on a single node.Advances in neural information processing Systems, 32, 2019

work page 2019
[20]

When large language models meet vector databases: A survey

Zhi Jing, Yongye Su, Yikun Han, Bo Yuan, Haiyun Xu, Chunjiang Liu, Kehai Chen, and Min Zhang. When large language models meet vector databases: A survey. arXiv preprint arXiv:2402.01763, 2024

work page arXiv 2024
[21]

Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Hein- rich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock- täschel, Sebastian Riedel, and Douwe Kiela. Retrieval- augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Inform...

work page 2020
[22]

A survey on retrieval-augmented text generation

Huayang Li, Yixuan Su, Deng Cai, Yan Wang, and Lemao Liu. A survey on retrieval-augmented text gen- eration.CoRR, abs/2202.01110, 2022

work page arXiv 2022
[23]

The de- sign and implementation of a real time visual search system on JD e-commerce platform

Jie Li, Haifeng Liu, Chuanghua Gui, Jianyu Chen, Zhenyuan Ni, Ning Wang, and Yuan Chen. The de- sign and implementation of a real time visual search system on JD e-commerce platform. InProceedings of the 19th International Middleware Conference, Middle- ware Industrial Track 2018, Rennes, France, December 10-14, 2018, pages 9–16. ACM, 2018. 4

work page 2018
[24]

Skillgpt: a restful API service for skill extraction and standardization using a large language model.CoRR, abs/2304.11060, 2023

Nan Li, Bo Kang, and Tijl De Bie. Skillgpt: a restful API service for skill extraction and standardization using a large language model.CoRR, abs/2304.11060, 2023

work page arXiv 2023
[25]

Wolverine: Highly efficient monotonic search path repair for graph- based ann index updates

Dawei Liu, Bolong Zheng, Ziyang Yue, Fuhao Ruan, Xiaofang Zhou, and Christian S Jensen. Wolverine: Highly efficient monotonic search path repair for graph- based ann index updates

work page
[26]

Approximate nearest neigh- bor algorithm based on navigable small world graphs

Yury Malkov, Alexander Ponomarenko, Andrey Logvi- nov, and Vladimir Krylov. Approximate nearest neigh- bor algorithm based on navigable small world graphs. Inf. Syst., 45:61–68, 2014

work page 2014
[27]

Malkov and Dmitry A

Yury A. Malkov and Dmitry A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Trans. Pattern Anal. Mach. Intell., 42(4):824–836, 2020

work page 2020
[28]

PMD: an optimal transportation-based user distance for recommender systems

Yitong Meng, Xinyan Dai, Xiao Yan, James Cheng, Wei- wen Liu, Jun Guo, Benben Liao, and Guangyong Chen. PMD: an optimal transportation-based user distance for recommender systems. InAdvances in Information Re- trieval - 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14-17, 2020, Pro- ceedings, Part II, volume 12036 ofLecture N...

work page 2020
[29]

Ilyas, Umar Farooq Min- has, Jeffrey Pound, and Theodoros Rekatsinas

Jason Mohoney, Anil Pacaci, Shihabur Rahman Chowd- hury, Ali Mousavi, Ihab F. Ilyas, Umar Farooq Min- has, Jeffrey Pound, and Theodoros Rekatsinas. High- throughput vector similarity search in knowledge graphs. Proc. ACM Manag. Data, 1(2):197:1–197:25, 2023

work page 2023
[30]

Embedding-based news recommenda- tion for millions of users

Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. Embedding-based news recommenda- tion for millions of users. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, Au- gust 13 - 17, 2017, pages 1933–1942. ACM, 2017

work page 2017
[31]

GPT semantic cache: Reducing LLM costs and latency via semantic embedding caching.CoRR, abs/2411.05276, 2024

Sajal Regmi and Chetan Phakami Pun. GPT semantic cache: Reducing LLM costs and latency via semantic embedding caching.CoRR, abs/2411.05276, 2024

work page arXiv 2024
[32]

Colbertv2: Ef- fective and efficient retrieval via lightweight late inter- action

Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. Colbertv2: Ef- fective and efficient retrieval via lightweight late inter- action. InProceedings of the 2022 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States,...

work page 2022
[33]

Kon- stan, and John Riedl

Badrul Munir Sarwar, George Karypis, Joseph A. Kon- stan, and John Riedl. Item-based collaborative filter- ing recommendation algorithms. InProceedings of the Tenth International World Wide Web Conference, WWW 10, Hong Kong, China, May 1-5, 2001, pages 285–295. ACM, 2001

work page 2001
[34]

Adaptive semantic prompt caching with VectorQ.arXiv preprint arXiv:2502.03771, 2025

Luis Gaspar Schroeder, Shu Liu, Alejandro Cuadron, Mark Zhao, Stephan Krusche, Alfons Kemper, Matei Zaharia, and Joseph E. Gonzalez. Adaptive semantic prompt caching with vectorq.CoRR, abs/2502.03771, 2025

work page arXiv 2025
[35]

FreshDiskANN: A fast and accurate graph-based ANN index for streaming similarity search.arXiv preprint arXiv:2105.09613,

Aditi Singh, Suhas Jayaram Subramanya, Ravis- hankar Krishnaswamy, and Harsha Vardhan Simhadri. Freshdiskann: A fast and accurate graph-based ANN index for streaming similarity search.CoRR, abs/2105.09613, 2021

work page arXiv 2021
[36]

Vexless: A serverless vector data management system us- ing cloud functions.Proc

Yongye Su, Yinqi Sun, Minjia Zhang, and Jianguo Wang. Vexless: A serverless vector data management system us- ing cloud functions.Proc. ACM Manag. Data, 2(3):187, 2024

work page 2024
[37]

Milvus: A purpose-built vector data management system

Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xiangyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, Kun Yu, Yuxing Yuan, Yinghao Zou, Jiquan Long, Yudong Cai, Zhenxiang Li, Zhifeng Zhang, Yihua Mo, Jun Gu, Ruiyi Jiang, Yi Wei, and Charles Xie. Milvus: A purpose-built vector data management system. InSIGMOD ’21: In- ternational Co...

work page 2021
[38]

Starling: An i/o- efficient disk-resident graph index framework for high- dimensional vector similarity search on data segment

Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xiangyu Ke, Yunjun Gao, Xiaoliang Xu, Rentong Guo, and Charles Xie. Starling: An i/o- efficient disk-resident graph index framework for high- dimensional vector similarity search on data segment. CoRR, abs/2401.02116, 2024

work page arXiv 2024
[39]

Analyticdb- v: A hybrid analytical engine towards query fusion for structured and unstructured data.Proc

Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and Yuanzhe Cai. Analyticdb- v: A hybrid analytical engine towards query fusion for structured and unstructured data.Proc. VLDB Endow., 13(12):3152–3165, 2020

work page 2020
[40]

Shih, and C

Kyle Williams, Lichi Li, Madian Khabsa, Jian Wu, Patrick C. Shih, and C. Lee Giles. A web service for scholarly big data information extraction. In2014 IEEE International Conference on Web Services, ICWS, 2014, Anchorage, AK, USA, June 27 - July 2, 2014, pages 105–

work page 2014
[41]

IEEE Computer Society, 2014

work page 2014
[42]

In-place updates of a graph in- dex for streaming approximate nearest neighbor search

Haike Xu, Magdalen Dobson Manohar, Philip A. Bern- stein, Badrish Chandramouli, Richard Wen, and Har- sha Vardhan Simhadri. In-place updates of a graph in- dex for streaming approximate nearest neighbor search. CoRR, abs/2502.13826, 2025. 5

work page arXiv 2025
[43]

Spfresh: Incremental in-place update for billion-scale vector search

Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, and Mao Yang. Spfresh: Incremental in-place update for billion-scale vector search. InPro- ceedings of the 29th Symposium on Operating Systems Principles, SOSP 2023, Koblenz, Germany, October 23- 26, 2023, pages 545–561. ACM, 2023

work page 2023
[44]

PASE: postgresql ultra-high-dimensional approximate nearest neighbor search extension

Wen Yang, Tao Li, Gai Fang, and Hong Wei. PASE: postgresql ultra-high-dimensional approximate nearest neighbor search extension. InProceedings of the 2020 International Conference on Management of Data, SIG- MOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, pages 2241–2253. ACM, 2020

work page 2020
[45]

VELO: A vector database-assisted cloud-edge collaborative LLM qos optimization framework

Zhi Yao, Zhiqing Tang, Jiong Lou, Ping Shen, and Wei- jia Jia. VELO: A vector database-assisted cloud-edge collaborative LLM qos optimization framework. In IEEE International Conference on Web Services, ICWS 2024, Shenzhen, China, July 7-13, 2024, pages 865–876. IEEE, 2024

work page 2024
[46]

Gpu-accelerated proximity graph approximate nearest neighbor search and construc- tion

Yuanhang Yu, Dong Wen, Ying Zhang, Lu Qin, Wen- jie Zhang, and Xuemin Lin. Gpu-accelerated proximity graph approximate nearest neighbor search and construc- tion. In2022 IEEE 38th International Conference on Data Engineering (ICDE), pages 552–564. IEEE, 2022

work page 2022
[47]

VBASE: unifying online vector similarity search and relational queries via relaxed monotonicity

Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Ji- adong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, Mao Yang, and Lidong Zhou. VBASE: unifying online vector similarity search and relational queries via relaxed monotonicity. In17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023, Boston, MA, USA, July 10- 12...

work page 2023
[48]

Song: Approxi- mate nearest neighbor search on gpu

Weijie Zhao, Shulong Tan, and Ping Li. Song: Approxi- mate nearest neighbor search on gpu. In2020 IEEE 36th International Conference on Data Engineering (ICDE), pages 1033–1044. IEEE, 2020. 6

work page 2020

[1] [1]

Amazon.https://www.amazon.com, 2025

work page 2025

[2] [2]

https://github.com/facebookresearch/ faiss, 2025

Faiss. https://github.com/facebookresearch/ faiss, 2025

work page 2025

[3] [3]

Mysql.https://dev.mysql.com/, 2025

work page 2025

[4] [4]

https://github.com/thustorage/ PipeANN/blob/main/README-OdinANN.md, 2025

Odinann. https://github.com/thustorage/ PipeANN/blob/main/README-OdinANN.md, 2025

work page 2025

[5] [5]

https://github.com/pgvector/ pgvector, 2025

pgvector. https://github.com/pgvector/ pgvector, 2025

work page 2025

[6] [6]

Pinecone.https://www.pinecone.io, 2025

work page 2025

[7] [7]

Zilliz.https://zilliz.com/, 2025

work page 2025

[8] [8]

Retrieval-based language models and applications

Akari Asai, Sewon Min, Zexuan Zhong, and Danqi Chen. Retrieval-based language models and applications. In Proceedings of the 61st Annual Meeting of the Associa- tion for Computational Linguistics: Tutorial Abstracts, ACL 2023, Toronto, Canada, July 9-14, 2023, pages 41–

work page 2023

[9] [9]

Association for Computational Linguistics, 2023

work page 2023

[10] [10]

Graph-based vector search: An experimental evaluation of the state-of-the-art.Proceedings of the ACM on Man- agement of Data, 3(1):1–31, 2025

Ilias Azizi, Karima Echihabi, and Themis Palpanas. Graph-based vector search: An experimental evaluation of the state-of-the-art.Proceedings of the ACM on Man- agement of Data, 3(1):1–31, 2025

work page 2025

[11] [11]

Gptcache: An open-source semantic cache for llm applications enabling faster answers and cost savings

Fu Bang. Gptcache: An open-source semantic cache for llm applications enabling faster answers and cost savings. InProceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 212–218, 2023

work page 2023

[12] [12]

Singlestore-v: An integrated vector database system in singlestore.Proc

Cheng Chen, Chenzhe Jin, Yunan Zhang, Sasha Podol- sky, Chun Wu, Szu-Po Wang, Eric Hanson, Zhou Sun, Robert Walzer, and Jianguo Wang. Singlestore-v: An integrated vector database system in singlestore.Proc. VLDB Endow., 17(12):3772–3785, 2024

work page 2024

[13] [13]

SPANN: highly-efficient billion-scale approx- imate nearest neighborhood search

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. SPANN: highly-efficient billion-scale approx- imate nearest neighborhood search. InAdvances in Neural Information Processing Systems 34: Annual Con- ference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages ...

work page 2021

[14] [14]

Deep neu- ral networks for youtube recommendations

Paul Covington, Jay Adams, and Emre Sargin. Deep neu- ral networks for youtube recommendations. InProceed- ings of the 10th ACM Conference on Recommender Sys- tems, Boston, MA, USA, September 15-19, 2016, pages 191–198. ACM, 2016

work page 2016

[15] [15]

Fast approximate nearest neighbor search with the nav- igating spreading-out graph.Proc

Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approximate nearest neighbor search with the nav- igating spreading-out graph.Proc. VLDB Endow., 12(5):461–474, 2019

work page 2019

[16] [16]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.CoRR, abs/2312.10997, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[17] [17]

Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with ssd

Hao Guo and Youyou Lu. Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with ssd. In19th USENIX Symposium on Oper- ating Systems Design and Implementation (OSDI 25), pages 171–186, Boston, MA, 2025. USENIX Associa- tion

work page 2025

[18] [18]

Odinann: Direct insert for consistently stable performance in billion-scale graph- based vector search

Hao Guo and Youyou Lu. Odinann: Direct insert for consistently stable performance in billion-scale graph- based vector search. In24th USENIX Conference on File and Storage Technologies (F AST 26), Santa Clara, CA, 2026. USENIX Association

work page 2026

[19] [19]

Diskann: Fast accurate billion-point nearest neighbor search on a single node.Advances in neural information processing Systems, 32, 2019

Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vard- han Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. Diskann: Fast accurate billion-point nearest neighbor search on a single node.Advances in neural information processing Systems, 32, 2019

work page 2019

[20] [20]

When large language models meet vector databases: A survey

Zhi Jing, Yongye Su, Yikun Han, Bo Yuan, Haiyun Xu, Chunjiang Liu, Kehai Chen, and Min Zhang. When large language models meet vector databases: A survey. arXiv preprint arXiv:2402.01763, 2024

work page arXiv 2024

[21] [21]

Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Hein- rich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock- täschel, Sebastian Riedel, and Douwe Kiela. Retrieval- augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Inform...

work page 2020

[22] [22]

A survey on retrieval-augmented text generation

Huayang Li, Yixuan Su, Deng Cai, Yan Wang, and Lemao Liu. A survey on retrieval-augmented text gen- eration.CoRR, abs/2202.01110, 2022

work page arXiv 2022

[23] [23]

The de- sign and implementation of a real time visual search system on JD e-commerce platform

Jie Li, Haifeng Liu, Chuanghua Gui, Jianyu Chen, Zhenyuan Ni, Ning Wang, and Yuan Chen. The de- sign and implementation of a real time visual search system on JD e-commerce platform. InProceedings of the 19th International Middleware Conference, Middle- ware Industrial Track 2018, Rennes, France, December 10-14, 2018, pages 9–16. ACM, 2018. 4

work page 2018

[24] [24]

Skillgpt: a restful API service for skill extraction and standardization using a large language model.CoRR, abs/2304.11060, 2023

Nan Li, Bo Kang, and Tijl De Bie. Skillgpt: a restful API service for skill extraction and standardization using a large language model.CoRR, abs/2304.11060, 2023

work page arXiv 2023

[25] [25]

Wolverine: Highly efficient monotonic search path repair for graph- based ann index updates

Dawei Liu, Bolong Zheng, Ziyang Yue, Fuhao Ruan, Xiaofang Zhou, and Christian S Jensen. Wolverine: Highly efficient monotonic search path repair for graph- based ann index updates

work page

[26] [26]

Approximate nearest neigh- bor algorithm based on navigable small world graphs

Yury Malkov, Alexander Ponomarenko, Andrey Logvi- nov, and Vladimir Krylov. Approximate nearest neigh- bor algorithm based on navigable small world graphs. Inf. Syst., 45:61–68, 2014

work page 2014

[27] [27]

Malkov and Dmitry A

Yury A. Malkov and Dmitry A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Trans. Pattern Anal. Mach. Intell., 42(4):824–836, 2020

work page 2020

[28] [28]

PMD: an optimal transportation-based user distance for recommender systems

Yitong Meng, Xinyan Dai, Xiao Yan, James Cheng, Wei- wen Liu, Jun Guo, Benben Liao, and Guangyong Chen. PMD: an optimal transportation-based user distance for recommender systems. InAdvances in Information Re- trieval - 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14-17, 2020, Pro- ceedings, Part II, volume 12036 ofLecture N...

work page 2020

[29] [29]

Ilyas, Umar Farooq Min- has, Jeffrey Pound, and Theodoros Rekatsinas

Jason Mohoney, Anil Pacaci, Shihabur Rahman Chowd- hury, Ali Mousavi, Ihab F. Ilyas, Umar Farooq Min- has, Jeffrey Pound, and Theodoros Rekatsinas. High- throughput vector similarity search in knowledge graphs. Proc. ACM Manag. Data, 1(2):197:1–197:25, 2023

work page 2023

[30] [30]

Embedding-based news recommenda- tion for millions of users

Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. Embedding-based news recommenda- tion for millions of users. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, Au- gust 13 - 17, 2017, pages 1933–1942. ACM, 2017

work page 2017

[31] [31]

GPT semantic cache: Reducing LLM costs and latency via semantic embedding caching.CoRR, abs/2411.05276, 2024

Sajal Regmi and Chetan Phakami Pun. GPT semantic cache: Reducing LLM costs and latency via semantic embedding caching.CoRR, abs/2411.05276, 2024

work page arXiv 2024

[32] [32]

Colbertv2: Ef- fective and efficient retrieval via lightweight late inter- action

Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. Colbertv2: Ef- fective and efficient retrieval via lightweight late inter- action. InProceedings of the 2022 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States,...

work page 2022

[33] [33]

Kon- stan, and John Riedl

Badrul Munir Sarwar, George Karypis, Joseph A. Kon- stan, and John Riedl. Item-based collaborative filter- ing recommendation algorithms. InProceedings of the Tenth International World Wide Web Conference, WWW 10, Hong Kong, China, May 1-5, 2001, pages 285–295. ACM, 2001

work page 2001

[34] [34]

Adaptive semantic prompt caching with VectorQ.arXiv preprint arXiv:2502.03771, 2025

Luis Gaspar Schroeder, Shu Liu, Alejandro Cuadron, Mark Zhao, Stephan Krusche, Alfons Kemper, Matei Zaharia, and Joseph E. Gonzalez. Adaptive semantic prompt caching with vectorq.CoRR, abs/2502.03771, 2025

work page arXiv 2025

[35] [35]

FreshDiskANN: A fast and accurate graph-based ANN index for streaming similarity search.arXiv preprint arXiv:2105.09613,

Aditi Singh, Suhas Jayaram Subramanya, Ravis- hankar Krishnaswamy, and Harsha Vardhan Simhadri. Freshdiskann: A fast and accurate graph-based ANN index for streaming similarity search.CoRR, abs/2105.09613, 2021

work page arXiv 2021

[36] [36]

Vexless: A serverless vector data management system us- ing cloud functions.Proc

Yongye Su, Yinqi Sun, Minjia Zhang, and Jianguo Wang. Vexless: A serverless vector data management system us- ing cloud functions.Proc. ACM Manag. Data, 2(3):187, 2024

work page 2024

[37] [37]

Milvus: A purpose-built vector data management system

Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xiangyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, Kun Yu, Yuxing Yuan, Yinghao Zou, Jiquan Long, Yudong Cai, Zhenxiang Li, Zhifeng Zhang, Yihua Mo, Jun Gu, Ruiyi Jiang, Yi Wei, and Charles Xie. Milvus: A purpose-built vector data management system. InSIGMOD ’21: In- ternational Co...

work page 2021

[38] [38]

Starling: An i/o- efficient disk-resident graph index framework for high- dimensional vector similarity search on data segment

Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xiangyu Ke, Yunjun Gao, Xiaoliang Xu, Rentong Guo, and Charles Xie. Starling: An i/o- efficient disk-resident graph index framework for high- dimensional vector similarity search on data segment. CoRR, abs/2401.02116, 2024

work page arXiv 2024

[39] [39]

Analyticdb- v: A hybrid analytical engine towards query fusion for structured and unstructured data.Proc

Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and Yuanzhe Cai. Analyticdb- v: A hybrid analytical engine towards query fusion for structured and unstructured data.Proc. VLDB Endow., 13(12):3152–3165, 2020

work page 2020

[40] [40]

Shih, and C

Kyle Williams, Lichi Li, Madian Khabsa, Jian Wu, Patrick C. Shih, and C. Lee Giles. A web service for scholarly big data information extraction. In2014 IEEE International Conference on Web Services, ICWS, 2014, Anchorage, AK, USA, June 27 - July 2, 2014, pages 105–

work page 2014

[41] [41]

IEEE Computer Society, 2014

work page 2014

[42] [42]

In-place updates of a graph in- dex for streaming approximate nearest neighbor search

Haike Xu, Magdalen Dobson Manohar, Philip A. Bern- stein, Badrish Chandramouli, Richard Wen, and Har- sha Vardhan Simhadri. In-place updates of a graph in- dex for streaming approximate nearest neighbor search. CoRR, abs/2502.13826, 2025. 5

work page arXiv 2025

[43] [43]

Spfresh: Incremental in-place update for billion-scale vector search

Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, and Mao Yang. Spfresh: Incremental in-place update for billion-scale vector search. InPro- ceedings of the 29th Symposium on Operating Systems Principles, SOSP 2023, Koblenz, Germany, October 23- 26, 2023, pages 545–561. ACM, 2023

work page 2023

[44] [44]

PASE: postgresql ultra-high-dimensional approximate nearest neighbor search extension

Wen Yang, Tao Li, Gai Fang, and Hong Wei. PASE: postgresql ultra-high-dimensional approximate nearest neighbor search extension. InProceedings of the 2020 International Conference on Management of Data, SIG- MOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, pages 2241–2253. ACM, 2020

work page 2020

[45] [45]

VELO: A vector database-assisted cloud-edge collaborative LLM qos optimization framework

Zhi Yao, Zhiqing Tang, Jiong Lou, Ping Shen, and Wei- jia Jia. VELO: A vector database-assisted cloud-edge collaborative LLM qos optimization framework. In IEEE International Conference on Web Services, ICWS 2024, Shenzhen, China, July 7-13, 2024, pages 865–876. IEEE, 2024

work page 2024

[46] [46]

Gpu-accelerated proximity graph approximate nearest neighbor search and construc- tion

Yuanhang Yu, Dong Wen, Ying Zhang, Lu Qin, Wen- jie Zhang, and Xuemin Lin. Gpu-accelerated proximity graph approximate nearest neighbor search and construc- tion. In2022 IEEE 38th International Conference on Data Engineering (ICDE), pages 552–564. IEEE, 2022

work page 2022

[47] [47]

VBASE: unifying online vector similarity search and relational queries via relaxed monotonicity

Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Ji- adong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, Mao Yang, and Lidong Zhou. VBASE: unifying online vector similarity search and relational queries via relaxed monotonicity. In17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023, Boston, MA, USA, July 10- 12...

work page 2023

[48] [48]

Song: Approxi- mate nearest neighbor search on gpu

Weijie Zhao, Shulong Tan, and Ping Li. Song: Approxi- mate nearest neighbor search on gpu. In2020 IEEE 36th International Conference on Data Engineering (ICDE), pages 1033–1044. IEEE, 2020. 6

work page 2020