DGAI: Decoupled On-Disk Graph-Based ANN Index for Efficient Updates and Queries
Pith reviewed 2026-05-18 03:39 UTC · model grok-4.3
The pith
Decoupling vectors from graph topology in on-disk ANN indexes speeds up updates over 8x while maintaining fast queries via dynamic layout and hierarchical refinement.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DGAI proposes a decoupled storage architecture that physically separates heavy vectors from lightweight graph topology to reduce redundant I/O during updates. This is paired with a similarity-aware dynamic layout that converts read amplification into useful prefetching and a two-stage query mechanism that employs hierarchical PQ to identify promising candidates quickly before exact refinement on raw vectors for only a small subset. The result is an index that supports resource-efficient updates and low-latency queries at the same time.
What carries the argument
Decoupled storage architecture that separates vectors from graph topology, supported by similarity-aware dynamic layout for data reuse and hierarchical PQ two-stage query for candidate refinement.
If this is right
- Insertions and deletions become up to 8 times faster by eliminating redundant vector I/O.
- Peak query latency under mixed workloads drops by roughly two-thirds compared with prior on-disk graph indexes.
- Systems can support more frequent updates without rebuilding the entire index from scratch.
- The design maintains query efficiency by turning potential amplification into prefetch benefits and limiting exact checks to few candidates.
Where Pith is reading between the lines
- The separation idea could apply to other disk-resident indexes where topology is lighter than data payloads.
- In environments with expensive random I/O, the dynamic layout may deliver even larger relative gains than on uniform hardware.
- Workloads with strong locality might need less aggressive dynamic reorganization, suggesting tunable parameters for different access patterns.
Load-bearing premise
The extra I/O from separating vectors and graph topology can be offset by the dynamic layout and hierarchical refinement across varied access patterns and hardware without large overhead.
What would settle it
Run mixed insert-delete-query workloads on multiple disk types and datasets; if peak query latency rises above the coupled baseline or update gains fall below 2x, the central tradeoff claim is refuted.
Figures
read the original abstract
On-disk graph-based indexes are favored for billion-scale Approximate Nearest Neighbor Search (ANNS) due to their high performance and cost-efficiency. However, existing systems typically rely on a coupled storage architecture that co-locates vectors and graph topology, which introduces substantial redundant I/O during index updates, thereby degrading usability in dynamic workloads. In this paper, we propose a decoupled storage architecture that physically separates heavy vectors from the lightweight graph topology. This design substantially improves update performance by reducing redundant I/O during updates. However, it introduces I/O amplification during ANNS, leading to degraded query efficiency.To improve query performance within the update-friendly architecture, we propose two techniques co-designed with the decoupled storage. We develop a similarity-aware dynamic layout that optimizes data placement online so that redundantly fetched data can be reused in subsequent search steps, effectively turning read amplification into useful prefetching. In addition, we propose a two-stage query mechanism enhanced by hierarchical PQ, which uses hierarchical PQ to rapidly and accurately identify promising candidates and performs exact refinement on raw vectors for only a small number of candidates. This design significantly reduces both the I/O and computational cost of the refinement stage. Overall, DGAI achieves resource-efficient updates and low-latency queries simultaneously. Experimental results demonstrate that \oursys improves update speed by 8.17x for insertions and 8.16x for deletions, while reducing peak query latency under mixed workloads by 67\% compared to state-of-the-art baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes DGAI, a decoupled on-disk graph-based approximate nearest neighbor search (ANNS) index that physically separates heavy vectors from lightweight graph topology to reduce redundant I/O during updates. To counteract resulting query I/O amplification, it introduces a similarity-aware dynamic layout for online data placement optimization (turning redundant fetches into prefetching) and a two-stage query mechanism with hierarchical product quantization (PQ) for rapid candidate identification followed by exact refinement on few candidates. Experiments claim 8.17x faster insertions, 8.16x faster deletions, and 67% lower peak query latency under mixed workloads versus state-of-the-art baselines.
Significance. If the empirical results hold under diverse access patterns, this would represent a meaningful advance for dynamic billion-scale on-disk ANNS by breaking the typical update-query tradeoff in coupled storage architectures. The co-design of layout and hierarchical PQ with the decoupled structure is a concrete systems contribution, and the reported speedups on real workloads provide falsifiable performance claims that could influence future index designs.
major comments (2)
- [§4 (Dynamic Layout) and §5 (Experiments)] The central claim that I/O amplification from decoupling is reliably offset by the similarity-aware dynamic layout and hierarchical PQ (yielding the reported 67% peak-latency reduction) is load-bearing, yet the manuscript provides no quantification of layout adaptation latency or sensitivity to non-stationary query distributions. If query neighborhoods shift faster than the online optimization can track, the prefetching benefit would not materialize and the mixed-workload results would not generalize.
- [§5 (Experimental Evaluation)] Table 3 and the mixed-workload latency plots report the 67% peak-latency improvement and update speedups, but lack error bars, explicit baseline configurations (e.g., exact versions of DiskANN or HNSW on-disk variants), workload definitions (insert/delete/query ratios and distribution skew), and hardware details. This weakens the strength of evidence for the headline numbers.
minor comments (2)
- [§3.2] Notation for the hierarchical PQ levels and candidate counts in the two-stage query mechanism could be clarified with a small diagram or pseudocode to make the refinement cost reduction easier to follow.
- [§1] The abstract and introduction would benefit from a brief comparison table of I/O costs (coupled vs. decoupled) to motivate the design before describing the mitigations.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and positive assessment of the work's potential significance. We address the two major comments point by point below, agreeing that additional quantification and reporting details are needed to strengthen the evidence.
read point-by-point responses
-
Referee: [§4 (Dynamic Layout) and §5 (Experiments)] The central claim that I/O amplification from decoupling is reliably offset by the similarity-aware dynamic layout and hierarchical PQ (yielding the reported 67% peak-latency reduction) is load-bearing, yet the manuscript provides no quantification of layout adaptation latency or sensitivity to non-stationary query distributions. If query neighborhoods shift faster than the online optimization can track, the prefetching benefit would not materialize and the mixed-workload results would not generalize.
Authors: We agree that the manuscript does not currently quantify layout adaptation latency or include explicit sensitivity analysis for non-stationary query distributions. Section 4 describes the online similarity-aware placement mechanism and its periodic optimization, while Section 5 reports mixed-workload results that incorporate varying query patterns. However, these do not directly measure adaptation overhead or test rapid distribution shifts. In the revision we will add a dedicated subsection with measurements of adaptation latency across different update/query ratios and a sensitivity study that simulates non-stationary shifts, confirming that the prefetching benefit holds under the evaluated conditions. revision: yes
-
Referee: [§5 (Experimental Evaluation)] Table 3 and the mixed-workload latency plots report the 67% peak-latency improvement and update speedups, but lack error bars, explicit baseline configurations (e.g., exact versions of DiskANN or HNSW on-disk variants), workload definitions (insert/delete/query ratios and distribution skew), and hardware details. This weakens the strength of evidence for the headline numbers.
Authors: We concur that the experimental reporting can be improved for reproducibility. While Section 5 provides workload and hardware information, it does not include error bars, precise baseline version details, exact insert/delete/query ratios with skew parameters, or expanded hardware specifications. In the revised manuscript we will augment Table 3 and the latency plots with error bars from repeated runs, add explicit baseline configuration details (including on-disk variants of DiskANN and HNSW), specify all workload parameters, and expand the hardware description section. revision: yes
Circularity Check
No circularity: empirical performance measurements on proposed storage and query techniques
full rationale
The paper describes a decoupled on-disk graph index architecture, a similarity-aware dynamic layout for prefetching, and a hierarchical PQ two-stage query mechanism. All reported results (8.17x insertion speedup, 8.16x deletion speedup, 67% peak latency reduction) are presented as outcomes of experimental evaluation against baselines rather than any mathematical derivation, fitted parameter renamed as prediction, or self-referential definition. No equations, uniqueness theorems, or ansatzes are invoked that reduce to the paper's own inputs by construction. The work is self-contained as a systems contribution whose claims rest on reproducible measurements, not on a derivation chain that collapses into its assumptions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Dynamic workloads with frequent insertions, deletions, and queries on billion-scale vector datasets are common and performance-critical.
Forward citations
Cited by 1 Pith paper
-
NAVIS: Concurrent Search and Update with Low Position-Seeking Overhead in On-SSD Graph-Based Vector Search
NAVIS improves concurrent search and update throughput in on-SSD graph vector search by up to 2.74x for insertions and 1.37x for searches through reduced position-seeking overhead.
Reference graph
Works this paper leans on
-
[1]
Amazon.https://www.amazon.com, 2025
work page 2025
-
[2]
https://github.com/facebookresearch/ faiss, 2025
Faiss. https://github.com/facebookresearch/ faiss, 2025
work page 2025
-
[3]
Mysql.https://dev.mysql.com/, 2025
work page 2025
-
[4]
https://github.com/thustorage/ PipeANN/blob/main/README-OdinANN.md, 2025
Odinann. https://github.com/thustorage/ PipeANN/blob/main/README-OdinANN.md, 2025
work page 2025
-
[5]
https://github.com/pgvector/ pgvector, 2025
pgvector. https://github.com/pgvector/ pgvector, 2025
work page 2025
-
[6]
Pinecone.https://www.pinecone.io, 2025
work page 2025
-
[7]
Zilliz.https://zilliz.com/, 2025
work page 2025
-
[8]
Retrieval-based language models and applications
Akari Asai, Sewon Min, Zexuan Zhong, and Danqi Chen. Retrieval-based language models and applications. In Proceedings of the 61st Annual Meeting of the Associa- tion for Computational Linguistics: Tutorial Abstracts, ACL 2023, Toronto, Canada, July 9-14, 2023, pages 41–
work page 2023
-
[9]
Association for Computational Linguistics, 2023
work page 2023
-
[10]
Ilias Azizi, Karima Echihabi, and Themis Palpanas. Graph-based vector search: An experimental evaluation of the state-of-the-art.Proceedings of the ACM on Man- agement of Data, 3(1):1–31, 2025
work page 2025
-
[11]
Fu Bang. Gptcache: An open-source semantic cache for llm applications enabling faster answers and cost savings. InProceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 212–218, 2023
work page 2023
-
[12]
Singlestore-v: An integrated vector database system in singlestore.Proc
Cheng Chen, Chenzhe Jin, Yunan Zhang, Sasha Podol- sky, Chun Wu, Szu-Po Wang, Eric Hanson, Zhou Sun, Robert Walzer, and Jianguo Wang. Singlestore-v: An integrated vector database system in singlestore.Proc. VLDB Endow., 17(12):3772–3785, 2024
work page 2024
-
[13]
SPANN: highly-efficient billion-scale approx- imate nearest neighborhood search
Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. SPANN: highly-efficient billion-scale approx- imate nearest neighborhood search. InAdvances in Neural Information Processing Systems 34: Annual Con- ference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages ...
work page 2021
-
[14]
Deep neu- ral networks for youtube recommendations
Paul Covington, Jay Adams, and Emre Sargin. Deep neu- ral networks for youtube recommendations. InProceed- ings of the 10th ACM Conference on Recommender Sys- tems, Boston, MA, USA, September 15-19, 2016, pages 191–198. ACM, 2016
work page 2016
-
[15]
Fast approximate nearest neighbor search with the nav- igating spreading-out graph.Proc
Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. Fast approximate nearest neighbor search with the nav- igating spreading-out graph.Proc. VLDB Endow., 12(5):461–474, 2019
work page 2019
-
[16]
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey.CoRR, abs/2312.10997, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[17]
Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with ssd
Hao Guo and Youyou Lu. Achieving low-latency graph- based vector search via aligning best-first search algo- rithm with ssd. In19th USENIX Symposium on Oper- ating Systems Design and Implementation (OSDI 25), pages 171–186, Boston, MA, 2025. USENIX Associa- tion
work page 2025
-
[18]
Hao Guo and Youyou Lu. Odinann: Direct insert for consistently stable performance in billion-scale graph- based vector search. In24th USENIX Conference on File and Storage Technologies (F AST 26), Santa Clara, CA, 2026. USENIX Association
work page 2026
-
[19]
Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vard- han Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. Diskann: Fast accurate billion-point nearest neighbor search on a single node.Advances in neural information processing Systems, 32, 2019
work page 2019
-
[20]
When large language models meet vector databases: A survey
Zhi Jing, Yongye Su, Yikun Han, Bo Yuan, Haiyun Xu, Chunjiang Liu, Kehai Chen, and Min Zhang. When large language models meet vector databases: A survey. arXiv preprint arXiv:2402.01763, 2024
-
[21]
Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Hein- rich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock- täschel, Sebastian Riedel, and Douwe Kiela. Retrieval- augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Inform...
work page 2020
-
[22]
A survey on retrieval-augmented text generation
Huayang Li, Yixuan Su, Deng Cai, Yan Wang, and Lemao Liu. A survey on retrieval-augmented text gen- eration.CoRR, abs/2202.01110, 2022
-
[23]
The de- sign and implementation of a real time visual search system on JD e-commerce platform
Jie Li, Haifeng Liu, Chuanghua Gui, Jianyu Chen, Zhenyuan Ni, Ning Wang, and Yuan Chen. The de- sign and implementation of a real time visual search system on JD e-commerce platform. InProceedings of the 19th International Middleware Conference, Middle- ware Industrial Track 2018, Rennes, France, December 10-14, 2018, pages 9–16. ACM, 2018. 4
work page 2018
-
[24]
Nan Li, Bo Kang, and Tijl De Bie. Skillgpt: a restful API service for skill extraction and standardization using a large language model.CoRR, abs/2304.11060, 2023
-
[25]
Wolverine: Highly efficient monotonic search path repair for graph- based ann index updates
Dawei Liu, Bolong Zheng, Ziyang Yue, Fuhao Ruan, Xiaofang Zhou, and Christian S Jensen. Wolverine: Highly efficient monotonic search path repair for graph- based ann index updates
-
[26]
Approximate nearest neigh- bor algorithm based on navigable small world graphs
Yury Malkov, Alexander Ponomarenko, Andrey Logvi- nov, and Vladimir Krylov. Approximate nearest neigh- bor algorithm based on navigable small world graphs. Inf. Syst., 45:61–68, 2014
work page 2014
-
[27]
Yury A. Malkov and Dmitry A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Trans. Pattern Anal. Mach. Intell., 42(4):824–836, 2020
work page 2020
-
[28]
PMD: an optimal transportation-based user distance for recommender systems
Yitong Meng, Xinyan Dai, Xiao Yan, James Cheng, Wei- wen Liu, Jun Guo, Benben Liao, and Guangyong Chen. PMD: an optimal transportation-based user distance for recommender systems. InAdvances in Information Re- trieval - 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14-17, 2020, Pro- ceedings, Part II, volume 12036 ofLecture N...
work page 2020
-
[29]
Ilyas, Umar Farooq Min- has, Jeffrey Pound, and Theodoros Rekatsinas
Jason Mohoney, Anil Pacaci, Shihabur Rahman Chowd- hury, Ali Mousavi, Ihab F. Ilyas, Umar Farooq Min- has, Jeffrey Pound, and Theodoros Rekatsinas. High- throughput vector similarity search in knowledge graphs. Proc. ACM Manag. Data, 1(2):197:1–197:25, 2023
work page 2023
-
[30]
Embedding-based news recommenda- tion for millions of users
Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. Embedding-based news recommenda- tion for millions of users. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, Au- gust 13 - 17, 2017, pages 1933–1942. ACM, 2017
work page 2017
-
[31]
Sajal Regmi and Chetan Phakami Pun. GPT semantic cache: Reducing LLM costs and latency via semantic embedding caching.CoRR, abs/2411.05276, 2024
-
[32]
Colbertv2: Ef- fective and efficient retrieval via lightweight late inter- action
Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. Colbertv2: Ef- fective and efficient retrieval via lightweight late inter- action. InProceedings of the 2022 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States,...
work page 2022
-
[33]
Badrul Munir Sarwar, George Karypis, Joseph A. Kon- stan, and John Riedl. Item-based collaborative filter- ing recommendation algorithms. InProceedings of the Tenth International World Wide Web Conference, WWW 10, Hong Kong, China, May 1-5, 2001, pages 285–295. ACM, 2001
work page 2001
-
[34]
Adaptive semantic prompt caching with VectorQ.arXiv preprint arXiv:2502.03771, 2025
Luis Gaspar Schroeder, Shu Liu, Alejandro Cuadron, Mark Zhao, Stephan Krusche, Alfons Kemper, Matei Zaharia, and Joseph E. Gonzalez. Adaptive semantic prompt caching with vectorq.CoRR, abs/2502.03771, 2025
-
[35]
Aditi Singh, Suhas Jayaram Subramanya, Ravis- hankar Krishnaswamy, and Harsha Vardhan Simhadri. Freshdiskann: A fast and accurate graph-based ANN index for streaming similarity search.CoRR, abs/2105.09613, 2021
-
[36]
Vexless: A serverless vector data management system us- ing cloud functions.Proc
Yongye Su, Yinqi Sun, Minjia Zhang, and Jianguo Wang. Vexless: A serverless vector data management system us- ing cloud functions.Proc. ACM Manag. Data, 2(3):187, 2024
work page 2024
-
[37]
Milvus: A purpose-built vector data management system
Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xiangyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, Kun Yu, Yuxing Yuan, Yinghao Zou, Jiquan Long, Yudong Cai, Zhenxiang Li, Zhifeng Zhang, Yihua Mo, Jun Gu, Ruiyi Jiang, Yi Wei, and Charles Xie. Milvus: A purpose-built vector data management system. InSIGMOD ’21: In- ternational Co...
work page 2021
-
[38]
Mengzhao Wang, Weizhi Xu, Xiaomeng Yi, Songlin Wu, Zhangyang Peng, Xiangyu Ke, Yunjun Gao, Xiaoliang Xu, Rentong Guo, and Charles Xie. Starling: An i/o- efficient disk-resident graph index framework for high- dimensional vector similarity search on data segment. CoRR, abs/2401.02116, 2024
-
[39]
Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and Yuanzhe Cai. Analyticdb- v: A hybrid analytical engine towards query fusion for structured and unstructured data.Proc. VLDB Endow., 13(12):3152–3165, 2020
work page 2020
-
[40]
Kyle Williams, Lichi Li, Madian Khabsa, Jian Wu, Patrick C. Shih, and C. Lee Giles. A web service for scholarly big data information extraction. In2014 IEEE International Conference on Web Services, ICWS, 2014, Anchorage, AK, USA, June 27 - July 2, 2014, pages 105–
work page 2014
-
[41]
IEEE Computer Society, 2014
work page 2014
-
[42]
In-place updates of a graph in- dex for streaming approximate nearest neighbor search
Haike Xu, Magdalen Dobson Manohar, Philip A. Bern- stein, Badrish Chandramouli, Richard Wen, and Har- sha Vardhan Simhadri. In-place updates of a graph in- dex for streaming approximate nearest neighbor search. CoRR, abs/2502.13826, 2025. 5
-
[43]
Spfresh: Incremental in-place update for billion-scale vector search
Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, and Mao Yang. Spfresh: Incremental in-place update for billion-scale vector search. InPro- ceedings of the 29th Symposium on Operating Systems Principles, SOSP 2023, Koblenz, Germany, October 23- 26, 2023, pages 545–561. ACM, 2023
work page 2023
-
[44]
PASE: postgresql ultra-high-dimensional approximate nearest neighbor search extension
Wen Yang, Tao Li, Gai Fang, and Hong Wei. PASE: postgresql ultra-high-dimensional approximate nearest neighbor search extension. InProceedings of the 2020 International Conference on Management of Data, SIG- MOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, pages 2241–2253. ACM, 2020
work page 2020
-
[45]
VELO: A vector database-assisted cloud-edge collaborative LLM qos optimization framework
Zhi Yao, Zhiqing Tang, Jiong Lou, Ping Shen, and Wei- jia Jia. VELO: A vector database-assisted cloud-edge collaborative LLM qos optimization framework. In IEEE International Conference on Web Services, ICWS 2024, Shenzhen, China, July 7-13, 2024, pages 865–876. IEEE, 2024
work page 2024
-
[46]
Gpu-accelerated proximity graph approximate nearest neighbor search and construc- tion
Yuanhang Yu, Dong Wen, Ying Zhang, Lu Qin, Wen- jie Zhang, and Xuemin Lin. Gpu-accelerated proximity graph approximate nearest neighbor search and construc- tion. In2022 IEEE 38th International Conference on Data Engineering (ICDE), pages 552–564. IEEE, 2022
work page 2022
-
[47]
VBASE: unifying online vector similarity search and relational queries via relaxed monotonicity
Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Ji- adong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, Mao Yang, and Lidong Zhou. VBASE: unifying online vector similarity search and relational queries via relaxed monotonicity. In17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023, Boston, MA, USA, July 10- 12...
work page 2023
-
[48]
Song: Approxi- mate nearest neighbor search on gpu
Weijie Zhao, Shulong Tan, and Ping Li. Song: Approxi- mate nearest neighbor search on gpu. In2020 IEEE 36th International Conference on Data Engineering (ICDE), pages 1033–1044. IEEE, 2020. 6
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.