EMA: Approximate Nearest Neighbor Search with General Attribute Filtering and Dynamic Updates

Baotong Lu; Chenhao Ma; James Cheng; Mocheng Li

arxiv: 2606.00734 · v1 · pith:HGCOASDDnew · submitted 2026-05-30 · 💻 cs.DB

EMA: Approximate Nearest Neighbor Search with General Attribute Filtering and Dynamic Updates

Mocheng Li , Baotong Lu , James Cheng , Chenhao Ma This is my paper

Pith reviewed 2026-06-28 17:56 UTC · model grok-4.3

classification 💻 cs.DB

keywords filtering approximate nearest neighborvector databaseattribute filteringdynamic updatesgraph indexpredicate queriesmulti-predicate search

0 comments

The pith

EMA attaches compact markers to graph edges to enable fast general attribute filtering in approximate nearest neighbor search with dynamic updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EMA, a filtering ANN algorithm for multi-predicate queries over mixed numerical and categorical attributes in vector databases. It attaches markers as compact summaries to graph edges to give conservative predicate- and geometric-aware guidance that guarantees zero false negatives at the marker level. Query processing uses marker-augmented joint search plus bounded edge recovery to filter results while keeping the graph navigable, and the design also supports efficient dynamic updates. A sympathetic reader would care because the method targets practical workloads in recommendation systems, retrieval-augmented generation, and agent memory where prior approaches either limit predicate types or incur high overhead.

Core claim

EMA achieves 1.68x--12.25x speedup over state-of-the-art general filtering ANN methods across diverse workloads by introducing Markers as compact summaries attached to graph edges that provide conservative predicate- and geometric-aware guidance with zero false negatives at the Marker level, then performing Marker-augmented joint search with a bounded edge recovery mechanism that enables efficient filtering while preserving graph navigability and supporting dynamic updates.

What carries the argument

Markers attached to graph edges, compact summaries that supply conservative predicate- and geometric-aware guidance with zero false negatives at the marker level.

If this is right

EMA supports multi-predicate queries over mixed numerical and categorical attributes.
It enables efficient dynamic updates to the index.
It delivers 1.68x--12.25x speedup over prior general filtering ANN methods on diverse workloads.
Marker-augmented joint search with bounded recovery preserves graph navigability during filtering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Vector databases could support richer real-time filtering for recommendation and retrieval-augmented generation without separate index structures.
The marker technique might apply to other proximity-graph indexes that currently lack general predicate support.
Lower construction and memory costs could make filtered ANN viable at larger scales than current predicate-agnostic methods allow.

Load-bearing premise

Markers attached to graph edges provide conservative predicate- and geometric-aware guidance with zero false negatives at the Marker level, enabling efficient filtering while preserving graph navigability.

What would settle it

An experiment on a workload where marker-guided search returns a different result set from exhaustive filtered search, or where measured speedups fall below the claimed range across the tested workloads.

Figures

Figures reproduced from arXiv: 2606.00734 by Baotong Lu, Chenhao Ma, James Cheng, Mocheng Li.

**Figure 2.** Figure 2: EMA workflow. Codebook Marker [0,3) [3,6) [6,9) {a,d} {b,e} {c,f} 1 0 0 1 0 1 Vector w 2 {a,c} Vector x 5 {d} Vector v 6 {c} 1 0 1 1 0 1 Gather Gather [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of Codebook-based Marker encoding [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: QPS for label+range multi-predicate query at 10%-100% selectivity with 95% recall@10. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: QPS for label+range multi-predicate query at 1%-10% selectivity with 95% recall@10. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: QPS for composed multi-predicate queries at 1%–10% selectivity with 95% recall@10. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: QPS for range query at 95% recall@10 [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 9.** Figure 9: QPS of EMA under different update operations at 95% recall@10. [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: QPS in OCQ at 95% recall@10 [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

read the original abstract

Filtering Approximate Nearest Neighbor (FANN) search is a critical and emerging task for strengthening the query capability of vector databases, supporting applications such as recommendation systems, retrieval-augmented generation (RAG), and agent memory. However, most existing methods are limited to range or label filtering, often incurring unacceptable index construction time and memory overhead. Predicate-agnostic approaches further struggle to handle a wide range of predicate selectivities effectively. In this paper, we propose EMA, a filtering ANN algorithm that supports multi-predicate queries over mixed numerical and categorical attributes, and efficient dynamic updates. EMA introduces Markers as compact summaries attached to graph edges, providing conservative predicate- and geometric-aware guidance with zero false negatives at the Marker level. During query processing, EMA performs Marker-augmented joint search with a bounded edge recovery mechanism, enabling efficient filtering while preserving graph navigability. Extensive experiments demonstrate that EMA achieves 1.68x--12.25x speedup over state-of-the-art general filtering ANN methods across diverse workloads.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EMA introduces Markers on graph edges for general multi-predicate filtering ANN plus dynamic updates, and the speedups look worth verifying in the full experiments.

read the letter

EMA's main advance is a graph-based ANN method that handles multi-predicate queries over mixed numerical and categorical attributes while also supporting dynamic updates. It attaches compact Markers to edges as conservative summaries that are both predicate-aware and geometry-aware, guaranteeing zero false negatives at the Marker level. Queries then run a joint search with bounded edge recovery to filter results without losing the graph's ability to navigate.

This directly targets gaps in earlier work that was stuck on range or label filters or that handled predicates poorly across different selectivities. The bounded recovery step is a sensible way to keep the index usable after filtering.

The paper does a clean job laying out those prior limitations and showing how Markers plus recovery address them. The reported 1.68x–12.25x speedups over existing general filtering methods, if the workloads and baselines are representative, would be a practical gain for vector database workloads like RAG and recommendations.

The soft spots are in the experimental presentation. The abstract gives the speedup range but no information on the exact baselines, workload characteristics, error bars, or how the Markers are constructed and what overhead they add. The zero-false-negative property at the Marker level and the precise bound on edge recovery also need the full construction algorithm and any supporting arguments to be checked. Those details matter because the whole claim rests on the Markers being cheap to build and the recovery keeping navigability intact.

No load-bearing contradictions jump out from the description. The approach stays within standard graph ANN territory but adds a targeted mechanism.

This is for engineers and researchers working on filtered vector search inside databases. Anyone building or tuning ANN indexes for mixed-attribute queries would get concrete value from the Marker idea and the update support.

It deserves a serious referee to examine the full experiments, Marker construction, and any proofs around the false-negative guarantee.

Referee Report

1 major / 0 minor

Summary. The paper proposes EMA, a filtering ANN algorithm supporting multi-predicate queries over mixed numerical and categorical attributes together with dynamic updates. It introduces Markers as compact summaries attached to graph edges that provide conservative predicate- and geometric-aware guidance with zero false negatives at the Marker level, combined with Marker-augmented joint search and bounded edge recovery to preserve navigability. The central claim is that EMA delivers 1.68x--12.25x speedup over state-of-the-art general filtering ANN methods across diverse workloads.

Significance. If the Marker construction and bounded-recovery mechanism can be shown to deliver the claimed zero-false-negative property while maintaining graph navigability, and if the reported speedups are reproducible, the work would meaningfully advance predicate-aware ANN search in vector databases for RAG, recommendation, and agent-memory workloads.

major comments (1)

Abstract: the claim of 1.68x--12.25x speedup is presented without any description of baselines, workloads, error bars, or exclusion criteria, preventing assessment of whether the data supports the central performance claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment. We address it point-by-point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Abstract: the claim of 1.68x--12.25x speedup is presented without any description of baselines, workloads, error bars, or exclusion criteria, preventing assessment of whether the data supports the central performance claim.

Authors: We agree that the abstract, as a concise summary, omits the specific experimental details needed for immediate assessment. The baselines are the state-of-the-art general filtering ANN methods (detailed in Section 2 and compared in Section 5), the workloads cover diverse predicate selectivities and attribute mixes on standard vector datasets, and results include error bars from repeated runs with no exclusion of outliers. To improve clarity and allow readers to evaluate the claim without reading the full paper, we will revise the abstract to briefly reference the evaluation setup (e.g., 'over state-of-the-art baselines on diverse workloads with reported error bars'). revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and description contain no equations, derivations, fitted parameters, or mathematical claims that could reduce to inputs by construction. The work describes an algorithmic construction (Markers attached to graph edges for conservative guidance) and reports empirical speedups, with no self-definitional steps, fitted-input predictions, or load-bearing self-citations visible in the text. The central claims rest on experimental validation rather than any closed-form reduction, making the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract only; no explicit free parameters, axioms, or invented entities beyond the high-level description of Markers can be identified.

invented entities (1)

Markers no independent evidence
purpose: compact summaries attached to graph edges for predicate- and geometric-aware guidance
New component introduced to enable filtering without false negatives at the Marker level

pith-pipeline@v0.9.1-grok · 5708 in / 1063 out tokens · 32753 ms · 2026-06-28T17:56:01.574016+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 13 canonical work pages · 5 internal anchors

[1]

Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN- Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems87 (2020), 101374

2020
[2]

Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching.Commun. ACM18, 9 (1975), 509–517

1975
[3]

Angela Bonifati, Wim Martens, and Thomas Timm. 2017. An analytical study of large SPARQL query logs.arXiv preprint arXiv:1708.00363(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[4]

Lawrence Cayton. 2008. Fast nearest neighbor retrieval for bregman divergences. InProceedings of the 25th international conference on Machine learning. 112–119

2008
[5]

Karan Desai, Gaurav Kaul, Zubin Aysola, and Justin Johnson. 2021. Redcaps: Web-curated image-text data created by the people, for the people.arXiv preprint arXiv:2111.11431(2021)

work page arXiv 2021
[6]

Wei Dong. 2011. Kgraph: A library for approximate nearest neighbor search. Retrieved July12 (2011), 2020

2011
[7]

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The faiss library.arXiv preprint arXiv:2401.08281(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2024. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[9]

Joshua Engels, Benjamin Landrum, Shangdi Yu, Laxman Dhulipala, and Julian Shun. 2024. Approximate Nearest Neighbor Search with Window Filters.arXiv preprint arXiv:2402.00943(2024)

work page arXiv 2024
[10]

Mocheng Li et al. 2026. EMA: Appendix. https://github.com/lmccccc/EMA/blob/ main/EMA_appendix.pdf. Supplementary material

2026
[11]

Xiyang Feng, Guodong Jin, Ziyi Chen, Chang Liu, and Semih Salihoğlu. 2023. Kùzu graph database management system. InThe Conference on Innovative Data Systems Research, Vol. 7. 25–35

2023
[12]

Jerome H Friedman, Jon Louis Bentley, and Raphael Ari Finkel. 1977. An algo- rithm for finding best matches in logarithmic expected time.ACM Transactions on Mathematical Software (TOMS)3, 3 (1977), 209–226

1977
[13]

Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng. 2012. Locality-sensitive hashing scheme based on dynamic collision counting. InProceedings of the 2012 ACM SIGMOD international conference on management of data. 541–552

2012
[14]

Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product quantization.IEEE transactions on pattern analysis and machine intelligence36, 4 (2013), 744–755

2013
[15]

Siddharth Gollapudi, Neel Karia, Varun Sivashankar, Ravishankar Krishnaswamy, Nikit Begwani, Swapnil Raz, Yiyong Lin, Yin Zhang, Neelam Mahapatro, Premku- mar Srinivasan, et al. 2023. Filtered-diskann: Graph algorithms for approximate nearest neighbor search with filters. InProceedings of the ACM Web Conference

2023
[16]

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating large-scale inference with anisotropic vector quantization. InInternational Conference on Machine Learning. PMLR, 3887–3896

2020
[17]

Antonin Guttman. 1984. R-trees: A dynamic index structure for spatial searching. InProceedings of the 1984 ACM SIGMOD international conference on Management of data. 47–57

1984
[18]

Ben Harwood and Tom Drummond. 2016. Fanng: Fast approximate nearest neighbour graphs. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5713–5722

2016
[19]

Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, and Wilfred Ng. 2015. Query-aware locality-sensitive hashing for approximate nearest neighbor search. Proceedings of the VLDB Endowment9, 1 (2015), 1–12

2015
[20]

Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. InProceedings of the thirtieth annual ACM symposium on Theory of computing. 604–613

1998
[21]

Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. Diskann: Fast accurate billion-point nearest neighbor search on a single node.Advances in neural information pro- cessing Systems32 (2019)

2019
[22]

Hervé Jégou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. 2011. Searching in one billion vectors: re-rank with source coding. In2011 IEEE Inter- national Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 861–864

2011
[23]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs.IEEE Transactions on Big Data7, 3 (2019), 535–547

2019
[24]

Andrew Kane. n.d.. PGVector. https://github.com/pgvector/pgvector. 13
[25]

1968.Regionalization, Theory and alternative algorithms

Philip M Lankford. 1968.Regionalization, Theory and alternative algorithms. Ph.D. Dissertation. The University of Chicago

1968
[26]

Anqi Liang, Pengcheng Zhang, Bin Yao, Zhongpu Chen, Yitong Song, and Guangxu Cheng. 2024. UNIFY: Unified Index for Range Filtered Approximate Nearest Neighbors Search.arXiv preprint arXiv:2412.02448(2024)

work page arXiv 2024
[27]

Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov
[28]

Approximate nearest neighbor algorithm based on navigable small world graphs.Information Systems45 (2014), 61–68

2014
[29]

Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE transactions on pattern analysis and machine intelligence42, 4 (2018), 824–836

2018
[30]

Yusuke Matsui, Ryota Hinami, and Shin’ichi Satoh. 2018. Reconfigurable Inverted Index. InProceedings of the 26th ACM international conference on Multimedia. 1715–1723

2018
[31]

Yitong Meng, Xinyan Dai, Xiao Yan, James Cheng, Weiwen Liu, Jun Guo, Benben Liao, and Guangyong Chen. 2020. Pmd: An optimal transportation-based user distance for recommender systems. InAdvances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II 42. Springer, 272–280

2020
[32]

Thomas Neumann and Guido Moerkotte. 2011. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In2011 IEEE 27th International Conference on Data Engineering. IEEE, 984–994

2011
[33]

Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. 2017. Embedding-based news recommendation for millions of users. InProceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1933–1942

2017
[34]

Liana Patel, Siddharth Jha, Carlos Guestrin, and Matei Zaharia. 2024. Lotus: Enabling semantic queries with llms over tables of unstructured and structured data.arXiv e-prints(2024), arXiv–2407

2024
[35]

Liana Patel, Peter Kraft, Carlos Guestrin, and Matei Zaharia. 2024. Acorn: Per- formant and predicate-agnostic search over vector embeddings and structured data.Proceedings of the ACM on Management of Data2, 3 (2024), 1–27

2024
[36]

Arkadiusz Paterek. 2007. Improving regularized singular value decomposition for collaborative filtering. InProceedings of KDD cup and workshop, Vol. 2007. 5–8

2007
[37]

Zhencan Peng, Miao Qiao, Wenchao Zhou, Feifei Li, and Dong Deng. [n.d.]. Dynamic Range-Filtering Approximate Nearest Neighbor Search. ([n. d.])
[38]

Zhencan Peng, Miao Qiao, Wenchao Zhou, Feifei Li, and Dong Deng. 2025. Dynamic Range-Filtering Approximate Nearest Neighbor Search.Proceedings of the VLDB Endowment18, 10 (2025), 3256–3268

2025
[39]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand- hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al
[40]

In International conference on machine learning

Learning transferable visual models from natural language supervision. In International conference on machine learning. PmLR, 8748–8763
[41]

Gaurav Sehgal and Semih Salihoglu. 2025. NaviX: A Native Vector Index Design for Graph DBMSs With Robust Predicate-Agnostic Search Performance.arXiv preprint arXiv:2506.23397(2025)

work page arXiv 2025
[42]

Aditi Singh, Suhas Jayaram Subramanya, Ravishankar Krishnaswamy, and Har- sha Vardhan Simhadri. 2021. Freshdiskann: A fast and accurate graph-based ann index for streaming similarity search.arXiv preprint arXiv:2105.09613(2021)

work page arXiv 2021
[43]

Godfried T Toussaint. 1980. The relative neighbourhood graph of a finite planar set.Pattern recognition12, 4 (1980), 261–268

1980
[44]

Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xi- angyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, et al. 2021. Milvus: A purpose-built vector data management system. InProceedings of the 2021 international conference on management of data. 2614–2627

2021
[45]

Mengzhao Wang, Lingwei Lv, Xiaoliang Xu, Yuxiang Wang, Qiang Yue, and Jiongkang Ni. 2024. An efficient and robust framework for approximate near- est neighbor search with attribute constraint.Advances in Neural Information Processing Systems36 (2024)

2024
[46]

Mengzhao Wang, Xiaoliang Xu, Qiang Yue, and Yuxiang Wang. 2021. A com- prehensive survey and experimental comparison of graph-based approximate nearest neighbor search.arXiv preprint arXiv:2101.12631(2021)

work page arXiv 2021
[47]

Shu Wang, Yixiang Fang, Yingli Zhou, Xilin Liu, and Yuchi Ma. 2025. ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation. arXiv preprint arXiv:2502.09891(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[48]

Weaviate. 2024. Weaviate. https://weaviate.io/

2024
[49]

Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and Yuanzhe Cai. 2020. AnalyticDB-V: a hybrid analytical engine towards query fusion for structured and unstructured data.Proceedings of the VLDB Endowment 13, 12 (2020), 3152–3165

2020
[50]

Yuexuan Xu, Jianyang Gao, Yutong Gou, Cheng Long, and Christian S Jensen
[51]

irangegraph: Improvising range-dedicated graphs for range-filtering near- est neighbor search.Proceedings of the ACM on Management of Data2, 6 (2024), 1–26

2024
[52]

Mingyu Yang, Wentao Li, Zhitao Shen, Chuan Xiao, and Wei Wang. 2025. ESG: Elastic Graphs for Range-Filtering Approximate k-Nearest Neighbor Search. arXiv preprint arXiv:2504.04018(2025)

work page arXiv 2025
[53]

Wen Yang, Tao Li, Gai Fang, and Hong Wei. 2020. Pase: Postgresql ultra-high- dimensional approximate nearest neighbor search extension. InProceedings of the 2020 ACM SIGMOD international conference on management of data. 2241–2253

2020
[54]

Peter Yianilos. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. (1993)

1993
[55]

Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Jiadong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, et al . 2023. {VBASE}: Unifying online vector similarity search and relational queries via relaxed monotonicity. In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23). 377–395

2023
[56]

Weijie Zhao, Shulong Tan, and Ping Li. 2022. Constrained approximate similarity search on proximity graph.arXiv preprint arXiv:2210.14958(2022)

work page arXiv 2022
[57]

Yingli Zhou, Yaodong Su, Youran Sun, Shu Wang, Taotao Wang, Runyuan He, Yongwei Zhang, Sicong Liang, Xilin Liu, Yuchi Ma, et al. 2025. In-depth Analysis of Graph-based RAG in a Unified Framework.arXiv preprint arXiv:2503.04338 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[58]

Chaoji Zuo, Miao Qiao, Wenchao Zhou, Feifei Li, and Dong Deng. 2024. SeRF: seg- ment graph for range-filtering approximate nearest neighbor search.Proceedings of the ACM on Management of Data2, 1 (2024), 1–26. 14

2024

[1] [1]

Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN- Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems87 (2020), 101374

2020

[2] [2]

Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching.Commun. ACM18, 9 (1975), 509–517

1975

[3] [3]

Angela Bonifati, Wim Martens, and Thomas Timm. 2017. An analytical study of large SPARQL query logs.arXiv preprint arXiv:1708.00363(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[4] [4]

Lawrence Cayton. 2008. Fast nearest neighbor retrieval for bregman divergences. InProceedings of the 25th international conference on Machine learning. 112–119

2008

[5] [5]

Karan Desai, Gaurav Kaul, Zubin Aysola, and Justin Johnson. 2021. Redcaps: Web-curated image-text data created by the people, for the people.arXiv preprint arXiv:2111.11431(2021)

work page arXiv 2021

[6] [6]

Wei Dong. 2011. Kgraph: A library for approximate nearest neighbor search. Retrieved July12 (2011), 2020

2011

[7] [7]

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The faiss library.arXiv preprint arXiv:2401.08281(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[8] [8]

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2024. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[9] [9]

Joshua Engels, Benjamin Landrum, Shangdi Yu, Laxman Dhulipala, and Julian Shun. 2024. Approximate Nearest Neighbor Search with Window Filters.arXiv preprint arXiv:2402.00943(2024)

work page arXiv 2024

[10] [10]

Mocheng Li et al. 2026. EMA: Appendix. https://github.com/lmccccc/EMA/blob/ main/EMA_appendix.pdf. Supplementary material

2026

[11] [11]

Xiyang Feng, Guodong Jin, Ziyi Chen, Chang Liu, and Semih Salihoğlu. 2023. Kùzu graph database management system. InThe Conference on Innovative Data Systems Research, Vol. 7. 25–35

2023

[12] [12]

Jerome H Friedman, Jon Louis Bentley, and Raphael Ari Finkel. 1977. An algo- rithm for finding best matches in logarithmic expected time.ACM Transactions on Mathematical Software (TOMS)3, 3 (1977), 209–226

1977

[13] [13]

Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng. 2012. Locality-sensitive hashing scheme based on dynamic collision counting. InProceedings of the 2012 ACM SIGMOD international conference on management of data. 541–552

2012

[14] [14]

Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product quantization.IEEE transactions on pattern analysis and machine intelligence36, 4 (2013), 744–755

2013

[15] [15]

Siddharth Gollapudi, Neel Karia, Varun Sivashankar, Ravishankar Krishnaswamy, Nikit Begwani, Swapnil Raz, Yiyong Lin, Yin Zhang, Neelam Mahapatro, Premku- mar Srinivasan, et al. 2023. Filtered-diskann: Graph algorithms for approximate nearest neighbor search with filters. InProceedings of the ACM Web Conference

2023

[16] [16]

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating large-scale inference with anisotropic vector quantization. InInternational Conference on Machine Learning. PMLR, 3887–3896

2020

[17] [17]

Antonin Guttman. 1984. R-trees: A dynamic index structure for spatial searching. InProceedings of the 1984 ACM SIGMOD international conference on Management of data. 47–57

1984

[18] [18]

Ben Harwood and Tom Drummond. 2016. Fanng: Fast approximate nearest neighbour graphs. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5713–5722

2016

[19] [19]

Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, and Wilfred Ng. 2015. Query-aware locality-sensitive hashing for approximate nearest neighbor search. Proceedings of the VLDB Endowment9, 1 (2015), 1–12

2015

[20] [20]

Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. InProceedings of the thirtieth annual ACM symposium on Theory of computing. 604–613

1998

[21] [21]

Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. Diskann: Fast accurate billion-point nearest neighbor search on a single node.Advances in neural information pro- cessing Systems32 (2019)

2019

[22] [22]

Hervé Jégou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. 2011. Searching in one billion vectors: re-rank with source coding. In2011 IEEE Inter- national Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 861–864

2011

[23] [23]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs.IEEE Transactions on Big Data7, 3 (2019), 535–547

2019

[24] [24]

Andrew Kane. n.d.. PGVector. https://github.com/pgvector/pgvector. 13

[25] [25]

1968.Regionalization, Theory and alternative algorithms

Philip M Lankford. 1968.Regionalization, Theory and alternative algorithms. Ph.D. Dissertation. The University of Chicago

1968

[26] [26]

Anqi Liang, Pengcheng Zhang, Bin Yao, Zhongpu Chen, Yitong Song, and Guangxu Cheng. 2024. UNIFY: Unified Index for Range Filtered Approximate Nearest Neighbors Search.arXiv preprint arXiv:2412.02448(2024)

work page arXiv 2024

[27] [27]

Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov

[28] [28]

Approximate nearest neighbor algorithm based on navigable small world graphs.Information Systems45 (2014), 61–68

2014

[29] [29]

Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE transactions on pattern analysis and machine intelligence42, 4 (2018), 824–836

2018

[30] [30]

Yusuke Matsui, Ryota Hinami, and Shin’ichi Satoh. 2018. Reconfigurable Inverted Index. InProceedings of the 26th ACM international conference on Multimedia. 1715–1723

2018

[31] [31]

Yitong Meng, Xinyan Dai, Xiao Yan, James Cheng, Weiwen Liu, Jun Guo, Benben Liao, and Guangyong Chen. 2020. Pmd: An optimal transportation-based user distance for recommender systems. InAdvances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II 42. Springer, 272–280

2020

[32] [32]

Thomas Neumann and Guido Moerkotte. 2011. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In2011 IEEE 27th International Conference on Data Engineering. IEEE, 984–994

2011

[33] [33]

Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. 2017. Embedding-based news recommendation for millions of users. InProceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1933–1942

2017

[34] [34]

Liana Patel, Siddharth Jha, Carlos Guestrin, and Matei Zaharia. 2024. Lotus: Enabling semantic queries with llms over tables of unstructured and structured data.arXiv e-prints(2024), arXiv–2407

2024

[35] [35]

Liana Patel, Peter Kraft, Carlos Guestrin, and Matei Zaharia. 2024. Acorn: Per- formant and predicate-agnostic search over vector embeddings and structured data.Proceedings of the ACM on Management of Data2, 3 (2024), 1–27

2024

[36] [36]

Arkadiusz Paterek. 2007. Improving regularized singular value decomposition for collaborative filtering. InProceedings of KDD cup and workshop, Vol. 2007. 5–8

2007

[37] [37]

Zhencan Peng, Miao Qiao, Wenchao Zhou, Feifei Li, and Dong Deng. [n.d.]. Dynamic Range-Filtering Approximate Nearest Neighbor Search. ([n. d.])

[38] [38]

Zhencan Peng, Miao Qiao, Wenchao Zhou, Feifei Li, and Dong Deng. 2025. Dynamic Range-Filtering Approximate Nearest Neighbor Search.Proceedings of the VLDB Endowment18, 10 (2025), 3256–3268

2025

[39] [39]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand- hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al

[40] [40]

In International conference on machine learning

Learning transferable visual models from natural language supervision. In International conference on machine learning. PmLR, 8748–8763

[41] [41]

Gaurav Sehgal and Semih Salihoglu. 2025. NaviX: A Native Vector Index Design for Graph DBMSs With Robust Predicate-Agnostic Search Performance.arXiv preprint arXiv:2506.23397(2025)

work page arXiv 2025

[42] [42]

Aditi Singh, Suhas Jayaram Subramanya, Ravishankar Krishnaswamy, and Har- sha Vardhan Simhadri. 2021. Freshdiskann: A fast and accurate graph-based ann index for streaming similarity search.arXiv preprint arXiv:2105.09613(2021)

work page arXiv 2021

[43] [43]

Godfried T Toussaint. 1980. The relative neighbourhood graph of a finite planar set.Pattern recognition12, 4 (1980), 261–268

1980

[44] [44]

Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xi- angyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, et al. 2021. Milvus: A purpose-built vector data management system. InProceedings of the 2021 international conference on management of data. 2614–2627

2021

[45] [45]

Mengzhao Wang, Lingwei Lv, Xiaoliang Xu, Yuxiang Wang, Qiang Yue, and Jiongkang Ni. 2024. An efficient and robust framework for approximate near- est neighbor search with attribute constraint.Advances in Neural Information Processing Systems36 (2024)

2024

[46] [46]

Mengzhao Wang, Xiaoliang Xu, Qiang Yue, and Yuxiang Wang. 2021. A com- prehensive survey and experimental comparison of graph-based approximate nearest neighbor search.arXiv preprint arXiv:2101.12631(2021)

work page arXiv 2021

[47] [47]

Shu Wang, Yixiang Fang, Yingli Zhou, Xilin Liu, and Yuchi Ma. 2025. ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation. arXiv preprint arXiv:2502.09891(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[48] [48]

Weaviate. 2024. Weaviate. https://weaviate.io/

2024

[49] [49]

Chuangxian Wei, Bin Wu, Sheng Wang, Renjie Lou, Chaoqun Zhan, Feifei Li, and Yuanzhe Cai. 2020. AnalyticDB-V: a hybrid analytical engine towards query fusion for structured and unstructured data.Proceedings of the VLDB Endowment 13, 12 (2020), 3152–3165

2020

[50] [50]

Yuexuan Xu, Jianyang Gao, Yutong Gou, Cheng Long, and Christian S Jensen

[51] [51]

irangegraph: Improvising range-dedicated graphs for range-filtering near- est neighbor search.Proceedings of the ACM on Management of Data2, 6 (2024), 1–26

2024

[52] [52]

Mingyu Yang, Wentao Li, Zhitao Shen, Chuan Xiao, and Wei Wang. 2025. ESG: Elastic Graphs for Range-Filtering Approximate k-Nearest Neighbor Search. arXiv preprint arXiv:2504.04018(2025)

work page arXiv 2025

[53] [53]

Wen Yang, Tao Li, Gai Fang, and Hong Wei. 2020. Pase: Postgresql ultra-high- dimensional approximate nearest neighbor search extension. InProceedings of the 2020 ACM SIGMOD international conference on management of data. 2241–2253

2020

[54] [54]

Peter Yianilos. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. (1993)

1993

[55] [55]

Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Jiadong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, et al . 2023. {VBASE}: Unifying online vector similarity search and relational queries via relaxed monotonicity. In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23). 377–395

2023

[56] [56]

Weijie Zhao, Shulong Tan, and Ping Li. 2022. Constrained approximate similarity search on proximity graph.arXiv preprint arXiv:2210.14958(2022)

work page arXiv 2022

[57] [57]

Yingli Zhou, Yaodong Su, Youran Sun, Shu Wang, Taotao Wang, Runyuan He, Yongwei Zhang, Sicong Liang, Xilin Liu, Yuchi Ma, et al. 2025. In-depth Analysis of Graph-based RAG in a Unified Framework.arXiv preprint arXiv:2503.04338 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[58] [58]

Chaoji Zuo, Miao Qiao, Wenchao Zhou, Feifei Li, and Dong Deng. 2024. SeRF: seg- ment graph for range-filtering approximate nearest neighbor search.Proceedings of the ACM on Management of Data2, 1 (2024), 1–26. 14

2024