pith. machine review for the scientific record. sign in

arxiv: 2604.22171 · v1 · submitted 2026-04-24 · 💻 cs.DB

Recognition: unknown

MCI: A Maximal Clique Index for Efficient Arbitrary-Filtered Approximate Nearest Neighbor Search

Daiyin Wang, Guoren Wang, Kaiwen Xue, Rong-Hua Li, Xiaowei Ye, Xubin Li

Authors on Pith no claims yet

Pith reviewed 2026-05-08 09:20 UTC · model grok-4.3

classification 💻 cs.DB
keywords approximate nearest neighbor searchmaximal cliquegraph-based indexarbitrary filteringhigh-dimensional dataclique coverfiltered ANNSnearest neighbor graph
0
0 comments X

The pith

The Maximal Clique Index uses clique covers to compress neighbor graphs for efficient arbitrary-filtered approximate nearest neighbor search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish a new graph index called the Maximal Clique Index for approximate nearest neighbor search when arbitrary filters are applied. Current methods for this task use too much storage and time because they do not compress the underlying neighbor connections well. By turning dense neighborhoods into maximal cliques that take advantage of how points relate geometrically in high dimensions, the index becomes much smaller yet still connects the right points for search. Queries use a multi-seed approach to handle how filters break the graph into pieces. If successful, this would let applications perform fast similarity searches combined with any kind of condition, like price ranges or text keywords, on large datasets.

Core claim

The central claim is that approximating a dense Nearest Neighbor Graph with a compact representation based on maximal cliques allows for robust and efficient AFANNS. The Maximal Clique Cover exploits geometric transitivity to encode dense neighborhoods, while Local Neighborhood Graph Geometric Densification builds the structure from a sparse starting point by increasing distance thresholds locally. The index is constructed in parallel and queried with a multi-seed strategy to manage fragmented subgraphs from predicates, resulting in superior query performance and reduced space.

What carries the argument

Maximal Clique Cover, which groups dense neighborhoods into maximal cliques to achieve high compression and connectivity in the index for filtered searches.

If this is right

  • The index supports arbitrary filtering predicates in a general way without needing separate indexes for different filter types.
  • Query throughput increases by up to an order of magnitude at high recall levels while memory footprint decreases.
  • The construction scales well due to its lock-free parallel design.
  • The multi-seed query strategy effectively deals with predicate-induced fragmentation of the graph.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This clique-based compression could be tested on other high-dimensional similarity search tasks beyond nearest neighbors.
  • Adapting the densification technique might help in scenarios with evolving data where graphs need periodic rebuilding.
  • The assumption of geometric transitivity suggests potential use in domains like image retrieval where point distributions follow similar patterns.

Load-bearing premise

The geometric transitivity property of high-dimensional spaces allows dense neighborhoods to be encoded as maximal cliques without major loss in approximation quality for filtered searches.

What would settle it

Compare the recall achieved by the MCI index versus a standard nearest neighbor graph index on a dataset of randomly generated high-dimensional vectors that lack clustered structure, to see if the clique method still maintains its advantages.

Figures

Figures reproduced from arXiv: 2604.22171 by Daiyin Wang, Guoren Wang, Kaiwen Xue, Rong-Hua Li, Xiaowei Ye, Xubin Li.

Figure 1
Figure 1. Figure 1: Illustration of 𝑘-NNG, ACORN index with two-hop pruning (𝑀𝛽 = 1), and our maximal clique index. and robust AFANNS. Below, we first present the key design princi￾ples of MCI, then detail its construction algorithm and analyze its complexity. 3.1 Key ideas of MCI Traditional proximity graphs optimize for unfiltered search by con￾trolling node degree and neighbor distribution. A common tech￾nique is Relative … view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of handling two special cases. view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the MCI construction algorithm with 𝑛 = 9, 𝑘′ = 4, 𝜏 = 3 Algorithm 1: The MCI Construction Algorithm Input: The set of vectors 𝑉 = {𝑣1, 𝑣2, ..., 𝑣𝑛 }, three integers 𝑘 ′ , 𝜏, 𝛼𝑚𝑎𝑥 Output: The maximal clique index M 1 Construct a 𝑘 ′ -NNG with a small 𝑘 ′ by invoking NN-Descent [11]; 2 𝐼 ← ∅; M ← { }; 3 Initialize 𝛼 ← 1.2; 4 while |𝐼 | < 𝑛 do 5 for 𝑖 ∈ [1, 𝑛] s.t. 𝑣𝑖 ∉ 𝐼 do 6 M′ , 𝐼′ ← mineC… view at source ↗
Figure 4
Figure 4. Figure 4: Recall@10 vs QPS of different methods under mixed selectivity view at source ↗
Figure 5
Figure 5. Figure 5: Index size of different methods [32, 45, 59]. Detailed characteristics of these datasets are summa￾rized in view at source ↗
Figure 6
Figure 6. Figure 6: Index construction time of different methods view at source ↗
Figure 7
Figure 7. Figure 7: Recall@10 vs QPS of different methods under various selectivity view at source ↗
Figure 8
Figure 8. Figure 8: Results of different methods on range filtering view at source ↗
Figure 12
Figure 12. Figure 12: Recall vs QPS of various methods for 𝑘 = 50 0.01 0.1 0.2 0.5 1.0 95 96 97 98 99100 recall@10 10 QPS 1 (a) glove2.2m, s=0.5 95 96 97 98 99100 recall@10 10 2 QPS (b) glove2.2m, s=0.05 95 96 97 98 99100 recall@10 10 2 3 × 10 1 4 × 10 1 6 × 10 1 QPS (c) glove2.2m, s=0.005 95 96 97 98 99100 recall@10 10 2 10 3 QPS (d) tripclick, s=0.5 9596979899100 recall@10 10 2 10 3 10 4 QPS (e) tripclick, s=0.05 95 96 97 98… view at source ↗
Figure 13
Figure 13. Figure 13: Results of our method with varying 𝜖 built upon the Faiss and Milvus libraries, which internally manage index traversal and distance calculations. Consistent with the trends observed in prior results (e.g., Fig￾ure 4), ACORN demonstrates an advantage in scenarios requiring lower recall. Conversely, our method MCI consistently achieves a significantly higher final recall rate. This indicates that MCI is su… view at source ↗
Figure 14
Figure 14. Figure 14: Results of our method with varying (𝑘 ′ , 𝜏) 0 1 2 4 8 16 α (multiply 1.2) 0 20 40 60 80 100 uncovered percentage (%) sift1M deep1M glove2.2m tripclick gist1M wit spacev10M view at source ↗
Figure 15
Figure 15. Figure 15: Results of 𝛼 vs. the percentage of uncovered nodes 1 2 4 8 16 32 64 threads 1 2 4 8 16 32 64 Speedup Clique ACORN (a) sift1M 1 2 4 8 16 32 64 threads 1 2 4 8 16 32 64 Speedup Clique ACORN (b) deep1M view at source ↗
Figure 16
Figure 16. Figure 16: Parallel performance of index building algorithms view at source ↗
Figure 17
Figure 17. Figure 17: Additional experiments on deep1M A Discussion on dynamic updates Although our current implementation focuses on static construc￾tion, MCI is designed to support efficient dynamic operations with￾out global re-indexing. We propose the following lightweight strate￾gies, which we leave for future work. Insertion. We utilize a heuristic grounded in geometric transitivity (Theorem 3.4). To insert a node 𝑢, we … view at source ↗
read the original abstract

Approximate Nearest Neighbor Search with arbitrary filtering predicates (AFANNS) is essential for modern data applications, yet existing methods often incur substantial storage and computational costs. In this work, we introduce the Maximal Clique Index (\mci), a novel graph-based index designed for robust and efficient AFANNS. The core idea of \mci is to approximate a dense Nearest Neighbor Graph (NNG) through a compact, clique-based representation. We propose two key techniques: (1) Maximal Clique Cover (\mcc), which exploits the geometric transitivity of high-dimensional spaces to encode dense neighborhoods as maximal cliques, achieving an index with high compression and connectivity; and (2) Local Neighborhood Graph Geometric Densification, a strategy that constructs an index approximating a large NNG from a sparse initial NNG, recovers global connectivity by progressively increasing distance thresholds to locally densify the structure. The index is built in a lock-free parallel manner for scalability and queried via a carefully-designed multi-seed strategy to handle fragmented predicate-induced subgraphs. Extensive experiments on 10 datasets show that \mci significantly outperforms state-of-the-art methods by up to one order of magnitude in QPS at high recall while using substantially smaller space, and remains competitive even on range/keyword filtering tasks, demonstrating robust general-purpose performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Maximal Clique Index (MCI) for approximate nearest neighbor search under arbitrary filtering predicates (AFANNS). It approximates a dense nearest-neighbor graph via Maximal Clique Cover (MCC), which encodes neighborhoods as maximal cliques by exploiting geometric transitivity, combined with Local Neighborhood Graph Geometric Densification to recover connectivity from a sparse initial graph. The index supports lock-free parallel construction and a multi-seed query strategy for handling predicate-induced fragmentation. Experiments on 10 datasets report up to 10x higher QPS at high recall versus state-of-the-art methods, with substantially smaller space and competitive results on range/keyword filtering.

Significance. If the performance claims and underlying connectivity assumptions hold under verification, MCI would constitute a meaningful advance for filtered high-dimensional search by delivering a compact index with strong query throughput. The lock-free parallel build and multi-seed strategy for fragmented subgraphs address practical scalability needs. Credit is due for the scale of the evaluation (10 datasets) and the attempt to handle arbitrary predicates rather than narrow filter types.

major comments (2)
  1. [Abstract] Abstract: the central claim that MCC 'exploits the geometric transitivity of high-dimensional spaces to encode dense neighborhoods as maximal cliques, achieving an index with high compression and connectivity' is load-bearing for the reported QPS gains and space savings, yet no formal bound, approximation-error analysis, or quantification of omitted edges is supplied. High-dimensional distance concentration can weaken transitivity, risking loss of cross-neighborhood edges that become critical once predicates fragment the graph.
  2. [Abstract / §3] Abstract / §3 (MCC and densification): the manuscript invokes 'progressively increasing distance thresholds' for densification but provides neither the exact schedule nor a parameter count for these thresholds, nor any empirical sensitivity analysis. This directly affects reproducibility of the 'substantially smaller space' and 'high recall' results under filtering.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'arbitrary-filtered' is used without a concise definition or taxonomy of supported predicate classes beyond the range/keyword examples mentioned later.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the careful and constructive review. We appreciate the recognition of the work's significance, the lock-free construction, multi-seed query strategy, and the scale of the 10-dataset evaluation. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that MCC 'exploits the geometric transitivity of high-dimensional spaces to encode dense neighborhoods as maximal cliques, achieving an index with high compression and connectivity' is load-bearing for the reported QPS gains and space savings, yet no formal bound, approximation-error analysis, or quantification of omitted edges is supplied. High-dimensional distance concentration can weaken transitivity, risking loss of cross-neighborhood edges that become critical once predicates fragment the graph.

    Authors: We agree that the claim is central and that the absence of a formal bound or approximation-error analysis is a limitation. Deriving a tight theoretical bound on omitted edges under high-dimensional distance concentration is challenging and would require substantial additional theoretical development beyond the scope of this systems-oriented paper. The manuscript instead supports the claim through empirical results across 10 datasets, including connectivity preservation under filtering. We will revise the abstract to qualify the claim as empirically supported and add a new paragraph in Section 3 quantifying the fraction of edges retained by the maximal clique cover relative to a dense NNG, along with a short discussion of distance concentration and how densification addresses fragmentation. revision: partial

  2. Referee: [Abstract / §3] Abstract / §3 (MCC and densification): the manuscript invokes 'progressively increasing distance thresholds' for densification but provides neither the exact schedule nor a parameter count for these thresholds, nor any empirical sensitivity analysis. This directly affects reproducibility of the 'substantially smaller space' and 'high recall' results under filtering.

    Authors: This observation is correct; the current manuscript describes the progressive densification at a conceptual level without specifying the exact threshold schedule, the number of parameters, or sensitivity results. We will revise Section 3 to document the precise schedule and parameter count used in the experiments, and we will add an empirical sensitivity analysis (in the main text or appendix) showing the impact on space and recall. These additions will directly improve reproducibility. revision: yes

standing simulated objections not resolved
  • A formal bound or approximation-error analysis for the geometric transitivity assumption in MCC under high-dimensional distance concentration, as this lies outside the current scope of the applied paper.

Circularity Check

0 steps flagged

No significant circularity; algorithmic construction is self-contained

full rationale

The paper presents MCI as an algorithmic index built via Maximal Clique Cover (exploiting geometric transitivity) and Local Neighborhood Graph Geometric Densification to approximate a dense NNG from a sparse one, followed by multi-seed querying. These steps are constructive procedures grounded in graph properties and stated assumptions about high-dimensional spaces, with performance claims backed by experiments across 10 datasets. No equations, fitted parameters, or derivations reduce by construction to their own inputs. No self-citation chains or ansatz smuggling are load-bearing in the provided description. The derivation chain does not collapse into tautology or statistical forcing.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The approach rests on domain assumptions about high-dimensional geometry and introduces a new index structure whose effectiveness is validated empirically rather than derived from first principles.

free parameters (1)
  • distance thresholds
    Progressively increased during local densification to recover global connectivity from sparse initial graph.
axioms (1)
  • domain assumption High-dimensional spaces exhibit geometric transitivity that permits dense neighborhoods to be represented as maximal cliques
    Invoked as the basis for the Maximal Clique Cover technique to achieve high compression.
invented entities (1)
  • Maximal Clique Index (MCI) no independent evidence
    purpose: Compact clique-based approximation of dense nearest neighbor graphs for filtered search
    Newly proposed index structure

pith-pipeline@v0.9.0 · 5549 in / 1259 out tokens · 89521 ms · 2026-05-08T09:20:07.035573+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

79 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    Razenshteyn, and Ludwig Schmidt

    Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya P. Razenshteyn, and Ludwig Schmidt. 2015. Practical and Optimal LSH for Angular Distance. InNIPS. 1225– 1233

  2. [2]

    Fabien André, Anne-Marie Kermarrec, and Nicolas Le Scouarnec. 2017. Acceler- ated Nearest Neighbor Search with Quick ADC. InICMR. ACM, 159–166

  3. [3]

    Akhil Arora, Sakshi Sinha, Piyush Kumar, and Arnab Bhattacharya. 2018. HD- Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces.Proc. VLDB Endow.11, 8 (2018), 906–919

  4. [4]

    Fedor Borisyuk, Siddarth Malreddy, Jun Mei, Yiqun Liu, Xiaoyi Liu, Piyush Ma- heshwari, Anthony Bell, and Kaushik Rangadurai. 2021. VisRel: Media Search at Scale. InSIGKDD. ACM, 2584–2592

  5. [5]

    Yuzheng Cai, Jiayang Shi, Yizhuo Chen, and Weiguo Zheng. 2024. Navigating La- bels and Vectors: A Unified Approach to Filtered Approximate Nearest Neighbor Search.Proc. ACM Manag. Data2, 6 (2024), 246:1–246:27

  6. [6]

    Manos Chatzakis, Panagiota Fatourou, Eleftherios Kosmas, Themis Palpanas, and Botao Peng. 2023. Odyssey: A Journey in the Land of Distributed Data Series Similarity Search.Proc. VLDB Endow.16, 5 (2023), 1140–1153

  7. [7]

    Sean Wang

    Meng Chen, Kai Zhang, Zhenying He, Yinan Jing, and X. Sean Wang. 2024. RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search.Proc. VLDB Endow.17, 11 (2024), 2735–2749

  8. [8]

    Chen, Wei-Cheng Chang, Jyun-Yu Jiang, Hsiang-Fu Yu, Inderjit S

    Patrick H. Chen, Wei-Cheng Chang, Jyun-Yu Jiang, Hsiang-Fu Yu, Inderjit S. Dhillon, and Cho-Jui Hsieh. 2023. FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search. InWWW. ACM, 3225–3235

  9. [9]

    Abhinandan Das, Mayur Datar, Ashutosh Garg, and Shyamsundar Rajaram. 2007. Google news personalization: scalable online collaborative filtering. InWWW. ACM, 271–280

  10. [10]

    Liwei Deng, Penghao Chen, Ximu Zeng, Tianfu Wang, Yan Zhao, and Kai Zheng. 2024. Efficient Data-aware Distance Comparison Operations for High- Dimensional Approximate Nearest Neighbor Search.Proc. VLDB Endow.18, 3 (2024), 812–821

  11. [11]

    Wei Dong, Moses Charikar, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. InWWW 2011. ACM, 577–586

  12. [12]

    Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The Faiss library. (2024). arXiv:2401.08281 [cs.LG]

  13. [13]

    Joshua Engels, Benjamin Landrum, Shangdi Yu, Laxman Dhulipala, and Julian Shun. 2024. Approximate Nearest Neighbor Search with Window Filters. InForty- first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net

  14. [14]

    Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. 2024. A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models. InSIGKDD. ACM, 6491–6501

  15. [15]

    Cong Fu and Deng Cai. 2016. EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph.CoRRabs/1609.07228 (2016)

  16. [16]

    Cong Fu, Changxu Wang, and Deng Cai. 2022. High Dimensional Similarity Search With Satellite System Graph: Efficiency, Scalability, and Unindexed Query Compatibility.IEEE Trans. Pattern Anal. Mach. Intell.44, 8 (2022), 4139–4150

  17. [17]

    Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2019. Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph.Proc. VLDB Endow.12, 5 (2019), 461–474

  18. [18]

    Yujian Fu, Cheng Chen, Yao Chen, Weng-Fai Wong, and Bingsheng He. 2025. Vista: Vector Indexing and Search for Large-Scale Imbalanced Datasets. InICDE. IEEE, 543–556

  19. [19]

    Jianyang Gao, Yutong Gou, Yuexuan Xu, Yongyi Yang, Cheng Long, and Ray- mond Chi-Wing Wong. 2025. Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neigh- bor Search.Proc. ACM Manag. Data3, 3 (2025), 202:1–202:26

  20. [20]

    Jianyang Gao and Cheng Long. 2023. High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations. Proc. ACM Manag. Data1, 2 (2023), 137:1–137:27

  21. [21]

    Jianyang Gao and Cheng Long. 2024. RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search. Proc. ACM Manag. Data2, 3 (2024), 167

  22. [22]

    Siddharth Gollapudi, Neel Karia, Varun Sivashankar, Ravishankar Krishnaswamy, Nikit Begwani, Swapnil Raz, Yiyong Lin, Yin Zhang, Neelam Mahapatro, Premku- mar Srinivasan, Amit Singh, and Harsha Vardhan Simhadri. 2023. Filtered- DiskANN: Graph Algorithms for Approximate Nearest Neighbor Search with Filters. InWWW. ACM, 3406–3416

  23. [23]

    Yutong Gou, Jianyang Gao, Yuexuan Xu, and Cheng Long. 2025. SymphonyQG: Towards Symphonious Integration of Quantization and Graph for Approximate Nearest Neighbor Search.Proc. ACM Manag. Data3, 1 (2025), 80:1–80:26

  24. [24]

    Ben Harwood and Tom Drummond. 2016. FANNG: Fast Approximate Nearest Neighbour Graphs. InCVPR. 5713–5722

  25. [25]

    Ville Hyvönen, Elias Jääsaari, and Teemu Roos. 2024. A Multilabel Classification Framework for Approximate Nearest Neighbor Search.J. Mach. Learn. Res.25 (2024), 46:1–46:51

  26. [26]

    Elias Jääsaari, Ville Hyvönen, and Teemu Roos. 2024. LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search. InNIPS

  27. [27]

    Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. InAdvances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc

  28. [28]

    Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search.IEEE Trans. Pattern Anal. Mach. Intell.33, 1 (2011), 117–128

  29. [29]

    Mengxu Jiang, Zhi Yang, Fangyuan Zhang, Guanhao Hou, Jieming Shi, Wenchao Zhou, Feifei Li, and Sibo Wang. 2025. DIGRA: A Dynamic Graph Indexing for Approximate Nearest Neighbor Search with Range Filter.Proc. ACM Manag. Data 3, 3 (2025), 148:1–148:26. doi:10.1145/3725399

  30. [30]

    Zhongming Jin, Debing Zhang, Yao Hu, Shiding Lin, Deng Cai, and Xiaofei He

  31. [31]

    IEEE Trans

    Fast and Accurate Hashing Via Iterative Nearest Neighbors Expansion. IEEE Trans. Cybern.44, 11 (2014), 2167–2177

  32. [32]

    Oh, and Jae W

    Sungjun Jung, Yongsang Park, Haeun Lee, Young H. Oh, and Jae W. Lee. 2025. Angular Distance-Guided Neighbor Selection for Graph-Based Approximate Nearest Neighbor Search. InWWW. ACM, 4014–4023

  33. [33]

    Mocheng Li, Xiao Yan, Baotong Lu, Yue Zhang, James Cheng, and Chenhao Ma

  34. [34]

    ACM Manag

    Attribute Filtering in Approximate Nearest Neighbor Search: An In-depth Experimental Study.Proc. ACM Manag. Data3, 6 (2025), 1–26

  35. [35]

    Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin. 2020. Approximate Nearest Neighbor Search on High Dimensional Data - Experiments, Analyses, and Improvement.TKDE32, 8 (2020), 1475–1488

  36. [36]

    Zhaoheng Li, Silu Huang, Wei Ding, Yongjoo Park, and Jianjun Chen. 2025. SIEVE: Effective Filtered Vector Search with Collection of Indexes.Proc. VLDB Endow. 18, 11 (2025), 4723–4736

  37. [37]

    Anqi Liang, Pengcheng Zhang, Bin Yao, Zhongpu Chen, Yitong Song, and Guangxu Cheng. 2024. UNIFY: Unified Index for Range Filtered Approximate Nearest Neighbors Search.Proc. VLDB Endow.18, 4 (2024), 1118–1130

  38. [38]

    Zihan Liu, Wentao Ni, Jingwen Leng, Yu Feng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, and Yuhao Zhu. 2024. JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping. InASPLOS. ACM, 549–565

  39. [39]

    Kejing Lu, Chuan Xiao, and Yoshiharu Ishikawa. 2024. Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search. InICML. OpenReview.net

  40. [40]

    Jiarui Luo, Miao Qiao, Chaoji Zuo, and Dong Deng. 2025. Tag-Filtered Approx- imate Nearest Neighbor Search. In41st IEEE International Conference on Data Engineering, ICDE. MCI: A Maximal Clique Index for Efficient Arbitrary-Filtered Approximate Nearest Neighbor Search Conference’17, July 2017, Washington, DC, USA

  41. [41]

    Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov

  42. [42]

    Syst.45 (2014), 61–68

    Approximate nearest neighbor algorithm based on navigable small world graphs.Inf. Syst.45 (2014), 61–68

  43. [43]

    Malkov and Dmitry A

    Yury A. Malkov and Dmitry A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.IEEE Trans. Pattern Anal. Mach. Intell.42, 4 (2020), 824–836

  44. [44]

    Blelloch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun

    Magdalen Dobson Manohar, Zheqi Shen, Guy E. Blelloch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun. 2024. ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search Algorithms. InPPoPP. ACM, 270–285

  45. [45]

    Ilyas, Umar Farooq Minhas, Jeffrey Pound, and Theodoros Rekatsinas

    Jason Mohoney, Anil Pacaci, Shihabur Rahman Chowdhury, Ali Mousavi, Ihab F. Ilyas, Umar Farooq Minhas, Jeffrey Pound, and Theodoros Rekatsinas. 2023. High- Throughput Vector Similarity Search in Knowledge Graphs.Proc. ACM Manag. Data1, 2 (2023), 197:1–197:25

  46. [46]

    Javier Alvaro Vargas Muñoz, Marcos André Gonçalves, Zanoni Dias, and Ricardo da Silva Torres. 2019. Hierarchical Clustering-Based Graphs for Large Scale Approximate Nearest Neighbor Search.Pattern Recognit.96 (2019)

  47. [47]

    Hiroyuki Ootomo, Akira Naruse, Corey Nolet, Ray Wang, Tamas Feher, and Yong Wang. 2024. CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs. InICDE. IEEE, 4236–4247

  48. [48]

    Liana Patel, Peter Kraft, Carlos Guestrin, and Matei Zaharia. 2024. ACORN: Per- formant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data.Proc. ACM Manag. Data2, 3 (2024), 120

  49. [49]

    Yun Peng, Byron Choi, Tsz Nam Chan, Jianye Yang, and Jianliang Xu. 2023. Efficient Approximate Nearest Neighbor Search in Multi-dimensional Databases. Proc. ACM Manag. Data1, 1 (2023), 54:1–54:27

  50. [50]

    Zhencan Peng, Miao Qiao, Wenchao Zhou, Feifei Li, and Dong Deng. 2025. Dy- namic Range-Filtering Approximate Nearest Neighbor Search.Proc. VLDB Endow. 18, 10 (2025), 3256–3268

  51. [51]

    Chanop Silpa-Anan and Richard I. Hartley. 2008. Optimised KD-trees for fast image descriptor matching. InCVPR. IEEE Computer Society

  52. [52]

    Bing Tian, Haikun Liu, Zhuohui Duan, Xiaofei Liao, Hai Jin, and Yu Zhang. 2024. Scalable Billion-point Approximate Nearest Neighbor Search Using SmartSSDs. InUSENIX ATC. USENIX Association, 1135–1150

  53. [53]

    Yao Tian, Xi Zhao, and Xiaofang Zhou. 2024. DB-LSH 2.0: Locality-Sensitive Hashing With Query-Based Dynamic Bucketing.IEEE Trans. Knowl. Data Eng. 36, 3 (2024), 1000–1015

  54. [54]

    Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. 2006. The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci.363, 1 (2006), 28–42

  55. [56]

    Toussaint

    Godfried T. Toussaint. 1980. The relative neighbourhood graph of a finite planar set.Pattern Recognit.12, 4 (1980), 261–268

  56. [57]

    R. v. Mises. 1942. On the Correct Use of Bayes’ Formula.The Annals of Mathe- matical Statistics13, 2 (1942), 156 – 165

  57. [58]

    Thomas Vecchiato, Claudio Lucchese, Franco Maria Nardini, and Sebastian Bruch

  58. [59]

    A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor Search. InSIGIR. ACM, 2261–2265

  59. [60]

    2018.High-Dimensional Probability: An Introduction with Applications in Data Science

    Roman Vershynin. 2018.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press

  60. [61]

    Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xi- angyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, et al. 2021. Milvus: A Purpose-Built Vector Data Management System. InProceedings of the 2021 International Conference on Management of Data. 2614–2627

  61. [62]

    Mengzhao Wang, Lingwei Lv, Xiaoliang Xu, Yuxiang Wang, Qiang Yue, and Jiongkang Ni. 2023. An Efficient and Robust Framework for Approximate Nearest Neighbor Search with Attribute Constraint. InNIPS

  62. [63]

    Mengzhao Wang, Xiaoliang Xu, Qiang Yue, and Yuxiang Wang. 2021. A Com- prehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search.Proc. VLDB Endow.14, 11 (2021), 1964–1978

  63. [64]

    Yitu Wang, Shiyu Li, Qilin Zheng, Linghao Song, Zongwang Li, Andrew Chang, Hai Li, and Yiran Chen. 2024. NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing. InISCA. IEEE, 368–381

  64. [65]

    Yi Wang, Huan Liu, Jianan Yuan, Jiaxian Chen, Tianyu Wang, Chenlin Ma, and Rui Mao. 2024. Leanor: A Learning-Based Accelerator for Efficient Approximate Nearest Neighbor Search via Reduced Memory Access. InDAC. ACM, 61:1–61:6

  65. [66]

    Zeyu Wang, Qitong Wang, Xiaoxing Cheng, Peng Wang, Themis Palpanas, and Wei Wang. 2024. Steiner-Hardness: A Query Hardness Measure for Graph-Based ANN Indexes.Proc. VLDB Endow.17, 13 (2024), 4668–4682

  66. [67]

    Zeyu Wang, Haoran Xiong, Qitong Wang, Zhenying He, Peng Wang, Themis Palpanas, and Wei Wang. 2024. Dimensionality-Reduction Techniques for Ap- proximate Nearest Neighbor Search: A Survey and Evaluation.IEEE Data Eng. Bull.48, 3 (2024), 63–80

  67. [68]

    Jiuqi Wei, Xiaodong Lee, Zhenyu Liao, Themis Palpanas, and Botao Peng. 2025. Subspace Collision: An Efficient and Accurate Framework for High-dimensional Approximate Nearest Neighbor Search.Proc. ACM Manag. Data3, 1 (2025), 79:1–79:29

  68. [69]

    Jiuqi Wei, Botao Peng, Xiaodong Lee, and Themis Palpanas. 2024. DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approxi- mate Nearest Neighbor Search.Proc. VLDB Endow.17, 9 (2024), 2241–2254

  69. [70]

    Jiadong Xie, Jeffrey Xu Yu, Siyi Teng, and Yingfan Liu. 2025. Beyond Vector Search: Querying With and Without Predicates.Proc. ACM Manag. Data3, 6 (2025), 1–26. doi:10.1145/3769765

  70. [71]

    Yuexuan Xu, Jianyang Gao, Yutong Gou, Cheng Long, and Christian S. Jensen

  71. [72]

    ACM Manag

    iRangeGraph: Improvising Range-dedicated Graphs for Range-filtering Nearest Neighbor Search.Proc. ACM Manag. Data2, 6 (2024), 239:1–239:26

  72. [73]

    Ming Yang, Yuzheng Cai, and Weiguo Zheng. 2024. CSPG: Crossing Sparse Proximity Graphs for Approximate Nearest Neighbor Search. InNIPS

  73. [74]

    Xiaowei Ye, Rong-Hua Li, and Guoren Wang. 2025. The Power of Core Clique Removal for Exact Clique Enumeration.Proc. ACM Manag. Data3, 4 (2025), 269:1–269:27

  74. [75]

    Ziqi Yin, Jianyang Gao, Pasquale Balsebre, Gao Cong, and Cheng Long. 2025. DEG: Efficient Hybrid Vector Search Using the Dynamic Edge Navigation Graph. Proc. ACM Manag. Data3, 1 (2025), 29:1–29:28

  75. [76]

    Qiang Yue, Xiaoliang Xu, Yuxiang Wang, Yikun Tao, and Xuliyuan Luo. 2024. Routing-Guided Learned Product Quantization for Graph-Based Approximate Nearest Neighbor Search. InICDE. IEEE, 4870–4883

  76. [77]

    Fangyuan Zhang, Mengxu Jiang, Guanhao Hou, Jieming Shi, Hua Fan, Wenchao Zhou, Feifei Li, and Sibo Wang. 2025. Efficient Dynamic Indexing for Range Filtered Approximate Nearest Neighbor Search.Proc. ACM Manag. Data3, 3 (2025), 152:1–152:26. doi:10.1145/3725401

  77. [78]

    Xi Zhao, Yao Tian, Kai Huang, Bolong Zheng, and Xiaofang Zhou. 2023. Towards Efficient Index Construction and Approximate Nearest Neighbor Search in High- Dimensional Spaces.Proc. VLDB Endow.16, 8 (2023), 1979–1991

  78. [79]

    Bolong Zheng, Xi Zhao, Lianggui Weng, Quoc Viet Hung Nguyen, Hang Liu, and Christian S. Jensen. 2022. PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search.VLDB J.31, 6 (2022), 1339–1363

  79. [80]

    Chaoji Zuo, Miao Qiao, Wenchao Zhou, Feifei Li, and Dong Deng. 2024. SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor Search.Proc. ACM Manag. Data2, 1 (2024), 69:1–69:26. Conference’17, July 2017, Washington, DC, USA Rong-Hua Li et al. 80 85 90 95 100 recall@10 103 QPS 6 8 10 12 (a) Various quality of𝑘 ′-NNG 80 85 90 95 100 recall@10 1...