arxiv: 2604.20401 · v1 · submitted 2026-04-22 · 💻 cs.CR · cs.AI

Recognition: unknown

Onyx: Cost-Efficient Disk-Oblivious ANN Search

Deevashwer Rathee, G. Edward Suh, Jean-Luc Watson, Raluca Ada Popa, Zirui Neil Zhao

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:18 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords approximate nearest neighbor searchoblivious RAMtrusted execution environmentsdisk access patternscost-efficient searchANN pruningORAM tree design

0 comments

The pith

Reversing optimization priorities lets ANN prune bandwidth and ORAM cut accesses, yielding up to 9.9x lower cost for disk-oblivious search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that pairing ANN search with ORAM for SSDs becomes inefficient when ANN minimizes access count and ORAM minimizes bandwidth, because each layer is poorly matched to that task. It instead assigns bandwidth minimization to the ANN layer and access-count minimization to the ORAM layer. The ANN component uses a compact intermediate representation to drop most high-bandwidth accesses early while keeping recall intact. The ORAM component uses a locality-aware shallow tree to lower the number of accesses while still working with bandwidth-efficient ORAM schemes. If correct, this inversion makes private similarity search on untrusted external storage practical for large AI workloads that must protect query patterns.

Core claim

Onyx inverts prior ORAM-ANN designs by minimizing bandwidth consumption in the ANN layer and access count in the ORAM layer. Onyx-ANNS achieves the first goal with a compact intermediate representation that proactively prunes the majority of bandwidth-intensive accesses without meaningfully reducing recall. Onyx-ORAM achieves the second goal with a locality-aware shallow tree that reduces access count while remaining compatible with bandwidth-efficient ORAM primitives. The resulting system delivers 1.7-9.9x lower cost and 2.3-12.3x lower latency than the state-of-the-art oblivious ANN search system.

What carries the argument

The inverted assignment of goals: bandwidth minimization performed by a compact pruning representation inside the ANN layer, paired with access-count minimization performed by a locality-aware shallow tree inside the ORAM layer.

If this is right

Secure ANN search over external SSDs inside TEEs becomes viable at commercial cost and latency levels.
Cloud providers can host privacy-sensitive similarity search without exposing query patterns to the host OS.
ANN indexes can be restructured to trade early filtering accuracy for lower bandwidth while still meeting recall targets.
ORAM constructions gain headroom when paired with data layouts that already reduce logical access count.
Deployment of large-scale private vector databases on commodity hardware no longer requires prohibitive SSD over-provisioning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same priority swap could be tested on other oblivious data structures such as range trees or graph indices where one layer tolerates approximation and the other benefits from locality.
Integration with specific TEEs like Intel SGX or AMD SEV might require only minor adjustments to the shallow-tree layout to preserve ORAM compatibility.
Real hardware traces from production SSDs could be used to measure whether the access-count savings persist when page sizes and queue depths vary.
If the pruning representation generalizes, it might reduce bandwidth in non-oblivious ANN systems as well, improving baseline performance before ORAM is added.

Load-bearing premise

The compact intermediate representation can discard most bandwidth-heavy accesses without lowering final recall, and the shallow locality-aware tree stays compatible with efficient ORAM under realistic SSD workloads.

What would settle it

An experiment on standard ANN benchmarks that shows recall falling below the target threshold when the pruning representation is applied, or a workload trace demonstrating that the reduced access count fails to lower end-to-end latency once ORAM protocol costs are included.

Figures

Figures reproduced from arXiv: 2604.20401 by Deevashwer Rathee, G. Edward Suh, Jean-Luc Watson, Raluca Ada Popa, Zirui Neil Zhao.

**Figure 1.** Figure 1: Prior designs for oblivious ANNS and Onyx. Compass [131] suffers from poor performance due to high network overhead, and mirroring its design into a TEE yields a system that needs significant CPU and SSD resources to sustain reasonable performance. Onyx proposes co-designed primitives for ANNS and ORAM that jointly improve resource utilization, yielding high performance and low cost. the inherently low … view at source ↗

**Figure 2.** Figure 2: I/O overhead design space for ORAM schemes at 𝐵 = 512 Bytes. Gray iso-throughput contours show the maximum I/O multipliers that our evaluation SSD (§ 6.1) can theoretically sustain at a given throughput. Arrows trace Onyx-ORAM’s step-by-step improvements from RingORAM, annotated with the factor change in each multiplier. which for 𝑑 = 8 is ≈ 1. As a result, this step yields a 3× access count reduction wit… view at source ↗

**Figure 3.** Figure 3: Onyx-ORAM vs prior work for 20M blocks (1-SSD). LBM (local bucket metadata), FBR (full bucket reads), ST (shallow tree) refer to steps in § 4.3; Onyx: Ring+LBM+FBR+ST-8; Ring: RingORAM; Path: localityoptimized PathORAM. around 2K QPS because the high access count per logical operation saturates the SSD’s IOPS. On the other hand, PathORAM is bandwidth-bound: its throughput degrades with increasing block… view at source ↗

**Figure 4.** Figure 4: (Top) Pareto frontier of throughput vs. latency of Onyx and baselines (1-SSD; top-left is better). Dots below the curves represent sub-optimal parameterizations; single markers are the configuration for a scheme that maximizes both latency and throughput. (Bottom) Throughput vs. recall of Onyx and baselines (1-SSD). Queries/$ Latency (ms) (a) SIFT (b) MS-MARCO (c) WIKI (d) DEEP [PITH_FULL_IMAGE:figures/fu… view at source ↗

**Figure 5.** Figure 5: Pareto frontier of cost-normalized throughput (Queries/$) vs. latency for Onyx and baselines, highlighting the most cost-effective configurations (vCPU and SSD resources allocated). Better configurations are towards the top-left. blocks). This overhead is more noticeable on datasets with smaller hints (e.g., DEEP), where ORAM client state is a larger fraction of total memory usage. Compass-in-TEE has the h… view at source ↗

**Figure 6.** Figure 6: Importance of ORAM-ANN co-design: pareto frontier of throughput vs. latency for Onyx and singlecomponent (Onyx-ORAM/ANN) constructions. Better configurations are towards the top-left; single markers indicate a scheme’s latency- and throughput-optimal configuration. primary bottleneck: Onyx-ORAM + Disk approaches Onyx (10–20% lower performance), while replacing Onyx-ORAM with RingORAM or PathORAM degrades… view at source ↗

**Figure 7.** Figure 7: complements [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: (Top) Pareto frontier of throughput vs. latency of Onyx and baselines (4-SSD; top-left is better). Dots below the curves represent sub-optimal parameterizations; single markers are the configuration for a scheme that maximizes both latency and throughput. (Bottom) Throughput vs. recall of Onyx and baselines (4-SSD). Throughput (QPS) (a) SIFT (b) MS-MARCO (c) WIKI (d) DEEP Latency (ms) [PITH_FULL_IMAGE:fig… view at source ↗

**Figure 9.** Figure 9: ORAM-ANN co-design ablation (4-SSD). Pareto frontier of throughput vs. latency for Onyx and singlecomponent (Onyx-ORAM and Onyx-ANN) constructions. Better configurations are towards the top-left. with large vectors (MS-MARCO 4.5×, WIKI 2.8×) where the bandwidth reduction is largest (3.8–5.0×, § 5.4), but OnyxANNS also improves on smaller-vector datasets (SIFT 1.2×, DEEP 1.1×). Without increasing memory f… view at source ↗

**Figure 10.** Figure 10: Onyx-ANNS ablation (1-SSD). Each curve varies the search queue depth (smallest QD value); other labeled values are pruning hint sizes. Pareto frontiers highlight the best-performing configurations for each approach. FreshDiskANN Update Primitives. FreshDiskANN’s dynamic updates are built from two primitives: GreedySearch and RobustPrune. At a high level, GreedySearch traverses the graph to find a candida… view at source ↗

**Figure 11.** Figure 11: Security game for disk-access privacy for a diskresident ANN search system ANNS. Hybrid 𝐻2: replace ciphertexts with encryptions of zeros. Identical to 𝐻1 except that every ciphertext written to disk encrypts a fixed string 0 𝐵 (where 𝐵 is the slot size) instead of the real payload (addrs, data). The challenger now maintains the entire ORAM tree on disk internally, so that it can follow the same access … view at source ↗

read the original abstract

Approximate nearest neighbor (ANN) search in AI systems increasingly handles sensitive data on third-party infrastructure. Trusted execution environments (TEEs) offer protection, but cost-efficient deployments must rely on external SSDs, which leaks user queries through disk access patterns to the host. Oblivious RAM (ORAM) can hide these access patterns but at a high cost; when paired with existing disk-based ANN search techniques, it makes poor use of SSD resources, yielding high latency and poor cost-efficiency. The core challenge for efficient oblivious ANN search over SSDs is balancing both bandwidth and access count. The state-of-the-art ORAM-ANN design minimizes access count at the ANN level and bandwidth at the ORAM level, each trading-off the other, leaving the combined system with both resources overutilized. We propose inverting this design, minimizing bandwidth consumption in the ANN layer and access count in the ORAM layer, since each component is better suited for its new role: ANN's inherent approximation allows for more bandwidth efficiency, while ORAM has no fundamental lower bounds on access count (as opposed to bandwidth). To this end, we propose a cost-efficient approach, Onyx, with two new co-designed components: Onyx-ANNS introduces a compact intermediate representation that proactively prunes the majority of bandwidth-intensive accesses without hurting recall, and Onyx-ORAM proposes a locality-aware shallow tree design that reduces access count while remaining compatible with bandwidth-efficient ORAM techniques. Compared to the state-of-the-art oblivious ANN search system, Onyx achieves $1.7-9.9\times$ lower cost and $2.3-12.3\times$ lower latency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Onyx inverts the ANN-ORAM optimization targets with a compact pruning representation and shallow tree, delivering claimed cost and latency wins if the two load-bearing assumptions on recall and compatibility hold.

read the letter

Onyx inverts the usual optimization targets for oblivious ANN search on external storage. Instead of minimizing access count at the ANN layer and bandwidth at ORAM, it does the opposite, using ANN's approximation tolerance for bandwidth savings and ORAM's flexibility on access count. The new pieces are a compact intermediate representation in Onyx-ANNS that prunes most heavy accesses upfront, and a locality-aware shallow tree in Onyx-ORAM that cuts down on accesses while staying compatible with efficient ORAM methods. This addresses a real bottleneck in deploying privacy-preserving ANN without new hardware, and the reported gains of 1.7-9.9x lower cost and 2.3-12.3x lower latency would matter for cloud setups if they hold. The paper does a good job laying out why prior designs waste resources and how the inversion fits the strengths of each component. The soft spots are exactly where the stress test points: whether the compact representation can drop most bandwidth-heavy accesses without dropping recall, and whether the shallow tree keeps the ORAM bandwidth-efficient on actual SSD workloads. These are load-bearing, and if the experiments don't show clear ablations or edge cases, the gains could be overstated. The full paper includes the implementation and results, which seem to back the claims with comparisons, but I'd want to see the workload details and recall metrics to be sure. This is for researchers in secure systems and ML serving who deal with external memory and privacy. A reader interested in ORAM applications or disk-based search would get value from the specific designs. It deserves a serious referee because the problem is timely and the approach is a clear departure from prior work, even if it needs more validation on the assumptions. I would recommend sending it out for review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Onyx, a system for cost-efficient disk-oblivious approximate nearest neighbor (ANN) search over SSDs in trusted execution environments. It identifies that prior ORAM-ANN designs overutilize both bandwidth and access count by minimizing access count at the ANN level and bandwidth at the ORAM level. Onyx inverts the targets: Onyx-ANNS uses a compact intermediate representation to proactively prune the majority of bandwidth-intensive accesses without hurting recall, while Onyx-ORAM uses a locality-aware shallow tree to reduce access count while remaining compatible with bandwidth-efficient ORAM primitives. The paper reports empirical gains of 1.7-9.9× lower cost and 2.3-12.3× lower latency versus the state-of-the-art oblivious ANN search system.

Significance. If the empirical results and underlying assumptions hold under realistic workloads, this work would be significant for practical deployment of secure ANN search on untrusted third-party infrastructure. The co-design that leverages ANN approximation for bandwidth savings and ORAM flexibility for access-count reduction addresses a key resource trade-off in disk-oblivious systems and could enable more cost-effective TEE-based AI services.

major comments (2)

The central performance claims depend on Onyx-ANNS's compact intermediate representation pruning the majority of bandwidth-intensive accesses while keeping recall essentially unchanged. The manuscript must supply concrete recall targets, workload descriptions, ablation results on pruning aggressiveness, and error bars to demonstrate that recall degradation is negligible rather than assumed.
Onyx-ORAM's locality-aware shallow tree is claimed to cut access count without violating the bandwidth efficiency or security of the underlying ORAM primitive on realistic SSD hardware. The evaluation should include access-pattern analysis, SSD-specific microbenchmarks, and direct comparison showing that the shallow tree does not increase bandwidth consumption or introduce new leakage under the tested configurations.

minor comments (2)

The abstract and introduction should explicitly name the state-of-the-art baseline system being compared against, rather than referring to it generically.
Notation for the compact intermediate representation and shallow tree parameters should be defined consistently in the design sections with clear mappings to the claimed resource savings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will incorporate the requested details into the revised manuscript to better substantiate our claims.

read point-by-point responses

Referee: The central performance claims depend on Onyx-ANNS's compact intermediate representation pruning the majority of bandwidth-intensive accesses while keeping recall essentially unchanged. The manuscript must supply concrete recall targets, workload descriptions, ablation results on pruning aggressiveness, and error bars to demonstrate that recall degradation is negligible rather than assumed.

Authors: We agree that these supporting details are necessary. In the revision we will add explicit recall targets (e.g., 0.95 and 0.99), full workload descriptions drawn from standard ANN benchmarks (SIFT, GloVe, etc.), ablation tables varying the pruning aggressiveness parameter, and error bars computed over repeated runs to confirm that recall remains essentially unchanged. revision: yes
Referee: Onyx-ORAM's locality-aware shallow tree is claimed to cut access count without violating the bandwidth efficiency or security of the underlying ORAM primitive on realistic SSD hardware. The evaluation should include access-pattern analysis, SSD-specific microbenchmarks, and direct comparison showing that the shallow tree does not increase bandwidth consumption or introduce new leakage under the tested configurations.

Authors: We will expand the evaluation with access-pattern traces for the shallow tree, SSD microbenchmarks on representative hardware, and side-by-side comparisons against the baseline ORAM that quantify bandwidth usage and confirm the absence of new leakage channels. These results will be added to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical systems design for disk-oblivious ANN search, introducing Onyx-ANNS (compact IR pruning) and Onyx-ORAM (locality-aware shallow tree) to invert prior minimization targets. Performance claims (1.7-9.9× cost, 2.3-12.3× latency reductions) are framed as results of experimental comparisons to state-of-the-art systems rather than any mathematical derivation, prediction, or fitted parameter. No equations, self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the text. The design rationale relies on layer-specific suitability arguments and external benchmarks, keeping the central claims independent of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, the design rests on standard assumptions from ANN search (approximation tolerance) and ORAM (security definitions) without introducing new free parameters, axioms, or invented entities that are explicitly quantified.

pith-pipeline@v0.9.0 · 5620 in / 1142 out tokens · 26844 ms · 2026-05-10T00:18:42.531710+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

147 extracted references · 12 canonical work pages · 2 internal anchors

[1]

Fletcher, Kartik Nayak, Benny Pinkas, and Ling Ren

Ittai Abraham, Christopher W. Fletcher, Kartik Nayak, Benny Pinkas, and Ling Ren. 2017. Asymptotically Tight Bounds for Composing ORAM with PIR. InPublic Key Cryptography (1) (Lecture Notes in Computer Science). Springer, 91–120

2017
[2]

Philip Adams, Menghao Li, Shi Zhang, Li Tan, Qi Chen, Mingqin Li, Zengzhong Li, Knut Magne Risvik, and Harsha Vardhan Simhadri
[3]

DISTRIBUTEDANN: Efficient Scaling of a Single DISKANN Graph Across Thousands of Computers.CoRRabs/2509.06046 (2025)

work page arXiv 2025
[4]

Sebastian Angel, Aditya Basu, Weidong Cui, Trent Jaeger, Stella Lau, Srinath T. V. Setty, and Sudheesh Singanamalla. 2023. Nimble: Roll- back Protection for Confidential Cloud Services. InOSDI. USENIX Association, 193–208

2023
[5]

Eyers, Rüdiger Kapitza, Peter R

Sergei Arnautov, Bohdan Trach, Franz Gregor, Thomas Knauth, André Martin, Christian Priebe, Joshua Lind, Divya Muthukumaran, Dan O’Keeffe, Mark Stillwell, David Goltzsche, David M. Eyers, Rüdiger Kapitza, Peter R. Pietzuch, and Christof Fetzer. 2016. SCONE: Secure Linux Containers with Intel SGX. InOSDI. USENIX Association, 689– 703

2016
[6]

Gilad Asharov, Ilan Komargodski, and Yehuda Michelson. 2023. Fu- tORAMa: A Concretely Efficient Hierarchical Oblivious RAM. InCCS. ACM, 3313–3327

2023
[7]

Martin Aumüller, Erik Bernhardsson, and Alexander John Faithfull
[8]

InSISAP (Lecture Notes in Computer Science)

ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms. InSISAP (Lecture Notes in Computer Science). Springer, 34–49
[9]

Compass Authors. 2024. Compass Faiss Fork.https://github.com/ Clive2312/faiss. Custom Faiss fork used for Compass index construc- tion

2024
[10]

Lempitsky

Artem Babenko and Victor S. Lempitsky. 2016. Efficient Indexing of Billion-Scale Datasets of Deep Descriptors. InCVPR. IEEE Computer Society, 2055–2063

2016
[11]

Andrew Baumann, Marcus Peinado, and Galen C. Hunt. 2014. Shield- ing Applications from an Untrusted Cloud with Haven. InOSDI. USENIX Association, 267–283

2014
[12]

Mihir Bellare and Chanathip Namprempre. 2000. Authenticated Encryption: Relations among Notions and Analysis of the Generic Composition Paradigm. InASIACRYPT (Lecture Notes in Computer Science). Springer, 531–545

2000
[13]

Laura Blackstone, Seny Kamara, and Tarik Moataz. 2020. Revisiting Leakage Abuse Attacks. InNDSS. The Internet Society

2020
[14]

Jo Van Bulck, Frank Piessens, and Raoul Strackx. 2017. SGX-Step: A Practical Attack Framework for Precise Enclave Execution Control. InSysTEX@SOSP. ACM, 4:1–4:6

2017
[15]

David Cash, Paul Grubbs, Jason Perry, and Thomas Ristenpart. 2015. Leakage-Abuse Attacks Against Searchable Encryption. InCCS. ACM, 668–679

2015
[16]

Javad Ghareh Chamani, Ioannis Demertzis, Dimitrios Papadopoulos, Charalampos Papamanthou, and Rasool Jalili. 2024. GraphOS: To- wards Oblivious Graph Processing.IACR Cryptol. ePrint Arch.2024 (2024), 642.https://eprint.iacr.org/2024/642

2024
[17]

Razenshteyn, and M

Hao Chen, Ilaria Chillotti, Yihe Dong, Oxana Poburinnaya, Ilya P. Razenshteyn, and M. Sadegh Riazi. 2020. SANNS: Scaling Up Se- cure Approximate k-Nearest Neighbors Search. InUSENIX Security Symposium. USENIX Association, 2111–2128

2020
[18]

Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. 2021. SPANN: Highly- efficient Billion-scale Approximate Nearest Neighborhood Search. In NeurIPS. 5199–5212

2021
[19]

Weikeng Chen and Raluca Ada Popa. 2020. Metal: A Metadata-Hiding File-Sharing System. InNDSS. The Internet Society

2020
[20]

Pau-Chen Cheng, Wojciech Ozga, Enriquillo Valdez, Salman Ahmed, Zhongshu Gu, Hani Jamjoom, Hubertus Franke, and James Bottomley
[21]

Intel TDX demystified: A top-down approach.ACM Computing Surveys, 56(9):238:1–238:33, 2024

Intel TDX Demystified: A Top-Down Approach.Comput. Surveys56, 9 (2024). doi:10.1145/3652597

work page doi:10.1145/3652597 2024
[22]

Jalen Chuang, Alex Seto, Nicolas Berrios, Stephan van Schaik, Christina Garman, and Daniel Genkin. 2026. TEE.fail: Breaking Trusted Execution Environments via DDR5 Memory Bus Interpo- sition. In47th IEEE Symposium on Security and Privacy (IEEE S&P ’26). 13 IEEE Computer Society.https://tee.fail

2026
[23]

Cohere. 2023. Wikipedia Embeddings (English, 768-dimensional). https://cohere.com/blog/embedding-archives-wikipedia

2023
[24]

Scott Constable, Jo Van Bulck, Xiang Cheng, Yuan Xiao, Cedric Xing, Ilya Alexandrovich, Taesoo Kim, Frank Piessens, Mona Vij, and Mark Silberstein. 2023. AEX-Notify: Thwarting Precise Single-Stepping At- tacks through Interrupt Awareness for Intel SGX Enclaves. InUSENIX Security Symposium. USENIX Association, 4051–4068

2023
[25]

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Jimmy Lin. 2021. MS MARCO: Benchmarking Ranking Models in the Large-Data Regime. InSIGIR. ACM, 1566–1576

2021
[26]

Natacha Crooks, Matthew Burke, Ethan Cecchetti, Sitar Harel, Rachit Agarwal, and Lorenzo Alvisi. 2018. Obladi: Oblivious Serializable Transactions in the Cloud. InOSDI. USENIX Association, 727–743

2018
[27]

Marc Damie, Florian Hahn, and Andreas Peter. 2021. A Highly Ac- curate Query-Recovery Attack against Searchable Encryption using Non-Indexed Documents. InUSENIX Security. USENIX Association, 143–160

2021
[28]

DataStax. 2024. JVector: Graph-based vector search for Java.https: //github.com/datastax/jvector

2024
[29]

Emma Dauterman, Vivian Fang, Ioannis Demertzis, Natacha Crooks, and Raluca Ada Popa. 2021. Snoopy: Surpassing the Scalability Bot- tleneck of Oblivious Storage. InSOSP. ACM, 655–671

2021
[30]

Jesse De Meulemeester, David Oswald, Ingrid Verbauwhede, and Jo Van Bulck. 2026. Battering RAM: Low-Cost Interposer Attacks on Confidential Computing via Dynamic Memory Aliasing. In47th IEEE Symposium on Security and Privacy (S&P)

2026
[31]

Jack Doerner and Abhi Shelat. 2017. Scaling ORAM for Secure Com- putation. InCCS. ACM, 523–535

2017
[32]

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The Faiss library. (2024). arXiv:2401.08281 [cs.LG]

work page internal anchor Pith review arXiv 2024
[33]

Engelsma, Anil K

Joshua J. Engelsma, Anil K. Jain, and Vishnu Naresh Boddeti. 2022. HERS: Homomorphically Encrypted Representation Search.IEEE Trans. Biom. Behav. Identity Sci.4, 3 (2022), 349–360

2022
[34]

Saba Eskandarian and Matei Zaharia. 2019. ObliDB: Oblivious Query Processing for Secure Databases.Proc. VLDB Endow.13, 2 (2019), 169–183

2019
[35]

Abu-Ghazaleh, and Dmitry Ponomarev

Dmitry Evtyushkin, Ryan Riley, Nael B. Abu-Ghazaleh, and Dmitry Ponomarev. 2018. BranchScope: A New Side-Channel Attack on Directional Branch Predictor. InASPLOS. ACM, 693–707

2018
[36]

Stefan Gast, Hannes Weissteiner, Robin Leander Schröder, and Daniel Gruss. 2025. CounterSEVeillance: Performance-Counter Attacks on AMD SEV-SNP. InNDSS. The Internet Society

2025
[37]

Goldman, Shai Halevi, Charanjit S

Craig Gentry, Kenny A. Goldman, Shai Halevi, Charanjit S. Jutla, Mariana Raykova, and Daniel Wichs. 2013. Optimizing ORAM and Using It Efficiently for Secure Computation. InPrivacy Enhancing Technologies (Lecture Notes in Computer Science). Springer, 1–18

2013
[38]

Oded Goldreich and Rafail Ostrovsky. 1996. Software Protection and Simulation on Oblivious RAMs.J. ACM43, 3 (1996), 431–473

1996
[39]

Siddharth Gollapudi, Neel Karia, Varun Sivashankar, Ravishankar Krishnaswamy, Nikit Begwani, Swapnil Raz, Yiyong Lin, Yin Zhang, Neelam Mahapatro, Premkumar Srinivasan, Amit Singh, and Har- sha Vardhan Simhadri. 2023. Filtered-DiskANN: Graph Algorithms for Approximate Nearest Neighbor Search with Filters. InWWW. ACM, 3406–3416

2023
[40]

Goodrich and Michael Mitzenmacher

Michael T. Goodrich and Michael Mitzenmacher. 2011. Privacy- Preserving Access of Outsourced Data via Oblivious RAM Simulation. InICALP (2) (Lecture Notes in Computer Science). Springer, 576–587

2011
[41]

Google. 2025. Gemini Personal Intelligence.https://gemini.google/ overview/personal-intelligence/

2025
[42]

Google Cloud. 2024. Confidential Computing.https://cloud.google. com/confidential-computing. Accessed March 2026

2024
[43]

Google Cloud. 2026. Compute Engine Pricing.https://cloud.google. com/compute/vm-instance-pricing

2026
[44]

Paul Grubbs, Thomas Ristenpart, and Vitaly Shmatikov. 2017. Why Your Encrypted Database Is Not Secure. InHotOS. ACM, 162–168

2017
[45]

Harsha Simhadri. 2026. Overview of the DiskANN Project (2018–present).https://harsha-simhadri.org/diskann-overview.html

2026
[46]

Alexandra Henzinger, Emma Dauterman, Henry Corrigan-Gibbs, and Nickolai Zeldovich. 2023. Private Web Search with Tiptoe. InSOSP. ACM, 396–416

2023
[47]

Yizheng Huang and Jimmy Xiangji Huang. 2026. A Survey on Retrieval-Augmented Text Generation for Large Language Models. Comput. Surveys(2026). doi:10.1145/3805774

work page doi:10.1145/3805774 2026
[48]

Tyler Hunt, Zhiting Zhu, Yuanzhong Xu, Simon Peter, and Emmett Witchel. 2016. Ryoan: A Distributed Sandbox for Untrusted Compu- tation on Secret Data. InOSDI. USENIX Association, 533–549

2016
[49]

Intel. 2024. Scalable Vector Search (SVS).https://github.com/intel/ ScalableVectorSearch

2024
[50]

Mohammad Saiful Islam, Mehmet Kuzu, and Murat Kantarcioglu
[51]

Access Pattern disclosure on Searchable Encryption: Ramifica- tion, Attack and Mitigation. InNDSS. The Internet Society
[52]

Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search.IEEE Trans. Pattern Anal. Mach. Intell.33, 1 (2011), 117–128

2011
[53]

Hervé Jégou, Romain Tavenard, Matthijs Douze, and Laurent Am- saleg. 2011. Searching in one billion vectors: Re-rank with source coding. InICASSP. IEEE, 861–864

2011
[54]

Grace Jia, Alex Wong, and Anurag Khandelwal. 2025. Found in Translation: A Generative Language Modeling Approach to Memory Access Pattern Attacks. InUSENIX Security Symposium. USENIX Association, 7957–7975

2025
[55]

David Kaplan. 2020. SEV-SNP: Strengthening VM Isolation with Integrity Protection and More.https://www.amd.com/content/ dam/amd/en/documents/epyc-business-docs/white-papers/SEV- SNP-strengthening-vm-isolation-with-integrity-protection-and- more.pdf

2020
[56]

Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. InSIGIR. ACM, 39–48

2020
[57]

Kim, Chris Fallin, Ji-Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu

Yoongu Kim, Ross Daly, Jeremie S. Kim, Chris Fallin, Ji-Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu. 2014. Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors. InISCA. IEEE Computer Society, 361–372

2014
[58]

Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. 2019. Spectre Attacks: Exploiting Speculative Execution. InIEEE Symposium on Security and Privacy. IEEE, 1–19

2019
[59]

Ilan Komargodski and Wei-Kai Lin. 2021. A Logarithmic Lower Bound for Oblivious RAM (for All Parameters). InCRYPTO (4) (Lecture Notes in Computer Science). Springer, 579–609

2021
[60]

Steven Lambregts, Huanhuan Chen, Jianting Ning, and Kaitai Liang
[61]

InESORICS (1) (Lecture Notes in Computer Sci- ence)

VAL: Volume and Access Pattern Leakage-Abuse Attack with Leaked Documents. InESORICS (1) (Lecture Notes in Computer Sci- ence). Springer, 653–676
[62]

Kasper Green Larsen and Jesper Buus Nielsen. 2018. Yes, There is an Oblivious RAM Lower Bound!. InCRYPTO (2) (Lecture Notes in Computer Science). Springer, 523–542

2018
[63]

Fang, Chia-Che Tsai, and Raluca Ada Popa

Dayeol Lee, Dongha Jung, Ian T. Fang, Chia-Che Tsai, and Raluca Ada Popa. 2020. An Off-Chip Attack on Hardware Enclaves via the Mem- ory Bus. InUSENIX Security Symposium. USENIX Association, 487– 504

2020
[64]

Sangho Lee, Ming-Wei Shih, Prasun Gera, Taesoo Kim, Hyesoon Kim, and Marcus Peinado. 2017. Inferring Fine-grained Control Flow 14 Inside SGX Enclaves with Branch Shadowing. InUSENIX Security Symposium. USENIX Association, 557–574

2017
[65]

Woomin Lee, Taehun Kim, Seunghee Shin, Junbeom Hur, and Youngjoo Shin. 2025. T-Time: A Fine-Grained Timing-Based Controlled-Channel Attack Against Intel TDX. InESORICS (3) (Lec- ture Notes in Computer Science). Springer, 323–341

2025
[66]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela
[67]

InNeurIPS

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InNeurIPS
[68]

Jingyu Li, Zhicong Huang, Min Zhang, Cheng Hong, Jian Liu, Tao Wei, and Wenguang Chen. 2025. Panther: Private Approximate Nearest Neighbor Search in the Single Server Setting. InCCS. ACM, 365–379

2025
[69]

Xiang Li, Yunqian Luo, and Mingyu Gao. 2024. Bulkor: Enabling Bulk Loading for Path ORAM. InSP. IEEE, 4258–4276

2024
[70]

Linux Kernel Documentation. 2024. Bounce Buffers for Confidential Computing.https://www.kernel.org/doc/html/latest/core-api/dma- api.html

2024
[71]

Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. 2018. Meltdown: Reading Kernel Memory from User Space. InUSENIX Security Sym- posium. USENIX Association, 973–990

2018
[72]

Steve Lu and Rafail Ostrovsky. 2013. Distributed Oblivious RAM for Secure Two-Party Computation. InTCC (Lecture Notes in Computer Science). Springer, 377–396

2013
[73]

Blelloch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun

Magdalen Dobson Manohar, Zheqi Shen, Guy E. Blelloch, Laxman Dhulipala, Yan Gu, Harsha Vardhan Simhadri, and Yihan Sun. 2024. ParlayANN: Scalable and Deterministic Parallel Graph-Based Approx- imate Nearest Neighbor Search Algorithms. InPPoPP. ACM, 270–285

2024
[74]

Sommer, Arthur Gervais, Ari Juels, and Srdjan Capkun

Sinisa Matetic, Mansoor Ahmed, Kari Kostiainen, Aritra Dhar, David M. Sommer, Arthur Gervais, Ari Juels, and Srdjan Capkun
[75]

InUSENIX Security Symposium

ROTE: Rollback Protection for Trusted Execution. InUSENIX Security Symposium. USENIX Association, 1289–1306
[76]

McGrew and John Viega

David A. McGrew and John Viega. 2004. The Security and Perfor- mance of the Galois/Counter Mode (GCM) of Operation. InProgress in Cryptology - INDOCRYPT 2004 (Lecture Notes in Computer Science). Springer, 343–355. doi:10.1007/978-3-540-30556-9_27

work page doi:10.1007/978-3-540-30556-9_27 2004
[77]

Oswald, Thomas Eisen- barth, Ingrid Verbauwhede, and Jo Van Bulck

Jesse De Meulemeester, Luca Wilke, David F. Oswald, Thomas Eisen- barth, Ingrid Verbauwhede, and Jo Van Bulck. 2025. BadRAM: Practi- cal Memory Aliasing Attacks on Trusted Execution Environments. InSP. IEEE, 4117–4135

2025
[78]

Micron. 2023. Micron 7450 MAX NVMe SSD.https://www.micron. com/products/storage/ssd/data-center-ssd/7450-ssd

2023
[79]

Microsoft. 2024. Azure Confidential Computing.https://learn. microsoft.com/en-us/azure/confidential-computing/. Accessed March 2026

2024
[80]

Microsoft. 2024. Retrace your steps with Recall.https: //support.microsoft.com/en-us/windows/retrace-your-steps- with-recall-aa03f8a0-a78b-4b3e-b0a1-2eb8ac48701c

2024

Showing first 80 references.