Embedding Inference Attack

Cedric Fitiavana Raelijohn; Jean-Francois Rajotte; S\'ebastien Gambs

arxiv: 2607.01276 · v1 · pith:VHQPHMQCnew · submitted 2026-07-01 · 💻 cs.CR · cs.IR· cs.LG

Embedding Inference Attack

Cedric Fitiavana Raelijohn , S\'ebastien Gambs , Jean-Francois Rajotte This is my paper

Pith reviewed 2026-07-03 20:38 UTC · model grok-4.3

classification 💻 cs.CR cs.IRcs.LG

keywords embedding inference attackblack-box attackinformation retrieval securityembedding modelsRAG vulnerabilitiesmodel identificationreranker defense

0 comments

The pith

Tailored queries can identify which embedding model powers a black-box IR system from the unordered sets of returned documents alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an adversary with only access to the unordered document sets returned by an IR system can craft queries that distinguish among a closed set of candidate embedding models. This identification, termed an embedding inference attack, succeeds even when a reranker is added as a defense and has been demonstrated on a live RAG pipeline where the queries also evade LLM refusal filters. If correct, the result means that model identity is not hidden by standard API interfaces and that subsequent model-specific attacks become feasible once the embedding model is known.

Core claim

In the black-box setting limited to unordered retrieved document sets, queries can be constructed whose returned document collections differ in ways that uniquely fingerprint the underlying embedding model among known candidates; the same queries remain effective when a reranker is present, and they also bypass LLM safeguards in a RAG system.

What carries the argument

Embedding inference attack via tailored queries whose returned document sets serve as distinctive fingerprints across candidate models.

If this is right

Once the embedding model is identified, the adversary can apply model-specific follow-on attacks such as embedding inversion.
Rerankers alone do not prevent model identification from document-set observations.
Similarity-threshold defenses can be tested as a countermeasure to reduce query effectiveness.
The attack extends to RAG systems where the same queries evade LLM input filters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Production IR APIs may need query monitoring or output randomization to prevent systematic model fingerprinting.
Similar fingerprinting could be attempted against other hidden components such as the reranker itself.
The closed-candidate assumption could be relaxed by first clustering models from public repositories before launching the attack.

Load-bearing premise

The attacker knows a closed list of candidate embedding models and the crafted queries produce document sets that are sufficiently different for each model.

What would settle it

A set of queries that, when issued to each candidate model, produce identical or statistically indistinguishable unordered document sets would falsify the attack's ability to discriminate models.

Figures

Figures reproduced from arXiv: 2607.01276 by Cedric Fitiavana Raelijohn, Jean-Francois Rajotte, S\'ebastien Gambs.

**Figure 2.** Figure 2: Effect of top-k on the performance of the attack. Effect of Reranker. From an attacker’s perspective, the presence and architecture of a reranker in the target system are typically unknown. Nevertheless, EIA remains effective even in such situation. More precisely as shown in [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Embedding models are essential components of modern Information Retrieval (IR) systems, yet they are typically hidden behind APIs. Recent works have shown that dense IR system can lead to security vulnerabilities such as embedding inversion attacks. However, such attacks usually require that the attacker knows the embedding model for the attack to be applicable. In this paper, we study IR systems under a black-box setting in which the adversary observes only the unordered set of retrieved documents, without ranking or similarity scores. We demonstrate that in such contexts, tailored queries allow an adversary to identify which embedding model is in use from a set of known model candidate, which we coin as an embedding inference attack (EIA). We also show that certain queries remain discriminative even when the system includes a reranker as a potential defense mechanism. We further validate our method on a real Retrieval-Augmented Generation (RAG) system, in which the tailored queries bypass the LLM's tendency to reject inputs it does not recognize as well-formed questions. Finally, we propose and evaluate other mitigation strategies such as similarity thresholds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows you can fingerprint which embedding model is in use behind a black-box retrieval API by sending crafted queries and inspecting the unordered document sets returned, even with a reranker present.

read the letter

The core result is that an adversary with a closed list of candidate models can identify the one in use from the sets of documents a black-box system returns. The authors call this an embedding inference attack and show the queries stay useful even when a reranker is added. They also run the method against a real RAG pipeline and test a couple of mitigation ideas such as similarity thresholds.

What is new is the explicit focus on the unordered black-box regime. Earlier inversion attacks usually assumed the attacker already knew the model or had access to scores and rankings. Here the only signal is the set of documents, which is a realistic constraint for many public APIs.

The work is straightforward and empirical. They pick queries that produce different retrieval sets across models, measure how well those sets separate the candidates, and check that a reranker does not erase the signal. The RAG experiment is a nice practical check because it shows the queries can slip past an LLM that would otherwise refuse malformed input.

The main limitation is the closed candidate set and the dependence on corpus and model diversity. If the candidates come from the same family or the corpus is narrow, the returned sets may overlap enough that identification drops toward chance. The paper would be stronger with more detail on how query selection scales and on accuracy numbers across varied corpora and model families. The mitigation section is brief and would benefit from stronger baselines.

This is the kind of paper that matters to people who run or audit retrieval APIs. It is not a theoretical breakthrough, but it flags a concrete exposure. I would send it to peer review; the claim is testable and the setting is realistic enough that referees can check the numbers directly.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to demonstrate an embedding inference attack (EIA) in which tailored queries enable an adversary to identify the embedding model in use within a black-box IR system, based solely on the unordered set of retrieved documents. The attack is asserted to remain effective even when a reranker is present as a defense, is validated on a real RAG system (where queries bypass LLM rejection of non-question inputs), and the authors propose and evaluate mitigations such as similarity thresholds.

Significance. If the empirical demonstration holds with high identification accuracy, the result is significant because it identifies a new, realistic side-channel for model inference in embedding-based retrieval and RAG pipelines that does not require access to scores or rankings. The black-box unordered-set threat model and the extension to real RAG systems are strengths; the explicit evaluation of defenses further increases practical relevance in the security literature.

major comments (2)

[Abstract, paragraph 3] Abstract, paragraph 3: the central claim that tailored queries produce document sets distinctive enough to identify the embedding model from a closed candidate set is load-bearing, yet the manuscript provides no visible quantitative evidence (identification accuracy, set-overlap metrics, or success-rate tables) and no description of the query-generation method, so the claim cannot be checked against data.
[Abstract] Abstract: the assertion that certain queries remain discriminative even with a reranker is load-bearing for the attack's robustness claim, but without reported accuracy numbers under the reranker condition or details on how document-set distinctiveness is preserved, it is impossible to assess whether the attack generalizes beyond the specific tested models and corpus.

minor comments (2)

The acronym EIA is introduced without a literature check for prior usage; a brief note on novelty of the term would improve clarity.
[Abstract] The abstract contains several long sentences that could be split to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and for highlighting the need for greater transparency in the abstract. We address the two major comments below and will make the requested revisions to improve verifiability.

read point-by-point responses

Referee: [Abstract, paragraph 3] Abstract, paragraph 3: the central claim that tailored queries produce document sets distinctive enough to identify the embedding model from a closed candidate set is load-bearing, yet the manuscript provides no visible quantitative evidence (identification accuracy, set-overlap metrics, or success-rate tables) and no description of the query-generation method, so the claim cannot be checked against data.

Authors: We agree that the abstract should include quantitative support for the central claim. The body of the manuscript reports identification accuracies, set-overlap metrics, and success-rate tables, together with the query-generation procedure. We will revise the abstract to incorporate representative accuracy figures and a brief description of the query-generation method so that the claim can be assessed directly from the abstract. revision: yes
Referee: [Abstract] Abstract: the assertion that certain queries remain discriminative even with a reranker is load-bearing for the attack's robustness claim, but without reported accuracy numbers under the reranker condition or details on how document-set distinctiveness is preserved, it is impossible to assess whether the attack generalizes beyond the specific tested models and corpus.

Authors: We acknowledge that the abstract currently omits numerical results for the reranker setting. The manuscript contains these accuracy numbers and an analysis of how set distinctiveness is retained under reranking. We will update the abstract to report the relevant accuracy figures and a short note on preservation of distinctiveness. revision: yes

Circularity Check

0 steps flagged

Empirical demonstration with no derivation chain or load-bearing self-references

full rationale

The paper is an empirical attack demonstration: it shows via experiments that tailored queries can produce distinctive unordered document sets sufficient to identify an embedding model from a closed candidate set, even with a reranker present. No equations, parameter fitting, predictions derived from fits, or self-citation chains are described in the abstract or claimed structure. The central claim rests on observed experimental distinctiveness rather than any reduction to inputs by construction. This is the most common honest non-finding for attack papers that do not invoke uniqueness theorems or ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of a known candidate pool and the empirical distinctiveness of document sets; no free parameters or invented physical entities are introduced.

axioms (1)

domain assumption The target system uses exactly one embedding model from a known finite candidate set.
Stated in abstract paragraph 3 as the setting for model identification.

pith-pipeline@v0.9.1-grok · 5716 in / 1110 out tokens · 20662 ms · 2026-07-03T20:38:41.030782+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 29 canonical work pages · 3 internal anchors

[1]

Maya Anderson, Guy Amit, and Abigail Goldsteen. 2025. https://doi.org/10.5220/0013108300003899 Is my data in your retrieval database? membership inference attacks against retrieval augmented generation . International Conference on Information Systems Security and Privacy, 2:474--485. Publisher Copyright: 2025 by SCITEPRESS – Science and Technology Public...

work page doi:10.5220/0013108300003899 2025
[2]

Muhammad Arslan, Hussam Ghanem, Saba Munawar, and Christophe Cruz. 2024. https://doi.org/10.1016/j.procs.2024.09.178 A survey on rag with llms . Procedia Computer Science, 246:3781--3790. 28th International Conference on Knowledge Based and Intelligent information and Engineering Systems (KES 2024)

work page doi:10.1016/j.procs.2024.09.178 2024
[3]

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2016. Ms marco: A human generated machine reading comprehension dataset. In InCoCo@NIPS

2016
[4]

Matan Ben-Tov and Mahmood Sharif. 2024. https://api.semanticscholar.org/CorpusID:275133232 Gasliteing the retrieval: Exploring vulnerabilities in dense embedding-based search . Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security

2024
[5]

Andreea-Elena Bodea, Stephen Meisenbacher, Alexandra Klymenko, and Florian Matthes. 2026. https://arxiv.org/abs/2601.03979 Sok: Privacy risks and mitigations in retrieval-augmented generation systems . Preprint, arXiv:2601.03979

work page arXiv 2026
[6]

Andrew Brown, Muhammad Roman, and Barry Devereux. 2025. https://doi.org/10.3390/bdcc9120320 A systematic literature review of retrieval-augmented generation: Techniques, metrics, and challenges . Big Data and Cognitive Computing, 9(12)

work page doi:10.3390/bdcc9120320 2025
[7]

James Calam. 2023. https://www.pinecone.io/learn/series/rag/rerankers/ Rerankers and two-stage retrieval . Pinecone Learn

2023
[8]

Laura Caspari, Kanishka Ghosh Dastidar, Saber Zerhoudi, Jelena Mitrovic, and Michael Granitzer. 2024. https://ceur-ws.org/Vol-3784/short4.pdf Beyond benchmarks: Evaluating embedding model similarity for retrieval augmented generation systems . In Proceedings of the Workshop Information Retrieval's Role in RAG Systems (IR-RAG 2024) co-located with the 47th...

2024
[9]

Yu, Qiang Yang, and Xing Xie

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie. 2024. https://doi.org/10.1145/3641289 A survey on evaluation of large language models . ACM Trans. Intell. Syst. Technol., 15(3)

work page doi:10.1145/3641289 2024
[10]

Yiyi Chen, Qiongkai Xu, and Johannes Bjerva. 2025. https://doi.org/10.18653/v1/2025.acl-long.1185 ALGEN : Few-shot inversion attacks on textual embeddings via cross-model alignment and generation . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 24330--24348, Vienna, Austria. Associ...

work page doi:10.18653/v1/2025.acl-long.1185 2025
[11]

Cohere. 2026. https://docs.cohere.com/docs/cohere-embed Cohere's embed models (details and application) . Accessed: 2026-06-03

2026
[12]

Florin Cuconasu, Giovanni Trappolini, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, and Fabrizio Silvestri. 2024. Rethinking relevance: How noise and distractors impact retrieval-augmented generation. In Proceedings of the 14th Italian Information Retrieval Workshop (IIR 2024). CEUR-WS

2024
[13]

Adam Dziedzic, Franziska Boenisch, Mingjian Jiang, Haonan Duan, and Nicolas Papernot. 2023. https://openreview.net/forum?id=XN5qOxI8gkz Sentence embedding encoders are easy to steal but hard to defend . In ICLR 2023 Workshop on Pitfalls of limited data and computation for Trustworthy ML

2023
[14]

Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, and Wei Wang. 2022. https://doi.org/10.18653/v1/2022.acl-long.62 Language-agnostic BERT sentence embedding . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 878--891, Dublin, Ireland. Association for Computational Linguistics

work page doi:10.18653/v1/2022.acl-long.62 2022
[15]

Minghao Hu, Yuxing Peng, Zhen Huang, and Dongsheng Li. 2019. https://doi.org/10.18653/v1/P19-1221 Retrieve, read, rerank: Towards end-to-end multi-document reading comprehension . In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2285--2295, Florence, Italy. Association for Computational Linguistics

work page doi:10.18653/v1/p19-1221 2019
[16]

Yu-Hsiang Huang, Yuche Tsai, Hsiang Hsiao, Hong-Yi Lin, and Shou-De Lin. 2024. https://doi.org/10.18653/v1/2024.acl-long.230 Transferable embedding inversion attack: Uncovering privacy risks in text embeddings without model queries . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4...

work page doi:10.18653/v1/2024.acl-long.230 2024
[17]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020 a . https://doi.org/10.18653/v1/2020.emnlp-main.550 Dense passage retrieval for open-domain question answering . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769--6781, Online. ...

work page doi:10.18653/v1/2020.emnlp-main.550 2020
[18]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick SH Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020 b . Dense passage retrieval for open-domain question answering. In EMNLP (1), pages 6769--6781

2020
[19]

Doohyun Kim, Donghwa Kang, Kyungjae Lee, Hyeongboo Baek, and Brent Byunghoon Kang. 2026. https://arxiv.org/abs/2602.01757 Zero2text: Zero-training cross-domain inversion attacks on textual embeddings . Preprint, arXiv:2602.01757

work page arXiv 2026
[20]

Desmarais

Antoine Lefebvre - Brossard, Stephane Gazaille, and Michel C. Desmarais. 2023. https://doi.org/10.48550/ARXIV.2302.07738 Alloprof: a new french question-answer education dataset and its use in an information retrieval case study . CoRR, abs/2302.07738

work page doi:10.48550/arxiv.2302.07738 2023
[21]

Haoran Li, Mingshi Xu, and Yangqiu Song. 2023. https://doi.org/10.18653/v1/2023.findings-acl.881 Sentence embedding leaks more information than you expect: Generative embedding inversion attack to recover the whole sentence . In Findings of the Association for Computational Linguistics: ACL 2023, pages 14022--14040, Toronto, Canada. Association for Comput...

work page doi:10.18653/v1/2023.findings-acl.881 2023
[22]

Chin-Yew Lin. 2004. https://aclanthology.org/W04-1013/ ROUGE : A package for automatic evaluation of summaries . In Text Summarization Branches Out, pages 74--81, Barcelona, Spain. Association for Computational Linguistics

2004
[23]

Yang Lu, Yue Chen, and Carsten Eickhoff. 2025. https://arxiv.org/html/2502.04645v3 Cross-encoder rediscovers a semantic variant of bm25 . arXiv preprint

work page arXiv 2025
[24]

Manning, Prabhakar Raghavan, and Hinrich Sch \"u tze

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch \"u tze. 2008 a . https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-unranked-retrieval-sets-1.html Evaluation of unranked retrieval sets , chapter 8. Cambridge University Press. Section 8.2

2008
[25]

Manning, Prabhakar Raghavan, and Hinrich Sch\" u tze

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch\" u tze. 2008 b . https://doi.org/10.1017/cbo9780511809071 Scoring, term weighting, and the vector space model , chapter 6. Cambridge University Press

work page doi:10.1017/cbo9780511809071 2008
[26]

Mintplex Labs . 2024. Anythingllm: A full-stack application that turns any document, resource, or content into context that any llm can use as references during chatting. https://github.com/mintplex-labs/anything-llm. Accessed: 2026-04-29

2024
[27]

John Morris, Volodymyr Kuleshov, Vitaly Shmatikov, and Alexander Rush. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.765 Text embeddings reveal (almost) as much as text . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12448--12460, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.emnlp-main.765 2023
[28]

Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernandez Abrego, Ji Ma, Vincent Zhao, Yi Luan, Keith Hall, Ming-Wei Chang, and Yinfei Yang. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.669 Large dual encoders are generalizable retrievers . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9844--9855, A...

work page doi:10.18653/v1/2022.emnlp-main.669 2022
[29]

OpenAI. 2024. https://openai.com/index/new-embedding-models-and-api-updates/ New embedding models and api updates . Accessed: 2026-06-03

2024
[30]

Tejul Pandit, Sakshi Mahendru, Meet Raval, and Dhvani Upadhyay. 2025. https://arxiv.org/abs/2512.16236 The evolution of reranking models in information retrieval: From heuristic methods to large language models . CLNLP'25, page 15

work page arXiv 2025
[31]

Kornaropoulos, and Giuseppe Ateniese

Dario Pasquini, Evgenios M. Kornaropoulos, and Giuseppe Ateniese. 2024. https://api.semanticscholar.org/CorpusID:271328475 Llmmap: Fingerprinting for large language models . ArXiv, abs/2407.15847

work page arXiv 2024
[32]

Nils Reimers and Iryna Gurevych. 2019. https://doi.org/10.18653/v1/D19-1410 Sentence- BERT : Sentence embeddings using S iamese BERT -networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982--3992, Hong Kong, Chi...

work page doi:10.18653/v1/d19-1410 2019
[33]

Sahel Sharifymoghaddam, Ronak Pradeep, Andre Slavescu, Ryan Nguyen, Andrew Xu, Zijian Chen, Yilin Zhang, Yidi Chen, Jasper Xian, and Jimmy Lin. 2025. https://doi.org/10.1145/3726302.3730331 Rankllm: A python package for reranking with llms . In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval,...

work page doi:10.1145/3726302.3730331 2025
[34]

Manveer Singh Tamber, Jasper Xian, and Jimmy Lin. 2025. https://doi.org/10.18653/v1/2025.findings-naacl.104 Can ' t hide behind the API : Stealing black-box commercial embedding models . In Findings of the Association for Computational Linguistics: NAACL 2025, pages 1958--1969, Albuquerque, New Mexico. Association for Computational Linguistics

work page doi:10.18653/v1/2025.findings-naacl.104 2025
[35]

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2024. https://arxiv.org/abs/2212.03533 Text embeddings by weakly-supervised contrastive pre-training . Preprint, arXiv:2212.03533

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

Xingrui Xie, Han Liu, Wenzhe Hou, and Hongbin Huang. 2023. https://doi.org/10.1109/BigDIA60676.2023.10429609 A brief survey of vector databases . In 2023 9th International Conference on Big Data and Information Analytics (BigDIA), pages 364--371

work page doi:10.1109/bigdia60676.2023.10429609 2023
[37]

Shenglai Zeng, Jiankun Zhang, Pengfei He, Yiding Liu, Yue Xing, Han Xu, Jie Ren, Yi Chang, Shuaiqiang Wang, Dawei Yin, and Jiliang Tang. 2024. https://doi.org/10.18653/v1/2024.findings-acl.267 The good and the bad: Exploring privacy issues in retrieval-augmented generation ( RAG ) . In Findings of the Association for Computational Linguistics: ACL 2024, p...

work page doi:10.18653/v1/2024.findings-acl.267 2024
[38]

Morris, and Vitaly Shmatikov

Collin Zhang, John X. Morris, and Vitaly Shmatikov. 2025 a . https://api.semanticscholar.org/CorpusID:277467864 Universal zero-shot embedding inversion . ArXiv, abs/2504.00147

work page arXiv 2025
[39]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. 2025 b . Qwen3 embedding: Advancing text embedding and reranking through foundation models. arXiv preprint arXiv:2506.05176

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

Zexuan Zhong, Ziqing Huang, Alexander Wettig, and Danqi Chen. 2023. https://api.semanticscholar.org/CorpusID:264828956 Poisoning retrieval corpora by injecting adversarial passages . ArXiv, abs/2310.19156

work page arXiv 2023
[41]

Yujia Zhou, Yan Liu, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Zheng Liu, Chaozhuo Li, Zhicheng Dou, Tsung-Yi Ho, and Philip S. Yu. 2024. https://api.semanticscholar.org/CorpusID:272689561 Trustworthiness in retrieval-augmented generation systems: A survey . ArXiv, abs/2409.10102

work page internal anchor Pith review Pith/arXiv arXiv 2024

[1] [1]

Maya Anderson, Guy Amit, and Abigail Goldsteen. 2025. https://doi.org/10.5220/0013108300003899 Is my data in your retrieval database? membership inference attacks against retrieval augmented generation . International Conference on Information Systems Security and Privacy, 2:474--485. Publisher Copyright: 2025 by SCITEPRESS – Science and Technology Public...

work page doi:10.5220/0013108300003899 2025

[2] [2]

Muhammad Arslan, Hussam Ghanem, Saba Munawar, and Christophe Cruz. 2024. https://doi.org/10.1016/j.procs.2024.09.178 A survey on rag with llms . Procedia Computer Science, 246:3781--3790. 28th International Conference on Knowledge Based and Intelligent information and Engineering Systems (KES 2024)

work page doi:10.1016/j.procs.2024.09.178 2024

[3] [3]

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2016. Ms marco: A human generated machine reading comprehension dataset. In InCoCo@NIPS

2016

[4] [4]

Matan Ben-Tov and Mahmood Sharif. 2024. https://api.semanticscholar.org/CorpusID:275133232 Gasliteing the retrieval: Exploring vulnerabilities in dense embedding-based search . Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security

2024

[5] [5]

Andreea-Elena Bodea, Stephen Meisenbacher, Alexandra Klymenko, and Florian Matthes. 2026. https://arxiv.org/abs/2601.03979 Sok: Privacy risks and mitigations in retrieval-augmented generation systems . Preprint, arXiv:2601.03979

work page arXiv 2026

[6] [6]

Andrew Brown, Muhammad Roman, and Barry Devereux. 2025. https://doi.org/10.3390/bdcc9120320 A systematic literature review of retrieval-augmented generation: Techniques, metrics, and challenges . Big Data and Cognitive Computing, 9(12)

work page doi:10.3390/bdcc9120320 2025

[7] [7]

James Calam. 2023. https://www.pinecone.io/learn/series/rag/rerankers/ Rerankers and two-stage retrieval . Pinecone Learn

2023

[8] [8]

Laura Caspari, Kanishka Ghosh Dastidar, Saber Zerhoudi, Jelena Mitrovic, and Michael Granitzer. 2024. https://ceur-ws.org/Vol-3784/short4.pdf Beyond benchmarks: Evaluating embedding model similarity for retrieval augmented generation systems . In Proceedings of the Workshop Information Retrieval's Role in RAG Systems (IR-RAG 2024) co-located with the 47th...

2024

[9] [9]

Yu, Qiang Yang, and Xing Xie

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie. 2024. https://doi.org/10.1145/3641289 A survey on evaluation of large language models . ACM Trans. Intell. Syst. Technol., 15(3)

work page doi:10.1145/3641289 2024

[10] [10]

Yiyi Chen, Qiongkai Xu, and Johannes Bjerva. 2025. https://doi.org/10.18653/v1/2025.acl-long.1185 ALGEN : Few-shot inversion attacks on textual embeddings via cross-model alignment and generation . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 24330--24348, Vienna, Austria. Associ...

work page doi:10.18653/v1/2025.acl-long.1185 2025

[11] [11]

Cohere. 2026. https://docs.cohere.com/docs/cohere-embed Cohere's embed models (details and application) . Accessed: 2026-06-03

2026

[12] [12]

Florin Cuconasu, Giovanni Trappolini, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, and Fabrizio Silvestri. 2024. Rethinking relevance: How noise and distractors impact retrieval-augmented generation. In Proceedings of the 14th Italian Information Retrieval Workshop (IIR 2024). CEUR-WS

2024

[13] [13]

Adam Dziedzic, Franziska Boenisch, Mingjian Jiang, Haonan Duan, and Nicolas Papernot. 2023. https://openreview.net/forum?id=XN5qOxI8gkz Sentence embedding encoders are easy to steal but hard to defend . In ICLR 2023 Workshop on Pitfalls of limited data and computation for Trustworthy ML

2023

[14] [14]

Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, and Wei Wang. 2022. https://doi.org/10.18653/v1/2022.acl-long.62 Language-agnostic BERT sentence embedding . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 878--891, Dublin, Ireland. Association for Computational Linguistics

work page doi:10.18653/v1/2022.acl-long.62 2022

[15] [15]

Minghao Hu, Yuxing Peng, Zhen Huang, and Dongsheng Li. 2019. https://doi.org/10.18653/v1/P19-1221 Retrieve, read, rerank: Towards end-to-end multi-document reading comprehension . In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2285--2295, Florence, Italy. Association for Computational Linguistics

work page doi:10.18653/v1/p19-1221 2019

[16] [16]

Yu-Hsiang Huang, Yuche Tsai, Hsiang Hsiao, Hong-Yi Lin, and Shou-De Lin. 2024. https://doi.org/10.18653/v1/2024.acl-long.230 Transferable embedding inversion attack: Uncovering privacy risks in text embeddings without model queries . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4...

work page doi:10.18653/v1/2024.acl-long.230 2024

[17] [17]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020 a . https://doi.org/10.18653/v1/2020.emnlp-main.550 Dense passage retrieval for open-domain question answering . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769--6781, Online. ...

work page doi:10.18653/v1/2020.emnlp-main.550 2020

[18] [18]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick SH Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020 b . Dense passage retrieval for open-domain question answering. In EMNLP (1), pages 6769--6781

2020

[19] [19]

Doohyun Kim, Donghwa Kang, Kyungjae Lee, Hyeongboo Baek, and Brent Byunghoon Kang. 2026. https://arxiv.org/abs/2602.01757 Zero2text: Zero-training cross-domain inversion attacks on textual embeddings . Preprint, arXiv:2602.01757

work page arXiv 2026

[20] [20]

Desmarais

Antoine Lefebvre - Brossard, Stephane Gazaille, and Michel C. Desmarais. 2023. https://doi.org/10.48550/ARXIV.2302.07738 Alloprof: a new french question-answer education dataset and its use in an information retrieval case study . CoRR, abs/2302.07738

work page doi:10.48550/arxiv.2302.07738 2023

[21] [21]

Haoran Li, Mingshi Xu, and Yangqiu Song. 2023. https://doi.org/10.18653/v1/2023.findings-acl.881 Sentence embedding leaks more information than you expect: Generative embedding inversion attack to recover the whole sentence . In Findings of the Association for Computational Linguistics: ACL 2023, pages 14022--14040, Toronto, Canada. Association for Comput...

work page doi:10.18653/v1/2023.findings-acl.881 2023

[22] [22]

Chin-Yew Lin. 2004. https://aclanthology.org/W04-1013/ ROUGE : A package for automatic evaluation of summaries . In Text Summarization Branches Out, pages 74--81, Barcelona, Spain. Association for Computational Linguistics

2004

[23] [23]

Yang Lu, Yue Chen, and Carsten Eickhoff. 2025. https://arxiv.org/html/2502.04645v3 Cross-encoder rediscovers a semantic variant of bm25 . arXiv preprint

work page arXiv 2025

[24] [24]

Manning, Prabhakar Raghavan, and Hinrich Sch \"u tze

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch \"u tze. 2008 a . https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-unranked-retrieval-sets-1.html Evaluation of unranked retrieval sets , chapter 8. Cambridge University Press. Section 8.2

2008

[25] [25]

Manning, Prabhakar Raghavan, and Hinrich Sch\" u tze

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch\" u tze. 2008 b . https://doi.org/10.1017/cbo9780511809071 Scoring, term weighting, and the vector space model , chapter 6. Cambridge University Press

work page doi:10.1017/cbo9780511809071 2008

[26] [26]

Mintplex Labs . 2024. Anythingllm: A full-stack application that turns any document, resource, or content into context that any llm can use as references during chatting. https://github.com/mintplex-labs/anything-llm. Accessed: 2026-04-29

2024

[27] [27]

John Morris, Volodymyr Kuleshov, Vitaly Shmatikov, and Alexander Rush. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.765 Text embeddings reveal (almost) as much as text . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12448--12460, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.emnlp-main.765 2023

[28] [28]

Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernandez Abrego, Ji Ma, Vincent Zhao, Yi Luan, Keith Hall, Ming-Wei Chang, and Yinfei Yang. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.669 Large dual encoders are generalizable retrievers . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9844--9855, A...

work page doi:10.18653/v1/2022.emnlp-main.669 2022

[29] [29]

OpenAI. 2024. https://openai.com/index/new-embedding-models-and-api-updates/ New embedding models and api updates . Accessed: 2026-06-03

2024

[30] [30]

Tejul Pandit, Sakshi Mahendru, Meet Raval, and Dhvani Upadhyay. 2025. https://arxiv.org/abs/2512.16236 The evolution of reranking models in information retrieval: From heuristic methods to large language models . CLNLP'25, page 15

work page arXiv 2025

[31] [31]

Kornaropoulos, and Giuseppe Ateniese

Dario Pasquini, Evgenios M. Kornaropoulos, and Giuseppe Ateniese. 2024. https://api.semanticscholar.org/CorpusID:271328475 Llmmap: Fingerprinting for large language models . ArXiv, abs/2407.15847

work page arXiv 2024

[32] [32]

Nils Reimers and Iryna Gurevych. 2019. https://doi.org/10.18653/v1/D19-1410 Sentence- BERT : Sentence embeddings using S iamese BERT -networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982--3992, Hong Kong, Chi...

work page doi:10.18653/v1/d19-1410 2019

[33] [33]

Sahel Sharifymoghaddam, Ronak Pradeep, Andre Slavescu, Ryan Nguyen, Andrew Xu, Zijian Chen, Yilin Zhang, Yidi Chen, Jasper Xian, and Jimmy Lin. 2025. https://doi.org/10.1145/3726302.3730331 Rankllm: A python package for reranking with llms . In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval,...

work page doi:10.1145/3726302.3730331 2025

[34] [34]

Manveer Singh Tamber, Jasper Xian, and Jimmy Lin. 2025. https://doi.org/10.18653/v1/2025.findings-naacl.104 Can ' t hide behind the API : Stealing black-box commercial embedding models . In Findings of the Association for Computational Linguistics: NAACL 2025, pages 1958--1969, Albuquerque, New Mexico. Association for Computational Linguistics

work page doi:10.18653/v1/2025.findings-naacl.104 2025

[35] [35]

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2024. https://arxiv.org/abs/2212.03533 Text embeddings by weakly-supervised contrastive pre-training . Preprint, arXiv:2212.03533

work page internal anchor Pith review Pith/arXiv arXiv 2024

[36] [36]

Xingrui Xie, Han Liu, Wenzhe Hou, and Hongbin Huang. 2023. https://doi.org/10.1109/BigDIA60676.2023.10429609 A brief survey of vector databases . In 2023 9th International Conference on Big Data and Information Analytics (BigDIA), pages 364--371

work page doi:10.1109/bigdia60676.2023.10429609 2023

[37] [37]

Shenglai Zeng, Jiankun Zhang, Pengfei He, Yiding Liu, Yue Xing, Han Xu, Jie Ren, Yi Chang, Shuaiqiang Wang, Dawei Yin, and Jiliang Tang. 2024. https://doi.org/10.18653/v1/2024.findings-acl.267 The good and the bad: Exploring privacy issues in retrieval-augmented generation ( RAG ) . In Findings of the Association for Computational Linguistics: ACL 2024, p...

work page doi:10.18653/v1/2024.findings-acl.267 2024

[38] [38]

Morris, and Vitaly Shmatikov

Collin Zhang, John X. Morris, and Vitaly Shmatikov. 2025 a . https://api.semanticscholar.org/CorpusID:277467864 Universal zero-shot embedding inversion . ArXiv, abs/2504.00147

work page arXiv 2025

[39] [39]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. 2025 b . Qwen3 embedding: Advancing text embedding and reranking through foundation models. arXiv preprint arXiv:2506.05176

work page internal anchor Pith review Pith/arXiv arXiv 2025

[40] [40]

Zexuan Zhong, Ziqing Huang, Alexander Wettig, and Danqi Chen. 2023. https://api.semanticscholar.org/CorpusID:264828956 Poisoning retrieval corpora by injecting adversarial passages . ArXiv, abs/2310.19156

work page arXiv 2023

[41] [41]

Yujia Zhou, Yan Liu, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Zheng Liu, Chaozhuo Li, Zhicheng Dou, Tsung-Yi Ho, and Philip S. Yu. 2024. https://api.semanticscholar.org/CorpusID:272689561 Trustworthiness in retrieval-augmented generation systems: A survey . ArXiv, abs/2409.10102

work page internal anchor Pith review Pith/arXiv arXiv 2024