Recognition: no theorem link
Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions
Pith reviewed 2026-05-10 17:48 UTC · model grok-4.3
The pith
Secure RAG is fundamentally about protecting the external knowledge-access pipeline rather than the language model alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Secure RAG requires treating the external knowledge-access pipeline as the primary security concern. The workflow is abstracted into six stages and organized around three trust boundaries plus four primary security surfaces—pre-retrieval knowledge corruption, retrieval-time access manipulation, downstream context exploitation, and knowledge exfiltration—to separate RAG-introduced or RAG-amplified threats from inherent LLM weaknesses. The survey of attacks, defenses, and benchmarks under this view reveals that current protections are largely reactive and incomplete, motivating future work on layered, boundary-aware safeguards across the entire pipeline.
What carries the argument
The six-stage RAG workflow abstraction organized by three trust boundaries and four security surfaces that classifies attacks and defenses specific to the external knowledge pipeline.
If this is right
- Attacks on RAG can be systematically placed into pre-retrieval corruption, retrieval manipulation, context exploitation, or exfiltration categories.
- Defenses must address each security surface rather than relying on isolated fixes inside the language model.
- Evaluation benchmarks should test resilience at the identified trust boundaries across the full pipeline.
- Remediation should shift from reactive patches to coordinated protection spanning the entire knowledge-access lifecycle.
Where Pith is reading between the lines
- The same pipeline-boundary approach could help secure other external-tool or memory-augmented AI systems beyond RAG.
- Production deployments could reduce risk by enforcing controls at each trust boundary instead of adding post-hoc filters.
- A next step would be to build automated tools that verify whether a given threat truly respects the four-surface taxonomy.
- Design-time choices in how retrieval indexes are built and updated may prove more effective than runtime detection alone.
Load-bearing premise
The proposed division of the RAG workflow into six stages and its mapping onto three trust boundaries and four security surfaces fully captures every RAG-specific threat without important omissions or overlaps.
What would settle it
A documented attack that targets the knowledge pipeline yet fits none of the four security surfaces or crosses the stated operational boundary between LLM flaws and RAG-introduced risks.
Figures
read the original abstract
Retrieval-augmented generation (RAG) significantly enhances large language models (LLMs) but introduces novel security risks through external knowledge access. While existing studies cover various RAG vulnerabilities, they often conflate inherent LLM risks with those specifically introduced by RAG. In this paper, we propose that secure RAG is fundamentally about the security of the external knowledge-access pipeline. We establish an operational boundary to separate inherent LLM flaws from RAG-introduced or RAG-amplified threats. Guided by this perspective, we abstract the RAG workflow into six stages and organize the literature around three trust boundaries and four primary security surfaces, including pre-retrieval knowledge corruption, retrieval-time access manipulation, downstream context exploitation, and knowledge exfiltration. By systematically reviewing the corresponding attacks, defenses, remediation mechanisms, and evaluation benchmarks, we reveal that current defenses remain largely reactive and fragmented. Finally, we discuss these gaps and highlight future directions toward layered, boundary-aware protection across the entire knowledge-access lifecycle.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that secure RAG is fundamentally about the security of the external knowledge-access pipeline. It establishes an operational boundary to separate inherent LLM flaws from RAG-introduced or RAG-amplified threats. The RAG workflow is abstracted into six stages and organized around three trust boundaries and four primary security surfaces (pre-retrieval knowledge corruption, retrieval-time access manipulation, downstream context exploitation, and knowledge exfiltration). Through a systematic review of attacks, defenses, remediation mechanisms, and evaluation benchmarks, the paper reveals that current defenses remain largely reactive and fragmented, and discusses future directions toward layered, boundary-aware protection across the knowledge-access lifecycle.
Significance. If the taxonomy comprehensively captures RAG-specific threats without substantial overlap or omission, the work supplies a useful organizing framework that clarifies the distinction between LLM-inherent and RAG-specific risks. This separation can help focus future security efforts on the external knowledge pipeline. The survey of attacks, defenses, and benchmarks, together with the identification of reactive and fragmented defenses, could usefully guide research toward more integrated, boundary-aware protections. The paper's conceptual contribution is its operational boundary and staged workflow abstraction.
major comments (2)
- [RAG workflow abstraction and trust boundaries section] The section describing the RAG workflow abstraction into six stages and the organization under three trust boundaries and four security surfaces: the paper does not articulate explicit inclusion criteria or a repeatable derivation process for these categories, which is load-bearing for the central claim that the taxonomy separates RAG-introduced threats from LLM flaws without significant omission or overlap.
- [Survey of defenses section] The section surveying defenses and remediation mechanisms: the claim that defenses 'remain largely reactive and fragmented' is central to the call for future directions, yet it rests on a qualitative synthesis without stated criteria for classifying a defense as reactive versus proactive or for quantifying fragmentation (e.g., no enumeration of defense types or overlap metrics).
minor comments (3)
- [Abstract] The abstract states that the literature is 'systematically reviewed' but does not indicate the search terms, databases, or date range used; adding this information would allow readers to assess coverage.
- [Introduction and taxonomy sections] The terms 'trust boundaries' and 'security surfaces' are used throughout without an initial consolidated definition or glossary; a brief formal definition at first use would improve accessibility.
- [Future directions section] The future-directions discussion lists high-level recommendations but does not map them back to specific security surfaces or stages; adding such a mapping would strengthen the link to the proposed taxonomy.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for minor revision. We address each major comment point by point below, providing clarification on our methodology while committing to targeted revisions that strengthen the manuscript without altering its core contributions.
read point-by-point responses
-
Referee: [RAG workflow abstraction and trust boundaries section] The section describing the RAG workflow abstraction into six stages and the organization under three trust boundaries and four security surfaces: the paper does not articulate explicit inclusion criteria or a repeatable derivation process for these categories, which is load-bearing for the central claim that the taxonomy separates RAG-introduced threats from LLM flaws without significant omission or overlap.
Authors: We appreciate this observation. The six workflow stages are derived by decomposing the canonical RAG pipeline (query encoding, retrieval, context augmentation, generation, post-processing, and output) as established in foundational RAG literature, with each stage mapped to points where external knowledge crosses into the LLM. The three trust boundaries are defined operationally as the interfaces separating the untrusted external knowledge store, the retrieval mechanism, and the LLM generation process. The four security surfaces emerge directly from these boundaries by identifying where RAG-specific threats (as opposed to inherent LLM vulnerabilities) can be introduced or amplified: pre-retrieval corruption of the knowledge base, retrieval-time manipulation of access or ranking, post-retrieval context exploitation within the prompt, and post-generation exfiltration of sensitive retrieved content. While the manuscript presents the resulting taxonomy, it does not include an explicit subsection detailing this derivation process or inclusion criteria. We will add such a subsection in the revision, including a table that enumerates each category with its derivation rationale and explicit criteria for assigning attacks and defenses, thereby making the separation of RAG-introduced threats repeatable and transparent. revision: partial
-
Referee: [Survey of defenses section] The section surveying defenses and remediation mechanisms: the claim that defenses 'remain largely reactive and fragmented' is central to the call for future directions, yet it rests on a qualitative synthesis without stated criteria for classifying a defense as reactive versus proactive or for quantifying fragmentation (e.g., no enumeration of defense types or overlap metrics).
Authors: The referee correctly identifies that our assessment of defenses as 'largely reactive and fragmented' is a qualitative synthesis drawn from reviewing the literature across the four security surfaces. Reactive defenses are those that detect or mitigate threats after they have manifested (e.g., output filtering or anomaly detection on generated responses), while proactive ones intervene at the trust boundaries before threats propagate (e.g., knowledge sanitization or retrieval-time access controls). Fragmentation is evidenced by the concentration of existing work on isolated surfaces without cross-boundary integration. We acknowledge that the manuscript does not provide explicit classification criteria or quantitative metrics such as overlap counts. In the revised version, we will add a dedicated table that enumerates all reviewed defense categories with their assigned classification (reactive/proactive), the security surface they address, and a brief note on observed overlaps or gaps. This will make the synthesis more rigorous while preserving the central observation that motivates the future directions toward layered protections. revision: partial
Circularity Check
No significant circularity
full rationale
This is a taxonomy and literature-organization paper with no derivations, equations, predictions, or fitted parameters. The central proposal (secure RAG as protection of the external knowledge-access pipeline, plus an operational boundary separating LLM-inherent from RAG-specific threats) is a definitional framing used to structure the survey around six workflow stages, three trust boundaries, and four security surfaces. These abstractions are presented as an organizing lens rather than derived results; the paper surveys existing attacks, defenses, and benchmarks without reducing any claim to self-referential inputs or load-bearing self-citations. The structure is self-contained against external literature and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The RAG workflow can be abstracted into six distinct stages that align with three trust boundaries and four primary security surfaces.
Reference graph
Works this paper leans on
-
[1]
Maya Anderson, Guy Amit, and Abigail Goldsteen
IEEE. Maya Anderson, Guy Amit, and Abigail Goldsteen
-
[2]
Rag security and privacy: Formalizing the threat model and attack surface,
Is my data in your retrieval database? mem- bership inference attacks against retrieval augmented generation. InInternational Conference on Informa- tion Systems Security and Privacy, volume 2, pages 474–485. Science and Technology Publications, Lda. Atousa Arzanipour, Rouzbeh Behnia, Reza Ebrahimi, and Kaushik Dutta. 2025. Rag security and pri- vacy: For...
-
[3]
Bingxiang Chen, John Tackman, Manu Setälä, Timo Po- ranen, and Zheying Zhang
Phantom: General trigger attacks on retrieval augmented language generation. Bingxiang Chen, John Tackman, Manu Setälä, Timo Po- ranen, and Zheying Zhang. 2025. Integrating access control with retrieval-augmented generation: A proof of concept for managing sensitive patient profiles. In Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing, SA...
-
[4]
Security and privacy challenges of large lan- guage models: A survey.ACM Computing Surveys, 57(6):1–39. Gianluca De Stefano, Lea Schönherr, and Giancarlo Pellegrino. 2024. Rag and roll: An end-to-end evaluation of indirect prompt manipulations in llm- based application frameworks.arXiv preprint arXiv:2408.05025. Sagie Dekel, Moshe Tennenholtz, and Oren Kurland
-
[5]
Gelei Deng, Yi Liu, Kailong Wang, Yuekang Li, Tian- wei Zhang, and Yang Liu
Addressing corpus knowledge poisoning at- tacks on rag using sparse attention.arXiv preprint arXiv:2602.04711. Gelei Deng, Yi Liu, Kailong Wang, Yuekang Li, Tian- wei Zhang, and Yang Liu. 2024. Pandora: Jailbreak gpts by retrieval augmented generation poisoning. arXiv preprint arXiv:2402.08416. Tessa E_Andersen, Ayanna Marie Avalos, Gaby G_Dagher, and Min...
-
[6]
Medpriv-bench: Benchmarking the privacy- utility trade-off of large language models in med- ical open-end question answering.arXiv preprint arXiv:2603.14265. Haoze Guo and Ziqi Wei. 2026. Hidden-in-plain-text: A benchmark for social-web indirect prompt injection in rag.arXiv preprint arXiv:2601.10923. Shailja Gupta, Rajesh Ranjan, and Surya Narayan Singh....
-
[7]
Vague-gate: Plug-and-play local-privacy shield for retrieval-augmented generation. InPro- ceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Confer- ence of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 3715–3730. Giwon Hong, Jeonghwan Kim, Junmo Kang, Sung- Hyon Myaeng, and Jo...
-
[8]
Yuying Li, Gaoyang Liu, Chen Wang, and Yang Yang
Llm-pbe: Assessing data privacy in large lan- guage models.Proceedings of the VLDB Endowment, 17(11):3201–3214. Yuying Li, Gaoyang Liu, Chen Wang, and Yang Yang
-
[9]
Generating is believing: Membership infer- ence attacks against retrieval-augmented generation. InICASSP 2025-2025 IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE. Jiacheng Liang, Yuhui Wang, Changjiang Li, Rongyi Zhu, Tanqiu Jiang, Neil Gong, and Ting Wang. 2025a. Graphrag under fire.arXiv preprint arX...
-
[10]
Ali Naseh, Yuefeng Peng, Anshuman Suri, Harsh Chaudhari, Alina Oprea, and Amir Houmansadr
Sd-rag: A prompt-injection-resilient frame- work for selective disclosure in retrieval-augmented generation.arXiv preprint arXiv:2601.11199. Ali Naseh, Yuefeng Peng, Anshuman Suri, Harsh Chaudhari, Alina Oprea, and Amir Houmansadr
-
[11]
InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, pages 1245–1259
Riddle me this! stealthy membership inference for retrieval-augmented generation. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, pages 1245–1259. Bo Ni, Zheyuan Liu, Leyao Wang, Yongjia Lei, Yuy- ing Zhao, Xueqi Cheng, Qingkai Zeng, Luna Dong, Yinglong Xia, Krishnaram Kenthapadi, and 1 others
2025
-
[12]
Towards trustworthy retrieval augmented gen- eration for large language models: A survey.arXiv preprint arXiv:2502.06872. Chanhee Park, Hyeonseok Moon, Chanjun Park, and Heuiseok Lim. 2025. MIRAGE: A metric-intensive benchmark for retrieval-augmented generation eval- uation. InFindings of the Association for Computa- tional Linguistics: NAACL 2025, pages ...
-
[13]
Ragpart & ragmask: Retrieval-stage defenses against corpus poisoning in retrieval-augmented gen- eration.arXiv preprint arXiv:2512.24268. Xiangyu Peng, Prafulla Kumar Choubey, Caiming Xiong, and Chien-Sheng Wu. 2025. Unanswerabil- ity evaluation for retrieval augmented generation. In Proceedings of the 63rd Annual Meeting of the As- sociation for Computat...
-
[14]
Machine against the {RAG}: Jamming {Retrieval-Augmented} generation with blocker doc- uments. In34th USENIX Security Symposium (USENIX Security 25), pages 3787–3806. Chaitanya Sharma. 2025. Retrieval-augmented gener- ation: A comprehensive survey of architectures, en- hancements, and robustness frontiers.arXiv preprint arXiv:2506.00054. Zeyu Shen, Basilea...
-
[15]
arXiv preprint arXiv:2510.09710 (2025)
One pic is all it takes: Poisoning visual doc- ument retrieval augmented generation with a single image.Transactions on Machine Learning Research. Xiaonan Si, Meilin Zhu, Simeng Qin, Lijia Yu, Li- jun Zhang, Shuaitong Liu, Xinfeng Li, Ranjie Duan, Yang Liu, and Xiaojun Jia. 2025. Secon- rag: A two-stage semantic filtering and conflict- free framework for ...
-
[16]
InFindings of the Association for Computational Linguistics: ACL 2025, pages 13935– 13952
The silent saboteur: Imperceptible adversarial attacks against black-box retrieval-augmented gen- eration systems. InFindings of the Association for Computational Linguistics: ACL 2025, pages 13935– 13952. Vasilije Stambolic, Aritra Dhar, and Lukas Cav- igelli. 2025. Rag-pull: Imperceptible attacks on rag systems for code generation.arXiv preprint arXiv:2...
-
[17]
Practical poisoning attacks against retrieval-augmented generation,
A survey on trustworthy llm agents: Threats and countermeasures. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6216–6226. Shenglai Zeng, Jiankun Zhang, Pengfei He, Yiding Liu, Yue Xing, Han Xu, Jie Ren, Yi Chang, Shuaiqiang Wang, Dawei Yin, and 1 others. 2024. The good and the bad: Exploring privacy is...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.