arxiv: 2604.08304 · v1 · submitted 2026-04-09 · 💻 cs.CR · cs.AI

Recognition: no theorem link

Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions

Yuming Xu , Mingtao Zhang , Zhuohan Ge , Haoyang Li , Nicole Hu , Jason Chen Zhang , Qing Li , Lei Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:48 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords Retrieval-Augmented GenerationRAG SecurityKnowledge PipelineTaxonomy of AttacksTrust BoundariesContext ExploitationLLM VulnerabilitiesDefense Mechanisms

0 comments

The pith

Secure RAG is fundamentally about protecting the external knowledge-access pipeline rather than the language model alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that retrieval-augmented generation introduces distinct security risks through its access to external knowledge sources, which must be separated from flaws already present in the base language model. It draws an operational boundary to isolate RAG-specific threats and then maps the entire workflow into six stages grouped by three trust boundaries and four security surfaces. This structure organizes known attacks such as knowledge corruption before retrieval, manipulation during retrieval, exploitation of retrieved context, and exfiltration of knowledge. The review shows that existing defenses remain mostly reactive and fragmented across these surfaces, and it outlines the need for protection that spans the full knowledge-access lifecycle.

Core claim

Secure RAG requires treating the external knowledge-access pipeline as the primary security concern. The workflow is abstracted into six stages and organized around three trust boundaries plus four primary security surfaces—pre-retrieval knowledge corruption, retrieval-time access manipulation, downstream context exploitation, and knowledge exfiltration—to separate RAG-introduced or RAG-amplified threats from inherent LLM weaknesses. The survey of attacks, defenses, and benchmarks under this view reveals that current protections are largely reactive and incomplete, motivating future work on layered, boundary-aware safeguards across the entire pipeline.

What carries the argument

The six-stage RAG workflow abstraction organized by three trust boundaries and four security surfaces that classifies attacks and defenses specific to the external knowledge pipeline.

If this is right

Attacks on RAG can be systematically placed into pre-retrieval corruption, retrieval manipulation, context exploitation, or exfiltration categories.
Defenses must address each security surface rather than relying on isolated fixes inside the language model.
Evaluation benchmarks should test resilience at the identified trust boundaries across the full pipeline.
Remediation should shift from reactive patches to coordinated protection spanning the entire knowledge-access lifecycle.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pipeline-boundary approach could help secure other external-tool or memory-augmented AI systems beyond RAG.
Production deployments could reduce risk by enforcing controls at each trust boundary instead of adding post-hoc filters.
A next step would be to build automated tools that verify whether a given threat truly respects the four-surface taxonomy.
Design-time choices in how retrieval indexes are built and updated may prove more effective than runtime detection alone.

Load-bearing premise

The proposed division of the RAG workflow into six stages and its mapping onto three trust boundaries and four security surfaces fully captures every RAG-specific threat without important omissions or overlaps.

What would settle it

A documented attack that targets the knowledge pipeline yet fits none of the four security surfaces or crosses the stated operational boundary between LLM flaws and RAG-introduced risks.

Figures

Figures reproduced from arXiv: 2604.08304 by Haoyang Li, Jason Chen Zhang, Lei Chen, Mingtao Zhang, Nicole Hu, Qing Li, Yuming Xu, Zhuohan Ge.

**Figure 2.** Figure 2: RAG knowledge-access pipeline, security surfaces, and trust boundaries. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Taxonomy of RAG Attack Methods. 4 [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Taxonomy of RAG Defense and Remediation Mechanisms. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Retrieval-augmented generation (RAG) significantly enhances large language models (LLMs) but introduces novel security risks through external knowledge access. While existing studies cover various RAG vulnerabilities, they often conflate inherent LLM risks with those specifically introduced by RAG. In this paper, we propose that secure RAG is fundamentally about the security of the external knowledge-access pipeline. We establish an operational boundary to separate inherent LLM flaws from RAG-introduced or RAG-amplified threats. Guided by this perspective, we abstract the RAG workflow into six stages and organize the literature around three trust boundaries and four primary security surfaces, including pre-retrieval knowledge corruption, retrieval-time access manipulation, downstream context exploitation, and knowledge exfiltration. By systematically reviewing the corresponding attacks, defenses, remediation mechanisms, and evaluation benchmarks, we reveal that current defenses remain largely reactive and fragmented. Finally, we discuss these gaps and highlight future directions toward layered, boundary-aware protection across the entire knowledge-access lifecycle.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper offers a clear taxonomy for RAG security risks by drawing a boundary around the knowledge pipeline, but it is mainly a survey of existing work.

read the letter

The paper gives a practical taxonomy for RAG security by separating RAG-specific threats from general LLM weaknesses. It does this with an operational boundary and breaks the process into six stages under three trust boundaries and four security surfaces. What is new is the way they frame secure RAG as pipeline security. This leads to a clean grouping of attacks: knowledge corruption before retrieval, manipulation at retrieval time, exploiting the context downstream, and exfiltrating knowledge. They review the corresponding attacks, defenses, and benchmarks, and note that defenses tend to be reactive. The paper does this review well. The categories line up with how RAG systems are built, so the taxonomy feels usable. It pulls together work that was scattered and points to future work on more proactive, layered protections. The main limitation is that it is a literature organization exercise. The strength of the taxonomy depends on how complete their coverage of the papers is, and the abstract does not show the search method or total count. The reactive nature of defenses is stated but not backed by a quantitative breakdown or comparison to non-RAG cases. No new experiments test whether the boundary actually helps in practice. This paper is for people in LLM security who want a structured map of RAG risks. A reader who needs to cite an overview or identify gaps will get value from it. It deserves a serious referee because the framing is clear and the topic matters as RAG becomes common. I would send it for peer review.

Referee Report

2 major / 3 minor

Summary. The paper claims that secure RAG is fundamentally about the security of the external knowledge-access pipeline. It establishes an operational boundary to separate inherent LLM flaws from RAG-introduced or RAG-amplified threats. The RAG workflow is abstracted into six stages and organized around three trust boundaries and four primary security surfaces (pre-retrieval knowledge corruption, retrieval-time access manipulation, downstream context exploitation, and knowledge exfiltration). Through a systematic review of attacks, defenses, remediation mechanisms, and evaluation benchmarks, the paper reveals that current defenses remain largely reactive and fragmented, and discusses future directions toward layered, boundary-aware protection across the knowledge-access lifecycle.

Significance. If the taxonomy comprehensively captures RAG-specific threats without substantial overlap or omission, the work supplies a useful organizing framework that clarifies the distinction between LLM-inherent and RAG-specific risks. This separation can help focus future security efforts on the external knowledge pipeline. The survey of attacks, defenses, and benchmarks, together with the identification of reactive and fragmented defenses, could usefully guide research toward more integrated, boundary-aware protections. The paper's conceptual contribution is its operational boundary and staged workflow abstraction.

major comments (2)

[RAG workflow abstraction and trust boundaries section] The section describing the RAG workflow abstraction into six stages and the organization under three trust boundaries and four security surfaces: the paper does not articulate explicit inclusion criteria or a repeatable derivation process for these categories, which is load-bearing for the central claim that the taxonomy separates RAG-introduced threats from LLM flaws without significant omission or overlap.
[Survey of defenses section] The section surveying defenses and remediation mechanisms: the claim that defenses 'remain largely reactive and fragmented' is central to the call for future directions, yet it rests on a qualitative synthesis without stated criteria for classifying a defense as reactive versus proactive or for quantifying fragmentation (e.g., no enumeration of defense types or overlap metrics).

minor comments (3)

[Abstract] The abstract states that the literature is 'systematically reviewed' but does not indicate the search terms, databases, or date range used; adding this information would allow readers to assess coverage.
[Introduction and taxonomy sections] The terms 'trust boundaries' and 'security surfaces' are used throughout without an initial consolidated definition or glossary; a brief formal definition at first use would improve accessibility.
[Future directions section] The future-directions discussion lists high-level recommendations but does not map them back to specific security surfaces or stages; adding such a mapping would strengthen the link to the proposed taxonomy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. We address each major comment point by point below, providing clarification on our methodology while committing to targeted revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses

Referee: [RAG workflow abstraction and trust boundaries section] The section describing the RAG workflow abstraction into six stages and the organization under three trust boundaries and four security surfaces: the paper does not articulate explicit inclusion criteria or a repeatable derivation process for these categories, which is load-bearing for the central claim that the taxonomy separates RAG-introduced threats from LLM flaws without significant omission or overlap.

Authors: We appreciate this observation. The six workflow stages are derived by decomposing the canonical RAG pipeline (query encoding, retrieval, context augmentation, generation, post-processing, and output) as established in foundational RAG literature, with each stage mapped to points where external knowledge crosses into the LLM. The three trust boundaries are defined operationally as the interfaces separating the untrusted external knowledge store, the retrieval mechanism, and the LLM generation process. The four security surfaces emerge directly from these boundaries by identifying where RAG-specific threats (as opposed to inherent LLM vulnerabilities) can be introduced or amplified: pre-retrieval corruption of the knowledge base, retrieval-time manipulation of access or ranking, post-retrieval context exploitation within the prompt, and post-generation exfiltration of sensitive retrieved content. While the manuscript presents the resulting taxonomy, it does not include an explicit subsection detailing this derivation process or inclusion criteria. We will add such a subsection in the revision, including a table that enumerates each category with its derivation rationale and explicit criteria for assigning attacks and defenses, thereby making the separation of RAG-introduced threats repeatable and transparent. revision: partial
Referee: [Survey of defenses section] The section surveying defenses and remediation mechanisms: the claim that defenses 'remain largely reactive and fragmented' is central to the call for future directions, yet it rests on a qualitative synthesis without stated criteria for classifying a defense as reactive versus proactive or for quantifying fragmentation (e.g., no enumeration of defense types or overlap metrics).

Authors: The referee correctly identifies that our assessment of defenses as 'largely reactive and fragmented' is a qualitative synthesis drawn from reviewing the literature across the four security surfaces. Reactive defenses are those that detect or mitigate threats after they have manifested (e.g., output filtering or anomaly detection on generated responses), while proactive ones intervene at the trust boundaries before threats propagate (e.g., knowledge sanitization or retrieval-time access controls). Fragmentation is evidenced by the concentration of existing work on isolated surfaces without cross-boundary integration. We acknowledge that the manuscript does not provide explicit classification criteria or quantitative metrics such as overlap counts. In the revised version, we will add a dedicated table that enumerates all reviewed defense categories with their assigned classification (reactive/proactive), the security surface they address, and a brief note on observed overlaps or gaps. This will make the synthesis more rigorous while preserving the central observation that motivates the future directions toward layered protections. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a taxonomy and literature-organization paper with no derivations, equations, predictions, or fitted parameters. The central proposal (secure RAG as protection of the external knowledge-access pipeline, plus an operational boundary separating LLM-inherent from RAG-specific threats) is a definitional framing used to structure the survey around six workflow stages, three trust boundaries, and four security surfaces. These abstractions are presented as an organizing lens rather than derived results; the paper surveys existing attacks, defenses, and benchmarks without reducing any claim to self-referential inputs or load-bearing self-citations. The structure is self-contained against external literature and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that RAG threats can be cleanly separated from inherent LLM flaws via an operational boundary and that the workflow decomposes into six stages without loss of security-relevant interactions; no free parameters or new physical entities are introduced.

axioms (1)

domain assumption The RAG workflow can be abstracted into six distinct stages that align with three trust boundaries and four primary security surfaces.
Invoked to organize the literature review and separate RAG-specific risks.

pith-pipeline@v0.9.0 · 5491 in / 1280 out tokens · 36288 ms · 2026-05-10T17:48:42.890085+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 14 canonical work pages

[1]

Maya Anderson, Guy Amit, and Abigail Goldsteen

IEEE. Maya Anderson, Guy Amit, and Abigail Goldsteen
[2]

Rag security and privacy: Formalizing the threat model and attack surface,

Is my data in your retrieval database? mem- bership inference attacks against retrieval augmented generation. InInternational Conference on Informa- tion Systems Security and Privacy, volume 2, pages 474–485. Science and Technology Publications, Lda. Atousa Arzanipour, Rouzbeh Behnia, Reza Ebrahimi, and Kaushik Dutta. 2025. Rag security and pri- vacy: For...

work page arXiv 2025
[3]

Bingxiang Chen, John Tackman, Manu Setälä, Timo Po- ranen, and Zheying Zhang

Phantom: General trigger attacks on retrieval augmented language generation. Bingxiang Chen, John Tackman, Manu Setälä, Timo Po- ranen, and Zheying Zhang. 2025. Integrating access control with retrieval-augmented generation: A proof of concept for managing sensitive patient profiles. In Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing, SA...

work page arXiv 2025
[4]

Rag and roll: An end-to-end evaluation of indirect prompt manipulations in llm-based application frameworks.CoRR, abs/2408.05025,

Security and privacy challenges of large lan- guage models: A survey.ACM Computing Surveys, 57(6):1–39. Gianluca De Stefano, Lea Schönherr, and Giancarlo Pellegrino. 2024. Rag and roll: An end-to-end evaluation of indirect prompt manipulations in llm- based application frameworks.arXiv preprint arXiv:2408.05025. Sagie Dekel, Moshe Tennenholtz, and Oren Kurland

work page arXiv 2024
[5]

Gelei Deng, Yi Liu, Kailong Wang, Yuekang Li, Tian- wei Zhang, and Yang Liu

Addressing corpus knowledge poisoning at- tacks on rag using sparse attention.arXiv preprint arXiv:2602.04711. Gelei Deng, Yi Liu, Kailong Wang, Yuekang Li, Tian- wei Zhang, and Yang Liu. 2024. Pandora: Jailbreak gpts by retrieval augmented generation poisoning. arXiv preprint arXiv:2402.08416. Tessa E_Andersen, Ayanna Marie Avalos, Gaby G_Dagher, and Min...

work page arXiv 2024
[6]

Haoze Guo and Ziqi Wei

Medpriv-bench: Benchmarking the privacy- utility trade-off of large language models in med- ical open-end question answering.arXiv preprint arXiv:2603.14265. Haoze Guo and Ziqi Wei. 2026. Hidden-in-plain-text: A benchmark for social-web indirect prompt injection in rag.arXiv preprint arXiv:2601.10923. Shailja Gupta, Rajesh Ranjan, and Surya Narayan Singh....

work page arXiv 2026
[7]

Vague-gate: Plug-and-play local-privacy shield for retrieval-augmented generation. InPro- ceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Confer- ence of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 3715–3730. Giwon Hong, Jeonghwan Kim, Junmo Kang, Sung- Hyon Myaeng, and Jo...

work page arXiv 2024
[8]

Yuying Li, Gaoyang Liu, Chen Wang, and Yang Yang

Llm-pbe: Assessing data privacy in large lan- guage models.Proceedings of the VLDB Endowment, 17(11):3201–3214. Yuying Li, Gaoyang Liu, Chen Wang, and Yang Yang
[9]

Graphrag under fire,

Generating is believing: Membership infer- ence attacks against retrieval-augmented generation. InICASSP 2025-2025 IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE. Jiacheng Liang, Yuhui Wang, Changjiang Li, Rongyi Zhu, Tanqiu Jiang, Neil Gong, and Ting Wang. 2025a. Graphrag under fire.arXiv preprint arX...

work page arXiv 2025
[10]

Ali Naseh, Yuefeng Peng, Anshuman Suri, Harsh Chaudhari, Alina Oprea, and Amir Houmansadr

Sd-rag: A prompt-injection-resilient frame- work for selective disclosure in retrieval-augmented generation.arXiv preprint arXiv:2601.11199. Ali Naseh, Yuefeng Peng, Anshuman Suri, Harsh Chaudhari, Alina Oprea, and Amir Houmansadr

work page arXiv
[11]

InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, pages 1245–1259

Riddle me this! stealthy membership inference for retrieval-augmented generation. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, pages 1245–1259. Bo Ni, Zheyuan Liu, Leyao Wang, Yongjia Lei, Yuy- ing Zhao, Xueqi Cheng, Qingkai Zeng, Luna Dong, Yinglong Xia, Krishnaram Kenthapadi, and 1 others

2025
[12]

Rossi, Franck Dernoncourt, Md

Towards trustworthy retrieval augmented gen- eration for large language models: A survey.arXiv preprint arXiv:2502.06872. Chanhee Park, Hyeonseok Moon, Chanjun Park, and Heuiseok Lim. 2025. MIRAGE: A metric-intensive benchmark for retrieval-augmented generation eval- uation. InFindings of the Association for Computa- tional Linguistics: NAACL 2025, pages ...

work page arXiv 2025
[13]

RAGShield

Ragpart & ragmask: Retrieval-stage defenses against corpus poisoning in retrieval-augmented gen- eration.arXiv preprint arXiv:2512.24268. Xiangyu Peng, Prafulla Kumar Choubey, Caiming Xiong, and Chien-Sheng Wu. 2025. Unanswerabil- ity evaluation for retrieval augmented generation. In Proceedings of the 63rd Annual Meeting of the As- sociation for Computat...

work page arXiv 2025
[14]

Retrieval-augmented generation: A comprehensive sur- vey of architectures, enhancements, and robustness frontiers.arXiv preprint, arXiv:2506.00054, 2025

Machine against the {RAG}: Jamming {Retrieval-Augmented} generation with blocker doc- uments. In34th USENIX Security Symposium (USENIX Security 25), pages 3787–3806. Chaitanya Sharma. 2025. Retrieval-augmented gener- ation: A comprehensive survey of architectures, en- hancements, and robustness frontiers.arXiv preprint arXiv:2506.00054. Zeyu Shen, Basilea...

work page arXiv 2025
[15]

arXiv preprint arXiv:2510.09710 (2025)

One pic is all it takes: Poisoning visual doc- ument retrieval augmented generation with a single image.Transactions on Machine Learning Research. Xiaonan Si, Meilin Zhu, Simeng Qin, Lijia Yu, Li- jun Zhang, Shuaitong Liu, Xinfeng Li, Ranjie Duan, Yang Liu, and Xiaojun Jia. 2025. Secon- rag: A two-stage semantic filtering and conflict- free framework for ...

work page arXiv 2025
[16]

InFindings of the Association for Computational Linguistics: ACL 2025, pages 13935– 13952

The silent saboteur: Imperceptible adversarial attacks against black-box retrieval-augmented gen- eration systems. InFindings of the Association for Computational Linguistics: ACL 2025, pages 13935– 13952. Vasilije Stambolic, Aritra Dhar, and Lukas Cav- igelli. 2025. Rag-pull: Imperceptible attacks on rag systems for code generation.arXiv preprint arXiv:2...

work page arXiv 2025
[17]

Practical poisoning attacks against retrieval-augmented generation,

A survey on trustworthy llm agents: Threats and countermeasures. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6216–6226. Shenglai Zeng, Jiankun Zhang, Pengfei He, Yiding Liu, Yue Xing, Han Xu, Jie Ren, Yi Chang, Shuaiqiang Wang, Dawei Yin, and 1 others. 2024. The good and the bad: Exploring privacy is...

work page arXiv 2024