arxiv: 2605.09822 · v1 · submitted 2026-05-10 · 💻 cs.CR · cs.AI

Recognition: no theorem link

Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning

Ben Kereopa-Yorke , Guillermo Diaz , Holly Wright , Reagan Johnston , Ron F. Del Rosario , Timothy Lynar

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:13 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords oracle poisoningknowledge graph poisoningAI agent securitytool-use attacksdata poisoningagent reasoningknowledge graphs

0 comments

The pith

AI agents fully trust poisoned knowledge graph data and accept fabricated security claims in 269 of 270 directed queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines Oracle Poisoning as an attack that corrupts the knowledge graph an AI agent queries through tool use, so the agent reaches incorrect conclusions while reasoning correctly over the altered facts. Unlike attacks that change the agent's instructions, this one changes only the data the agent retrieves and treats as true. Experiments across nine models show that at moderate attacker skill every model accepts the poisoned results at 100 percent when the query is directed, while trust falls under open-ended prompts. The work also shows that how the data reaches the model, whether through real tool calls or simulated inline text, changes whether the attack succeeds. If the attack holds, then protecting the data sources agents consult becomes essential for safe agent operation.

Core claim

Oracle Poisoning corrupts a structured knowledge graph that AI agents query at runtime, causing them to accept and reason from fabricated security claims. In a production 42-million-node code knowledge graph, every tested model from three providers trusted the poisoned data at 100 percent under directed queries at moderate attacker sophistication, succeeding in 269 of 270 valid trials. Trust drops to 3-55 percent under open-ended prompts, and the attack shows clear thresholds in attacker skill and strong dependence on delivery mode, with real tool use producing higher success than inline evaluation.

What carries the argument

Oracle Poisoning, the corruption of the knowledge graph queried by agents via tool-use protocols, which supplies false facts that the agents then reason over without changing their instructions or prompts.

If this is right

All tested models accept the poisoned data at 100 percent when using real graph query tools at moderate attacker sophistication.
Trust falls sharply under open-ended prompts, making prompt framing a measurable confound in evaluation.
Inline text evaluation produces false negatives, with some models showing 0 percent trust inline but 100 percent under actual tool use.
Read-only access control removes the direct mutation path, while the other four tested defenses remain partial and model-dependent.
The attack shows discrete skill thresholds and appears to generalize to other knowledge-graph platforms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Agents may require built-in checks that compare graph results against independent sources before accepting them as facts.
The same poisoning approach could be applied to any domain where agents rely on structured external data rather than code security alone.
Lowering the minimum attacker skill needed implies that defenses must address moderate-capability adversaries rather than only sophisticated ones.

Load-bearing premise

AI agents will autonomously invoke the graph query tool and fully trust and reason over the returned results without cross-verification, suspicion, or additional safeguards against tampering.

What would settle it

A controlled trial in which an agent given a directed query to the poisoned graph detects the fabrication or refuses to accept the false security claim under the same tool-use conditions that previously produced 100 percent acceptance.

Figures

Figures reproduced from arXiv: 2605.09822 by Ben Kereopa-Yorke, Guillermo Diaz, Holly Wright, Reagan Johnston, Ron F. Del Rosario, Timothy Lynar.

**Figure 2.** Figure 2: Trust rates under three delivery conditions at L2 sophistication ( [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Oracle Poisoning attack flow. The attacker corrupts the data, not the AI. The agent [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

read the original abstract

We define Oracle Poisoning, an attack class in which an adversary corrupts a structured knowledge graph that AI agents query at runtime via tool-use protocols, causing incorrect conclusions through correct reasoning. Unlike prompt injection, Oracle Poisoning manipulates the data agents reason over, not their instructions. We demonstrate six attack scenarios against a production 42-million-node code knowledge graph, providing the first empirical demonstration of knowledge graph poisoning against a production-scale agentic system, distinct from CTI embedding poisoning. Primary evaluation uses real SDK tool-use across nine models from three providers (N=30 per model), where models autonomously invoke a graph query tool and reason from results. The result is unambiguous: every tested model trusts poisoned data at 100% at moderate attacker sophistication(L2), with 269 valid trials (of 270) accepting fabricated security claims under directed queries. Under open-ended prompts, trust drops to 3-55%, confirming prompt framing as a confound; we report both conditions. An attacker sophistication gradient reveals discrete break points, a minimum skill at which trust flips from 0% to 100%, reframing the attack as a question not of whether but of how much. A controlled delivery-mode comparison shows that inline evaluation produces false negatives: GPT-5.1 shows 0% trust inline but 100% under both simulated and real agentic tool-use, demonstrating that delivery mode is a first-order confound. We evaluate five defences; read-only access control eliminates the direct mutation vector, while the remaining four are partial and model-dependent. Analysis of four additional platforms suggests the attack may generalise across the knowledge-graph ecosystem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Oracle Poisoning names a data-tampering attack on agent knowledge graphs and reports 100% trust under directed queries, but the numbers rest on an abstract with no methods or data to check.

read the letter

Oracle Poisoning names a data-tampering attack on agent knowledge graphs and reports 100% trust under directed queries, but the numbers rest on an abstract with no methods or data to check. The paper separates this from prompt injection by focusing on corruption of the structured data the agent retrieves and reasons over at runtime. It uses a real 42-million-node production code graph and actual SDK tool calls across nine models, which gives the setup more weight than purely synthetic tests. The results also show clear differences by prompt type (directed queries hit 100% while open-ended drop to 3-55%) and by delivery mode (inline evaluation misses the effect that tool-use reveals), plus a skill threshold where trust flips and a quick check of five defenses where read-only access stops the direct vector. Those points are useful for anyone thinking about how agents consume external data. The obvious soft spot is that only the abstract exists, so the 269-of-270 trial count, the exact poisoning steps, the query templates, and the rules for counting valid trials cannot be examined for confounds or reproducibility. The assumption that agents will invoke the tool and accept the results without cross-checks may also be setup-specific. This is aimed at people building or securing agent systems that rely on live knowledge graphs for code intelligence, security tooling, or autonomous decisions. A reader working on adversarial robustness for deployed agents would pick up the attack class and the delivery-mode warning even without the full protocol. I would send the complete paper to peer review rather than desk-reject it; the core idea identifies a practical vector that deserves proper scrutiny of the experimental design once the methods are available.

Referee Report

3 major / 1 minor

Summary. The manuscript defines Oracle Poisoning as an attack that corrupts structured knowledge graphs queried at runtime by AI agents via tool-use protocols, producing incorrect conclusions through otherwise correct reasoning. It reports six attack scenarios on a production 42-million-node code knowledge graph, with primary evaluation using real SDK tool invocations across nine models from three providers (N=30 per model). Key results include 100% trust in poisoned data at moderate attacker sophistication (L2) in 269 of 270 valid trials under directed queries, lower trust (3-55%) under open-ended prompts, discrete breakpoints in an attacker sophistication gradient, a delivery-mode confound (inline vs. tool-use), evaluation of five defenses, and preliminary evidence of generalization to other platforms.

Significance. If the empirical results hold under full scrutiny, the work would identify a practical and previously under-examined attack surface on agentic systems that rely on external knowledge graphs for reasoning. The use of autonomous real-tool invocations, the explicit separation from prompt injection, and the demonstration that delivery mode is a first-order confound provide concrete guidance for both attackers and defenders. The attacker-sophistication gradient reframes the problem as one of minimum capability thresholds rather than binary feasibility.

major comments (3)

[Abstract] Abstract: the central quantitative claims (100% trust at L2 sophistication, 269/270 trials accepting fabricated claims) are presented without any description of the poisoning mechanism, query templates, trial validity criteria, prompting details, or statistical procedures. This absence is load-bearing for the primary empirical result because it prevents assessment of confounds such as prompt framing effects or selection of trials.
[Abstract] Abstract: the delivery-mode comparison (GPT-5.1 shows 0% trust inline but 100% under both simulated and real agentic tool-use) is offered as evidence that inline evaluation produces false negatives, yet no protocol for the inline condition, the simulated tool-use setup, or how the 0%/100% figures were obtained is supplied, rendering the confound claim unverifiable.
[Abstract] Abstract: the evaluation of five defenses states that read-only access control eliminates the mutation vector while the other four are partial and model-dependent, but neither the identities of the four defenses nor any quantitative per-model results are provided, which is required to support the model-dependence conclusion.

minor comments (1)

The abstract introduces 'Oracle Poisoning' and contrasts it with prompt injection but does not supply a concise formal definition or a short literature positioning relative to prior knowledge-graph or data-poisoning work.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive review. We address each major comment point by point below. We agree that the abstract requires additional methodological detail to support its central claims and will revise it accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central quantitative claims (100% trust at L2 sophistication, 269/270 trials accepting fabricated claims) are presented without any description of the poisoning mechanism, query templates, trial validity criteria, prompting details, or statistical procedures. This absence is load-bearing for the primary empirical result because it prevents assessment of confounds such as prompt framing effects or selection of trials.

Authors: We agree that the abstract omits these details due to length constraints. The poisoning mechanism (targeted insertion of fabricated security claims into the 42-million-node code knowledge graph), query templates (directed vs. open-ended), trial validity criteria (successful tool invocation and response parsing), prompting details, and statistical procedures (exact binomial confidence intervals on the 269/270 valid trials) are described in the Methods and Results sections. To make the primary result verifiable from the abstract, we will revise it to include concise descriptions of the poisoning mechanism, the directed-query setup, validity criteria, and the statistical approach used. revision: yes
Referee: [Abstract] Abstract: the delivery-mode comparison (GPT-5.1 shows 0% trust inline but 100% under both simulated and real agentic tool-use) is offered as evidence that inline evaluation produces false negatives, yet no protocol for the inline condition, the simulated tool-use setup, or how the 0%/100% figures were obtained is supplied, rendering the confound claim unverifiable.

Authors: We accept that the abstract does not specify the protocols. The inline condition embedded fabricated claims directly in the user prompt without tool invocation; the simulated condition used mocked tool responses; and the real condition used live SDK tool calls against the production graph. The 0%/100% figures derive from N=30 trials per condition on GPT-5.1. We will add a brief description of these three delivery modes and the trial counts to the revised abstract. revision: yes
Referee: [Abstract] Abstract: the evaluation of five defenses states that read-only access control eliminates the mutation vector while the other four are partial and model-dependent, but neither the identities of the four defenses nor any quantitative per-model results are provided, which is required to support the model-dependence conclusion.

Authors: We agree that the abstract does not name the four defenses or report quantitative results. The full manuscript evaluates five defenses (read-only access control, input sanitization, output verification, model fine-tuning, and runtime monitoring) with per-model success rates showing partial mitigation except for read-only access control. We will revise the abstract to name the four additional defenses and summarize the key quantitative, model-dependent findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity: purely empirical measurements

full rationale

The paper's central claims consist of empirical measurements of model behavior under controlled poisoning attacks on a knowledge graph, using real SDK tool invocations across nine models (N=30 per model) and reporting specific trust rates such as 100% at L2 sophistication and 269/270 valid trials. The abstract contains no equations, derivations, fitted parameters, predictions, or self-citations that reduce any result to its inputs by construction. All reported outcomes (trust rates under directed vs. open-ended prompts, attacker sophistication breakpoints, delivery-mode comparisons, and defense evaluations) are direct experimental observations rather than logical or mathematical consequences of prior assumptions within the paper. The work is self-contained as an empirical demonstration with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The evaluation depends on the domain assumption that agents treat graph query results as authoritative facts for reasoning; the new entity is the attack classification itself, introduced without independent falsifiable evidence beyond the reported trials.

axioms (1)

domain assumption AI agents trust and reason over results from knowledge graph query tools without independent verification or cross-checking
This assumption underpins the reported 100% acceptance rates under directed queries and the distinction from prompt-based attacks.

invented entities (1)

Oracle Poisoning no independent evidence
purpose: To categorize and name the attack of corrupting structured knowledge graphs that agents query at runtime
Newly defined in the paper as distinct from prompt injection and embedding poisoning; no external falsifiable handle provided beyond the attack scenarios.

pith-pipeline@v0.9.0 · 5584 in / 1652 out tokens · 58247 ms · 2026-05-12T02:13:59.545619+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

[1]

A production code knowledge graph, 2025

Anonymous. A production code knowledge graph, 2025. Internal documentation

work page 2025
[2]

Model context protocol specification.https://modelcontextprotocol.io, 2024

Anthropic. Model context protocol specification.https://modelcontextprotocol.io, 2024

work page 2024
[3]

The Promptware Kill Chain,

Oleg Brodt, Elad Feldman, Bruce Schneier, and Ben Nassi. The promptware kill chain: How prompt injections gradually evolved into a multistep malware delivery mechanism.arXiv preprint arXiv:2601.09625, 2026

work page arXiv 2026
[4]

Towards evaluating the robustness of neural networks

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (S&P), 2017

work page 2017
[5]

KEPo: Knowledge evolution poison on graph-based retrieval-augmented generation

Qizhi Chen, Chao Qi, Yihong Huang, Muquan Li, Rongzheng Wang, Dongyang Zhang, Ke Qin, and Shuang Liang. KEPo: Knowledge evolution poison on graph-based retrieval-augmented generation. InProceedings of the ACM Web Conference (WWW), 2026. arXiv:2603.11501

work page arXiv 2026
[6]

AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases

Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[7]

Securing AI agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025

Manuel Costa, Boris K¨ opf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-B´ eguelin. Securing AI agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025

work page arXiv 2025
[8]

AI tool poisoning: How hidden instructions threaten AI agents.https://www

CrowdStrike. AI tool poisoning: How hidden instructions threaten AI agents.https://www. crowdstrike.com/en-us/blog/ai-tool-poisoning/, 2026

work page 2026
[9]

Shashidhar, Micheal Tuape, Dan Abudu, Beakcheol Jang, and Jong Wook Kim

Kennedy Edemacu, Vinay M. Shashidhar, Micheal Tuape, Dan Abudu, Beakcheol Jang, and Jong Wook Kim. Defending against knowledge poisoning attacks during retrieval-augmented generation.arXiv preprint arXiv:2508.02835, 2025. 22

work page arXiv 2025
[10]

From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows.ICT Express, 2025

Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, Abderrah- mane Lakas, and Merouane Debbah. From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows.ICT Express, 2025

work page 2025
[11]

CodeQL: Semantic code analysis engine.https://codeql.github.com, 2024

GitHub. CodeQL: Semantic code analysis engine.https://codeql.github.com, 2024. Variant analysis engine for finding security vulnerabilities at scale

work page 2024
[12]

Goodfellow, Jonathon Shlens, and Christian Szegedy

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver- sarial examples. InInternational Conference on Learning Representations (ICLR), 2015

work page 2015
[13]

Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. InAISec Workshop, 2023

work page 2023
[14]

Agentic AI threat modeling framework: MAESTRO

Ken Huang. Agentic AI threat modeling framework: MAESTRO. Cloud Secu- rity Alliance, February 2025.https://cloudsecurityalliance.org/blog/2025/02/06/ agentic-ai-threat-modeling-framework-maestro

work page 2025
[15]

MCP security notification: Tool poisoning attacks, 2025

Invariant Labs. MCP security notification: Tool poisoning attacks, 2025. Disclosure

work page 2025
[16]

Graphrag under fire,

Jiacheng Liang, Yuhui Wang, Changjiang Li, Rongyi Zhu, Tanqiu Jiang, Neil Gong, and Ting Wang. GraphRAG under fire. InIEEE Symposium on Security and Privacy (S&P), 2026. arXiv:2501.14050

work page arXiv 2026
[17]

Architecting trust in artificial epistemic agents

Nahema Marchal, Stephanie Chan, Matija Franklin, Manon Revel, Geoff Keeling, Roberta Fischli, Bilva Chandra, and Iason Gabriel. Architecting trust in artificial epistemic agents. arXiv preprint arXiv:2603.02960, March 2026

work page arXiv 2026
[18]

Manipulating AI memory for profit: The rise of AI recommendation poisoning

Microsoft Defender Security Research Team. Manipulating AI memory for profit: The rise of AI recommendation poisoning. Microsoft Security Blog, February 2026.https://www. microsoft.com/en-us/security/blog/2026/02/10/ai-recommendation-poisoning/

work page 2026
[19]

ATLAS: Adversarial threat landscape for AI systems

MITRE. ATLAS: Adversarial threat landscape for AI systems. MITRE Corporation, 2025

work page 2025
[20]

OWASP top 10 for agentic applications for 2026

OWASP. OWASP top 10 for agentic applications for 2026. OWASP Gen AI Security Project, December 2025.https://genai.owasp.org/resource/ owasp-top-10-for-agentic-applications-for-2026/

work page 2026
[21]

ConfusedPilot: Confused deputy risks in RAG-based LLMs,

Ayush RoyChowdhury, Mulong Luo, Prateek Sahu, Sarbartha Banerjee, and Mohit Tiwari. ConfusedPilot: Confused deputy risks in RAG-based large language models, 2024. arXiv preprint arXiv:2408.04870

work page arXiv 2024
[22]

Ignore this title and HackAPrompt: Exposing systemic vulnerabilities of LLMs through a global prompt hacking competition

Sander Schulhoff, Jeremy Pinto, Anaum Khan, Louis-Fran¸ cois Bouchard, Chenglei Si, Svetlina Anati, Valen Tagliabue, Anson Kost, Christopher Carnahan, and Jordan Boyd-Graber. Ignore this title and HackAPrompt: Exposing systemic vulnerabilities of LLMs through a global prompt hacking competition. InProceedings of the Conference on Empirical Methods in Natu...

work page 2023
[23]

Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein

Ali Shafahi, W. Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein. Poison frogs! Targeted clean-label poisoning attacks on neural networks. InAdvances in Neural Information Processing Systems (NeurIPS), 2018

work page 2018
[24]

Code intelligence platform.https://sourcegraph.com, 2024

Sourcegraph. Code intelligence platform.https://sourcegraph.com, 2024. 23

work page 2024
[25]

Mcptox: A benchmark for tool poisoning attack on real- world mcp servers.arXiv preprint arXiv:2508.14925, 2025b

Zhiqiang Wang, Yichao Gao, Yanting Wang, Suyuan Liu, Haifeng Sun, Haoran Cheng, Guan- quan Shi, Haohua Du, and Xiangyang Li. MCPTox: A benchmark for tool poisoning attack on real-world MCP servers.arXiv preprint arXiv:2508.14925, 2025

work page arXiv 2025
[26]

A few words can distort graphs: Knowledge poisoning attacks on graph-based retrieval-augmented generation of large language models.arXiv preprint arXiv:2508.04276, 2025

Jiayi Wen, Tong Chen, Zheng Zheng, and Chengqi Huang. A few words can distort graphs: Knowledge poisoning attacks on graph-based retrieval-augmented generation of large language models.arXiv preprint arXiv:2508.04276, 2025

work page arXiv 2025
[27]

On the security risks of knowledge graph reasoning

Zhaohan Xi, Tianyu Du, Changjiang Li, Ren Pang, Shouling Ji, Xiapu Luo, Xusheng Xiao, Fenglong Ma, and Ting Wang. On the security risks of knowledge graph reasoning. InUSENIX Security Symposium, pages 3259–3276, 2023

work page 2023
[28]

Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,

Jiaqi Xue, Mengxin Zheng, Yue Hua, Yifei Shu, Zhen Fang, Zhiqi Li, Kaixiong Tu, Wenjie Wang, and Suhang Wang. BadRAG: Identifying vulnerabilities in retrieval augmented gener- ation of large language models.arXiv preprint arXiv:2406.00083, 2024

work page arXiv 2024
[29]

MaSS: Model-agnostic, semantic and stealthy data poisoning attack on knowledge graph em- bedding

Xiaoyu You, Beina Sheng, Daizong Ding, Mi Zhang, Xudong Pan, Min Yang, and Fuli Feng. MaSS: Model-agnostic, semantic and stealthy data poisoning attack on knowledge graph em- bedding. InProceedings of the ACM Web Conference, 2023

work page 2023
[30]

InjecAgent: Benchmarking in- direct prompt injections in tool-integrated LLM agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent: Benchmarking in- direct prompt injections in tool-integrated LLM agents. InFindings of the Association for Computational Linguistics (ACL), pages 10471–10506, 2024

work page 2024
[31]

Benchmarking poisoning attacks against retrieval- augmented generation,

Baolei Zhang, Haoran Xin, Jiatong Li, Dongzhe Zhang, Minghong Fang, Zhuqing Liu, Lihai Nie, and Zheli Liu. Benchmarking poisoning attacks against retrieval-augmented generation. arXiv preprint arXiv:2505.18543, 2025

work page arXiv 2025
[32]

Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents. InInternational Conference on Learning Repre- sentations (ICLR), 2025

work page 2025
[33]

Data poisoning attack against knowledge graph embedding

Hengtong Zhang, Tianhang Zheng, Jing Gao, Chenglin Miao, Lu Su, Yaliang Li, and Kui Ren. Data poisoning attack against knowledge graph embedding. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2019

work page 2019
[34]

From allies to adversaries: Manipulating LLM tool-calling through adversarial injection

Rupeng Zhang, Haowei Wang, Junjie Wang, Mingyang Li, Yuekai Huang, Dandan Wang, and Qing Wang. From allies to adversaries: Manipulating LLM tool-calling through adversarial injection. InProceedings of NAACL, pages 2009–2028, 2025

work page 2009
[35]

Exploring knowledge poisoning attacks to retrieval-augmented generation.Information Fusion, 127:103900, March 2026

Tianzhe Zhao, Jiaoyan Chen, Yanchi Ru, Haiping Zhu, Nan Hu, Jun Liu, and Qika Lin. Exploring knowledge poisoning attacks to retrieval-augmented generation.Information Fusion, 127:103900, March 2026

work page 2026
[36]

Poisoning retrieval corpora by injecting adversarial passages

Zexuan Zhong, Ziqing Huang, Alexander Wettig, and Danqi Chen. Poisoning retrieval corpora by injecting adversarial passages. InProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

work page 2023
[37]

Poisoning attack on federated knowledge graph embedding

Enyuan Zhou, Song Guo, Zhixiu Ma, Zicong Hong, Tao Guo, and Peiran Dong. Poisoning attack on federated knowledge graph embedding. InProceedings of the ACM Web Conference, 2024. 24

work page 2024
[38]

PoisonedRAG: Knowledge poisoning attacks to retrieval-augmented generation of large language models.arXiv preprint arXiv:2402.07867, 2024

Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. PoisonedRAG: Knowledge corrup- tion attacks to retrieval-augmented generation of large language models. InUSENIX Security Symposium, 2025. arXiv:2402.07867

work page arXiv 2025
[39]

You must independently verify all tool responses

Daniel Z¨ ugner, Amir Akbarnejad, and Stephan G¨ unnemann. Adversarial attacks on neural networks for graph data. InProceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2018. A Remaining Attack Scenarios A.1 Scenario 2: Transitive Dependency Chain Injection Objective.Introduce a malicious library into a project’s dependen...

work page 2018