Recognition: no theorem link
Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning
Pith reviewed 2026-05-12 02:13 UTC · model grok-4.3
The pith
AI agents fully trust poisoned knowledge graph data and accept fabricated security claims in 269 of 270 directed queries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Oracle Poisoning corrupts a structured knowledge graph that AI agents query at runtime, causing them to accept and reason from fabricated security claims. In a production 42-million-node code knowledge graph, every tested model from three providers trusted the poisoned data at 100 percent under directed queries at moderate attacker sophistication, succeeding in 269 of 270 valid trials. Trust drops to 3-55 percent under open-ended prompts, and the attack shows clear thresholds in attacker skill and strong dependence on delivery mode, with real tool use producing higher success than inline evaluation.
What carries the argument
Oracle Poisoning, the corruption of the knowledge graph queried by agents via tool-use protocols, which supplies false facts that the agents then reason over without changing their instructions or prompts.
If this is right
- All tested models accept the poisoned data at 100 percent when using real graph query tools at moderate attacker sophistication.
- Trust falls sharply under open-ended prompts, making prompt framing a measurable confound in evaluation.
- Inline text evaluation produces false negatives, with some models showing 0 percent trust inline but 100 percent under actual tool use.
- Read-only access control removes the direct mutation path, while the other four tested defenses remain partial and model-dependent.
- The attack shows discrete skill thresholds and appears to generalize to other knowledge-graph platforms.
Where Pith is reading between the lines
- Agents may require built-in checks that compare graph results against independent sources before accepting them as facts.
- The same poisoning approach could be applied to any domain where agents rely on structured external data rather than code security alone.
- Lowering the minimum attacker skill needed implies that defenses must address moderate-capability adversaries rather than only sophisticated ones.
Load-bearing premise
AI agents will autonomously invoke the graph query tool and fully trust and reason over the returned results without cross-verification, suspicion, or additional safeguards against tampering.
What would settle it
A controlled trial in which an agent given a directed query to the poisoned graph detects the fabrication or refuses to accept the false security claim under the same tool-use conditions that previously produced 100 percent acceptance.
Figures
read the original abstract
We define Oracle Poisoning, an attack class in which an adversary corrupts a structured knowledge graph that AI agents query at runtime via tool-use protocols, causing incorrect conclusions through correct reasoning. Unlike prompt injection, Oracle Poisoning manipulates the data agents reason over, not their instructions. We demonstrate six attack scenarios against a production 42-million-node code knowledge graph, providing the first empirical demonstration of knowledge graph poisoning against a production-scale agentic system, distinct from CTI embedding poisoning. Primary evaluation uses real SDK tool-use across nine models from three providers (N=30 per model), where models autonomously invoke a graph query tool and reason from results. The result is unambiguous: every tested model trusts poisoned data at 100% at moderate attacker sophistication(L2), with 269 valid trials (of 270) accepting fabricated security claims under directed queries. Under open-ended prompts, trust drops to 3-55%, confirming prompt framing as a confound; we report both conditions. An attacker sophistication gradient reveals discrete break points, a minimum skill at which trust flips from 0% to 100%, reframing the attack as a question not of whether but of how much. A controlled delivery-mode comparison shows that inline evaluation produces false negatives: GPT-5.1 shows 0% trust inline but 100% under both simulated and real agentic tool-use, demonstrating that delivery mode is a first-order confound. We evaluate five defences; read-only access control eliminates the direct mutation vector, while the remaining four are partial and model-dependent. Analysis of four additional platforms suggests the attack may generalise across the knowledge-graph ecosystem.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript defines Oracle Poisoning as an attack that corrupts structured knowledge graphs queried at runtime by AI agents via tool-use protocols, producing incorrect conclusions through otherwise correct reasoning. It reports six attack scenarios on a production 42-million-node code knowledge graph, with primary evaluation using real SDK tool invocations across nine models from three providers (N=30 per model). Key results include 100% trust in poisoned data at moderate attacker sophistication (L2) in 269 of 270 valid trials under directed queries, lower trust (3-55%) under open-ended prompts, discrete breakpoints in an attacker sophistication gradient, a delivery-mode confound (inline vs. tool-use), evaluation of five defenses, and preliminary evidence of generalization to other platforms.
Significance. If the empirical results hold under full scrutiny, the work would identify a practical and previously under-examined attack surface on agentic systems that rely on external knowledge graphs for reasoning. The use of autonomous real-tool invocations, the explicit separation from prompt injection, and the demonstration that delivery mode is a first-order confound provide concrete guidance for both attackers and defenders. The attacker-sophistication gradient reframes the problem as one of minimum capability thresholds rather than binary feasibility.
major comments (3)
- [Abstract] Abstract: the central quantitative claims (100% trust at L2 sophistication, 269/270 trials accepting fabricated claims) are presented without any description of the poisoning mechanism, query templates, trial validity criteria, prompting details, or statistical procedures. This absence is load-bearing for the primary empirical result because it prevents assessment of confounds such as prompt framing effects or selection of trials.
- [Abstract] Abstract: the delivery-mode comparison (GPT-5.1 shows 0% trust inline but 100% under both simulated and real agentic tool-use) is offered as evidence that inline evaluation produces false negatives, yet no protocol for the inline condition, the simulated tool-use setup, or how the 0%/100% figures were obtained is supplied, rendering the confound claim unverifiable.
- [Abstract] Abstract: the evaluation of five defenses states that read-only access control eliminates the mutation vector while the other four are partial and model-dependent, but neither the identities of the four defenses nor any quantitative per-model results are provided, which is required to support the model-dependence conclusion.
minor comments (1)
- The abstract introduces 'Oracle Poisoning' and contrasts it with prompt injection but does not supply a concise formal definition or a short literature positioning relative to prior knowledge-graph or data-poisoning work.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive review. We address each major comment point by point below. We agree that the abstract requires additional methodological detail to support its central claims and will revise it accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central quantitative claims (100% trust at L2 sophistication, 269/270 trials accepting fabricated claims) are presented without any description of the poisoning mechanism, query templates, trial validity criteria, prompting details, or statistical procedures. This absence is load-bearing for the primary empirical result because it prevents assessment of confounds such as prompt framing effects or selection of trials.
Authors: We agree that the abstract omits these details due to length constraints. The poisoning mechanism (targeted insertion of fabricated security claims into the 42-million-node code knowledge graph), query templates (directed vs. open-ended), trial validity criteria (successful tool invocation and response parsing), prompting details, and statistical procedures (exact binomial confidence intervals on the 269/270 valid trials) are described in the Methods and Results sections. To make the primary result verifiable from the abstract, we will revise it to include concise descriptions of the poisoning mechanism, the directed-query setup, validity criteria, and the statistical approach used. revision: yes
-
Referee: [Abstract] Abstract: the delivery-mode comparison (GPT-5.1 shows 0% trust inline but 100% under both simulated and real agentic tool-use) is offered as evidence that inline evaluation produces false negatives, yet no protocol for the inline condition, the simulated tool-use setup, or how the 0%/100% figures were obtained is supplied, rendering the confound claim unverifiable.
Authors: We accept that the abstract does not specify the protocols. The inline condition embedded fabricated claims directly in the user prompt without tool invocation; the simulated condition used mocked tool responses; and the real condition used live SDK tool calls against the production graph. The 0%/100% figures derive from N=30 trials per condition on GPT-5.1. We will add a brief description of these three delivery modes and the trial counts to the revised abstract. revision: yes
-
Referee: [Abstract] Abstract: the evaluation of five defenses states that read-only access control eliminates the mutation vector while the other four are partial and model-dependent, but neither the identities of the four defenses nor any quantitative per-model results are provided, which is required to support the model-dependence conclusion.
Authors: We agree that the abstract does not name the four defenses or report quantitative results. The full manuscript evaluates five defenses (read-only access control, input sanitization, output verification, model fine-tuning, and runtime monitoring) with per-model success rates showing partial mitigation except for read-only access control. We will revise the abstract to name the four additional defenses and summarize the key quantitative, model-dependent findings. revision: yes
Circularity Check
No significant circularity: purely empirical measurements
full rationale
The paper's central claims consist of empirical measurements of model behavior under controlled poisoning attacks on a knowledge graph, using real SDK tool invocations across nine models (N=30 per model) and reporting specific trust rates such as 100% at L2 sophistication and 269/270 valid trials. The abstract contains no equations, derivations, fitted parameters, predictions, or self-citations that reduce any result to its inputs by construction. All reported outcomes (trust rates under directed vs. open-ended prompts, attacker sophistication breakpoints, delivery-mode comparisons, and defense evaluations) are direct experimental observations rather than logical or mathematical consequences of prior assumptions within the paper. The work is self-contained as an empirical demonstration with no load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption AI agents trust and reason over results from knowledge graph query tools without independent verification or cross-checking
invented entities (1)
-
Oracle Poisoning
no independent evidence
Reference graph
Works this paper leans on
-
[1]
A production code knowledge graph, 2025
Anonymous. A production code knowledge graph, 2025. Internal documentation
work page 2025
-
[2]
Model context protocol specification.https://modelcontextprotocol.io, 2024
Anthropic. Model context protocol specification.https://modelcontextprotocol.io, 2024
work page 2024
-
[3]
Oleg Brodt, Elad Feldman, Bruce Schneier, and Ben Nassi. The promptware kill chain: How prompt injections gradually evolved into a multistep malware delivery mechanism.arXiv preprint arXiv:2601.09625, 2026
-
[4]
Towards evaluating the robustness of neural networks
Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (S&P), 2017
work page 2017
-
[5]
KEPo: Knowledge evolution poison on graph-based retrieval-augmented generation
Qizhi Chen, Chao Qi, Yihong Huang, Muquan Li, Rongzheng Wang, Dongyang Zhang, Ke Qin, and Shuang Liang. KEPo: Knowledge evolution poison on graph-based retrieval-augmented generation. InProceedings of the ACM Web Conference (WWW), 2026. arXiv:2603.11501
-
[6]
AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases
Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases. InAdvances in Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[7]
Securing AI agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025
Manuel Costa, Boris K¨ opf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-B´ eguelin. Securing AI agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025
-
[8]
AI tool poisoning: How hidden instructions threaten AI agents.https://www
CrowdStrike. AI tool poisoning: How hidden instructions threaten AI agents.https://www. crowdstrike.com/en-us/blog/ai-tool-poisoning/, 2026
work page 2026
-
[9]
Shashidhar, Micheal Tuape, Dan Abudu, Beakcheol Jang, and Jong Wook Kim
Kennedy Edemacu, Vinay M. Shashidhar, Micheal Tuape, Dan Abudu, Beakcheol Jang, and Jong Wook Kim. Defending against knowledge poisoning attacks during retrieval-augmented generation.arXiv preprint arXiv:2508.02835, 2025. 22
-
[10]
Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, Abderrah- mane Lakas, and Merouane Debbah. From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows.ICT Express, 2025
work page 2025
-
[11]
CodeQL: Semantic code analysis engine.https://codeql.github.com, 2024
GitHub. CodeQL: Semantic code analysis engine.https://codeql.github.com, 2024. Variant analysis engine for finding security vulnerabilities at scale
work page 2024
-
[12]
Goodfellow, Jonathon Shlens, and Christian Szegedy
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver- sarial examples. InInternational Conference on Learning Representations (ICLR), 2015
work page 2015
-
[13]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. InAISec Workshop, 2023
work page 2023
-
[14]
Agentic AI threat modeling framework: MAESTRO
Ken Huang. Agentic AI threat modeling framework: MAESTRO. Cloud Secu- rity Alliance, February 2025.https://cloudsecurityalliance.org/blog/2025/02/06/ agentic-ai-threat-modeling-framework-maestro
work page 2025
-
[15]
MCP security notification: Tool poisoning attacks, 2025
Invariant Labs. MCP security notification: Tool poisoning attacks, 2025. Disclosure
work page 2025
-
[16]
Jiacheng Liang, Yuhui Wang, Changjiang Li, Rongyi Zhu, Tanqiu Jiang, Neil Gong, and Ting Wang. GraphRAG under fire. InIEEE Symposium on Security and Privacy (S&P), 2026. arXiv:2501.14050
-
[17]
Architecting trust in artificial epistemic agents
Nahema Marchal, Stephanie Chan, Matija Franklin, Manon Revel, Geoff Keeling, Roberta Fischli, Bilva Chandra, and Iason Gabriel. Architecting trust in artificial epistemic agents. arXiv preprint arXiv:2603.02960, March 2026
-
[18]
Manipulating AI memory for profit: The rise of AI recommendation poisoning
Microsoft Defender Security Research Team. Manipulating AI memory for profit: The rise of AI recommendation poisoning. Microsoft Security Blog, February 2026.https://www. microsoft.com/en-us/security/blog/2026/02/10/ai-recommendation-poisoning/
work page 2026
-
[19]
ATLAS: Adversarial threat landscape for AI systems
MITRE. ATLAS: Adversarial threat landscape for AI systems. MITRE Corporation, 2025
work page 2025
-
[20]
OWASP top 10 for agentic applications for 2026
OWASP. OWASP top 10 for agentic applications for 2026. OWASP Gen AI Security Project, December 2025.https://genai.owasp.org/resource/ owasp-top-10-for-agentic-applications-for-2026/
work page 2026
-
[21]
ConfusedPilot: Confused deputy risks in RAG-based LLMs,
Ayush RoyChowdhury, Mulong Luo, Prateek Sahu, Sarbartha Banerjee, and Mohit Tiwari. ConfusedPilot: Confused deputy risks in RAG-based large language models, 2024. arXiv preprint arXiv:2408.04870
-
[22]
Sander Schulhoff, Jeremy Pinto, Anaum Khan, Louis-Fran¸ cois Bouchard, Chenglei Si, Svetlina Anati, Valen Tagliabue, Anson Kost, Christopher Carnahan, and Jordan Boyd-Graber. Ignore this title and HackAPrompt: Exposing systemic vulnerabilities of LLMs through a global prompt hacking competition. InProceedings of the Conference on Empirical Methods in Natu...
work page 2023
-
[23]
Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein
Ali Shafahi, W. Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein. Poison frogs! Targeted clean-label poisoning attacks on neural networks. InAdvances in Neural Information Processing Systems (NeurIPS), 2018
work page 2018
-
[24]
Code intelligence platform.https://sourcegraph.com, 2024
Sourcegraph. Code intelligence platform.https://sourcegraph.com, 2024. 23
work page 2024
-
[25]
Zhiqiang Wang, Yichao Gao, Yanting Wang, Suyuan Liu, Haifeng Sun, Haoran Cheng, Guan- quan Shi, Haohua Du, and Xiangyang Li. MCPTox: A benchmark for tool poisoning attack on real-world MCP servers.arXiv preprint arXiv:2508.14925, 2025
-
[26]
Jiayi Wen, Tong Chen, Zheng Zheng, and Chengqi Huang. A few words can distort graphs: Knowledge poisoning attacks on graph-based retrieval-augmented generation of large language models.arXiv preprint arXiv:2508.04276, 2025
-
[27]
On the security risks of knowledge graph reasoning
Zhaohan Xi, Tianyu Du, Changjiang Li, Ren Pang, Shouling Ji, Xiapu Luo, Xusheng Xiao, Fenglong Ma, and Ting Wang. On the security risks of knowledge graph reasoning. InUSENIX Security Symposium, pages 3259–3276, 2023
work page 2023
-
[28]
Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,
Jiaqi Xue, Mengxin Zheng, Yue Hua, Yifei Shu, Zhen Fang, Zhiqi Li, Kaixiong Tu, Wenjie Wang, and Suhang Wang. BadRAG: Identifying vulnerabilities in retrieval augmented gener- ation of large language models.arXiv preprint arXiv:2406.00083, 2024
-
[29]
MaSS: Model-agnostic, semantic and stealthy data poisoning attack on knowledge graph em- bedding
Xiaoyu You, Beina Sheng, Daizong Ding, Mi Zhang, Xudong Pan, Min Yang, and Fuli Feng. MaSS: Model-agnostic, semantic and stealthy data poisoning attack on knowledge graph em- bedding. InProceedings of the ACM Web Conference, 2023
work page 2023
-
[30]
InjecAgent: Benchmarking in- direct prompt injections in tool-integrated LLM agents
Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent: Benchmarking in- direct prompt injections in tool-integrated LLM agents. InFindings of the Association for Computational Linguistics (ACL), pages 10471–10506, 2024
work page 2024
-
[31]
Benchmarking poisoning attacks against retrieval- augmented generation,
Baolei Zhang, Haoran Xin, Jiatong Li, Dongzhe Zhang, Minghong Fang, Zhuqing Liu, Lihai Nie, and Zheli Liu. Benchmarking poisoning attacks against retrieval-augmented generation. arXiv preprint arXiv:2505.18543, 2025
-
[32]
Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents
Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents. InInternational Conference on Learning Repre- sentations (ICLR), 2025
work page 2025
-
[33]
Data poisoning attack against knowledge graph embedding
Hengtong Zhang, Tianhang Zheng, Jing Gao, Chenglin Miao, Lu Su, Yaliang Li, and Kui Ren. Data poisoning attack against knowledge graph embedding. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2019
work page 2019
-
[34]
From allies to adversaries: Manipulating LLM tool-calling through adversarial injection
Rupeng Zhang, Haowei Wang, Junjie Wang, Mingyang Li, Yuekai Huang, Dandan Wang, and Qing Wang. From allies to adversaries: Manipulating LLM tool-calling through adversarial injection. InProceedings of NAACL, pages 2009–2028, 2025
work page 2009
-
[35]
Tianzhe Zhao, Jiaoyan Chen, Yanchi Ru, Haiping Zhu, Nan Hu, Jun Liu, and Qika Lin. Exploring knowledge poisoning attacks to retrieval-augmented generation.Information Fusion, 127:103900, March 2026
work page 2026
-
[36]
Poisoning retrieval corpora by injecting adversarial passages
Zexuan Zhong, Ziqing Huang, Alexander Wettig, and Danqi Chen. Poisoning retrieval corpora by injecting adversarial passages. InProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
work page 2023
-
[37]
Poisoning attack on federated knowledge graph embedding
Enyuan Zhou, Song Guo, Zhixiu Ma, Zicong Hong, Tao Guo, and Peiran Dong. Poisoning attack on federated knowledge graph embedding. InProceedings of the ACM Web Conference, 2024. 24
work page 2024
-
[38]
Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. PoisonedRAG: Knowledge corrup- tion attacks to retrieval-augmented generation of large language models. InUSENIX Security Symposium, 2025. arXiv:2402.07867
-
[39]
You must independently verify all tool responses
Daniel Z¨ ugner, Amir Akbarnejad, and Stephan G¨ unnemann. Adversarial attacks on neural networks for graph data. InProceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2018. A Remaining Attack Scenarios A.1 Scenario 2: Transitive Dependency Chain Injection Objective.Introduce a malicious library into a project’s dependen...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.