arxiv: 2604.12201 · v1 · submitted 2026-04-14 · 💻 cs.IR

Recognition: unknown

AdversarialCoT: Single-Document Retrieval Poisoning for LLM Reasoning

Hongru Song , Yu-An Liu , Ruqing Zhang , Jiafeng Guo , Maarten de Rijke , Yixing Fan , Xueqi Cheng

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:17 UTC · model grok-4.3

classification 💻 cs.IR

keywords adversarial attacksretrieval-augmented generationLLM reasoningknowledge base poisoningchain-of-thoughtsingle-document attacksecurity vulnerabilities

0 comments

The pith

A single adversarial document can significantly degrade large language model reasoning in retrieval-augmented generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that retrieval-augmented generation systems can be compromised through the injection of just one malicious document into the knowledge base. The proposed method first extracts the target LLM's reasoning steps, builds an initial adversarial chain-of-thought inside the document, and then refines it through repeated interactions with the model to target its vulnerabilities. A sympathetic reader would care because this reveals how retrieval mechanisms, intended to improve accuracy, can instead amplify targeted attacks with minimal effort from the adversary. Experiments on standard benchmark LLMs demonstrate clear drops in reasoning performance once the poisoned document is retrieved and used.

Core claim

AdversarialCoT is a query-specific attack that poisons only a single document by extracting the LLM's reasoning framework to construct an initial adversarial chain-of-thought, then iteratively refines the document through direct interactions with the LLM to progressively expose and exploit critical reasoning vulnerabilities, which results in substantially reduced accuracy when the document is surfaced by the retriever.

What carries the argument

AdversarialCoT, the iterative refinement of a single query-specific adversarial chain-of-thought document that guides the LLM toward flawed reasoning once retrieved.

If this is right

RAG systems become vulnerable to targeted poisoning attacks that require injecting only one document.
LLM reasoning accuracy drops substantially once the crafted adversarial content is retrieved and incorporated.
Iterative interactions allow progressive identification and exploitation of subtle weaknesses in the model's reasoning process.
The approach exposes security risks and offers insights for building more robust LLM reasoning pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Retrieval systems may need additional checks to detect documents that mimic or distort reasoning chains.
This single-document technique could extend to other retrieval-dependent AI applications that rely on external knowledge.
Defenses might benefit from requiring multiple independent documents or verifying consistency across retrieved sources.

Load-bearing premise

The attack requires the adversary to interact iteratively with the target LLM to refine the poisoned document and assumes the retriever will reliably return that single document for the chosen query.

What would settle it

Measure whether reasoning accuracy on standard benchmarks remains above baseline levels when a single refined adversarial document is the only one retrieved and consumed during inference.

Figures

Figures reproduced from arXiv: 2604.12201 by Hongru Song, Jiafeng Guo, Maarten de Rijke, Ruqing Zhang, Xueqi Cheng, Yixing Fan, Yu-An Liu.

**Figure 2.** Figure 2: The attack process of AdversarialCoT [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: ASR scaling across iteration rounds for different [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Cross-model generalization heatmap showing the [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

Retrieval-augmented generation (RAG) enhances large language model (LLM) reasoning by retrieving external documents, but also opens up new attack surfaces. We study knowledge-base poisoning attacks in RAG, where an attacker injects malicious content into the retrieval corpus, which is then naturally surfaced by the retriever and consumed by the LLM during reasoning. Unlike prior work that floods the corpus with poisoned documents, we propose AdversarialCoT, a query-specific attack that poisons only a single document in the corpus. AdversarialCoT first extracts the target LLM's reasoning framework to guide the construction of an initial adversarial chain-of-thought (CoT). The adversarial document is iteratively refined through interactions with the LLM, progressively exposing and exploiting critical reasoning vulnerabilities. Experiments on benchmark LLMs show that a single adversarial document can significantly degrade reasoning accuracy, revealing subtle yet impactful weaknesses. This study exposes security risks in RAG systems and provides actionable insights for designing more robust LLM reasoning pipelines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes AdversarialCoT, a query-specific single-document poisoning attack on RAG systems. It extracts the target LLM's reasoning framework to construct an initial adversarial chain-of-thought, then iteratively refines the poisoned document via interactions with the LLM to exploit reasoning vulnerabilities. The central claim is that injecting only this one document into the corpus causes the retriever to naturally surface it and significantly degrade LLM reasoning accuracy on benchmark tasks.

Significance. If the results hold under realistic retrieval conditions, the work would be significant for RAG security and LLM reasoning robustness, demonstrating that minimal poisoning (one document) can compromise systems where prior attacks required corpus flooding. It exposes subtle integration weaknesses between retrieval and reasoning steps and offers insights for defenses. The empirical iterative refinement process is a strength, as it is presented as a direct construction without free parameters or circular derivations.

major comments (2)

[Experimental Evaluation] The experimental evaluation reports accuracy degradation from the single adversarial document but supplies no retrieval metrics (e.g., rank, hit rate at k=1, or success rate of surfacing the document for target queries). This is load-bearing for the central claim, as the attack presupposes that the retriever will naturally surface the single poisoned document in the RAG pipeline; without these metrics, the causal connection between injection and observed degradation cannot be verified, especially in large corpora with approximate nearest-neighbor search.
[AdversarialCoT Construction] The AdversarialCoT construction (§3) relies on iterative interactions with the target LLM to refine the document and expose vulnerabilities. The paper should explicitly state the threat model (e.g., black-box query access only) and typical iteration counts, as this assumption directly affects whether the attack is practical outside controlled settings.

minor comments (2)

[Abstract] The abstract asserts significant accuracy degradation without any numerical results, baselines, or error bars; including a concise quantitative summary would improve completeness.
Define notation for the adversarial document and CoT components consistently upon first use to prevent ambiguity in the method description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments highlight important aspects of empirical validation and threat model clarity that will strengthen the manuscript. We address each point below and will incorporate the suggested revisions.

read point-by-point responses

Referee: [Experimental Evaluation] The experimental evaluation reports accuracy degradation from the single adversarial document but supplies no retrieval metrics (e.g., rank, hit rate at k=1, or success rate of surfacing the document for target queries). This is load-bearing for the central claim, as the attack presupposes that the retriever will naturally surface the single poisoned document in the RAG pipeline; without these metrics, the causal connection between injection and observed degradation cannot be verified, especially in large corpora with approximate nearest-neighbor search.

Authors: We agree that retrieval metrics are essential to verify the attack's mechanism and establish the causal link between document injection and reasoning degradation. In the revised manuscript, we will add a dedicated analysis subsection reporting the average retrieval rank of the adversarial document, hit rate at k=1, and surfacing success rates across queries, models, and varying corpus sizes (including approximate nearest-neighbor settings). These metrics will be computed on the same experimental setups as the accuracy results to directly support the central claim. revision: yes
Referee: [AdversarialCoT Construction] The AdversarialCoT construction (§3) relies on iterative interactions with the target LLM to refine the document and expose vulnerabilities. The paper should explicitly state the threat model (e.g., black-box query access only) and typical iteration counts, as this assumption directly affects whether the attack is practical outside controlled settings.

Authors: We concur that explicit threat model details and iteration statistics improve reproducibility and practicality assessment. We will revise Section 3 to clearly state the black-box threat model (query-only access to the target LLM for construction and refinement, with no access to retrieval internals, corpus, or model weights). We will also report the observed iteration counts from our experiments, including averages and ranges per model and task, to quantify the construction effort. revision: yes

Circularity Check

0 steps flagged

No circularity detected in empirical attack construction

full rationale

The paper presents AdversarialCoT as an iterative, query-specific empirical procedure for building one poisoned document via LLM interactions to expose reasoning flaws. No mathematical derivation, fitted parameters renamed as predictions, or self-citation chains are invoked to support the central claim. The reported accuracy degradation on benchmark LLMs stands as an external experimental outcome rather than a tautology or reduction to inputs by construction. The method is self-contained and does not rely on any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the domain assumption that a retriever will surface the crafted document and that iterative LLM interaction is feasible for the attacker. No free parameters or invented entities with independent evidence are introduced in the abstract.

axioms (2)

domain assumption The retriever will surface the single poisoned document for the target query
Required for the attack to succeed as described
domain assumption Iterative interaction with the target LLM is available to the attacker for refinement
Central to the construction process outlined

invented entities (1)

AdversarialCoT document no independent evidence
purpose: A single poisoned document containing a crafted adversarial chain-of-thought
The core artifact of the proposed attack

pith-pipeline@v0.9.0 · 5484 in / 1277 out tokens · 29480 ms · 2026-05-10T16:17:48.901095+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 22 canonical work pages · 11 internal anchors

[1]

Dibyanayan Bandyopadhyay, Soham Bhattacharjee, and Asif Ekbal. 2025. Think- ing Machines: A Survey of LLM Based Reasoning Strategies. arXiv:2503.10814 [cs] doi:10.48550/arXiv.2503.10814

work page doi:10.48550/arxiv.2503.10814 2025
[2]

Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Michal Podstawski, Hubert Niewiadomski, Piotr Nyczyk, and Torsten Hoefler. 2023. Graph of Thoughts: Solving Elaborate Problems with Large Language Models. arXiv:2308.09687 [cs] doi:10.48550/arXiv. 2308.09687

work page internal anchor Pith review doi:10.48550/arxiv 2023
[3]

Zhuo Chen, Yuyang Gong, Miaokun Chen, Haotan Liu, Qikai Cheng, Fan Zhang, Wei Lu, Xiaozhong Liu, and Jiawei Liu. 2025. FlipedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models. arXiv:2501.02968 [cs] doi:10.48550/arXiv.2501.02968

work page doi:10.48550/arxiv.2501.02968 2025
[4]

Jingsheng Gao, Linxu Li, Weiyuan Li, Yuzhuo Fu, and Bin Dai. 2024. SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback.arXiv preprint arXiv:2410.18141(2024)

work page arXiv 2024
[5]

Luyu Gao and Jamie Callan. 2022. Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computa- tional Linguistics, Dublin, Ireland,...

work page doi:10.18653/v1/2022.acl-long.203 2022
[6]

GLM-4.5 Team, Aohan Zeng, Xin Lv, Qinkai Zheng, Zhenyu Hou, Bin Chen, Chengxing Xie, et al . 2025. GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models. arXiv:2508.06471 [cs.CL] https://arxiv.org/abs/2508.06471

work page internal anchor Pith review arXiv 2025
[7]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al . 2025. DeepSeek- R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv:2501.12948 [cs.CL] https://doi.org/10.48550/arXiv.2501.12948

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025
[8]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval Augmented Language Model Pre-training. InInternational Conference on Machine Learning. PMLR, 3929–3938

2020
[9]

Zhibo Hu, Chen Wang, Yanfeng Shu, Hye-Young Paik, and Liming Zhu. 2024. Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models. InSIGKDD. 1119–1130

2024
[10]

Gautier Izacard and Édouard Grave. 2021. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. InEACL. 874–880

2021
[11]

Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave
[12]

Few-shot learning with retrieval augmented language models,

Atlas: Few-shot Learning with Retrieval Augmented Language Models. arXiv:2208.03299 [cs.CL] https://arxiv.org/abs/2208.03299

work page arXiv
[13]

Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, and Jiawei Han. 2025. Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning. arXiv:2503.09516 [cs] doi:10.48550/arXiv.2503.09516

work page Pith review doi:10.48550/arxiv.2503.09516 2025
[14]

Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Matthew Kelcey, Jacob Devlin, Kenton Lee, Kristina N. Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Questions: a Benchmark for Question Answering Research.Tr...

2019
[15]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al
[16]

Advances in Neural Information Processing Systems33 (2020), 9459–9474

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems33 (2020), 9459–9474

2020
[17]

Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, and Zhicheng Dou. 2025. Search-O1: Agentic Search-Enhanced Large Reasoning Models. arXiv:2501.05366 [cs] doi:10.48550/arXiv.2501.05366

work page internal anchor Pith review doi:10.48550/arxiv.2501.05366 2025
[18]

Jiawei Liu, Yangyang Kang, Di Tang, Kaisong Song, Changlong Sun, Xiaofeng Wang, Wei Lu, and Xiaozhong Liu. 2022. Order-disorder: Imitation adversarial attacks for black-box neural ranking models. InSIGSAC. 2025–2039

2022
[19]

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and Benchmarking Prompt Injection Attacks and Defenses. In33rd USENIX Security Symposium (USENIX Security 24). 1831–1847

2024
[20]

Yu-An Liu, Ruqing Zhang, Jiafeng Guo, and Xueqi Cheng. 2025. On the Robustness of Generative Information Retrieval Models. InECIR

2025
[21]

Yu-An Liu, Ruqing Zhang, Jiafeng Guo, and Maarten de Rijke. 2024. Robust Information Retrieval. InSIGIR. 3009–3012

2024
[22]

Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, and Xueqi Cheng. 2024. Multi-Granular Adversarial Attacks against Black-box Neural Ranking Models. InSIGIR. 1391–1400

2024
[23]

Yu-An Liu, Ruqing Zhang, Mingkun Zhang, Wei Chen, Maarten de Rijke, Jiafeng Guo, and Xueqi Cheng. 2024. Perturbation-Invariant Adversarial Training for Neural Ranking Models: Improving the Effectiveness-Robustness Trade-Off. In AAAI, Vol. 38

2024
[24]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A Human-Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268 [cs.CL] https://arxiv.org/ abs/1611.09268

work page internal anchor Pith review arXiv 2016
[25]

Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023. In-context Retrieval-Augmented Lan- guage Models.TACL11 (2023), 1316–1331

2023
[26]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeek- Math: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv:2402.03300 [cs] doi:10.48550/arXiv.2402.03300

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300 2024
[28]

Hongru Song, Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Jianming Lv, Maarten de Rijke, and Xueqi Cheng. 2025. The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems. InFindings of the Association for Computational Linguistics: ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher P...

work page doi:10.18653/v1/ 2025
[29]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto. 2018.Reinforcement Learning: An Intro- duction(2 ed.). The MIT Press, Cambridge, MA

2018
[30]

Kimi Team, Yifan Bai, Yiping Bao, Guanduo Chen, Jiahao Chen, Ningxin Chen, Ruijue Chen, et al . 2025. Kimi K2: Open Agentic Intelligence. arXiv:2507.20534 [cs.LG] https://arxiv.org/abs/2507.20534

work page internal anchor Pith review arXiv 2025
[31]

Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How Does LLM Safety Training Fail?NIPS36 (2023), 80079–80110

2023
[32]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elic- its Reasoning in Large Language Models. InProceedings of the 36th International Conference on Neural Information Processing Systems(New Orleans, LA, USA) (NIPS ’22). Curran Associates Inc., Red Hook, ...

2022
[33]

Chen Wu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, and Xueqi Cheng. 2023. Prada: Practical Black-box Adversarial Attacks Against Neural Ranking Models.ACM Transactions on Information Systems41, 4 (2023), 1–27

2023
[34]

Sirui Xia, Xintao Wang, Jiaqing Liang, Yifei Zhang, Weikang Zhou, Jiaji Deng, Fei Yu, and Yanghua Xiao. 2024. Ground Every Sentence: Improving Retrieval- Augmented LLMs with Interleaved Reference-Claim Generation.arXiv preprint arXiv:2407.01796(2024)

work page arXiv 2024
[35]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, et al. 2025. Qwen3 Technical Report. arXiv:2505.09388 [cs.CL] https://arxiv.org/abs/2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025
[36]

An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al . 2024. Qwen2.5 Technical Report.arXiv preprint arXiv:2412.15115(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[37]

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (...

work page doi:10.18653/v1/d18- 2018
[38]

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601 [cs.CL] https://arxiv.org/abs/ 2305.10601

work page internal anchor Pith review arXiv 2023
[39]

Peitian Zhang, Zheng Liu, Shitao Xiao, Zhicheng Dou, and Jian-Yun Nie. 2024. A Multi-task Embedder for Retrieval Augmented LLMs. InACL. 3537–3553

2024
[40]

Yucheng Zhang, Qinfeng Li, Tianyu Du, Xuhong Zhang, Xinkui Zhao, Zhengwen Feng, and Jianwei Yin. 2024. HijackRAG: Hijacking Attacks against Retrieval- Augmented Large Language Models.arXiv preprint arXiv:2410.22832(2024)

work page arXiv 2024
[41]

Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, Jingren Zhou, and Junyang Lin
[42]

Group Sequence Policy Optimization

Group Sequence Policy Optimization. arXiv:2507.18071 [cs.LG] https: //arxiv.org/abs/2507.18071

work page internal anchor Pith review arXiv
[43]

Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. 2024. PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models. arXiv:2402.07867 [cs.CR] https://arxiv.org/abs/2402.07867

work page arXiv 2024