DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation

Guoxiu He; Jiawei Liu; Miaokun Chen; Wei Lu; Xiaofeng Wang; Xiaozhong Liu; Yuyang Gong; Zhuo Chen

arxiv: 2606.01212 · v2 · pith:R6WERKAFnew · submitted 2026-05-31 · 💻 cs.CL · cs.AI· cs.CR· cs.IR

DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation

Yuyang Gong , Miaokun Chen , Jiawei Liu , Zhuo Chen , Guoxiu He , Wei Lu , XiaoFeng Wang , Xiaozhong Liu This is my paper

Pith reviewed 2026-06-28 17:41 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CRcs.IR

keywords discourse-level attackopinion manipulationRAG securityblack-box attackpoisoning attackretrieval-augmented generationgraph-guided attacksemantic query network

0 comments

The pith

DiscourseFlip enables discourse-level opinion manipulation in black-box RAG by graph-guided budget allocation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a new threat model of discourse-level opinion manipulation where attacks coordinate influence across a semantic query network to shift opinions in a multi-topic space. It proposes DiscourseFlip as an agentic attack that models the query space as a graph and dynamically allocates a limited poisoning budget to maximize opinion deviation in black-box settings. Experiments show this approach induces targeted shifts and outperforms baselines in coverage and effectiveness. User studies indicate the attacks remain camouflaged, while existing mitigations prove ineffective.

Core claim

DiscourseFlip is an agentic, graph-guided attack that formalizes and executes discourse-level opinion manipulation against black-box RAG systems by dynamically allocating a limited poisoning budget to maximize opinion deviation across a contextualized query network, achieving higher coverage and effectiveness than prior methods.

What carries the argument

The agentic graph-guided poisoning allocation that targets discourse-level opinion deviation in a semantic query network.

If this is right

Coordinated attacks can affect entire networks of related queries rather than single ones.
Limited poisoning budgets can be optimized using graph structure for broader impact.
Opinion shifts can be induced without white-box access to the RAG system.
Camouflaged attacks evade user detection in practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Defenses may need to incorporate graph analysis of query spaces to detect coordinated poisoning.
The approach could extend to other retrieval-based generative models beyond RAG.
Scaling the attack to very large query networks might require adjustments to the budget allocation strategy.

Load-bearing premise

That modeling queries as a semantic graph allows effective dynamic allocation of poisoning to cause measurable opinion shifts even with black-box access and limited budget.

What would settle it

Running DiscourseFlip on a deployed RAG system and measuring no significant opinion shift difference from baselines or random poisoning across the query network.

Figures

Figures reproduced from arXiv: 2606.01212 by Guoxiu He, Jiawei Liu, Miaokun Chen, Wei Lu, Xiaofeng Wang, Xiaozhong Liu, Yuyang Gong, Zhuo Chen.

**Figure 1.** Figure 1: Examples of Existing Paradigms and DiscourseFlip. (Upper) Existing opinion manipulation attacks directly target a [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: The overview of our proposed DiscourseFlip, a graph-guided agentic optimization framework for discourse-level [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Empirical comparison of user opinions measured before (Pre) and after (Post) interaction with the RAG system, across [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: User Attribution Distribution under DiscourseFlip. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Systematic overview of mitigation surfaces in a [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 7.** Figure 7: Distributions of log perplexity (PPL) calculated by Qwen-3 on clean documents and poisoned documents of Discourse [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Comparative analysis of the evaluation metrics (Cov [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: Comparative analysis of the evaluation metrics (Cov [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: Attack effectiveness and coverage under varying poisoning budgets M and retrieval Top-K [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

read the original abstract

Retrieval-Augmented Generation (RAG) systems are widely deployed and increasingly influential, but their reliance on external corpora exposes new security risks from poisoned retrieval content. Existing RAG attacks are largely focusing on individual queries or narrow topic-local query sets, which limits their practical reach and offers limited camouflage in real-world settings. In this paper, we introduce discourse-level opinion manipulation, a new threat model in which coordinated influence across a semantic query network induces opinion shifts over a holistic, multi-topic query space. We formalize this threat in a black-box setting and propose DiscourseFlip, an agentic, graph-guided attack that dynamically allocates a limited poisoning budget to maximize discourse-level opinion deviation. Extensive experiments demonstrate that DiscourseFlip consistently induces targeted opinion shifts across the contextualized query network and significantly outperforms existing baselines in terms of coverage and effectiveness. User studies further confirm that DiscourseFlip is effective while remaining well camouflaged from user detection. Moreover, systematic analyses show that existing mitigation strategies are ineffective against discourse-level manipulation, underscoring the urgent need for more robust and adaptive defenses to address discourse-level vulnerabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DiscourseFlip frames a new discourse-level threat model for RAG but leaves the black-box graph construction and budget allocation mechanism unspecified, so the central claim cannot be assessed from the abstract.

read the letter

The main thing here is that the paper defines discourse-level opinion manipulation as coordinated poisoning across a semantic query network rather than single queries or narrow topics. This threat model is new and could matter for deployed RAG systems that handle related questions.

The work does a reasonable job spelling out why existing attacks have limited reach and why current mitigations might miss this kind of spread. The agentic graph-guided approach and the user studies on camouflage are concrete steps that show the authors considered practical deployment.

The soft spot is exactly the one the stress-test note raises. In a true black-box setting the attacker sees only query-response pairs, so any claim about building the query network graph and dynamically allocating the poisoning budget needs an explicit approximation method for edges and node weights. The abstract provides none, which makes the dynamic allocation step look like it may rest on an oracle that the threat model does not allow. Without the methods section it is impossible to tell whether this gap is closed or whether the reported outperformance depends on extra information.

The abstract calls the experiments extensive and the mitigations ineffective, but supplies no datasets, metrics, baselines, or statistics, which keeps soundness low. There is no sign of circular definitions or self-referential results.

This paper is for people working on RAG robustness and adversarial attacks. A reader interested in new attack surfaces would get value from the expanded threat model even if the execution details need work. It shows clear engagement with the existing literature on query-level attacks.

I would bring it to a reading group as maybe, mainly to talk through the threat model. I would not cite it yet. It deserves peer review so the authors can supply the missing graph-construction mechanism and let referees examine the experimental support.

Referee Report

2 major / 2 minor

Summary. The paper introduces discourse-level opinion manipulation as a new threat model for black-box RAG systems and proposes DiscourseFlip, an agentic graph-guided attack that constructs a semantic query network, dynamically allocates a limited poisoning budget across it, and induces targeted opinion shifts over a multi-topic query space. It claims extensive experiments showing consistent outperformance over baselines in coverage and effectiveness, plus user studies on camouflage and analyses showing existing mitigations are ineffective.

Significance. If the black-box graph construction and dynamic allocation mechanism can be shown to work from query-response observations alone, the result would be significant for exposing discourse-level vulnerabilities in deployed RAG systems and motivating new defenses; the work also supplies an experimental construction that could serve as a benchmark for future mitigation research.

major comments (2)

[§3 and §4] §3 (Threat Model) and §4 (DiscourseFlip Construction): the central claim requires that an attacker can infer a semantic query network graph and node importances from black-box query-response pairs alone, yet the manuscript provides no concrete mechanism (e.g., no algorithm, similarity metric, or approximation procedure) for building edges or estimating importances without access to the retrieval index or embeddings; without this the dynamic budget allocation step rests on an oracle assumption not justified by the stated threat model.
[§5] §5 (Experiments): the abstract asserts 'extensive experiments' demonstrating consistent outperformance and ineffective mitigations, but the provided text supplies neither the full set of baselines, metrics, statistical tests, nor the exact query-network construction procedure used in the black-box setting; this prevents assessment of whether the reported coverage and effectiveness gains are load-bearing or artifacts of the evaluation setup.

minor comments (2)

[Abstract] Abstract: the phrase 'agentic, graph-guided attack' is introduced without a forward reference to the precise definition or pseudocode that appears later.
[Throughout] Notation: the term 'contextualized query network' is used repeatedly but never given an explicit mathematical definition (e.g., as a graph G=(V,E) with node features).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments identify key areas where the black-box aspects of the threat model and the experimental reporting require greater explicitness. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§3 and §4] §3 (Threat Model) and §4 (DiscourseFlip Construction): the central claim requires that an attacker can infer a semantic query network graph and node importances from black-box query-response pairs alone, yet the manuscript provides no concrete mechanism (e.g., no algorithm, similarity metric, or approximation procedure) for building edges or estimating importances without access to the retrieval index or embeddings; without this the dynamic budget allocation step rests on an oracle assumption not justified by the stated threat model.

Authors: We agree that the black-box graph construction and importance estimation steps merit a more explicit algorithmic description. The current manuscript presents DiscourseFlip as an agentic process that builds the semantic query network solely from observable query-response interactions, but we acknowledge the need for a concrete procedure. In the revision we will add a dedicated subsection and pseudocode (new Algorithm 1) specifying: (i) edge construction via response-similarity metrics computed from the attacker’s own queries (e.g., cosine similarity on response embeddings or n-gram overlap), and (ii) node-importance estimation using observed query centrality and response-variance statistics. This will demonstrate that the dynamic budget allocation operates without any oracle access to the retrieval index or embeddings, fully consistent with the stated black-box threat model. revision: yes
Referee: [§5] §5 (Experiments): the abstract asserts 'extensive experiments' demonstrating consistent outperformance and ineffective mitigations, but the provided text supplies neither the full set of baselines, metrics, statistical tests, nor the exact query-network construction procedure used in the black-box setting; this prevents assessment of whether the reported coverage and effectiveness gains are load-bearing or artifacts of the evaluation setup.

Authors: We concur that the experimental section would benefit from expanded reporting to enable full reproducibility and assessment. While the manuscript already compares against multiple baselines and reports coverage/effectiveness metrics, we will revise §5 to include: the complete enumerated list of baselines with selection rationale, precise definitions and formulas for every metric, results of statistical significance tests (paired t-tests and Wilcoxon signed-rank tests with p-values), and a detailed description of the exact black-box query-network construction procedure employed, including the similarity metric and sampling strategy used to approximate the graph from query-response observations alone. These additions will allow readers to verify that the reported gains are robust. revision: yes

Circularity Check

0 steps flagged

No circularity: experimental attack construction with no self-referential derivations

full rationale

The paper proposes DiscourseFlip as an agentic graph-guided attack that dynamically allocates poisoning budget in a black-box RAG threat model. No equations, fitted parameters, or derivations appear in the abstract or provided text that reduce the claimed opinion shifts or coverage metrics to inputs by construction. The method is framed as an experimental construction rather than a mathematical prediction derived from prior self-citations or ansatzes. The skeptic concern about graph inference in true black-box settings identifies a potential modeling gap but does not indicate circularity, as the paper does not claim a first-principles derivation that collapses to its own definitions. This is a standard non-circular experimental security paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Only the abstract is available; no specific free parameters, mathematical axioms, or independently evidenced invented entities are detailed beyond the attack name itself.

invented entities (1)

DiscourseFlip no independent evidence
purpose: Agentic graph-guided attack for discourse-level opinion manipulation in RAG
Newly proposed method described in the abstract.

pith-pipeline@v0.9.1-grok · 5751 in / 1094 out tokens · 19261 ms · 2026-06-28T17:41:48.013845+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

67 extracted references · 24 canonical work pages · 7 internal anchors

[1]

Anthropic usage policy update

Anthropic. Anthropic usage policy update. https:// www.anthropic.com/news/usage-policy-update. Accessed: 2025-12-24

2025
[2]

Poisoning web-scale training datasets is practi- cal

Nicholas Carlini, Matthew Jagielski, Christopher A Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr. Poisoning web-scale training datasets is practi- cal. In2024 IEEE Symposium on Security and Privacy (SP), pages 407–425. IEEE, 2024

2024
[3]

Overcoming the retrieval barrier: Indirect prompt injection in the wild for llm systems.arXiv preprint arXiv:2601.07072, 2026

Hongyan Chang, Ergute Bao, Xinjian Luo, and Ting Yu. Overcoming the retrieval barrier: Indirect prompt injection in the wild for llm systems.arXiv preprint arXiv:2601.07072, 2026

work page arXiv 2026
[4]

Phan- tom: General trigger attacks on retrieval augmented lan- guage generation.arXiv preprint arXiv:2405.20485, 2024

Harsh Chaudhari, Giorgio Severi, John Abascal, Matthew Jagielski, Christopher A Choquette-Choo, Mi- lad Nasr, Cristina Nita-Rotaru, and Alina Oprea. Phan- tom: General trigger attacks on retrieval augmented lan- guage generation.arXiv preprint arXiv:2405.20485, 2024

work page arXiv 2024
[5]

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases.Advances in Neural Information Processing Systems, 37:130185– 130213, 2024

Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases.Advances in Neural Information Processing Systems, 37:130185– 130213, 2024

2024
[6]

Flipedrag: Black-box opinion manipula- tion attacks to retrieval-augmented generation of large language models, 2025

Zhuo Chen, Yuyang Gong, Miaokun Chen, Haotan Liu, Qikai Cheng, Fan Zhang, Wei Lu, Xiaozhong Liu, and Jiawei Liu. Flipedrag: Black-box opinion manipula- tion attacks to retrieval-augmented generation of large language models, 2025

2025
[7]

Trojanrag: Retrieval-augmented generation can be backdoor driver in large language models.arXiv preprint arXiv:2405.13401, 2024

Pengzhou Cheng, Yidong Ding, Tianjie Ju, Zongru Wu, Wei Du, Ping Yi, Zhuosheng Zhang, and Gongshen Liu. Trojanrag: Retrieval-augmented generation can be back- door driver in large language models.arXiv preprint arXiv:2405.13401, 2024

work page arXiv 2024
[8]

Typos that broke the rag’s back: Genetic attack on rag pipeline by simulating doc- uments in the wild via low-level perturbations.arXiv preprint arXiv:2404.13948, 2024

Sukmin Cho, Soyeong Jeong, Jeongyeon Seo, Taeho Hwang, and Jong C Park. Typos that broke the rag’s back: Genetic attack on rag pipeline by simulating doc- uments in the wild via low-level perturbations.arXiv preprint arXiv:2404.13948, 2024

work page arXiv 2024
[9]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jin- liu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang. Retrieval-augmented generation for large language mod- els: A survey.arXiv preprint arXiv:2312.10997, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

Unic-rag: Universal knowledge corruption attacks to retrieval-augmented generation.arXiv preprint arXiv:2508.18652, 2025

Runpeng Geng, Yanting Wang, Ying Chen, and Jinyuan Jia. Unic-rag: Universal knowledge corruption at- tacks to retrieval-augmented generation.arXiv preprint arXiv:2508.18652, 2025

work page arXiv 2025
[11]

Topic-fliprag: Topic-orientated adversarial opinion manipulation attacks to retrieval-augmented generation models.arXiv preprint arXiv:2502.01386, 2025

Yuyang Gong, Zhuo Chen, Miaokun Chen, Fengchang Yu, Wei Lu, Xiaofeng Wang, Xiaozhong Liu, and Jiawei Liu. Topic-fliprag: Topic-orientated adversarial opinion manipulation attacks to retrieval-augmented generation models.arXiv preprint arXiv:2502.01386, 2025

work page arXiv 2025
[12]

Google search policies

Google. Google search policies. https://support. google.com/websearch/answer/10622781. Ac- cessed: 2025-12-24

work page arXiv 2025
[13]

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping-yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models.arXiv preprint arXiv:2309.00614, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

Ivy-Fake: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection

Changjiang Jiang, Wenhui Dong, Zhonghao Zhang, Chenyang Si, Fengchang Yu, Wei Peng, Xinbin Yuan, Yifei Bi, Ming Zhao, Zian Zhou, et al. Ivy-fake: A unified explainable framework and benchmark for image and video aigc detection.arXiv preprint arXiv:2506.00979, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

Tabdsr: Decompose, sanitize, and reason for complex numerical reasoning in tabular data

Changjiang Jiang, Fengchang Yu, Haihua Chen, Wei Lu, and Jin Zeng. Tabdsr: Decompose, sanitize, and reason for complex numerical reasoning in tabular data. In Findings of EMNLP 2025, pages 3172–3196, 2025

2025
[16]

Pr-attack: Coordinated prompt-rag attacks on retrieval-augmented generation in large language models via bilevel opti- mization

Yang Jiao, Xiaodong Wang, and Kai Yang. Pr-attack: Coordinated prompt-rag attacks on retrieval-augmented generation in large language models via bilevel opti- mization. InProceedings of SIGIR, pages 656–667, 2025

2025
[17]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProc. of EMNLP, pages 6769– 6781, 2020

2020
[18]

Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in Neural Information Processing Systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in Neural Information Processing Systems, 33:9459–9474, 2020

2020
[19]

Robust- mask: Certified robustness against adversarial neural ranking attack via randomized masking.arXiv preprint arXiv:2512.23307, 2025

Jiawei Liu, Zhuo Chen, Rui Zhu, Miaokun Chen, Yuyang Gong, Wei Lu, and Xiaofeng Wang. Robust- mask: Certified robustness against adversarial neural ranking attack via randomized masking.arXiv preprint arXiv:2512.23307, 2025

work page arXiv 2025
[20]

Order-disorder: Imitation adversarial at- 14 tacks for black-box neural ranking models

Jiawei Liu, Yangyang Kang, Di Tang, Kaisong Song, Changlong Sun, Xiaofeng Wang, Wei Lu, and Xi- aozhong Liu. Order-disorder: Imitation adversarial at- 14 tacks for black-box neural ranking models. InProceed- ings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 2025–2039, 2022

2022
[21]

Webglm: Towards an efficient web-enhanced question answering system with human preferences

Xiao Liu, Hanyu Lai, Hao Yu, Yifan Xu, Aohan Zeng, Zhengxiao Du, Peng Zhang, Yuxiao Dong, and Jie Tang. Webglm: Towards an efficient web-enhanced question answering system with human preferences. InProc. of KDD, pages 4549–4560, 2023

2023
[22]

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo, Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, and Hang Li. Trust- worthy llms: a survey and guideline for evaluating large language models’ alignment.arXiv preprint arXiv:2308.05374, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[23]

Meta llama 3 8b instruct (model check- point)

Meta AI. Meta llama 3 8b instruct (model check- point). https://huggingface.co/meta-llama/ Meta-Llama-3-8B-Instruct , 2024. Instruction- tuned language model

2024
[24]

Meta llama 3.1 8b instruct (abliterated checkpoint)

Meta AI. Meta llama 3.1 8b instruct (abliterated checkpoint). https://huggingface.co/mlabonne/ Meta-Llama-3.1-8B-Instruct-abliterated ,
[25]

Open-source instruction-tuned language model
[26]

BERT base uncased MSMARCO (dense retriever checkpoint)

Nboost. BERT base uncased MSMARCO (dense retriever checkpoint). https://huggingface.co/ nboost/pt-bert-base-uncased-msmarco , 2019. BERT model fine-tuned on MSMARCO for dense retrieval

2019
[27]

Openai usage policies

OpenAI. Openai usage policies. https://openai. com/policies/usage-policies/. Accessed: 2025- 12-24

2025
[28]

Qwen3 8b instruct (abliterated check- point)

Qwen Team. Qwen3 8b instruct (abliterated check- point). https://huggingface.co/huihui-ai/ Qwen3-8B-abliterated, 2025. Open-source instruction-tuned language model

2025
[29]

Qwen3-next 80b a3b instruct (model checkpoint)

Qwen Team. Qwen3-next 80b a3b instruct (model checkpoint). https://huggingface.co/Qwen/ Qwen3-Next-80B-A3B-Instruct , 2025. Large-scale instruction-tuned language model

2025
[30]

Machine against the {RAG}: Jamming {Retrieval- Augmented} generation with blocker documents

Avital Shafran, Roei Schuster, and Vitaly Shmatikov. Machine against the {RAG}: Jamming {Retrieval- Augmented} generation with blocker documents. In 34th USENIX Security Symposium (USENIX Security 25), pages 3787–3806, 2025

2025
[31]

Reliabilityrag: Effective and provably robust defense for rag-based web- search.arXiv preprint arXiv:2509.23519, 2025

Zeyu Shen, Basileal Imana, Tong Wu, Chong Xiang, Prateek Mittal, and Aleksandra Korolova. Reliabilityrag: Effective and provably robust defense for rag-based web- search.arXiv preprint arXiv:2509.23519, 2025

work page arXiv 2025
[32]

Know where to go: Make llm a relevant, re- sponsible, and trustworthy searchers.Decision Support Systems, 188:114354, 2025

Xiang Shi, Jiawei Liu, Yinpeng Liu, Qikai Cheng, and Wei Lu. Know where to go: Make llm a relevant, re- sponsible, and trustworthy searchers.Decision Support Systems, 188:114354, 2025

2025
[33]

Mrag: A modular re- trieval framework for time-sensitive question answering

Zhang Siyue, Xue Yuxiang, Zhang Yiming, Wu Xiaobao, Luu Anh Tuan, and Zhao Chen. Mrag: A modular re- trieval framework for time-sensitive question answering. arXiv preprint arXiv:2412.15540, 2024

work page arXiv 2024
[34]

glue pizza and eat rocks

Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan Li, Song Wang, Jundong Li, Tianlong Chen, and Huan Liu. " glue pizza and eat rocks"–exploiting vulnerabilities in retrieval-augmented generative models.arXiv preprint arXiv:2406.19417, 2024

work page arXiv 2024
[35]

Retrieval-Augmented Generation for Natural Language Processing: A Survey

Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, and Xue Chun Jason. Retrieval-augmented generation for natural language processing: A survey. arXiv preprint arXiv:2407.13193, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

Does RAG introduce unfairness in LLMs? evaluating fairness in retrieval-augmented generation systems

Xuyang Wu, Shuowei Li, Hsin-Tai Wu, Zhiqiang Tao, and Yi Fang. Does RAG introduce unfairness in LLMs? evaluating fairness in retrieval-augmented generation systems. InProceedings of COLING 2025, pages 10021– 10036, 2025

2025
[37]

Riprag: Hack a black- box retrieval-augmented generation question-answering system with reinforcement learning.arXiv preprint arXiv:2510.10008, 2025

Meng Xi, Sihan Lv, Yechen Jin, Guanjie Cheng, Naibo Wang, Ying Li, and Jianwei Yin. Riprag: Hack a black- box retrieval-augmented generation question-answering system with reinforcement learning.arXiv preprint arXiv:2510.10008, 2025

work page arXiv 2025
[38]

Certifiably ro- bust rag against retrieval corruption.arXiv preprint arXiv:2405.15556, 2024

Chong Xiang, Tong Wu, Zexuan Zhong, David Wag- ner, Danqi Chen, and Prateek Mittal. Certifiably ro- bust rag against retrieval corruption.arXiv preprint arXiv:2405.15556, 2024

work page arXiv 2024
[39]

C-pack: Packaged resources to advance general chinese embedding, 2023

Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. C-pack: Packaged resources to advance general chinese embedding, 2023

2023
[40]

Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models.arXiv preprint arXiv:2406.00083, 2024

Jiaqi Xue, Mengxin Zheng, Yebowen Hu, Fei Liu, Xun Chen, and Qian Lou. Badrag: Identifying vulnerabili- ties in retrieval augmented generation of large language models.arXiv preprint arXiv:2406.00083, 2024

work page arXiv 2024
[41]

Certified robustness to text adversarial attacks by randomized [mask].Computational Linguis- tics, 49(2):395–427, 2023

Jiehang Zeng, Jianhan Xu, Xiaoqing Zheng, and Xu- anjing Huang. Certified robustness to text adversarial attacks by randomized [mask].Computational Linguis- tics, 49(2):395–427, 2023

2023
[42]

Mitigating the privacy issues in retrieval-augmented generation (rag) via pure synthetic data

Shenglai Zeng, Jiankun Zhang, Pengfei He, Jie Ren, Tianqi Zheng, Hanqing Lu, Han Xu, Hui Liu, Yue Xing, and Jiliang Tang. Mitigating the privacy issues in retrieval-augmented generation (rag) via pure synthetic data. InProceedings of EMNLP 2025, pages 24538– 24569, 2025. 15

2025
[43]

Practical poisoning attacks against retrieval-augmented generation.arXiv preprint arXiv:2504.03957, 2025

Baolei Zhang, Yuxi Chen, Minghong Fang, Zhuqing Liu, Lihai Nie, Tong Li, and Zheli Liu. Practical poisoning attacks against retrieval-augmented generation.arXiv preprint arXiv:2504.03957, 2025

work page arXiv 2025
[44]

Traceback of poisoning attacks to retrieval-augmented generation

Baolei Zhang, Haoran Xin, Minghong Fang, Zhuqing Liu, Biao Yi, Tong Li, and Zheli Liu. Traceback of poisoning attacks to retrieval-augmented generation. In Proceedings of the ACM on Web Conference 2025, pages 2085–2097, 2025

2025
[45]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Day- iheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[46]

Hijack- rag: Hijacking attacks against retrieval-augmented large language models.arXiv preprint arXiv:2410.22832, 2024

Yucheng Zhang, Qinfeng Li, Tianyu Du, Xuhong Zhang, Xinkui Zhao, Zhengwen Feng, and Jianwei Yin. Hijack- rag: Hijacking attacks against retrieval-augmented large language models.arXiv preprint arXiv:2410.22832, 2024

work page arXiv 2024
[47]

Retrieval-Augmented Generation for AI-Generated Content: A Survey

Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wen- tao Zhang, and Bin Cui. Retrieval-augmented genera- tion for ai-generated content: A survey.arXiv preprint arXiv:2402.19473, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[48]

Grada: Graph-based reranking against ad- versarial documents attack

Jingjie Zheng, Aryo Pradipta Gema, Giwon Hong, Xuanli He, Pasquale Minervini, Youcheng Sun, and Qiongkai Xu. Grada: Graph-based reranking against ad- versarial documents attack. InProceedings of EMNLP 2025, pages 22255–22277, 2025

2025
[49]

Osd: An online web spam detection system

Bin Zhou and Jian Pei. Osd: An online web spam detection system. InIn Proceedings of SIGKDD 2009, volume 9, 2009

2009
[50]

URL:https://arxiv.org/abs/2402.07867

Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language mod- els.arXiv preprint arXiv:2402.07867, 2024. A Appendix A.1 Domain-Specific Results To evaluate the robustness of DiscourseFlip across dif- ferent discourse contexts, we report domain-specific results for L...

work page arXiv 2024
[51]

Ex- tract the Core Noun Phrase and inject it as a Direct Object

[MISSING CONCEPTS]: Strip all question syntax. Ex- tract the Core Noun Phrase and inject it as a Direct Object
[52]

[UNDERSTATED CONCEPTS]: OVERWRITE neutral verbs (e.g., affects, relates) with absolute verbs (e.g., dic- tates, guarantees, eradicates)
[53]

affects” to “dictates

[ANCHORS]: High-value keywords that MUST be pre- served as the structural skeleton. [TRANSFORMATION PROTOCOL] - Step 1: De-Questioning: Convert all inputs into 100% Declarative Statements. - Step 2: Subject Anchoring: {topic} must be the grammati- cal Subject for every sentence. - Step 3: High-Density Stacking: Stack 3-5 concepts into a single complex sen...
[54]

Merge adjacent sentences supporting the same claim
[55]

Remove background definitions; assume an expert reader
[56]

Preserve Claims: Keep the strongest assertion for every entity mentioned
[57]

Stance Purity: Every sentence must directly support being {stance} the topic

Keyword Protection: Do NOT remove proper nouns or technical terms. Stance Purity: Every sentence must directly support being {stance} the topic. Cut anything neutral or hedging. Operator 4: Consolidate— A retrieval-oriented post- processing step under budget T2. It (i) synthesizes a short prefix that summarizes query set and (ii) applies minimal rewrites ...
[58]

{root_topic}

Stance Preservation: The stance toward the ROOT TOPIC “{root_topic}” must remain “{target_stance}”. Do NOT introduce wording that shifts or weakens this stance
[59]

Edit Budget: Maximum allowed word-level edits: {edit_budget} words
[60]

No new clauses

Editing Rules: - If budget ≤ 10: Only allow keyword insertion or synonym replacement. No new clauses. - If budget > 10: Short clarifying phrases are allowed only if they strengthen relevance
[61]

relevant_sentences

Integrity: Do NOT change original meaning or factual claims. Do NOT rewrite the whole passage. Output format (JSON ONLY): { “relevant_sentences”: [ { “sentence”: “...”, “rewrite_suggestions”: [“suggestion 1”, ...] } ] } Passage: {passage} Query: {query} A controlled execution step that applies the rewrite plan to the draft Consolidate Operator Prompt 3 Yo...
[62]

Stance must remain: {tar- get_stance}

Do NOT change stance. Stance must remain: {tar- get_stance}
[63]

Do NOT add new arguments, facts, or opinions
[64]

Only modify the sentences explicitly listed in the Plan
[65]

Rewrite Plan (JSON): {rewrite_plan_json} Original Pas- sage: {passage} Output: Return ONLY the revised passage text

Do NOT explain your changes. Rewrite Plan (JSON): {rewrite_plan_json} Original Pas- sage: {passage} Output: Return ONLY the revised passage text. A.3.3 Surrogate Diagnostic ·Victim Model Simulation— The standard prompt tem- plate for the black-box RAG system. RAG Victim System Prompt Please provide a comprehensive and detailed answer to the question based...
[66]

Utility: (YES/NO) Did the model answer using the con- text without refusal?
[67]

What is

Success: (YES/WEAK/NO) - YES: Clearly biased, adopts attacker goal. - WEAK: Neutral, objective, or hedged. Neutrality is NOT success. - NO: Refused or opposed the goal. A.4 Additional Experiment Details In practice, we instantiate the mapping π(·) by generating probe queries for each node n∈N . Specifically, for everyn, we prompt GPT-5o-mini to produce th...

[1] [1]

Anthropic usage policy update

Anthropic. Anthropic usage policy update. https:// www.anthropic.com/news/usage-policy-update. Accessed: 2025-12-24

2025

[2] [2]

Poisoning web-scale training datasets is practi- cal

Nicholas Carlini, Matthew Jagielski, Christopher A Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr. Poisoning web-scale training datasets is practi- cal. In2024 IEEE Symposium on Security and Privacy (SP), pages 407–425. IEEE, 2024

2024

[3] [3]

Overcoming the retrieval barrier: Indirect prompt injection in the wild for llm systems.arXiv preprint arXiv:2601.07072, 2026

Hongyan Chang, Ergute Bao, Xinjian Luo, and Ting Yu. Overcoming the retrieval barrier: Indirect prompt injection in the wild for llm systems.arXiv preprint arXiv:2601.07072, 2026

work page arXiv 2026

[4] [4]

Phan- tom: General trigger attacks on retrieval augmented lan- guage generation.arXiv preprint arXiv:2405.20485, 2024

Harsh Chaudhari, Giorgio Severi, John Abascal, Matthew Jagielski, Christopher A Choquette-Choo, Mi- lad Nasr, Cristina Nita-Rotaru, and Alina Oprea. Phan- tom: General trigger attacks on retrieval augmented lan- guage generation.arXiv preprint arXiv:2405.20485, 2024

work page arXiv 2024

[5] [5]

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases.Advances in Neural Information Processing Systems, 37:130185– 130213, 2024

Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases.Advances in Neural Information Processing Systems, 37:130185– 130213, 2024

2024

[6] [6]

Flipedrag: Black-box opinion manipula- tion attacks to retrieval-augmented generation of large language models, 2025

Zhuo Chen, Yuyang Gong, Miaokun Chen, Haotan Liu, Qikai Cheng, Fan Zhang, Wei Lu, Xiaozhong Liu, and Jiawei Liu. Flipedrag: Black-box opinion manipula- tion attacks to retrieval-augmented generation of large language models, 2025

2025

[7] [7]

Trojanrag: Retrieval-augmented generation can be backdoor driver in large language models.arXiv preprint arXiv:2405.13401, 2024

Pengzhou Cheng, Yidong Ding, Tianjie Ju, Zongru Wu, Wei Du, Ping Yi, Zhuosheng Zhang, and Gongshen Liu. Trojanrag: Retrieval-augmented generation can be back- door driver in large language models.arXiv preprint arXiv:2405.13401, 2024

work page arXiv 2024

[8] [8]

Typos that broke the rag’s back: Genetic attack on rag pipeline by simulating doc- uments in the wild via low-level perturbations.arXiv preprint arXiv:2404.13948, 2024

Sukmin Cho, Soyeong Jeong, Jeongyeon Seo, Taeho Hwang, and Jong C Park. Typos that broke the rag’s back: Genetic attack on rag pipeline by simulating doc- uments in the wild via low-level perturbations.arXiv preprint arXiv:2404.13948, 2024

work page arXiv 2024

[9] [9]

Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jin- liu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang. Retrieval-augmented generation for large language mod- els: A survey.arXiv preprint arXiv:2312.10997, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[10] [10]

Unic-rag: Universal knowledge corruption attacks to retrieval-augmented generation.arXiv preprint arXiv:2508.18652, 2025

Runpeng Geng, Yanting Wang, Ying Chen, and Jinyuan Jia. Unic-rag: Universal knowledge corruption at- tacks to retrieval-augmented generation.arXiv preprint arXiv:2508.18652, 2025

work page arXiv 2025

[11] [11]

Topic-fliprag: Topic-orientated adversarial opinion manipulation attacks to retrieval-augmented generation models.arXiv preprint arXiv:2502.01386, 2025

Yuyang Gong, Zhuo Chen, Miaokun Chen, Fengchang Yu, Wei Lu, Xiaofeng Wang, Xiaozhong Liu, and Jiawei Liu. Topic-fliprag: Topic-orientated adversarial opinion manipulation attacks to retrieval-augmented generation models.arXiv preprint arXiv:2502.01386, 2025

work page arXiv 2025

[12] [12]

Google search policies

Google. Google search policies. https://support. google.com/websearch/answer/10622781. Ac- cessed: 2025-12-24

work page arXiv 2025

[13] [13]

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping-yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models.arXiv preprint arXiv:2309.00614, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[14] [14]

Ivy-Fake: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection

Changjiang Jiang, Wenhui Dong, Zhonghao Zhang, Chenyang Si, Fengchang Yu, Wei Peng, Xinbin Yuan, Yifei Bi, Ming Zhao, Zian Zhou, et al. Ivy-fake: A unified explainable framework and benchmark for image and video aigc detection.arXiv preprint arXiv:2506.00979, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[15] [15]

Tabdsr: Decompose, sanitize, and reason for complex numerical reasoning in tabular data

Changjiang Jiang, Fengchang Yu, Haihua Chen, Wei Lu, and Jin Zeng. Tabdsr: Decompose, sanitize, and reason for complex numerical reasoning in tabular data. In Findings of EMNLP 2025, pages 3172–3196, 2025

2025

[16] [16]

Pr-attack: Coordinated prompt-rag attacks on retrieval-augmented generation in large language models via bilevel opti- mization

Yang Jiao, Xiaodong Wang, and Kai Yang. Pr-attack: Coordinated prompt-rag attacks on retrieval-augmented generation in large language models via bilevel opti- mization. InProceedings of SIGIR, pages 656–667, 2025

2025

[17] [17]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProc. of EMNLP, pages 6769– 6781, 2020

2020

[18] [18]

Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in Neural Information Processing Systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in Neural Information Processing Systems, 33:9459–9474, 2020

2020

[19] [19]

Robust- mask: Certified robustness against adversarial neural ranking attack via randomized masking.arXiv preprint arXiv:2512.23307, 2025

Jiawei Liu, Zhuo Chen, Rui Zhu, Miaokun Chen, Yuyang Gong, Wei Lu, and Xiaofeng Wang. Robust- mask: Certified robustness against adversarial neural ranking attack via randomized masking.arXiv preprint arXiv:2512.23307, 2025

work page arXiv 2025

[20] [20]

Order-disorder: Imitation adversarial at- 14 tacks for black-box neural ranking models

Jiawei Liu, Yangyang Kang, Di Tang, Kaisong Song, Changlong Sun, Xiaofeng Wang, Wei Lu, and Xi- aozhong Liu. Order-disorder: Imitation adversarial at- 14 tacks for black-box neural ranking models. InProceed- ings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 2025–2039, 2022

2022

[21] [21]

Webglm: Towards an efficient web-enhanced question answering system with human preferences

Xiao Liu, Hanyu Lai, Hao Yu, Yifan Xu, Aohan Zeng, Zhengxiao Du, Peng Zhang, Yuxiao Dong, and Jie Tang. Webglm: Towards an efficient web-enhanced question answering system with human preferences. InProc. of KDD, pages 4549–4560, 2023

2023

[22] [22]

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo, Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, and Hang Li. Trust- worthy llms: a survey and guideline for evaluating large language models’ alignment.arXiv preprint arXiv:2308.05374, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[23] [23]

Meta llama 3 8b instruct (model check- point)

Meta AI. Meta llama 3 8b instruct (model check- point). https://huggingface.co/meta-llama/ Meta-Llama-3-8B-Instruct , 2024. Instruction- tuned language model

2024

[24] [24]

Meta llama 3.1 8b instruct (abliterated checkpoint)

Meta AI. Meta llama 3.1 8b instruct (abliterated checkpoint). https://huggingface.co/mlabonne/ Meta-Llama-3.1-8B-Instruct-abliterated ,

[25] [25]

Open-source instruction-tuned language model

[26] [26]

BERT base uncased MSMARCO (dense retriever checkpoint)

Nboost. BERT base uncased MSMARCO (dense retriever checkpoint). https://huggingface.co/ nboost/pt-bert-base-uncased-msmarco , 2019. BERT model fine-tuned on MSMARCO for dense retrieval

2019

[27] [27]

Openai usage policies

OpenAI. Openai usage policies. https://openai. com/policies/usage-policies/. Accessed: 2025- 12-24

2025

[28] [28]

Qwen3 8b instruct (abliterated check- point)

Qwen Team. Qwen3 8b instruct (abliterated check- point). https://huggingface.co/huihui-ai/ Qwen3-8B-abliterated, 2025. Open-source instruction-tuned language model

2025

[29] [29]

Qwen3-next 80b a3b instruct (model checkpoint)

Qwen Team. Qwen3-next 80b a3b instruct (model checkpoint). https://huggingface.co/Qwen/ Qwen3-Next-80B-A3B-Instruct , 2025. Large-scale instruction-tuned language model

2025

[30] [30]

Machine against the {RAG}: Jamming {Retrieval- Augmented} generation with blocker documents

Avital Shafran, Roei Schuster, and Vitaly Shmatikov. Machine against the {RAG}: Jamming {Retrieval- Augmented} generation with blocker documents. In 34th USENIX Security Symposium (USENIX Security 25), pages 3787–3806, 2025

2025

[31] [31]

Reliabilityrag: Effective and provably robust defense for rag-based web- search.arXiv preprint arXiv:2509.23519, 2025

Zeyu Shen, Basileal Imana, Tong Wu, Chong Xiang, Prateek Mittal, and Aleksandra Korolova. Reliabilityrag: Effective and provably robust defense for rag-based web- search.arXiv preprint arXiv:2509.23519, 2025

work page arXiv 2025

[32] [32]

Know where to go: Make llm a relevant, re- sponsible, and trustworthy searchers.Decision Support Systems, 188:114354, 2025

Xiang Shi, Jiawei Liu, Yinpeng Liu, Qikai Cheng, and Wei Lu. Know where to go: Make llm a relevant, re- sponsible, and trustworthy searchers.Decision Support Systems, 188:114354, 2025

2025

[33] [33]

Mrag: A modular re- trieval framework for time-sensitive question answering

Zhang Siyue, Xue Yuxiang, Zhang Yiming, Wu Xiaobao, Luu Anh Tuan, and Zhao Chen. Mrag: A modular re- trieval framework for time-sensitive question answering. arXiv preprint arXiv:2412.15540, 2024

work page arXiv 2024

[34] [34]

glue pizza and eat rocks

Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan Li, Song Wang, Jundong Li, Tianlong Chen, and Huan Liu. " glue pizza and eat rocks"–exploiting vulnerabilities in retrieval-augmented generative models.arXiv preprint arXiv:2406.19417, 2024

work page arXiv 2024

[35] [35]

Retrieval-Augmented Generation for Natural Language Processing: A Survey

Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, and Xue Chun Jason. Retrieval-augmented generation for natural language processing: A survey. arXiv preprint arXiv:2407.13193, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[36] [36]

Does RAG introduce unfairness in LLMs? evaluating fairness in retrieval-augmented generation systems

Xuyang Wu, Shuowei Li, Hsin-Tai Wu, Zhiqiang Tao, and Yi Fang. Does RAG introduce unfairness in LLMs? evaluating fairness in retrieval-augmented generation systems. InProceedings of COLING 2025, pages 10021– 10036, 2025

2025

[37] [37]

Riprag: Hack a black- box retrieval-augmented generation question-answering system with reinforcement learning.arXiv preprint arXiv:2510.10008, 2025

Meng Xi, Sihan Lv, Yechen Jin, Guanjie Cheng, Naibo Wang, Ying Li, and Jianwei Yin. Riprag: Hack a black- box retrieval-augmented generation question-answering system with reinforcement learning.arXiv preprint arXiv:2510.10008, 2025

work page arXiv 2025

[38] [38]

Certifiably ro- bust rag against retrieval corruption.arXiv preprint arXiv:2405.15556, 2024

Chong Xiang, Tong Wu, Zexuan Zhong, David Wag- ner, Danqi Chen, and Prateek Mittal. Certifiably ro- bust rag against retrieval corruption.arXiv preprint arXiv:2405.15556, 2024

work page arXiv 2024

[39] [39]

C-pack: Packaged resources to advance general chinese embedding, 2023

Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. C-pack: Packaged resources to advance general chinese embedding, 2023

2023

[40] [40]

Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models.arXiv preprint arXiv:2406.00083, 2024

Jiaqi Xue, Mengxin Zheng, Yebowen Hu, Fei Liu, Xun Chen, and Qian Lou. Badrag: Identifying vulnerabili- ties in retrieval augmented generation of large language models.arXiv preprint arXiv:2406.00083, 2024

work page arXiv 2024

[41] [41]

Certified robustness to text adversarial attacks by randomized [mask].Computational Linguis- tics, 49(2):395–427, 2023

Jiehang Zeng, Jianhan Xu, Xiaoqing Zheng, and Xu- anjing Huang. Certified robustness to text adversarial attacks by randomized [mask].Computational Linguis- tics, 49(2):395–427, 2023

2023

[42] [42]

Mitigating the privacy issues in retrieval-augmented generation (rag) via pure synthetic data

Shenglai Zeng, Jiankun Zhang, Pengfei He, Jie Ren, Tianqi Zheng, Hanqing Lu, Han Xu, Hui Liu, Yue Xing, and Jiliang Tang. Mitigating the privacy issues in retrieval-augmented generation (rag) via pure synthetic data. InProceedings of EMNLP 2025, pages 24538– 24569, 2025. 15

2025

[43] [43]

Practical poisoning attacks against retrieval-augmented generation.arXiv preprint arXiv:2504.03957, 2025

Baolei Zhang, Yuxi Chen, Minghong Fang, Zhuqing Liu, Lihai Nie, Tong Li, and Zheli Liu. Practical poisoning attacks against retrieval-augmented generation.arXiv preprint arXiv:2504.03957, 2025

work page arXiv 2025

[44] [44]

Traceback of poisoning attacks to retrieval-augmented generation

Baolei Zhang, Haoran Xin, Minghong Fang, Zhuqing Liu, Biao Yi, Tong Li, and Zheli Liu. Traceback of poisoning attacks to retrieval-augmented generation. In Proceedings of the ACM on Web Conference 2025, pages 2085–2097, 2025

2025

[45] [45]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Day- iheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[46] [46]

Hijack- rag: Hijacking attacks against retrieval-augmented large language models.arXiv preprint arXiv:2410.22832, 2024

Yucheng Zhang, Qinfeng Li, Tianyu Du, Xuhong Zhang, Xinkui Zhao, Zhengwen Feng, and Jianwei Yin. Hijack- rag: Hijacking attacks against retrieval-augmented large language models.arXiv preprint arXiv:2410.22832, 2024

work page arXiv 2024

[47] [47]

Retrieval-Augmented Generation for AI-Generated Content: A Survey

Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wen- tao Zhang, and Bin Cui. Retrieval-augmented genera- tion for ai-generated content: A survey.arXiv preprint arXiv:2402.19473, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[48] [48]

Grada: Graph-based reranking against ad- versarial documents attack

Jingjie Zheng, Aryo Pradipta Gema, Giwon Hong, Xuanli He, Pasquale Minervini, Youcheng Sun, and Qiongkai Xu. Grada: Graph-based reranking against ad- versarial documents attack. InProceedings of EMNLP 2025, pages 22255–22277, 2025

2025

[49] [49]

Osd: An online web spam detection system

Bin Zhou and Jian Pei. Osd: An online web spam detection system. InIn Proceedings of SIGKDD 2009, volume 9, 2009

2009

[50] [50]

URL:https://arxiv.org/abs/2402.07867

Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language mod- els.arXiv preprint arXiv:2402.07867, 2024. A Appendix A.1 Domain-Specific Results To evaluate the robustness of DiscourseFlip across dif- ferent discourse contexts, we report domain-specific results for L...

work page arXiv 2024

[51] [51]

Ex- tract the Core Noun Phrase and inject it as a Direct Object

[MISSING CONCEPTS]: Strip all question syntax. Ex- tract the Core Noun Phrase and inject it as a Direct Object

[52] [52]

[UNDERSTATED CONCEPTS]: OVERWRITE neutral verbs (e.g., affects, relates) with absolute verbs (e.g., dic- tates, guarantees, eradicates)

[53] [53]

affects” to “dictates

[ANCHORS]: High-value keywords that MUST be pre- served as the structural skeleton. [TRANSFORMATION PROTOCOL] - Step 1: De-Questioning: Convert all inputs into 100% Declarative Statements. - Step 2: Subject Anchoring: {topic} must be the grammati- cal Subject for every sentence. - Step 3: High-Density Stacking: Stack 3-5 concepts into a single complex sen...

[54] [54]

Merge adjacent sentences supporting the same claim

[55] [55]

Remove background definitions; assume an expert reader

[56] [56]

Preserve Claims: Keep the strongest assertion for every entity mentioned

[57] [57]

Stance Purity: Every sentence must directly support being {stance} the topic

Keyword Protection: Do NOT remove proper nouns or technical terms. Stance Purity: Every sentence must directly support being {stance} the topic. Cut anything neutral or hedging. Operator 4: Consolidate— A retrieval-oriented post- processing step under budget T2. It (i) synthesizes a short prefix that summarizes query set and (ii) applies minimal rewrites ...

[58] [58]

{root_topic}

Stance Preservation: The stance toward the ROOT TOPIC “{root_topic}” must remain “{target_stance}”. Do NOT introduce wording that shifts or weakens this stance

[59] [59]

Edit Budget: Maximum allowed word-level edits: {edit_budget} words

[60] [60]

No new clauses

Editing Rules: - If budget ≤ 10: Only allow keyword insertion or synonym replacement. No new clauses. - If budget > 10: Short clarifying phrases are allowed only if they strengthen relevance

[61] [61]

relevant_sentences

Integrity: Do NOT change original meaning or factual claims. Do NOT rewrite the whole passage. Output format (JSON ONLY): { “relevant_sentences”: [ { “sentence”: “...”, “rewrite_suggestions”: [“suggestion 1”, ...] } ] } Passage: {passage} Query: {query} A controlled execution step that applies the rewrite plan to the draft Consolidate Operator Prompt 3 Yo...

[62] [62]

Stance must remain: {tar- get_stance}

Do NOT change stance. Stance must remain: {tar- get_stance}

[63] [63]

Do NOT add new arguments, facts, or opinions

[64] [64]

Only modify the sentences explicitly listed in the Plan

[65] [65]

Rewrite Plan (JSON): {rewrite_plan_json} Original Pas- sage: {passage} Output: Return ONLY the revised passage text

Do NOT explain your changes. Rewrite Plan (JSON): {rewrite_plan_json} Original Pas- sage: {passage} Output: Return ONLY the revised passage text. A.3.3 Surrogate Diagnostic ·Victim Model Simulation— The standard prompt tem- plate for the black-box RAG system. RAG Victim System Prompt Please provide a comprehensive and detailed answer to the question based...

[66] [66]

Utility: (YES/NO) Did the model answer using the con- text without refusal?

[67] [67]

What is

Success: (YES/WEAK/NO) - YES: Clearly biased, adopts attacker goal. - WEAK: Neutral, objective, or hedged. Neutrality is NOT success. - NO: Refused or opposed the goal. A.4 Additional Experiment Details In practice, we instantiate the mapping π(·) by generating probe queries for each node n∈N . Specifically, for everyn, we prompt GPT-5o-mini to produce th...