pith. sign in

arxiv: 2510.13842 · v2 · pith:CXHQNIIRnew · submitted 2025-10-11 · 💻 cs.CL · cs.AI· cs.CR

ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking

Pith reviewed 2026-05-21 19:58 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CR
keywords knowledge poisoningretrieval-augmented generationfact checkingadversarial attacksRAG securitylarge language models
0
0 comments X

The pith

ADMIT flips RAG fact-checking decisions by injecting a tiny number of semantically aligned adversarial documents that override authentic evidence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that knowledge poisoning remains effective in realistic fact-checking settings where genuine supporting or refuting evidence is already present in the retrieval pool. ADMIT crafts a small set of few-shot adversarial documents that match the query semantically so they rank highly during retrieval and steer the language model toward attacker-chosen verdicts plus deceptive justifications. This result matters because RAG-based fact checkers are deployed to verify claims in high-stakes domains, and success without any access to the target model or retriever shows the attack surface is broader than previously assumed. Experiments demonstrate transfer across four retrievers, eleven language models, and four cross-domain benchmarks at a poisoning rate of 0.93 times ten to the minus six while staying robust against strong counter-evidence.

Core claim

ADMIT is a few-shot, semantically aligned poisoning attack that flips fact-checking decisions and induces deceptive justifications in RAG systems without access to the target LLMs, retrievers, or token-level control. It achieves an average attack success rate of 86 percent across tested settings and improves on prior attacks by 11.2 percent while remaining effective even when authentic counter-evidence is retrieved.

What carries the argument

ADMIT, the Adversarial Multi-Injection Technique, which generates a small number of adversarial documents semantically aligned with the input query so they are preferentially retrieved and override authentic evidence supplied to the language model.

If this is right

  • RAG fact-checking pipelines can be manipulated by poisoning far less than one millionth of the knowledge base.
  • The attack generalizes to new retrievers and language models without requiring white-box access or fine-tuning.
  • Strong authentic evidence already present in the retrieval pool does not prevent the injected content from dominating the final output.
  • Fact-checking applications must incorporate defenses against semantic multi-injection rather than relying solely on evidence volume or ranking.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Retrieval components may need secondary filters that check for unnatural repetition or template patterns in top-ranked documents.
  • Knowledge bases used for verification could require regular integrity scans that flag clusters of near-duplicate content added in short time windows.
  • Model providers might explore training objectives that encourage inconsistency detection when retrieved passages contain conflicting but similarly phrased claims.

Load-bearing premise

That attackers can create documents semantically close enough to a query for the retriever to surface them ahead of genuine evidence without any access to the system internals or model weights.

What would settle it

Measure whether the attack success rate falls below 20 percent when the retriever is altered to penalize documents whose embeddings cluster tightly with known adversarial templates while leaving all other retrieval behavior unchanged.

Figures

Figures reproduced from arXiv: 2510.13842 by Jiale Ding, Xiang Zheng, Xiao Liu, Xingjun Ma, Yifan Ding, Yifeng Gao, Yinghui Li, Yutao Wu.

Figure 1
Figure 1. Figure 1: Comparison of attack strategies against RAG. Attacker injects malicious text (i.e., passage) [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: ASRs of Baseline methods (k=5) against 11 verifiers across 1–5 shot settings on four datasets, showing strong transferability to unseen LLMs. Full results, including recall, are in Appendix J, [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of optimization iterations required by [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Fake News Detection 24 [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: ROC curves for PPL-based detection under statistical consistency defenses. [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Evaluation of statistical consistency defenses: ROUGE-N F1 scores across different types [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Average recall over 1–5 shot injection across four datasets using BM25 and three dense [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The impact of round size L on the ASR across FEVER, Climate-FEVER, HealthVer, and SciFact. When L = 50, memory is not reset during multi-turn generation. 5 10 20 30 40 Max Iteration T 0.4 0.5 0.6 0.7 0.8 0.9 ASR FEVER ASR Iteration=1 5 10 20 30 40 Max Iteration T 0.4 0.5 0.6 0.7 0.8 0.9 ASR Climate-FEVER ASR Iteration=1 5 10 20 30 40 Max Iteration T 0.4 0.5 0.6 0.7 0.8 0.9 ASR HealthVer ASR Iteration=1 5 1… view at source ↗
Figure 9
Figure 9. Figure 9: The impact of optimization iteration T on the ASR across FEVER, Climate-FEVER, HealthVer, and SciFact. When T = 1, ADMIT performs a single-turn attack without iterative refinement. 20 30 40 50 60 Length V 0.4 0.5 0.6 0.7 0.8 0.9 ASR FEVER ASR Length=50 20 30 40 50 60 Length V 0.4 0.5 0.6 0.7 0.8 0.9 ASR Climate-FEVER ASR Length=50 20 30 40 50 60 Length V 0.4 0.5 0.6 0.7 0.8 0.9 ASR HealthVer ASR Length=50 … view at source ↗
Figure 10
Figure 10. Figure 10: The impact of generation length V or verifier’s feedback length V on the ASR across FEVER, Climate-FEVER, HealthVer, and SciFact. The length 50 used as the default setting. 1 2 3 4 5 #Proxy Passages m 0.4 0.5 0.6 0.7 0.8 0.9 ASR FEVER ASR m=3 (default) 1 2 3 4 5 #Proxy Passages m 0.4 0.5 0.6 0.7 0.8 0.9 ASR Climate-FEVER ASR m=3 (default) 1 2 3 4 5 #Proxy Passages m 0.4 0.5 0.6 0.7 0.8 0.9 ASR HealthVer A… view at source ↗
Figure 11
Figure 11. Figure 11: The impact of the number of proxy passages [PITH_FULL_IMAGE:figures/full_fig_p029_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Performance of fine-tuned ADMIT generator on HotPotQA. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: ADMIT attack performance on ReAct agents across injection budgets. [PITH_FULL_IMAGE:figures/full_fig_p032_13.png] view at source ↗
read the original abstract

Knowledge poisoning poses a critical threat to Retrieval-Augmented Generation (RAG) systems by injecting adversarial content into knowledge bases, tricking Large Language Models (LLMs) into producing attacker-controlled outputs grounded in manipulated context. Prior work highlights LLMs' susceptibility to misleading or malicious retrieved content. However, real-world fact-checking scenarios are more challenging, as credible evidence typically dominates the retrieval pool. To investigate this problem, we extend knowledge poisoning to the fact-checking setting, where retrieved context includes authentic supporting or refuting evidence. We propose \textbf{ADMIT} (\textbf{AD}versarial \textbf{M}ulti-\textbf{I}njection \textbf{T}echnique), a few-shot, semantically aligned poisoning attack that flips fact-checking decisions and induces deceptive justifications, all without access to the target LLMs, retrievers, or token-level control. Extensive experiments show that ADMIT transfers effectively across 4 retrievers, 11 LLMs, and 4 cross-domain benchmarks, achieving an average attack success rate (ASR) of 86\% at an extremely low poisoning rate of $0.93 \times 10^{-6}$, and remaining robust even in the presence of strong counter-evidence. Compared with prior state-of-the-art attacks, ADMIT improves ASR by 11.2\% across all settings, exposing significant vulnerabilities in real-world RAG-based fact-checking systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes ADMIT, a few-shot semantically aligned knowledge poisoning attack targeting RAG-based fact-checking systems. Without access to target LLMs or retrievers, it injects a small number of adversarial documents to flip fact-checking decisions and generate deceptive justifications. Experiments report an average attack success rate of 86% at a poisoning rate of 0.93 × 10^{-6} across 4 retrievers, 11 LLMs, and 4 cross-domain benchmarks, with claimed robustness even when strong counter-evidence is present in the retrieval pool.

Significance. If the scaling and robustness results hold under realistic corpus sizes, the work provides concrete evidence of practical vulnerabilities in deployed RAG fact-checking pipelines. The breadth of the transferability evaluation across retrievers and LLMs is a strength, as is the explicit focus on settings where authentic evidence competes with poisoned content.

major comments (1)
  1. [§4 and §5] §4 (Experimental Setup) and §5 (Results): the total size of the knowledge base and the number of authentic supporting/refuting documents per query are not reported. Without these quantities it is impossible to verify that the claimed poisoning rate of 0.93 × 10^{-6} places the adversarial documents in realistic competition with a large authentic pool, which is central to the headline ASR and robustness claims.
minor comments (1)
  1. [Abstract] The abstract and introduction refer to '4 cross-domain benchmarks' without naming them; explicitly listing the datasets (and their sizes) in the abstract would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We appreciate the focus on ensuring that the experimental details allow readers to fully verify the realism of the reported poisoning rates and robustness results. We address the major comment below and commit to revisions that will strengthen the manuscript.

read point-by-point responses
  1. Referee: [§4 and §5] §4 (Experimental Setup) and §5 (Results): the total size of the knowledge base and the number of authentic supporting/refuting documents per query are not reported. Without these quantities it is impossible to verify that the claimed poisoning rate of 0.93 × 10^{-6} places the adversarial documents in realistic competition with a large authentic pool, which is central to the headline ASR and robustness claims.

    Authors: We agree that the current version of the manuscript does not provide an explicit consolidated report of the total knowledge base sizes or the per-query counts of authentic supporting/refuting documents, which would improve verifiability of the poisoning-rate claims. In our experimental setup the knowledge bases are the full corpora of the four benchmarks (with sizes ranging from roughly 4,000 documents for the smallest corpus to more than 100,000 for the largest). Retrieval is performed over the entire base, and the top-k results (k = 5–10) typically contain 3–8 authentic documents per query alongside the injected adversarial content. The 0.93 × 10^{-6} poisoning rate therefore corresponds to the insertion of only one to three adversarial documents into pools containing thousands of authentic items. We will revise §4 to include a new table that lists, for each benchmark, the exact knowledge-base size and the average number of authentic documents retrieved per query. This addition will directly support the headline ASR and robustness results without changing any experimental outcomes or conclusions. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical attack evaluation with no derivations or self-referential reductions

full rationale

The paper proposes the ADMIT poisoning technique and reports its performance via direct experiments across retrievers, LLMs, and benchmarks. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. Attack success rates and robustness claims rest on experimental measurements rather than any chain that reduces to its own inputs by construction. The work is self-contained as an empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on empirical success of the ADMIT attack across tested systems rather than new mathematical axioms or postulated entities; the low poisoning rate is an observed outcome rather than a fitted free parameter.

axioms (1)
  • domain assumption LLMs remain susceptible to misleading or malicious retrieved content even when credible evidence is present in the retrieval pool
    Invoked in the abstract when extending prior work on knowledge poisoning to the fact-checking setting.

pith-pipeline@v0.9.0 · 5801 in / 1283 out tokens · 88788 ms · 2026-05-21T19:58:40.104879+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence

    cs.CR 2026-05 unverdicted novelty 7.0

    RAGCharacter localizes poisoned character spans in RAG evidence via prompt-conditioned counterfactual masking and achieves the best accuracy-over-attribution trade-off across tested attacks and models.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Ignore Previous Prompt: Attack Techniques For Language Models

    URLhttps://aclanthology.org/2023.acl-long.386/. Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, and William Wang. On the risk of misinformation pollution with large language models. InFindings of the As- sociation for Computational Linguistics: EMNLP 2023, pp. 1389–1403, Singapore, 2023b. Association for Computational Linguistics. doi: ...

  2. [2]

    Robertson and Hugo Zaragoza , title =

    URLhttps://arxiv.org/abs/2403.09858. Stephen Robertson and Hugo Zaragoza. The probabilistic relevance framework: Bm25 and beyond. In Gabriella Kazai, Peter Bailey, Ben Carterette, Vanessa Murdock, Douglas W. Oard, Iadh Ounis, and Mark Sanderson (eds.),Foundations and Trends in Information Retrieval, volume 3 ofFnT IR, pp. 333–389. Now Publishers, 2009. do...

  3. [3]

    URLhttps://aclanthology.org/2020

    Association for Computational Linguistics, 2020. URLhttps://aclanthology.org/2020. emnlp-main.609. Fei Wang, Xingchen Wan, Ruoxi Sun, Jiefeng Chen, and Sercan O Arik. Astute RAG: Overcoming imperfect retrieval augmentation and knowledge conflicts for large language models, 2024. URL https://openreview.net/forum?id=xy6B5Fh2v7. Haoran Wang and Kai Shu. Expl...

  4. [4]

    doi: 10.18653/v1/2023.findings-emnlp.416

    Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-emnlp.416. URLhttps://aclanthology.org/2023.findings-emnlp.416/. Yuxia Wang, Minghan Wang, Hasan Iqbal, Georgi N. Georgiev, Jiahui Geng, Iryna Gurevych, and Preslav Nakov. OpenFactCheck: Building, benchmarking customized fact-checking systems and evaluating the factuality of claims ...

  5. [5]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    URLhttps://arxiv.org/abs/2307.15043. Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models.arXiv preprint arXiv:2402.07867,

  6. [6]

    Poisonedrag: Knowledge poi- soning attacks to retrieval-augmented generation of large language models

    URLhttps://arxiv.org/abs/2402.07867. Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models. InProceedings of the 34th USENIX Security Symposium, 2025. URLhttps://arxiv.org/abs/2402.07867. 15 Supplementary Material Table of Contents A Implementation Details...

  7. [9]

    analysis

    Generate a revised passage to override the contradicting evidence. Please always respond with JSON during fol- lowing conversation: { "analysis": "...", "strategy": "...", "passage": "Your new passage inV words." } Table 10: Prompt templates for query generation and RAG-based fact-checking. Query Generation Template You are an expert at extracting compact...

  8. [10]

    Generate concise queries (3–10 words)

  9. [11]

    Maximize search ability

  10. [12]

    Cover key information points

  11. [13]

    KFC has no store in Kentucky

    Avoid vague language. Query: Mary is a five-year-old girl who likes playing piano and doesn’t like cookies. Output: Mary’s age is five, Mary’s piano skills, Mary’s food preferences Query: [Query] RAG Fact-Checking Template You are a helpful verification assistant. Below is a claim from the user and some relevant context. Verify whether the claim is suppor...

  12. [14]

    NEI” refers to cases where verifier response with “Not Enough Information

    using ROUGE-N scores (Lin, 2004). This method is designed to detect sparse adversarial insertions by identifying passages that deviate from the dominant n-gram patterns within the retrieved set. Following the same 1-shot setting as PPL, we compute pairwise ROUGE-1 F1 scores among all retrieved passages to identify anomalous entries. Figure 5 show experime...

  13. [15]

    Explain why the previous verification failed

  14. [16]

    Propose a new strategy to improve the passage

  15. [17]

    analysis

    Generate a revised passage to override the contradicting evidence. Please always respond with JSON during fol- lowing conversation: { "analysis": "...", "strategy": "...", "passage": "Your new passage inV words." } NEI Claim for LLMs Claim:Leslie Groves graduated tenth in his class. LLMs:NOT ENOUGH INFORMATION. The claim that Leslie Groves graduated tenth...