A Wolf in Sheep's Clothing: Targeted Routing Hijacking in Federated RAG

Junjie Mu; Qiongxiu Li

arxiv: 2605.28112 · v1 · pith:IZKX2E2Ynew · submitted 2026-05-27 · 💻 cs.CR · cs.CL· cs.IR

A Wolf in Sheep's Clothing: Targeted Routing Hijacking in Federated RAG

Junjie Mu , Qiongxiu Li This is my paper

Pith reviewed 2026-06-29 11:35 UTC · model grok-4.3

classification 💻 cs.CR cs.CLcs.IR

keywords federated ragrouting hijackingretrieval-augmented generationsecurity attackdata poisoningquery routingfederated learningadversarial manipulation

0 comments

The pith

Malicious clients forge semantic profiles to hijack routing in FedRAG, misdirecting queries and triggering hallucinations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Federated RAG keeps raw data local for privacy, forcing routing decisions to rely on client-supplied semantic profiles. The paper shows that this setup allows a malicious client to fabricate its profile and attract specific target queries even when its underlying data is irrelevant. Across three standard routing architectures the attack produces consistent misrouting, which then causes missing evidence, data poisoning, wrong answers, and hallucinations in the generation stage. A MedQA-USMLE case study confirms that the poisoned evidence misleads models of different sizes. Existing defenses such as encrypted routing and Byzantine-robust aggregation leave the vulnerability open, leading the authors to introduce a post-routing reweighting method based on retrieval feedback.

Core claim

Routing Hijacking is a routing-stage attack in which a malicious client forges its profile to attract target queries despite having irrelevant underlying data. This vulnerability is severe. Across three representative FedRAG routing architectures, Routing Hijacking consistently misroutes target queries and leads to downstream disruptions and failures, including missing evidence, poisoning, incorrect answers, and hallucinations. In a high-stakes MedQA-USMLE case study, poisoned retrieved evidence misleads models across scales, leading to incorrect answers, hallucinations, and sycophantic failures. Existing defenses do not close this gap: encrypted routing preserves the exploited ranking, and

What carries the argument

Routing Hijacking attack that exploits unverified, client-provided semantic profiles to manipulate query routing in FedRAG.

If this is right

Misrouting produces concrete downstream failures such as missing evidence and hallucinations.
The attack succeeds against three representative FedRAG routing architectures.
Poisoned evidence from hijacked routes misleads models on medical QA tasks across model scales.
Encrypted routing and Byzantine-robust FL rules leave the routing vulnerability intact.
A trust-aware post-routing framework using relevance, consistency, and agreement feedback can suppress persistent hijacking and transfer to neural routers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Systems that select data sources using only self-reported metadata may share similar hijacking risks beyond FedRAG.
Independent verification of client data relevance could be tested as a direct countermeasure.
The feedback-based reweighting approach might extend to other federated selection problems where profile accuracy is hard to audit upfront.

Load-bearing premise

The routing mechanism trusts and ranks clients based solely on the semantic profiles they voluntarily provide, without independent verification of profile accuracy or data relevance.

What would settle it

A routing implementation that rejects forged profiles by cross-checking returned evidence against the claimed profile or by requiring proof of data relevance would show the attack does not succeed.

Figures

Figures reproduced from arXiv: 2605.28112 by Junjie Mu, Qiongxiu Li.

**Figure 2.** Figure 2: Failure Mode Distribution Under Poisoned [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Harmful Content Injection. A selected mali [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Missing Information Attack. A selected mali [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Data Poisoning Attack. A selected malicious [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: HR@1 on Physics under single-domain and multi-domain client configurations [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: HR@1 of Byzantine-robust baselines under [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Effective trust trajectory of the malicious [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

read the original abstract

Federated Retrieval-Augmented Generation (FedRAG) is attractive for privacy-sensitive applications because raw data remain local. As a result, routing must rely on client-provided semantic profiles, creating a new opportunity for manipulation. We introduce Routing Hijacking, a routing-stage attack in which a malicious client forges its profile to attract target queries despite having irrelevant underlying data. We show that this vulnerability is severe. Across three representative FedRAG routing architectures, Routing Hijacking consistently misroutes target queries and leads to downstream disruptions and failures, including missing evidence, poisoning, incorrect answers, and hallucinations. In a high-stakes MedQA-USMLE case study, we further show that poisoned retrieved evidence can mislead models across scales, leading to incorrect answers, hallucinations, and sycophantic failures. Existing defenses do not close this gap: encrypted routing preserves the exploited ranking, and Byzantine-robust Federated Learning (FL) rules transfer poorly to heterogeneous routing profiles. To address this gap, we propose a trust-aware post-routing framework that reweights clients using returned-evidence feedback, including retrieval relevance, profile consistency, and cross-client agreement; online experiments show that it suppresses persistent hijacking over recurring queries and transfers to a learned neural router. Our findings establish routing integrity as a new security challenge in FedRAG and highlight the need for stronger defenses for secure federated retrieval.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that forged client profiles let attackers hijack routing in federated RAG, producing consistent misrouting and downstream failures that standard defenses miss.

read the letter

The core point is that FedRAG routing trusts unverified client profiles, so a malicious client can forge one to pull in queries it has no relevant data for. This leads to missing evidence, poisoning, wrong answers, and hallucinations, and the paper demonstrates the effect across three routing architectures plus a MedQA-USMLE case study.

What is new is the explicit framing of profile forgery as a targeted routing-stage attack rather than a general poisoning or retrieval issue. The experiments report that the attack succeeds reliably and that encrypted routing plus Byzantine FL rules do not close it. The proposed trust-aware post-routing reweighting, which uses retrieval relevance, profile consistency, and cross-client agreement, is shown to reduce persistent hijacking on recurring queries and to transfer to a learned router.

The evaluation appears to rest on empirical runs rather than formal proofs, and the abstract gives limited detail on query selection, statistical tests, or exact success rates, so the scale of the problem is hard to judge precisely from the summary alone. The mitigation is post hoc, which means the initial hijack still occurs before feedback can correct it.

This is relevant for anyone building or securing federated retrieval systems, especially in high-stakes domains. The central claim is internally consistent with the stated premise about unverified profiles, so the paper deserves a serious referee to check the full methods and defense results.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces Routing Hijacking, an attack in which a malicious client in Federated RAG forges its semantic profile to attract target queries despite holding irrelevant data. It reports that the attack succeeds across three representative FedRAG routing architectures, produces downstream failures (missing evidence, poisoning, incorrect answers, hallucinations), demonstrates these effects in a MedQA-USMLE case study across model scales, shows that encrypted routing and Byzantine-robust FL do not close the gap, and proposes a trust-aware post-routing framework that reweights clients via retrieval relevance, profile consistency, and cross-client agreement; online experiments indicate the framework suppresses persistent hijacking and transfers to learned neural routers.

Significance. If the empirical results hold, the work is significant for establishing routing integrity as a distinct security challenge in FedRAG systems that rely on unverified client profiles to preserve privacy. The cross-architecture evaluation, the high-stakes MedQA case study, and the concrete post-routing mitigation (with online experiments) provide falsifiable evidence and a practical starting point for defenses. The explicit premise that routing trusts voluntarily provided profiles without verification is stated directly and underpins the attack surface analysis.

minor comments (3)

[Abstract] Abstract: the phrase 'sycophantic failures' is used without a brief parenthetical definition or example; adding one would improve accessibility for readers outside the immediate subfield.
[Evaluation] The three representative routing architectures are described at a high level; a short table or paragraph in the evaluation section listing their key differences (e.g., profile representation, ranking function) would aid reproducibility.
[Defense Proposal] The trust-aware framework description would benefit from an explicit equation or pseudocode for the reweighting function that combines the three feedback signals.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, accurate summary of the contributions, and recommendation for minor revision. We appreciate the recognition that routing integrity represents a distinct security challenge in FedRAG.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical security analysis of a routing attack in FedRAG. It states the core premise (routing trusts unverified client semantic profiles) directly in the abstract and introduction, then reports experimental outcomes across three architectures, a MedQA case study, and a proposed mitigation. No equations, fitted parameters, self-definitional reductions, or load-bearing self-citations appear in the provided text. The central claims follow from the stated attack surface and observed results rather than any internal redefinition or renaming of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the domain assumption that client profiles are forgeable and that routing decisions are made without external verification of data quality.

axioms (1)

domain assumption Routing decisions in FedRAG are made exclusively from client-provided semantic profiles
Stated in abstract as the reason the attack is possible.

invented entities (1)

Routing Hijacking attack no independent evidence
purpose: Demonstrate targeted misrouting via profile forgery
New attack concept introduced to explain the vulnerability.

pith-pipeline@v0.9.1-grok · 5776 in / 1181 out tokens · 25491 ms · 2026-06-29T11:35:46.347474+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 3 canonical work pages

[1]

Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun

Federated retrieval-augmented generation: A systematic mapping study.arXiv preprint arXiv:2505.18906. Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun

work page arXiv
[2]

InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 38, pages 17754–17762

Benchmarking large language models in retrieval-augmented generation. InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 38, pages 17754–17762. Jung Hee Cheon, Andrey Kim, Miran Kim, and Yong- soo Song. 2017. Homomorphic encryption for arith- metic of approximate numbers. InInternational con- ference on the theory and application of...

2017
[3]

Flax Sentence Embeddings Team

The faiss library. Flax Sentence Embeddings Team
[4]

Unic-rag: Universal knowledge corruption attacks to retrieval-augmented generation.arXiv preprint arXiv:2508.18652, 2025

Stack exchange question pairs. https://huggingface.co/datasets/flax-sentence- embeddings/. Runpeng Geng, Yanting Wang, Ying Chen, and Jinyuan Jia. 2025. Unic-rag: Universal knowledge corrup- tion attacks to retrieval-augmented generation.arXiv preprint arXiv:2508.18652. Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, A...

work page arXiv 2025
[5]

arXiv preprint arXiv:2006.09365 , year=

Byzantine-robust learning on heteroge- neous datasets via bucketing.arXiv preprint arXiv:2006.09365. Yubin Kim, Hyewon Jeong, Shan Chen, Shuyue Stella Li, Chanwoo Park, Mingyu Lu, Kumail Alhamoud, Jimin Mun, Cristina Grau, Minseok Jung, and 1 oth- ers. 2025. Medical hallucinations in foundation mod- els and their impact on healthcare.arXiv preprint arXiv:...

work page arXiv 2006

[1] [1]

Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun

Federated retrieval-augmented generation: A systematic mapping study.arXiv preprint arXiv:2505.18906. Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun

work page arXiv

[2] [2]

InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 38, pages 17754–17762

Benchmarking large language models in retrieval-augmented generation. InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 38, pages 17754–17762. Jung Hee Cheon, Andrey Kim, Miran Kim, and Yong- soo Song. 2017. Homomorphic encryption for arith- metic of approximate numbers. InInternational con- ference on the theory and application of...

2017

[3] [3]

Flax Sentence Embeddings Team

The faiss library. Flax Sentence Embeddings Team

[4] [4]

Unic-rag: Universal knowledge corruption attacks to retrieval-augmented generation.arXiv preprint arXiv:2508.18652, 2025

Stack exchange question pairs. https://huggingface.co/datasets/flax-sentence- embeddings/. Runpeng Geng, Yanting Wang, Ying Chen, and Jinyuan Jia. 2025. Unic-rag: Universal knowledge corrup- tion attacks to retrieval-augmented generation.arXiv preprint arXiv:2508.18652. Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, A...

work page arXiv 2025

[5] [5]

arXiv preprint arXiv:2006.09365 , year=

Byzantine-robust learning on heteroge- neous datasets via bucketing.arXiv preprint arXiv:2006.09365. Yubin Kim, Hyewon Jeong, Shan Chen, Shuyue Stella Li, Chanwoo Park, Mingyu Lu, Kumail Alhamoud, Jimin Mun, Cristina Grau, Minseok Jung, and 1 oth- ers. 2025. Medical hallucinations in foundation mod- els and their impact on healthcare.arXiv preprint arXiv:...

work page arXiv 2006