A Wolf in Sheep's Clothing: Targeted Routing Hijacking in Federated RAG
Pith reviewed 2026-06-29 11:35 UTC · model grok-4.3
The pith
Malicious clients forge semantic profiles to hijack routing in FedRAG, misdirecting queries and triggering hallucinations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Routing Hijacking is a routing-stage attack in which a malicious client forges its profile to attract target queries despite having irrelevant underlying data. This vulnerability is severe. Across three representative FedRAG routing architectures, Routing Hijacking consistently misroutes target queries and leads to downstream disruptions and failures, including missing evidence, poisoning, incorrect answers, and hallucinations. In a high-stakes MedQA-USMLE case study, poisoned retrieved evidence misleads models across scales, leading to incorrect answers, hallucinations, and sycophantic failures. Existing defenses do not close this gap: encrypted routing preserves the exploited ranking, and
What carries the argument
Routing Hijacking attack that exploits unverified, client-provided semantic profiles to manipulate query routing in FedRAG.
If this is right
- Misrouting produces concrete downstream failures such as missing evidence and hallucinations.
- The attack succeeds against three representative FedRAG routing architectures.
- Poisoned evidence from hijacked routes misleads models on medical QA tasks across model scales.
- Encrypted routing and Byzantine-robust FL rules leave the routing vulnerability intact.
- A trust-aware post-routing framework using relevance, consistency, and agreement feedback can suppress persistent hijacking and transfer to neural routers.
Where Pith is reading between the lines
- Systems that select data sources using only self-reported metadata may share similar hijacking risks beyond FedRAG.
- Independent verification of client data relevance could be tested as a direct countermeasure.
- The feedback-based reweighting approach might extend to other federated selection problems where profile accuracy is hard to audit upfront.
Load-bearing premise
The routing mechanism trusts and ranks clients based solely on the semantic profiles they voluntarily provide, without independent verification of profile accuracy or data relevance.
What would settle it
A routing implementation that rejects forged profiles by cross-checking returned evidence against the claimed profile or by requiring proof of data relevance would show the attack does not succeed.
Figures
read the original abstract
Federated Retrieval-Augmented Generation (FedRAG) is attractive for privacy-sensitive applications because raw data remain local. As a result, routing must rely on client-provided semantic profiles, creating a new opportunity for manipulation. We introduce Routing Hijacking, a routing-stage attack in which a malicious client forges its profile to attract target queries despite having irrelevant underlying data. We show that this vulnerability is severe. Across three representative FedRAG routing architectures, Routing Hijacking consistently misroutes target queries and leads to downstream disruptions and failures, including missing evidence, poisoning, incorrect answers, and hallucinations. In a high-stakes MedQA-USMLE case study, we further show that poisoned retrieved evidence can mislead models across scales, leading to incorrect answers, hallucinations, and sycophantic failures. Existing defenses do not close this gap: encrypted routing preserves the exploited ranking, and Byzantine-robust Federated Learning (FL) rules transfer poorly to heterogeneous routing profiles. To address this gap, we propose a trust-aware post-routing framework that reweights clients using returned-evidence feedback, including retrieval relevance, profile consistency, and cross-client agreement; online experiments show that it suppresses persistent hijacking over recurring queries and transfers to a learned neural router. Our findings establish routing integrity as a new security challenge in FedRAG and highlight the need for stronger defenses for secure federated retrieval.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Routing Hijacking, an attack in which a malicious client in Federated RAG forges its semantic profile to attract target queries despite holding irrelevant data. It reports that the attack succeeds across three representative FedRAG routing architectures, produces downstream failures (missing evidence, poisoning, incorrect answers, hallucinations), demonstrates these effects in a MedQA-USMLE case study across model scales, shows that encrypted routing and Byzantine-robust FL do not close the gap, and proposes a trust-aware post-routing framework that reweights clients via retrieval relevance, profile consistency, and cross-client agreement; online experiments indicate the framework suppresses persistent hijacking and transfers to learned neural routers.
Significance. If the empirical results hold, the work is significant for establishing routing integrity as a distinct security challenge in FedRAG systems that rely on unverified client profiles to preserve privacy. The cross-architecture evaluation, the high-stakes MedQA case study, and the concrete post-routing mitigation (with online experiments) provide falsifiable evidence and a practical starting point for defenses. The explicit premise that routing trusts voluntarily provided profiles without verification is stated directly and underpins the attack surface analysis.
minor comments (3)
- [Abstract] Abstract: the phrase 'sycophantic failures' is used without a brief parenthetical definition or example; adding one would improve accessibility for readers outside the immediate subfield.
- [Evaluation] The three representative routing architectures are described at a high level; a short table or paragraph in the evaluation section listing their key differences (e.g., profile representation, ranking function) would aid reproducibility.
- [Defense Proposal] The trust-aware framework description would benefit from an explicit equation or pseudocode for the reweighting function that combines the three feedback signals.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript, accurate summary of the contributions, and recommendation for minor revision. We appreciate the recognition that routing integrity represents a distinct security challenge in FedRAG.
Circularity Check
No significant circularity
full rationale
The paper is an empirical security analysis of a routing attack in FedRAG. It states the core premise (routing trusts unverified client semantic profiles) directly in the abstract and introduction, then reports experimental outcomes across three architectures, a MedQA case study, and a proposed mitigation. No equations, fitted parameters, self-definitional reductions, or load-bearing self-citations appear in the provided text. The central claims follow from the stated attack surface and observed results rather than any internal redefinition or renaming of inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Routing decisions in FedRAG are made exclusively from client-provided semantic profiles
invented entities (1)
-
Routing Hijacking attack
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun
Federated retrieval-augmented generation: A systematic mapping study.arXiv preprint arXiv:2505.18906. Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun
-
[2]
InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 38, pages 17754–17762
Benchmarking large language models in retrieval-augmented generation. InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 38, pages 17754–17762. Jung Hee Cheon, Andrey Kim, Miran Kim, and Yong- soo Song. 2017. Homomorphic encryption for arith- metic of approximate numbers. InInternational con- ference on the theory and application of...
2017
-
[3]
Flax Sentence Embeddings Team
The faiss library. Flax Sentence Embeddings Team
-
[4]
Stack exchange question pairs. https://huggingface.co/datasets/flax-sentence- embeddings/. Runpeng Geng, Yanting Wang, Ying Chen, and Jinyuan Jia. 2025. Unic-rag: Universal knowledge corrup- tion attacks to retrieval-augmented generation.arXiv preprint arXiv:2508.18652. Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, A...
-
[5]
arXiv preprint arXiv:2006.09365 , year=
Byzantine-robust learning on heteroge- neous datasets via bucketing.arXiv preprint arXiv:2006.09365. Yubin Kim, Hyewon Jeong, Shan Chen, Shuyue Stella Li, Chanwoo Park, Mingyu Lu, Kumail Alhamoud, Jimin Mun, Cristina Grau, Minseok Jung, and 1 oth- ers. 2025. Medical hallucinations in foundation mod- els and their impact on healthcare.arXiv preprint arXiv:...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.