arxiv: 2604.22849 · v1 · submitted 2026-04-22 · 💻 cs.IR

Recognition: unknown

R³AG: Retriever Routing for Retrieval-Augmented Generation

Tong Zhao , Yutao Zhu , Yucheng Tian , Zhicheng Dou

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:47 UTC · model grok-4.3

classification 💻 cs.IR

keywords retrieval-augmented generationretriever routingdynamic routingcontrastive learningRAGknowledge-intensive tasksquery-retriever alignment

0 comments

The pith

R³AG routes each query to the retriever whose documents best support correct answer generation, not just semantic relevance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that standard RAG suffers from using one retriever for all queries or static routing based only on relevance. R³AG instead learns to route queries dynamically by modeling two aspects of each retriever: how well it retrieves relevant documents and how useful those documents are for the generator to produce right answers. It trains this router with contrastive learning that combines signals from document quality judgments and whether the final answer is correct. Experiments show this approach beats both the strongest single retriever and previous routing techniques across multiple knowledge-intensive tasks.

Core claim

R³AG is a novel routing framework that explicitly models the dynamic alignment between queries and retriever capabilities. Unlike prior methods assuming static single capability, it decomposes retriever capability into two learnable dimensions—retrieval quality and generation utility—and uses a contrastive learning objective leveraging complementary supervision signals from document assessments and downstream answer correctness to capture query-specific preference shifts.

What carries the argument

Two learnable dimensions of retriever capability (retrieval quality and generation utility) trained via contrastive learning on complementary supervision signals.

If this is right

The router can select different retrievers for different queries based on their specific needs.
Performance improves on knowledge-intensive tasks like question answering.
It avoids the one-size-fits-all problem by considering both relevance and utility for generation.
It outperforms static routing methods on several tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This two-dimensional view might apply to other selection tasks where relevance alone is insufficient.
Future systems could incorporate more dimensions like latency or cost into the routing.
It highlights the need for supervision that includes end-task performance rather than just retrieval metrics.

Load-bearing premise

That modeling retriever capability with only retrieval quality and generation utility dimensions, trained contrastively on available supervision, is enough to learn accurate dynamic alignments without overfitting or requiring impractical amounts of data.

What would settle it

A test where R³AG's performance does not exceed that of the best fixed retriever or a relevance-only router on a new set of queries and tasks, or where ablating the generation utility dimension yields equivalent results.

Figures

Figures reproduced from arXiv: 2604.22849 by Tong Zhao, Yucheng Tian, Yutao Zhu, Zhicheng Dou.

**Figure 2.** Figure 2: R 3AG models retriever capability from retrieval quality and generation utility, and applies query-adaptive multi-head attention to select the most suitable retriever for each query. This process comprises three key components: capability representation, capability fusion, and the final routing decision. Capability Representation We construct a representational space where both queries and retrievers are… view at source ↗

**Figure 3.** Figure 3: (a) Retrieval Quality supervision provides [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Sensitivity analysis of the β and γ weighting factors in training the Generation Utility Encoder. approaches that do not explicitly capture retriever capability, including both retrieval quality and generation utility, and that are unable to adapt to the evolving interaction between the retriever and the query. Moreover, R 3AG also surpasses rule-based strategies such as Random (36.38 and 46.07) and Oracl… view at source ↗

**Figure 5.** Figure 5: Mirrored horizontal bars show EM (left) and F1 (right) on HotpotQA, Natural Questions, and TriviaQA. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

read the original abstract

Retrieval-augmented generation (RAG) has become a cornerstone for knowledge-intensive tasks. However, the efficacy of RAG is often bottlenecked by the ``one-size-fits-all'' retrieval paradigm, as different queries exhibit distinct preferences for different retrievers. While recent routing techniques attempt to select the optimal retriever dynamically, they typically operate under a ``single and static capability'' assumption, selecting retrievers solely based on semantic relevance. This overlooks a critical distinction in RAG: a retrieved document must not only be relevant but also effectively support the generator in producing correct answers. To address this limitation, we propose R$^3$AG, a novel routing framework that explicitly models the dynamic alignment between queries and retriever capabilities. Unlike previous approaches, R$^3$AG decomposes retriever capability into two learnable dimensions: retrieval quality and generation utility. We employ a contrastive learning objective that leverages complementary supervision signals, \textit{i.e.}, document assessments and downstream answer correctness, to capture query-specific preference shifts. Extensive experiments on several knowledge-intensive tasks show that R$^3$AG consistently outperforms both the best individual retrievers and state-of-the-art static routing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes R³AG, a retriever routing framework for retrieval-augmented generation (RAG) that addresses the limitations of one-size-fits-all or static routing by decomposing retriever capability into two learnable dimensions—retrieval quality and generation utility—and training via a contrastive learning objective that incorporates complementary supervision from document assessments and downstream answer correctness. This is intended to capture query-specific preference shifts, with the central claim being consistent outperformance over the best individual retrievers and state-of-the-art static routing methods across several knowledge-intensive tasks.

Significance. If the results hold under scrutiny, R³AG offers a meaningful step forward in RAG by explicitly separating relevance from generative utility in routing decisions, which could improve answer quality on tasks where retriever choice matters. The contrastive objective leveraging two supervision signals is a clear strength over purely semantic approaches, and the framework's focus on dynamic alignment has potential for broader applicability if it proves robust to varying data regimes.

major comments (2)

[Experiments] The central claim of consistent outperformance and effective dynamic alignment rests on the assumption that the two-dimensional decomposition plus contrastive objective suffices without large amounts of labeled data or overfitting. However, the experimental evaluation does not include ablations varying the volume of supervision signals (document assessments and answer correctness labels) or tests for generalization to unseen query distributions, leaving the load-bearing assumption unverified.
[Method] The method description indicates reliance on external supervision for training the router, but provides no analysis of how these signals are obtained at scale or their cost, which directly impacts whether the dynamic routing advantage can be realized in practice without reducing to an expensive static alternative.

minor comments (2)

[Method] Notation for the two learnable dimensions (retrieval quality and generation utility) should be defined more explicitly with equations to avoid ambiguity in how they are combined during inference.
[Introduction] The abstract and introduction would benefit from a brief comparison table summarizing how R³AG differs from prior routing methods in terms of supervision requirements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, providing clarifications from the existing work where possible and indicating revisions to strengthen the paper.

read point-by-point responses

Referee: [Experiments] The central claim of consistent outperformance and effective dynamic alignment rests on the assumption that the two-dimensional decomposition plus contrastive objective suffices without large amounts of labeled data or overfitting. However, the experimental evaluation does not include ablations varying the volume of supervision signals (document assessments and answer correctness labels) or tests for generalization to unseen query distributions, leaving the load-bearing assumption unverified.

Authors: We appreciate the referee's point that explicit verification of robustness to supervision volume and generalization would better support the central claim. Our original experiments evaluate R³AG across multiple knowledge-intensive tasks with differing characteristics and data regimes, which provides evidence that the contrastive objective with two-dimensional decomposition generalizes without requiring excessive labels. However, we agree that dedicated ablations were not presented. In the revised manuscript we will add (i) controlled ablations that vary the fraction of document assessments and answer correctness labels available for router training and (ii) cross-domain generalization tests that hold out entire query distributions during router training. These additions will directly test the load-bearing assumptions while remaining consistent with the paper's focus on efficient dynamic routing. revision: yes
Referee: [Method] The method description indicates reliance on external supervision for training the router, but provides no analysis of how these signals are obtained at scale or their cost, which directly impacts whether the dynamic routing advantage can be realized in practice without reducing to an expensive static alternative.

Authors: The referee correctly notes that practical deployment considerations around supervision acquisition were not analyzed in the original submission. We will revise the method section to include a dedicated paragraph describing signal acquisition: document assessments can be obtained via LLM-as-a-judge pipelines or limited human annotation on representative query-document pairs, while answer correctness labels are derived directly from the ground-truth answers already present in standard benchmarks. We will also add a brief cost discussion clarifying that supervision collection is a one-time offline cost for router training; once trained, the router performs efficient inference without per-query overhead, preserving the dynamic advantage over static baselines. This revision will make the practical feasibility explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces R³AG by explicitly decomposing retriever capability into two learnable dimensions (retrieval quality and generation utility) and training via a contrastive objective on external supervision signals (document assessments and downstream answer correctness). These signals are independent of the model's internal fitted parameters and are not defined in terms of the routing decisions themselves. No equations or claims reduce the central prediction to a self-referential fit, renamed known result, or load-bearing self-citation chain. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract provides no explicit free parameters, invented entities, or detailed axioms; the framework rests on standard machine-learning assumptions about contrastive learning and the meaningfulness of the two capability dimensions.

axioms (2)

domain assumption Retriever capabilities can be meaningfully decomposed into retrieval quality and generation utility dimensions
Central modeling choice stated in the abstract as the basis for the routing framework.
domain assumption Contrastive learning can effectively align queries with retrievers using complementary signals from document assessments and answer correctness
Invoked as the training objective to capture query-specific preference shifts.

pith-pipeline@v0.9.0 · 5512 in / 1368 out tokens · 41833 ms · 2026-05-09T23:47:30.051821+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 33 canonical work pages · 7 internal anchors

[1]

Advances in neural information processing systems , volume=

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=
[2]

Retrieval-Augmented Generation for Large Language Models: A Survey

Retrieval-augmented generation for large language models: A survey , author=. arXiv preprint arXiv:2312.10997 , volume=

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Companion Proceedings of the ACM Web Conference 2024 , pages=

Large language model based long-tail query rewriting in taobao search , author=. Companion Proceedings of the ACM Web Conference 2024 , pages=

2024
[4]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Precise zero-shot dense retrieval without relevance labels , author=. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[5]

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

Efficient document re-ranking for transformers by precomputing term representations , author=. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
[6]

In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (Dec 2023).https://doi.org/ 10.18653/v1/2023.emnlp-main.391

Yucheng Li and Bo Dong and Frank Guerin and Chenghua Lin , editor =. Compressing Context to Enhance Inference Efficiency of Large Language Models , booktitle =. 2023 , url =. doi:10.18653/V1/2023.EMNLP-MAIN.391 , timestamp =

work page doi:10.18653/v1/2023.emnlp-main.391 2023
[7]

Forty-second International Conference on Machine Learning,

Kuan Li and Liwen Zhang and Yong Jiang and Pengjun Xie and Fei Huang and Shuai Wang and Minhao Cheng , title =. Forty-second International Conference on Machine Learning,. 2025 , url =

2025
[8]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

Query rewriting in retrieval-augmented large language models , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

2023
[9]

CoRR , volume =

Guohang Yan and Yue Zhang and Pinlong Cai and Ding Wang and Song Mao and Hongwei Zhang and Yaoze Zhang and Hairong Zhang and Xinyu Cai and Botian Shi , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2509.21336 , eprinttype =. 2509.21336 , timestamp =

work page doi:10.48550/arxiv.2509.21336 2025
[10]

arXiv preprint arXiv:2502.18139 (2025),https://arxiv.org/abs/2502.18139

Zhuocheng Zhang and Yang Feng and Min Zhang , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2502.18139 , eprinttype =. 2502.18139 , timestamp =

work page doi:10.48550/arxiv.2502.18139 2025
[11]

CoRR , volume =

Jiarui Zhang and Xiangyu Liu and Yong Hu and Chaoyue Niu and Fan Wu and Guihai Chen , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2505.23052 , eprinttype =. 2505.23052 , timestamp =

work page doi:10.48550/arxiv.2505.23052 2025
[12]

RouterRetriever: Routing over a Mixture of Expert Embedding Models , booktitle =

Hyunji Lee and Luca Soldaini and Arman Cohan and Minjoon Seo and Kyle Lo , editor =. RouterRetriever: Routing over a Mixture of Expert Embedding Models , booktitle =. 2025 , url =. doi:10.1609/AAAI.V39I11.33306 , timestamp =

work page doi:10.1609/aaai.v39i11.33306 2025
[13]

Unchosen experts can contribute too: Unleashing moe models’ power by self-contrast.Advances in Neural Information Processing Systems, 37:136897–136921, 2024a

Sainbayar Sukhbaatar and Olga Golovneva and Vasu Sharma and Hu Xu and Xi Victoria Lin and Baptiste Rozi. Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2403.07816 , eprinttype =. 2403.07816 , timestamp =

work page doi:10.48550/arxiv.2403.07816 2024
[14]

Forty-first International Conference on Machine Learning,

Mohammed Muqeeth and Haokun Liu and Yufan Liu and Colin Raffel , title =. Forty-first International Conference on Machine Learning,. 2024 , url =

2024
[15]

The Twelfth International Conference on Learning Representations,

Shangbin Feng and Weijia Shi and Yuyang Bai and Vidhisha Balachandran and Tianxing He and Yulia Tsvetkov , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

2024
[16]

6th Artificial Intelligence and Cloud Computing Conference,

Joshua Belofsky , title =. 6th Artificial Intelligence and Cloud Computing Conference,. 2023 , url =. doi:10.1145/3639592.3639615 , timestamp =

work page doi:10.1145/3639592.3639615 2023
[17]

Learning to Decode Collaboratively with Multiple Language Models

Zejiang Shen and Hunter Lang and Bailin Wang and Yoon Kim and David A. Sontag , editor =. Learning to Decode Collaboratively with Multiple Language Models , booktitle =. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.701 , timestamp =

work page doi:10.18653/v1/2024.acl-long.701 2024
[18]

When not to trust language models: Investigating effectiveness of parametric and non-parametric memories

Alex Mallen and Akari Asai and Victor Zhong and Rajarshi Das and Daniel Khashabi and Hannaneh Hajishirzi , editor =. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories , booktitle =. 2023 , url =. doi:10.18653/V1/2023.ACL-LONG.546 , timestamp =

work page doi:10.18653/v1/2023.acl-long.546 2023
[19]

Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity

Soyeong Jeong and Jinheon Baek and Sukmin Cho and Sung Ju Hwang and Jong Park , editor =. Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity , booktitle =. 2024 , url =. doi:10.18653/V1/2024.NAACL-LONG.389 , timestamp =

work page doi:10.18653/v1/2024.naacl-long.389 2024
[20]

Decomposing Complex Queries for Tip-of-the-tongue Retrieval , booktitle =

Kevin Lin and Kyle Lo and Joseph Gonzalez and Dan Klein , editor =. Decomposing Complex Queries for Tip-of-the-tongue Retrieval , booktitle =. 2023 , url =. doi:10.18653/V1/2023.FINDINGS-EMNLP.367 , timestamp =

work page doi:10.18653/v1/2023.findings-emnlp.367 2023
[21]

Hinton , title =

Ting Chen and Simon Kornblith and Mohammad Norouzi and Geoffrey E. Hinton , title =. Proceedings of the 37th International Conference on Machine Learning,. 2020 , url =

2020
[22]

Light-weight Calibrator:

Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross B. Girshick , title =. 2020. 2020 , url =. doi:10.1109/CVPR42600.2020.00975 , timestamp =

work page doi:10.1109/cvpr42600.2020.00975 2020
[23]

Lost in the Middle: How Language Models Use Long Contexts

Tom Kwiatkowski and Jennimaria Palomaki and Olivia Redfield and Michael Collins and Ankur P. Parikh and Chris Alberti and Danielle Epstein and Illia Polosukhin and Jacob Devlin and Kenton Lee and Kristina Toutanova and Llion Jones and Matthew Kelcey and Ming. Natural Questions: a Benchmark for Question Answering Research , journal =. 2019 , url =. doi:10....

work page internal anchor Pith review doi:10.1162/tacl 2019
[24]

T rivia QA : A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Mandar Joshi and Eunsol Choi and Daniel S. Weld and Luke Zettlemoyer , editor =. TriviaQA:. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics,. 2017 , url =. doi:10.18653/V1/P17-1147 , timestamp =

work page doi:10.18653/v1/p17-1147 2017
[25]

, booktitle =

Zhilin Yang and Peng Qi and Saizheng Zhang and Yoshua Bengio and William W. Cohen and Ruslan Salakhutdinov and Christopher D. Manning , editor =. HotpotQA:. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018 , pages =. 2018 , url =. doi:10.18653/V1/D18-1259 , timestamp =

work page doi:10.18653/v1/d18-1259 2018
[26]

In: Long, G., Blumestein, M., Chang, Y., Lewin-Eytan, L., Huang, Z.H., Yom-Tov, E

Jiajie Jin and Yutao Zhu and Zhicheng Dou and Guanting Dong and Xinyu Yang and Chenghao Zhang and Tong Zhao and Zhao Yang and Ji. FlashRAG:. Companion Proceedings of the. 2025 , url =. doi:10.1145/3701716.3715313 , timestamp =

work page doi:10.1145/3701716.3715313 2025
[27]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin and Barlas Oguz and Sewon Min and Patrick Lewis and Ledell Wu and Sergey Edunov and Danqi Chen and Wen. Dense Passage Retrieval for Open-Domain Question Answering , booktitle =. 2020 , url =. doi:10.18653/V1/2020.EMNLP-MAIN.550 , timestamp =

work page doi:10.18653/v1/2020.emnlp-main.550 2020
[28]

The Llama 3 Herd of Models

Llama Team , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2407.21783 , eprinttype =. 2407.21783 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783 2024
[29]

Qwen3 Technical Report

An Yang and Anfeng Li and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Gao and Chengen Huang and Chenxu Lv and Chujie Zheng and Dayiheng Liu and Fan Zhou and Fei Huang and Feng Hu and Hao Ge and Haoran Wei and Huan Lin and Jialong Tang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jian Yang and Jiaxi Yang and Ji...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.09388 2025
[30]

Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In , booktitle =

Zichun Yu and Chenyan Xiong and Shi Yu and Zhiyuan Liu , editor =. Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In , booktitle =. 2023 , url =. doi:10.18653/V1/2023.ACL-LONG.136 , timestamp =

work page doi:10.18653/v1/2023.acl-long.136 2023
[31]

Active retrieval augmented generation

Zhengbao Jiang and Frank F. Xu and Luyu Gao and Zhiqing Sun and Qian Liu and Jane Dwivedi. Active Retrieval Augmented Generation , booktitle =. 2023 , url =. doi:10.18653/V1/2023.EMNLP-MAIN.495 , timestamp =

work page doi:10.18653/v1/2023.emnlp-main.495 2023
[32]

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

Harsh Trivedi and Niranjan Balasubramanian and Tushar Khot and Ashish Sabharwal , editor =. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions , booktitle =. 2023 , url =. doi:10.18653/V1/2023.ACL-LONG.557 , timestamp =

work page doi:10.18653/v1/2023.acl-long.557 2023
[33]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Yanzhao Zhang and Mingxin Li and Dingkun Long and Xin Zhang and Huan Lin and Baosong Yang and Pengjun Xie and An Yang and Dayiheng Liu and Junyang Lin and Fei Huang and Jingren Zhou , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2506.05176 , eprinttype =. 2506.05176 , timestamp =

work page internal anchor Pith review doi:10.48550/arxiv.2506.05176 2025
[34]

7th International Conference on Learning Representations,

Ilya Loshchilov and Frank Hutter , title =. 7th International Conference on Learning Representations,. 2019 , url =

2019
[35]

Tom B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwal and Ariel Herbert. Language Models are Few-Shot Learners , booktitle =. 2020 , url =

2020
[36]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

S. Sparks of Artificial General Intelligence: Early experiments with. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2303.12712 , eprinttype =. 2303.12712 , timestamp =

work page internal anchor Pith review doi:10.48550/arxiv.2303.12712 2023
[37]

A Survey of Large Language Models

Wayne Xin Zhao and Kun Zhou and Junyi Li and Tianyi Tang and Xiaolei Wang and Yupeng Hou and Yingqian Min and Beichen Zhang and Junjie Zhang and Zican Dong and Yifan Du and Chen Yang and Yushuo Chen and Zhipeng Chen and Jinhao Jiang and Ruiyang Ren and Yifan Li and Xinyu Tang and Zikang Liu and Peiyu Liu and Jian. A Survey of Large Language Models , journ...

work page Pith review doi:10.48550/arxiv.2303.18223 2023
[38]

Gomez and Lukasz Kaiser and Illia Polosukhin , editor =

Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin , editor =. Attention is All you Need , booktitle =. 2017 , url =

2017
[39]

2018 , publisher=

Improving language understanding by generative pre-training , author=. 2018 , publisher=

2018
[40]

ACM computing surveys , volume=

Survey of hallucination in natural language generation , author=. ACM computing surveys , volume=. 2023 , publisher=

2023
[41]

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qiang- long Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al

Hangfeng He and Hongming Zhang and Dan Roth , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2301.00303 , eprinttype =. 2301.00303 , timestamp =

work page doi:10.48550/arxiv.2301.00303 2023
[42]

2024 , archivePrefix=

GPT-4 Technical Report , author=. 2024 , archivePrefix=

2024
[43]

Retrieval-Augmented Generation for Knowledge-Intensive

Patrick Lewis and Ethan Perez and Aleksandra Piktus and Fabio Petroni and Vladimir Karpukhin and Naman Goyal and Heinrich K. Retrieval-Augmented Generation for Knowledge-Intensive. Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual , year =

2020
[44]

Retrieval Augmented Language Model Pre-Training , booktitle =

Kelvin Guu and Kenton Lee and Zora Tung and Panupong Pasupat and Ming. Retrieval Augmented Language Model Pre-Training , booktitle =. 2020 , url =

2020
[45]

Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval , pages=

Recent advances in retrieval-augmented text generation , author=. Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval , pages=
[46]

Leveraging passage retrieval with generative models for open domain question answering

Gautier Izacard and Edouard Grave , editor =. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , booktitle =. 2021 , url =. doi:10.18653/V1/2021.EACL-MAIN.74 , timestamp =

work page doi:10.18653/v1/2021.eacl-main.74 2021
[47]

Advances in Neural Information Processing Systems , volume=

End-to-end training of multi-document reader and retriever for open-domain question answering , author=. Advances in Neural Information Processing Systems , volume=
[48]

2025 , url =

Mahd Hindi and Linda Mohammed and Ommama Maaz and Abdulmalik Alwarafy , title =. 2025 , url =. doi:10.1109/ACCESS.2025.3550145 , timestamp =

work page doi:10.1109/access.2025.3550145 2025
[49]

REPLUG: retrieval-augmented black-box language models

Weijia Shi and Sewon Min and Michihiro Yasunaga and Minjoon Seo and Richard James and Mike Lewis and Luke Zettlemoyer and Wen. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers),. 2024 , url =. doi:10.18653/V1/2024.NAACL-LONG.463 , timestamp =

work page doi:10.18653/v1/2024.naacl-long.463 2024
[50]

LTRR: Learning To Rank Retrievers for LLMs

To Eun Kim and Fernando Diaz , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2506.13743 , eprinttype =. 2506.13743 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.13743 2025
[51]

2025 , eprint=

Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning , author=. 2025 , eprint=

2025
[52]

IEEE Transactions on Computational Social Systems , year=

Large-Scale Data Retrieve in Road Networks With Optimized k-Retriever Routing , author=. IEEE Transactions on Computational Social Systems , year=
[53]

In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Fabio Petroni and Tim Rockt. Language Models as Knowledge Bases? , booktitle =. 2019 , url =. doi:10.18653/V1/D19-1250 , timestamp =

work page doi:10.18653/v1/d19-1250 2019
[54]

InProceedings of the 29th Symposium on Operating Systems Principles(Koblenz, Germany)(SOSP ’23)

Woosuk Kwon and Zhuohan Li and Siyuan Zhuang and Ying Sheng and Lianmin Zheng and Cody Hao Yu and Joseph Gonzalez and Hao Zhang and Ion Stoica , editor =. Efficient Memory Management for Large Language Model Serving with PagedAttention , booktitle =. 2023 , url =. doi:10.1145/3600006.3613165 , timestamp =

work page doi:10.1145/3600006.3613165 2023