Recognition: unknown
A Survey of Reasoning-Intensive Retrieval: Progress and Challenges
Pith reviewed 2026-05-09 20:49 UTC · model grok-4.3
The pith
Reasoning-intensive retrieval organizes around benchmarks by domain and a taxonomy of how reasoning enters the retrieval pipeline.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that Reasoning-Intensive Retrieval efforts, which incorporate large-language-model reasoning into retrieval to handle inferential relevance, can be made coherent by grouping benchmarks by domain and modality and by classifying methods according to where and how reasoning is inserted into the pipeline, thereby supplying a usable roadmap.
What carries the argument
The structured taxonomy that places methods into categories based on where and how reasoning is integrated into the retrieval pipeline.
If this is right
- Developers can select or design methods according to the stage of reasoning integration that best matches their accuracy and efficiency needs.
- Benchmark creators can target gaps in specific domains or modalities identified by the systematization.
- Comparisons across papers become possible once methods share the same taxonomy labels.
- Research priorities can focus on the challenges and directions the survey lists as most pressing.
Where Pith is reading between the lines
- The taxonomy may need new branches once methods begin combining reasoning at multiple pipeline stages simultaneously.
- Extending the same grouping to non-text modalities such as images or tables could expose whether current categories generalize.
- If the taxonomy proves stable, it could serve as the basis for standardized evaluation protocols that measure inferential reasoning quality directly.
Load-bearing premise
That the authors' selection and categorization of the literature is sufficiently complete and unbiased to serve as a reliable roadmap for the field.
What would settle it
A substantial new benchmark or method that cannot be placed in any of the taxonomy categories or domain-modality groups would show the roadmap is incomplete.
Figures
read the original abstract
Reasoning-Intensive Retrieval (RIR) targets retrieval settings where relevance is mediated by latent inferential links between a query and supporting evidence, rather than semantic similarity. Motivated by the emergent reasoning abilities of Large Language Models (LLMs), recent work integrates these capabilities into the IR field, spanning the entire pipeline from benchmarks to retrievers and rerankers. Despite this progress, the field lacks a systematic framework to organize current efforts and articulate a clear path forward. To provide a clear roadmap for this rapidly growing yet fragmented area, this survey (1) systematizes existing RIR benchmarks by knowledge domains and modalities, providing a detailed analysis of the current landscape; (2) introduces a structured taxonomy that categorizes methods based on where and how reasoning is integrated into the retrieval pipeline, alongside an analysis of their trade-offs and practical applications; and (3) summarizes challenges and future directions to guide research in this evolving field.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript surveys Reasoning-Intensive Retrieval (RIR), where relevance depends on latent inferential links rather than direct semantic similarity, often leveraging LLMs. It (1) systematizes existing benchmarks by knowledge domains and modalities, (2) introduces a taxonomy categorizing methods by the location and manner of reasoning integration in the retrieval pipeline, and (3) summarizes challenges and future directions to guide the field.
Significance. If the benchmark systematization proves representative and the taxonomy is shown to be both natural and consistently applied, the survey could serve as a useful organizing framework for an emerging, fragmented subfield. It would help researchers identify gaps in reasoning-enhanced IR and accelerate work on LLM-augmented retrievers and rerankers.
major comments (2)
- [§2] §2 (Benchmarks): No literature search protocol, queried databases, date cutoff, or explicit inclusion/exclusion criteria are described for selecting the surveyed benchmarks. This omission directly undermines the claim that the systematization by domains and modalities provides a reliable landscape analysis.
- [§3] §3 (Taxonomy): The taxonomy categories (e.g., reasoning at query rewriting, retrieval, or reranking stages) are presented without justification of how they were derived, how boundary cases were handled, or any validation (such as application to a held-out set of papers). This makes it difficult to evaluate whether the taxonomy reflects genuine divisions or post-hoc grouping, which is load-bearing for the roadmap value asserted in the abstract.
minor comments (2)
- [Abstract] The abstract would be clearer if it included one concrete example distinguishing RIR from standard semantic retrieval.
- A summary table listing all discussed benchmarks with columns for domain, modality, size, and reasoning requirements would improve accessibility and allow readers to quickly assess coverage.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments, which help clarify how to improve the rigor of our survey. We address each major comment below and will revise the manuscript to incorporate the suggested enhancements.
read point-by-point responses
-
Referee: [§2] §2 (Benchmarks): No literature search protocol, queried databases, date cutoff, or explicit inclusion/exclusion criteria are described for selecting the surveyed benchmarks. This omission directly undermines the claim that the systematization by domains and modalities provides a reliable landscape analysis.
Authors: We agree that an explicit literature search protocol would strengthen the reproducibility and credibility of the benchmark systematization. While the benchmarks were identified through a comprehensive review of recent publications in top IR venues, arXiv, and related workshops (covering works up to early 2024), the original manuscript did not document the process in detail. In the revised version, we will add a dedicated subsection at the start of §2 describing the search strategy, including queried sources (Google Scholar, arXiv, ACL Anthology, SIGIR/TOIS proceedings), date cutoff, and inclusion/exclusion criteria (e.g., focus on tasks requiring multi-hop or latent inference rather than direct semantic match). This will make the landscape analysis more transparent without altering the core categorization. revision: yes
-
Referee: [§3] §3 (Taxonomy): The taxonomy categories (e.g., reasoning at query rewriting, retrieval, or reranking stages) are presented without justification of how they were derived, how boundary cases were handled, or any validation (such as application to a held-out set of papers). This makes it difficult to evaluate whether the taxonomy reflects genuine divisions or post-hoc grouping, which is load-bearing for the roadmap value asserted in the abstract.
Authors: The taxonomy was constructed by mapping reasoning integration points onto the canonical stages of the retrieval pipeline (query formulation, initial retrieval, and reranking), which follows naturally from standard IR system architectures and the ways LLMs are currently applied in the literature. Boundary cases (e.g., hybrid methods) were resolved by primary stage of reasoning application. We acknowledge that the manuscript presents the taxonomy without sufficient methodological justification or illustrative validation. In revision, we will expand the opening of §3 with a new paragraph explaining the derivation rationale, provide explicit examples of boundary-case handling, and include a short validation table applying the taxonomy to a representative sample of papers (including some not used in the initial development) to demonstrate consistency. This will clarify that the categories capture genuine pipeline distinctions rather than arbitrary groupings. revision: yes
Circularity Check
No circularity in organizational survey
full rationale
This survey paper organizes existing RIR literature by domains/modalities and introduces a taxonomy of reasoning integration points in the retrieval pipeline. No derivations, equations, predictions, fitted parameters, or self-referential reductions appear anywhere in the text. The central claims are descriptive and classificatory rather than derived from prior results within the paper; completeness of coverage is presented as an external literature-review task, not as a constructed output. The work is therefore self-contained as an organizational contribution with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Abdelrahman Abdallah, Mohamed Darwish Mounis, Mahmoud Abdalla, Mahmoud Salaheldin Kasem, Mostafa Farouk Senussi, Mohamed Mahmoud, Mohammed Ali, Adam Jatowt, and Hyun Soo Kang. 2026. https://api.semanticscholar.org/CorpusID:284718195 Mm-bright: A multi-task multimodal benchmark for reasoning-intensive retrieval . ArXiv, abs/2601.09562
-
[2]
Abdelrahman Abdallah, Jamshid Mozafari, Bhawna Piryani, and Adam Jatowt. 2025. https://doi.org/10.18653/v1/2025.findings-emnlp.306 D e AR : Dual-stage document reranking with reasoning agents via LLM distillation . In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 5710--5723, Suzhou, China. Association for Computational Linguistics
-
[3]
Mohammad Kalim Akram, Saba Sturua, Nastia Havriushenko, Quentin Herreros, Michael G \"u nther, Maximilian Werk, and Han Xiao. 2026. https://api.semanticscholar.org/CorpusID:285659408 jina-embeddings-v5-text: Task-targeted embedding distillation . ArXiv, abs/2602.15547
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[4]
Freeman, and Antonio Torralba
Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, and Antonio Torralba. 2025. https://openreview.net/forum?id=rQQZiSFcNU Mathnet: a global multimodal benchmark for mathematical reasoning and retrieval . In The 5th Workshop on Mathematical Reasoning and AI at NeurIPS 2025
2025
- [5]
-
[6]
Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, and Siva Reddy. 2024. https://openreview.net/forum?id=IW1PR7vEBf LLM 2vec: Large language models are secretly powerful text encoders . In First Conference on Language Modeling
2024
- [7]
- [8]
- [9]
-
[10]
Jianlyu Chen, Junwei Lan, Chaofan Li, Defu Lian, and Zheng Liu. 2025 b . https://arxiv.org/abs/2510.08252 Reasonembed: Enhanced text embeddings for reasoning-intensive document retrieval . arXiv preprint arXiv:2510.08252
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [11]
- [12]
- [13]
-
[14]
Debrup Das, Sam O ' Nuallain, and Razieh Rahimi. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1011 R a D e R : Reasoning-aware dense retrieval models . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 19981--20008, Suzhou, China. Association for Computational Linguistics
-
[15]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long a...
-
[16]
Felix Faltings, Wei Wei, and Yujia Bao. 2025. https://doi.org/10.18653/v1/2025.acl-short.34 Enhancing retrieval systems with inference-time logical reasoning . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 449--463, Vienna, Austria. Association for Computational Linguistics
- [17]
-
[18]
Aniketh Garikaparthi, Manasi Patwardhan, Aditya Sanjiv Kanade, Aman Hassan, Lovekesh Vig, and Arman Cohan. 2025. https://doi.org/10.18653/v1/2025.acl-long.1390 MIR : Methodology inspiration retrieval for scientific research problems . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages ...
-
[19]
Jiahui Geng, Fengyu Cai, Shaobo Cui, Qing Li, Liangwei Chen, Chenyang Lyu, Haonan Li, Derui Zhu, Walter Pretschner, Heinz Koeppl, et al. 2025. https://arxiv.org/abs/2506.11066 Coquir: A comprehensive benchmark for code quality-aware information retrieval . arXiv preprint arXiv:2506.11066
-
[20]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, and et al. 2025. https://doi.org/10.1038/S41586-025-09422-Z Deepseek-r1 incentivizes reasoning in llms through reinforcement learning . Nat., 645(8081):633--638
- [21]
-
[22]
Jerry Huang, Siddarth Madala, Cheng Niu, J. Hockenmaier, and Tong Zhang. 2025. https://api.semanticscholar.org/CorpusID:282739773 Contextual relevance and adaptive sampling for llm-based document reranking . ArXiv, abs/2511.01208
-
[23]
Hamel Husain, Hongqiu Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. https://api.semanticscholar.org/CorpusID:202712680 Codesearchnet challenge: Evaluating the state of semantic code search . ArXiv, abs/1909.09436
work page internal anchor Pith review arXiv 2019
-
[24]
Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. 2022. https://openreview.net/forum?id=jKN1pXi7b0 Unsupervised dense information retrieval with contrastive learning . Transactions on Machine Learning Research
2022
-
[25]
Yuelyu Ji, Zhuochun Li, Rui Meng, and Daqing He. 2025. https://doi.org/10.1145/3726302.3730070 Reason-to-rank: Distilling direct and comparative reasoning from large language models for document reranking . In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025, Padua, Italy, July 13-...
- [26]
- [27]
-
[28]
Haocheng Ju and Bin Dong. 2025. https://openreview.net/forum?id=0pJtN4S9d6 MIRB : Mathematical information retrieval benchmark . In 2nd AI for Math Workshop @ ICML 2025
2025
-
[29]
Omar Khattab and Matei Zaharia. 2020. https://doi.org/10.1145/3397271.3401075 Colbert: Efficient and effective passage search via contextualized late interaction over bert . In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '20, page 39–48, New York, NY, USA. Association for Computing...
- [30]
- [31]
-
[32]
Dohyeon Lee, Yeonseok Jeong, and Seung-won Hwang. 2025 a . https://doi.org/10.18653/v1/2025.findings-emnlp.371 From token to action: State machine reasoning to mitigate overthinking in information retrieval . In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 7048--7064, Suzhou, China. Association for Computational Linguistics
-
[33]
Jinhyuk Lee, Feiyang Chen, Sahil Dua, Daniel Cer, Madhuri Shanbhogue, Iftekhar Naim, Gustavo Hern \'a ndez Abrego, Zhe Li, Kaifeng Chen, Henrique Schechter Vera, Xiaoqi Ren, Shanfeng Zhang, Daniel M. Salz, Michael Boratko, Jay Han, Blair Chen, Shuo Huang, Vikram Rao, Paul Suganthan, Feng Han, Andreas Doumanoglou, Nithi Gupta, Fedor Moiseev, Cathy Yip, Aas...
work page internal anchor Pith review arXiv 2025
- [34]
- [35]
-
[36]
Yibin Lei, Tao Shen, and Andrew Yates. 2025. https://doi.org/10.18653/v1/2025.findings-emnlp.965 T hink QE : Query expansion via an evolving thinking process . In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 17772--17781, Suzhou, China. Association for Computational Linguistics
-
[37]
Lei Li, Xiangxu Zhang, Xiao Zhou, and Zheng Liu. 2025 a . https://doi.org/10.18653/v1/2025.findings-emnlp.1305 A uto MIR : Effective zero-shot medical information retrieval without relevance labels . In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24028--24047, Suzhou, China. Association for Computational Linguistics
-
[38]
Lei Li, Xiao Zhou, and Zheng Liu. 2025 b . https://arxiv.org/abs/2505.14558 R2med: A benchmark for reasoning-driven medical retrieval . arXiv preprint arXiv:2505.14558
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
Qingquan Li, Yiran Hu, Feng Yao, Chaojun Xiao, Zhiyuan Liu, Maosong Sun, and Weixing Shen. 2023. https://doi.org/10.1145/3583780.3615125 Muser: A multi-view similar case retrieval dataset . In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM '23, page 5336–5340, New York, NY, USA. Association for Computing...
-
[40]
Xiangyang Li, Kuicai Dong, Yi Quan Lee, Wei Xia, Hao Zhang, Xinyi Dai, Yasheng Wang, and Ruiming Tang. 2025 c . https://doi.org/10.18653/v1/2025.acl-long.1072 C o IR : A comprehensive benchmark for code information retrieval models . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2...
-
[41]
Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, and Zhicheng Dou. 2025 d . https://doi.org/10.1145/3722552 From matching to generation: A survey on generative information retrieval . ACM Trans. Inf. Syst., 43(3)
-
[43]
Xingxuan Li, Weiwen Xu, Ruochen Zhao, Fangkai Jiao, Shafiq Joty, and Lidong Bing. 2025 f . https://doi.org/10.18653/v1/2025.acl-long.1244 Can we further elicit reasoning in LLM s? critic-guided planning with retrieval-augmentation for solving challenging tasks . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Vo...
-
[44]
Yangning Li, Weizhi Zhang, Yuyao Yang, Wei-Chieh Huang, Yaozu Wu, Junyu Luo, Yuanchen Bei, Henry Peng Zou, Xiao Luo, Yusheng Zhao, Chunkit Chan, Yankai Chen, Zhongfen Deng, Yinghui Li, Hai-Tao Zheng, Dongyuan Li, Renhe Jiang, Ming Zhang, Yangqiu Song, and Philip S. Yu. 2025 g . https://doi.org/10.18653/v1/2025.findings-emnlp.648 A survey of RAG -reasoning...
- [45]
-
[46]
Junyong Lin, Lu Dai, Ruiqian Han, Yijie Sui, Ruilin Wang, Xingliang Sun, Qinglin Wu, Min Feng, Hao Liu, and Hui Xiong. 2025. https://doi.org/10.1145/3711896.3737432 Scirgen: Synthesize realistic and large-scale RAG dataset for scientific research . In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, KDD 2025, Toro...
- [47]
- [48]
-
[49]
Wenhan Liu, Xinyu Ma, Weiwei Sun, Yutao Zhu, Yuchen Li, Dawei Yin, and Zhicheng Dou. 2025 b . https://arxiv.org/abs/2508.07050 Reasonrank: Empowering passage ranking with strong reasoning ability . arXiv preprint arXiv:2508.07050
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[50]
Yuxiang Liu, Tian Wang, Gourab Kundu, Tianyu Cao, Guang Cheng, Zhen Ge, Jianshu Chen, Qingjun Cui, and Trishul Chilimbi. 2025 c . https://doi.org/10.1145/3746252.3760855 Exploring reasoning-infused text embedding with large language models for zero-shot dense retrieval . In Proceedings of the 34th ACM International Conference on Information and Knowledge ...
- [51]
-
[52]
Shubham Kumar Nigam, Navansh Goel, and Arnab Bhattacharya. 2022. https://doi.org/10.1007/978-3-031-29168-5_7 nigam@coliee-22: Legal case retrieval and entailment using cascading of lexical and semantic-based models . In New Frontiers in Artificial Intelligence: JSAI-IsAI 2022 Workshop, JURISIN 2022, and JSAI 2022 International Session, Kyoto, Japan, June ...
- [53]
- [54]
-
[55]
Zhiyuan Peng, Ting-Ruen Wei, Tingyu Song, and Yilun Zhao. 2025. https://doi.org/10.18653/v1/2025.emnlp-industry.186 Efficiency-effectiveness reranking FLOP s for LLM -based rerankers . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 2782--2791, Suzhou (China). Association for Computational L...
-
[56]
Zile Qiao, Guoxin Chen, Xuanzhong Chen, Donglei Yu, Wenbiao Yin, Xinyu Wang, Zhen Zhang, Baixuan Li, Huifeng Yin, Kuan Li, Rui Min, Minpeng Liao, Yong Jiang, Pengjun Xie, Fei Huang, and Jingren Zhou. 2025. https://api.semanticscholar.org/CorpusID:281325175 Webresearcher: Unleashing unbounded reasoning capability in long-horizon agents . ArXiv, abs/2509.13309
-
[57]
Xubo Qin, Jun Bai, Jiaqi Li, Zixia Jia, and Zilong Zheng. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1078 Reinforced query reasoners for reasoning-intensive retrieval tasks . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 21261--21274, Suzhou, China. Association for Computational Linguistics
-
[58]
Stephen Robertson and Hugo Zaragoza. 2009. https://doi.org/10.1561/1500000019 The probabilistic relevance framework: Bm25 and beyond . Found. Trends Inf. Retr., 3(4):333–389
-
[59]
Fu, Simran Arora, Neel Guha, and Christopher R\' e
Jon Saad-Falcon, Daniel Y. Fu, Simran Arora, Neel Guha, and Christopher R\' e . 2024. Benchmarking and building long-context retrieval models with loco and m2-bert. In Proceedings of the 41st International Conference on Machine Learning, ICML'24. JMLR.org
2024
-
[60]
Chris Samarinas and Hamed Zamani. 2025. https://doi.org/10.1145/3731120.3744613 Distillation and refinement of reasoning in small language models for document re-ranking . In Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval, ICTIR 2025, Padua, Italy, 18 July 2025 , pages 430--435. ACM
- [61]
- [62]
-
[63]
Yuchen Shi, Yuzheng Cai, Siqi Cai, Zihan Xu, Lichao Chen, Yulei Qin, Zhijian Zhou, Xiang Fei, Chaofan Qiu, Xiaoyu Tan, Gang Li, Zongyi Li, Haojia Lin, Guocan Cai, Yong Mao, Yunsheng Wu, Ke Li, and Xing Sun. 2025. https://api.semanticscholar.org/CorpusID:284350437 Youtu-agent: Scaling agent productivity with automated generation and hybrid policy optimizat...
-
[64]
Tingyu Song, Guo Gan, Mingsheng Shang, and Yilun Zhao. 2025 a . https://doi.org/10.18653/v1/2025.naacl-long.511 IFIR : A comprehensive benchmark for evaluating instruction-following in expert-domain information retrieval . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human L...
-
[65]
Tingyu Song, Yilun Zhao, Siyue Zhang, Chen Zhao, and Arman Cohan. 2025 b . https://doi.org/10.18653/v1/2025.emnlp-main.1041 L im R ank: Less is more for reasoning-intensive information reranking . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 20636--20650, Suzhou, China. Association for Computational Linguistics
-
[66]
Hongjin Su, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han yu Wang, Liu Haisu, Quan Shi, Zachary S Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O Arik, Danqi Chen, and Tao Yu. 2025. https://openreview.net/forum?id=ykuc5q381b BRIGHT : A realistic and challenging benchmark for reasoning-intensive retrieval . In The Thirteenth Interna...
2025
-
[67]
Duolin Sun, Meixiu Long, Dan Yang, Yihan Jiao, Zhehao Tan, Jie Feng, Junjie Wang, Yue Shen, Peng Wei, Jian Wang, et al. 2025. https://arxiv.org/abs/2511.11653 Grouprank: A groupwise reranking paradigm driven by reinforcement learning . arXiv preprint arXiv:2511.11653
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[68]
Zeinab Sadat Taghavi, Ali Modarressi, Yunpu Ma, and Hinrich Schuetze. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1685 I mpli R et: Benchmarking the implicit fact retrieval challenge . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 33156--33178, Suzhou, China. Association for Computational Linguistics
- [69]
-
[70]
Tian Tang, Zhixing Tian, Zhenyu Zhu, Chenyang Wang, Haiqing Hu, Guoyu Tang, Lin Liu, and Sulong Xu. 2025 b . https://doi.org/10.1145/3701716.3715246 Lref: A novel llm-based relevance framework for e-commerce search . In Companion Proceedings of the ACM on Web Conference 2025, WWW '25, page 468–475, New York, NY, USA. Association for Computing Machinery
-
[71]
Nandan Thakur, Jimmy Lin, Sam Havens, Michael Carbin, Omar Khattab, and Andrew Drozdov. 2025. https://openreview.net/forum?id=54TTgXlS2U Freshstack: Building realistic benchmarks for evaluating retrieval on technical documents . In The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track
2025
- [72]
- [73]
-
[74]
Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, and Luca Soldaini. 2025 a . https://doi.org/10.18653/v1/2025.naacl-long.597 F ollow IR : Evaluating and teaching information retrieval models to follow instructions . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Assoc...
-
[75]
Orion Weller, Benjamin Van Durme, Dawn Lawrie, Ashwin Paranjape, Yuhao Zhang, and Jack Hessel. 2025 b . https://openreview.net/forum?id=odvSjn416y Promptriever: Instruction-trained retrievers can be prompted like language models . In The Thirteenth International Conference on Learning Representations
2025
- [76]
- [77]
- [78]
-
[79]
Kaishuai Xu, Wenjun Hou, Yi Cheng, and Wenjie Li. 2025. https://doi.org/10.18653/v1/2025.findings-emnlp.1110 RAR ^2 : Retrieval-augmented medical reasoning via thought-driven retrieval . In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 20386--20396, Suzhou, China. Association for Computational Linguistics
- [80]
- [81]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.