arxiv: 2509.07794 · v3 · submitted 2025-09-09 · 💻 cs.IR

Query Expansion in the Age of Pre-trained and Large Language Models: A Comprehensive Survey

Minghan Li , Xinxuan Lv , Junjie Zou , Tongna Chen , Chao Zhang , Suchao An , Ercong Nie , Guodong Zhou This is my paper

Pith reviewed 2026-05-18 18:03 UTC · model grok-4.3

classification 💻 cs.IR

keywords query expansionpre-trained language modelslarge language modelsinformation retrievalsurveytaxonomydesign dimensions

0 comments p. Extension

The pith

Query expansion methods for pre-trained and large language models are unified by four design dimensions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey reviews how query expansion has changed with pre-trained and large language models, noting new behaviors such as stronger contextualization, controllable generation, and instruction following. It organizes recent techniques into a unified framework using four complementary dimensions to map the current landscape. The dimensions cover the location of expansion in the retrieval pipeline, its grounding with corpus evidence, its learning or alignment process, and its use of structured knowledge like knowledge graphs. The paper synthesizes application patterns across retrieval settings and discusses practical trade-offs in effectiveness, controllability, and cost. It concludes by identifying open challenges for building more reliable and adaptive query expansion systems.

Core claim

The paper claims that query expansion in the PLM and LLM era can be comprehensively understood through a taxonomy built on four design dimensions: where expansion is injected in the pipeline, how it is grounded and interacts with corpus evidence, how it is learned or aligned, and how structured knowledge such as knowledge graphs is incorporated.

What carries the argument

Four complementary design dimensions for classifying query expansion techniques by injection point, grounding to evidence, learning method, and structured knowledge use.

If this is right

Application patterns and deployment considerations become clearer across representative retrieval settings.
Trade-offs among effectiveness, controllability, grounding quality, and operating cost can be compared systematically.
Open challenges for reliable, safe, efficient, and continually adaptive query expansion can be prioritized under real-world constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The four-dimension framework could serve as a checklist for designing hybrid query expansion systems that combine multiple strategies.
Future empirical studies might measure whether techniques that explicitly address all four dimensions outperform those focused on only one or two.
The taxonomy may help surface gaps in current research, such as methods that maintain performance across changing corpora without retraining.

Load-bearing premise

The selected papers and four proposed design dimensions together give a comprehensive and unbiased picture of the entire field.

What would settle it

Discovery of multiple recent query expansion methods that resist classification into any of the four dimensions, or identification of major omitted papers that alter the synthesized view, would falsify the claim of a unified landscape.

Figures

Figures reproduced from arXiv: 2509.07794 by Chao Zhang, Ercong Nie, Guodong Zhou, Junjie Zou, Minghan Li, Suchao An, Tongna Chen, Xinxuan Lv.

**Figure 1.** Figure 1: A Taxonomy of Query Expansion Techniques: From Traditional Methods to PLM/LLM-Driven Techniques and Applications. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Application Scenarios of Query Expansion in Information Retrieval [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Application Scenarios of Query Expansion in Information Retrieval [PITH_FULL_IMAGE:figures/full_fig_p027_3.png] view at source ↗

read the original abstract

Modern information retrieval must reconcile short, ambiguous queries with increasingly diverse and dynamic corpora. Query expansion (QE) remains a core technique for mitigating vocabulary mismatch, but its design space has been reshaped by pre-trained and large language models (PLMs/LLMs). This survey reviews QE methods in the PLM/LLM era and provides a unified view of the emerging landscape. We first summarize how different model families enable new expansion behaviors, including stronger contextualization, more controllable generation, and instruction-following. We then organize recent techniques along four complementary design dimensions: where expansion is injected in the pipeline, how it is grounded and interacts with corpus evidence, how it is learned or aligned, and how structured knowledge such as knowledge graphs is incorporated. Beyond taxonomy, we synthesize application patterns and deployment considerations across representative retrieval settings, highlighting practical trade-offs among effectiveness, controllability, grounding quality, and operating cost. Finally, we outline open challenges and future directions toward more reliable, safe, efficient, and continually adaptive QE under real-world constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey organizes QE methods for PLMs and LLMs into a four-axis taxonomy and flags practical trade-offs, which is helpful for the subfield but stays within standard survey bounds.

read the letter

This paper is a survey that maps out query expansion in the current era of pre-trained and large language models. The core contribution is a taxonomy built around four dimensions: where the expansion gets injected in the pipeline, how it stays grounded in corpus evidence, how the model learns or aligns the expansion, and how structured knowledge like graphs gets folded in. That framing lines up with the abstract's description of complementary axes and gives a way to compare effectiveness, controllability, and cost across settings. The authors also pull together patterns from different retrieval tasks and list open challenges around reliability and efficiency, which is the kind of synthesis that can save time for someone entering the area. What the paper does well is keep the focus on concrete design choices rather than just listing papers. It notes how instruction-following and controllable generation change what is possible compared with older methods, and it flags deployment considerations without overclaiming. The coverage appears broad enough from the abstract to cover post-2022 work, and the circularity risk looks low since the taxonomy draws from cited prior results rather than depending on the authors' own earlier papers. Soft spots are the usual ones for surveys. Selection of which papers to include can tilt the picture, and without a clear protocol for exhaustive search it is possible some efficiency-focused or multilingual lines get less attention. The abstract does not show internal contradictions or unsupported claims, but the strength of the unified view will depend on how accurately the full text represents the landscape. This is the sort of paper that helps IR researchers who want an organized starting point rather than a single new algorithm. A reader who needs to place recent work in context or spot gaps for their own projects will find it useful. It is not paradigm-shifting, but the taxonomy is grounded enough to serve as a reference. I would send it to peer review so the community can check the completeness and sharpen the dimensions if needed.

Referee Report

2 major / 3 minor

Summary. The manuscript surveys query expansion (QE) techniques in the pre-trained and large language model (PLM/LLM) era. It first summarizes how different model families enable new expansion behaviors such as stronger contextualization, controllable generation, and instruction following. It then organizes recent methods along four complementary design dimensions: where expansion is injected in the pipeline, how it is grounded and interacts with corpus evidence, how it is learned or aligned, and how structured knowledge (e.g., knowledge graphs) is incorporated. The survey synthesizes application patterns and deployment considerations across retrieval settings, highlights trade-offs among effectiveness, controllability, grounding quality, and cost, and outlines open challenges for reliable, safe, efficient, and adaptive QE.

Significance. If the taxonomy and synthesis hold, the paper offers a timely unified view of a rapidly evolving subfield in information retrieval. The four-dimension organization and explicit discussion of practical trade-offs could help researchers and practitioners navigate design choices; the coverage of open challenges provides a useful roadmap. Strengths include the focus on post-2022 literature and the attempt to move beyond isolated technique descriptions toward cross-cutting patterns.

major comments (2)

[§3] §3 (Taxonomy): The claim that the four dimensions are 'complementary' and provide a 'unified view' would be strengthened by an explicit analysis of their orthogonality or overlap. For instance, grounding with corpus evidence and learning/alignment appear intertwined in many LLM-based methods; without a concrete mapping or counter-example table, the taxonomy risks being descriptive rather than prescriptive.
[§4] §4 (Application patterns and trade-offs): The synthesis of effectiveness vs. operating cost is central to the practical contribution, yet the manuscript provides only qualitative discussion. Adding a summary table that aggregates reported metrics (e.g., nDCG deltas and latency) from representative papers would make the trade-off claims more falsifiable and load-bearing for the deployment considerations section.

minor comments (3)

[Throughout] Ensure consistent use of 'PLM' versus 'LLM' terminology; some passages appear to treat them interchangeably while others distinguish scale and capabilities.
[Open challenges] The open-challenges section lists 'continually adaptive QE' but does not reference specific continual-learning techniques from the broader IR or NLP literature; adding 2–3 targeted citations would improve concreteness.
[Figures] Figure captions and axis labels in any taxonomy diagrams should explicitly state the source papers or selection criteria used to populate each cell.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments on our survey. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (Taxonomy): The claim that the four dimensions are 'complementary' and provide a 'unified view' would be strengthened by an explicit analysis of their orthogonality or overlap. For instance, grounding with corpus evidence and learning/alignment appear intertwined in many LLM-based methods; without a concrete mapping or counter-example table, the taxonomy risks being descriptive rather than prescriptive.

Authors: We agree that an explicit discussion of relationships among the dimensions would make the taxonomy more prescriptive. In the revised manuscript we will add a dedicated paragraph plus a mapping table in §3. The table will list representative methods, indicate primary and secondary dimensions for each, and provide counter-examples showing cases of clear orthogonality versus entanglement (e.g., pure corpus-grounded expansion versus jointly learned alignment). revision: yes
Referee: [§4] §4 (Application patterns and trade-offs): The synthesis of effectiveness vs. operating cost is central to the practical contribution, yet the manuscript provides only qualitative discussion. Adding a summary table that aggregates reported metrics (e.g., nDCG deltas and latency) from representative papers would make the trade-off claims more falsifiable and load-bearing for the deployment considerations section.

Authors: We acknowledge the value of making trade-off claims more concrete. Because of heterogeneous experimental protocols, we cannot produce a single comparable aggregate. We will therefore add a table in §4 that reports the key effectiveness and efficiency figures exactly as published for a curated set of representative papers, accompanied by an explicit caveats paragraph on non-comparability. This revision will be partial but will directly address the request for falsifiability. revision: partial

Circularity Check

0 steps flagged

No significant circularity in survey synthesis and taxonomy

full rationale

This is a survey paper whose central contribution is a review of existing QE literature in the PLM/LLM era plus a taxonomy organized along four complementary design dimensions drawn from the reviewed works. No equations, predictions, or derivations are present that could reduce to the paper's own inputs by construction. The dimensions (injection point, grounding, learning/alignment, structured knowledge) are presented as an organizational lens synthesized from external citations rather than self-defined or fitted from the paper's data. Any author self-citations are incidental and non-load-bearing for the core claims, which remain externally grounded and falsifiable against the broader literature.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is a survey paper with no new mathematical derivations or empirical claims. It relies on the body of prior QE literature for its taxonomy and synthesis without introducing free parameters, new axioms beyond standard IR assumptions, or invented entities.

axioms (1)

domain assumption Query expansion mitigates vocabulary mismatch between short queries and diverse corpora
Stated in the opening of the abstract as the core motivation for QE.

pith-pipeline@v0.9.0 · 5733 in / 1150 out tokens · 34525 ms · 2026-05-18T18:03:19.045228+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

133 extracted references · 133 canonical work pages · 11 internal anchors

[1]

Ahmed Abdelali, Jim Cowie, and Hamdy S Soliman. 2007. Improving query precision using semantic expansion. Information processing & management 43, 3 (2007), 705–716

work page 2007
[2]

Kenya Abe, Kunihiro Takeoka, Makoto P Kato, and Masafumi Oyamada. 2025. LLM-based Query Expansion Fails for Unfamiliar and Ambiguous Queries. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3035–3039

work page 2025
[3]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, et al. 2023. Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

Alan R Aronson and Thomas C Rindflesch. 1997. Query expansion using the UMLS Metathesaurus. In Proceedings of the AMIA annual fall symposium. 485

work page 1997
[6]

Aronson and Ph.D Thomas C

Ph.D Alan R. Aronson and Ph.D Thomas C. Rindflesch. 1997. Query Expansion Using the UMLS®Metathesaurus®. https://api.semanticscholar. org/CorpusID:8713491

work page 1997
[7]

Samy Ateia and Udo Kruschwitz. 2025. BioRAGent: A Retrieval-Augmented Generation System for Showcasing Generative Query Expansion and Domain-Specific Search for Scientific Q&A. In European Conference on Information Retrieval. Springer, 1–5. 32 Li et al

work page 2025
[8]

Hiteshwar Kumar Azad and Akshay Deepak. 2019. Query expansion techniques for information retrieval: a survey. Information Processing & Management 56, 5 (2019), 1698–1735

work page 2019
[9]

Ricardo Baeza-Yates, Carlos Hurtado, and Marcelo Mendoza. 2004. Query recommendation using query logs in search engines. In International conference on extending database technology. Springer, 588–596

work page 2004
[10]

Elias Bassani, Nicola Tonellotto, and Gabriella Pasi. 2023. Personalized query expansion with contextual word embeddings. ACM Transactions on Information Systems 42, 2 (2023), 1–35

work page 2023
[11]

Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:202558505

work page 2019
[12]

Michele Bevilacqua, Giuseppe Ottaviano, Patrick Lewis, Scott Yih, Sebastian Riedel, and Fabio Petroni. 2022. Autoregressive search engines: Generating substrings as document identifiers. Advances in Neural Information Processing Systems 35 (2022), 31668–31683

work page 2022
[13]

Jagdev Bhogal, Andrew MacFarlane, and Peter Smith. 2007. A review of ontology based query expansion. Information processing & management 43, 4 (2007), 866–886

work page 2007
[14]

Nazia Bibi, Muhammad Usman Tariq, Zabeeh Ullah, Muhammad Babar, and Zahid Khan. 2025. Enhancing Code Search through Query Expansion: A Fusion of LSTM with GloVe and BERT Model (ECSQE). Results in Engineering (2025), 105979

work page 2025
[15]

Altan Cakir and Mert Gurkan. 2023. Modified query expansion through generative adversarial networks for information extraction in e-commerce. Machine Learning with Applications 14 (2023), 100509

work page 2023
[16]

Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. Acm Computing Surveys (CSUR) 44, 1 (2012), 1–50

work page 2012
[17]

Shufan Chen, He Zheng, and Lei Cui. 2025. When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models. In Findings of the Association for Computational Linguistics: NAACL 2025. 3621–3634

work page 2025
[18]

Xinran Chen, Xuanang Chen, Ben He, Tengfei Wen, and Le Sun. 2024. Analyze, generate and refine: Query expansion with LLMs for zero-shot open-domain QA. In Findings of the Association for Computational Linguistics ACL 2024. 11908–11922

work page 2024
[19]

Yung-Sung Chuang, Wei Fang, Shang-Wen Li, Wen-tau Yih, and James Glass. 2023. Expand, rerank, and retrieve: Query reranking for open-domain question answering. arXiv preprint arXiv:2305.17080 (2023)

work page arXiv 2023
[20]

Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. 2002. Probabilistic query expansion using query logs. In Proceedings of the 11th international conference on World Wide Web. 325–332

work page 2002
[21]

Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. 2003. Query expansion by mining user logs. IEEE transactions on knowledge and data engineering 15, 4 (2003), 829–839

work page 2003
[22]

Giulio D’Erasmo, Giovanni Trappolini, Fabrizio Silvestri, and Nicola Tonellotto. 2025. ECLIPSE: Contrastive Dimension Importance Estimation with Pseudo-Irrelevance Feedback for Dense Retrieval. In Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR). 147–154

work page 2025
[23]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 4171–4186

work page 2019
[24]

Lijuan Diao, Hong Yan, Fuxue Li, Shoujun Song, Guohua Lei, and Feng Wang. 2018. The research of query expansion based on medical terms reweighting in medical information retrieval. EURASIP Journal on Wireless Communications and Networking 2018, 1 (2018), 105

work page 2018
[25]

Efthimis N Efthimiadis. 1996. Query Expansion. Annual review of information science and technology (ARIST) 31 (1996), 121–87

work page 1996
[26]

Jiazhan Feng, Chongyang Tao, Xiubo Geng, Tao Shen, Can Xu, Guodong Long, Dongyan Zhao, and Daxin Jiang. 2023. Synergistic interplay between search and large language models for information retrieval. arXiv preprint arXiv:2305.07402 (2023)

work page arXiv 2023
[27]

Luciano Floridi and Massimo Chiriatti. 2020. GPT-3: Its nature, scope, limits, and consequences. Minds and machines 30, 4 (2020), 681–694

work page 2020
[28]

On the Move to Meaningful Internet Systems

Gaihua Fu, Christopher B Jones, and Alia I Abdelmoty. 2005. Ontology-based spatial query expansion in information retrieval. InOTM Confederated International Conferences" On the Move to Meaningful Internet Systems". Springer, 1466–1482

work page 2005
[29]

Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. 2023. Precise zero-shot dense retrieval without relevance labels. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1762–1777

work page 2023
[30]

Silviu Homoceanu and Wolf-Tilo Balke. 2014. Querying concepts in product data by means of query expansion. Web intelligence and agent systems 12, 1 (2014), 1–14

work page 2014
[31]

Junying Hu, Kai Sun, Cong Ma, Hai Zhang, Jiangshe Zhang, et al. [n. d.]. Query Expansion by Retrieval-Augmented Generation Based on Deepseek. Kai and Ma, Cong and Zhang, Hai and Zhang, Jiangshe, Query Expansion by Retrieval-Augmented Generation Based on Deepseek ([n. d.])

work page
[32]

Qing Huang, Yang Yang, and Ming Cheng. 2019. Deep learning the semantics of change sequences for query expansion. Software: Practice and Experience 49, 11 (2019), 1600–1617

work page 2019
[33]

Qing Huang, Yangrui Yang, Xue Zhan, Hongyan Wan, and Guoqing Wu. 2018. Query expansion based on statistical learning from code changes. Software: Practice and Experience 48, 7 (2018), 1333–1351

work page 2018
[34]

Thomas Jaenich, Graham McDonald, and Iadh Ounis. 2025. Fair Exposure Allocation Using Generative Query Expansion. In European Conference on Information Retrieval. Springer, 267–281

work page 2025
[35]

Rolf Jagerman, Honglei Zhuang, Zhen Qin, Xuanhui Wang, and Michael Bendersky. 2023. Query expansion by prompting large language models. arXiv preprint arXiv:2305.03653 (2023). Query Expansion in the Age of Pre-trained and Large Language Models: A Comprehensive Survey 33

work page arXiv 2023
[36]

Pengyue Jia, Yiding Liu, Xiangyu Zhao, Xiaopeng Li, Changying Hao, Shuaiqiang Wang, and Dawei Yin. 2023. Mill: Mutual verification with large language models for zero-shot query expansion. arXiv preprint arXiv:2310.19056 (2023)

work page arXiv 2023
[37]

SeongKu Kang, Shivam Agarwal, Bowen Jin, Dongha Lee, Hwanjo Yu, and Jiawei Han. 2024. Improving retrieval in theme-specific applications using a corpus topical taxonomy. In Proceedings of the ACM Web Conference 2024. 1497–1508

work page 2024
[38]

Sam Kerr Kelly. 2021. Enhancing Query Expansion for Rare Diseases in PubMed Using Embedding-Based Semantic Representations. (2021)

work page 2021
[39]

Ayesha Khader and Faezeh Ensan. 2023. Learning to rank query expansion terms for COVID-19 scholarly search.Journal of Biomedical Informatics 142 (2023), 104386 – 104386. https://api.semanticscholar.org/CorpusID:258659102

work page 2023
[40]

Ayesha Khader, Hamid Sajjadi, and Faezeh Ensan. 2022. Contextual Query Expansion for Conducting Technology-Assisted Biomedical Reviews.. In Canadian AI

work page 2022
[41]

Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 39–48

work page 2020
[42]

Hamin Koo, Minseon Kim, and Sung Ju Hwang. 2024. Optimizing query generation for enhanced document retrieval in rag. arXiv preprint arXiv:2407.12325 (2024)

work page arXiv 2024
[43]

Robert Krovetz and W Bruce Croft. 1992. Lexical ambiguity and information retrieval. ACM Transactions on Information Systems (TOIS) 10, 2 (1992), 115–141

work page 1992
[44]

Vaibhav Kumar and Jamie Callan. 2020. Making information seeking easier: An improved pipeline for conversational search. In Findings of the Association for Computational Linguistics: EMNLP 2020. 3971–3980

work page 2020
[45]

Zhu Kunpeng, Wang Xiaolong, and Liu Yuanchao. 2009. A new query expansion method based on query logs mining. International Journal on Asian Language Processing 19, 1 (2009), 1–12

work page 2009
[46]

Victor Lavrenko and W Bruce Croft. 2017. Relevance-based language models. In ACM SIGIR Forum, Vol. 51. ACM New York, NY, USA, 260–267

work page 2017
[47]

Jinhyuk Lee, WonJin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2019. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36 (2019), 1234 – 1240. https://api.semanticscholar.org/CorpusID: 59291975

work page 2019
[48]

Sung-Min Lee, Eunhwan Park, Donghyeon Jeon, Inho Kang, and Seung-Hoon Na. 2024. RADCoT: Retrieval-augmented distillation to specialization models for generating chain-of-thoughts in query expansion. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 13514–13523

work page 2024
[49]

Jiayin Lei, Weijiang Li, Feng Wang, and Hui Deng. 2011. A survey on query expansion based on local analysis. In2011 4th International Conference on Intelligent Networks and Intelligent Systems. IEEE, 1–4

work page 2011
[50]

Yibin Lei, Yu Cao, Tianyi Zhou, Tao Shen, and Andrew Yates. 2024. Corpus-Steered Query Expansion with Large Language Models. InProceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers). 393–401

work page 2024
[51]

Yibin Lei, Tao Shen, and Andrew Yates. 2025. ThinkQE: Query Expansion via an Evolving Thinking Process. arXiv preprint arXiv:2506.09260 (2025)

work page arXiv 2025
[52]

Otávio AL Lemos, Adriano C de Paula, Felipe C Zanichelli, and Cristina V Lopes. 2014. Thesaurus-based automatic query expansion for interface-driven code search. In Proceedings of the 11th working conference on mining software repositories. 212–221

work page 2014
[53]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdel rahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2019. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Annual Meeting of the Association for Computational Linguistics. https://api.semanticscholar.org/Corp...

work page 2019
[54]

Dong Li, Yelong Shen, Ruoming Jin, Yi Mao, Kuan Wang, and Weizhu Chen. 2022. Generation-augmented query expansion for code retrieval. arXiv preprint arXiv:2212.10692 (2022)

work page arXiv 2022
[55]

Hang Li, Xiao Wang, Bevan Koopman, and Guido Zuccon. 2025. Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models. arXiv preprint arXiv:2503.14887 (2025)

work page arXiv 2025
[56]

Hang Li, Shengyao Zhuang, Bevan Koopman, and Guido Zuccon. 2025. LLM-VPRF: Large Language Model Based Vector Pseudo Relevance Feedback. arXiv preprint arXiv:2504.01448 (2025)

work page arXiv 2025
[57]

Ronghan Li, Mingze Cui, Benben Wang, Yu Wang, and Qiguang Miao. 2025. Query Expansion with Topic-Aware In-Context Learning and Vocabulary Projection for Open-Domain Dense Retrieval. Available at SSRN 5367307 (2025)

work page 2025
[58]

Jason Liu, Seohyun Kim, Vijayaraghavan Murali, Swarat Chaudhuri, and Satish Chandra. 2019. Neural query expansion for code search. In Proceedings of the 3rd acm sigplan international workshop on machine learning and programming languages. 29–37

work page 2019
[59]

Linqing Liu, Minghan Li, Jimmy Lin, Sebastian Riedel, and Pontus Stenetorp. 2022. Query expansion using contextual clue sampling with language models. arXiv preprint arXiv:2210.07093 (2022)

work page arXiv 2022
[60]

Lingyuan Liu and Mengxiang Zhang. 2025. Exp4Fuse: A Rank Fusion Framework for Enhanced Sparse Retrieval using Large Language Model-based Query Expansion. arXiv:2506.04760 [cs.IR] https://arxiv.org/abs/2506.04760

work page arXiv 2025
[61]

XiangZheng Liu. 2023. When self-supervision met Query Expansion. Authorea Preprints (2023)

work page 2023
[62]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[63]

Meili Lu, Xiaobing Sun, Shaowei Wang, David Lo, and Yucong Duan. 2015. Query expansion via wordnet for effective code search. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 545–549. 34 Li et al

work page 2015
[64]

Zhiyong Lu, Won Kim, and W John Wilbur. 2009. Evaluation of query expansion using MeSH in PubMed.Information retrieval 12, 1 (2009), 69–80

work page 2009
[65]

Iain Mackie, Shubham Chatterjee, and Jeff Dalton. 2023. Generative and Pseudo-Relevant Feedback for Sparse, Dense and Learned Sparse Retrieval. In Workshop on Large Language Models’Interpretation and Trustworthiness, CIKM 2023

work page 2023
[66]

Iain Mackie, Shubham Chatterjee, and Jeffrey Dalton. 2023. Generative relevance feedback with large language models. In Proceedings of the 46th international ACM SIGIR conference on research and development in information retrieval. 2026–2031

work page 2023
[67]

Iain Mackie and Jeffrey Dalton. 2022. Query-specific knowledge graphs for complex finance topics. arXiv preprint arXiv:2211.04142 (2022)

work page arXiv 2022
[68]

Aritra Mandal, Ishita K Khan, and Prathyusha Senthil Kumar. 2019. Query Rewriting using Automatic Synonym Extraction for E-commerce Search.. In eCOM@ SIGIR

work page 2019
[69]

Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, and Weizhu Chen. 2020. Generation-augmented retrieval for open-domain question answering. arXiv preprint arXiv:2009.08553 (2020)

work page arXiv 2020
[70]

Yuetian Mao, Chengcheng Wan, Yuze Jiang, and Xiaodong Gu. 2023. Self-supervised query reformulation for code search. In Proceedings of the 31st acm joint european software engineering conference and symposium on the foundations of software engineering. 363–374

work page 2023
[71]

George Michalopoulos, Yuanxin Wang, Hussam Kaka, Helen H Chen, and Alexander Wong. 2020. UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus. In North American Chapter of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:224803491

work page 2020
[72]

Jack Minker, Gerald A Wilson, and Barbara H Zimmerman. 1972. An evaluation of query expansion by the addition of clustered terms for a document retrieval system. Information Storage and Retrieval 8, 6 (1972), 329–348

work page 1972
[73]

Fengran Mo, Jian-Yun Nie, Kaiyu Huang, Kelong Mao, Yutao Zhu, Peng Li, and Yang Liu. 2023. Learning to relate to previous turns in conversational search. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1722–1732

work page 2023
[74]

Shahrzad Naseri, Jeffrey Dalton, Andrew Yates, and James Allan. 2021. Ceqe: Contextualized embeddings for query expansion. In European conference on information retrieval. Springer, 467–482

work page 2021
[75]

Shahrzad Naseri, Jeffrey Dalton, Andrew Yates, and James Allan. 2022. CEQE to SQET: A study of contextualized embeddings for query expansion. Information Retrieval Journal 25, 2 (2022), 184–208

work page 2022
[76]

Jamal Abdul Nasir, Iraklis Varlamis, and Samreen Ishfaq. 2019. A knowledge-based semantic framework for query expansion. Information processing & management 56, 5 (2019), 1605–1617

work page 2019
[77]

Roberto Navigli, Paola Velardi, et al . 2003. An analysis of ontology-based query expansion strategies. In Proceedings of the 14th European Conference on Machine Learning, Workshop on Adaptive Text Extraction and Mining, Cavtat-Dubrovnik, Croatia. 42–49

work page 2003
[78]

Liming Nie, He Jiang, Zhilei Ren, Zeyi Sun, and Xiaochen Li. 2016. Query expansion based on crowd knowledge for code search.IEEE Transactions on Services Computing 9, 5 (2016), 771–783

work page 2016
[79]

Cheng Niu, Yuanhao Wu, Juno Zhu, Siliang Xu, Kashun Shum, Randy Zhong, Juntong Song, and Tong Zhang. 2023. Ragtruth: A hallucination corpus for developing trustworthy retrieval-augmented language models. arXiv preprint arXiv:2401.00396 (2023)

work page arXiv 2023
[80]

Mengjia Niu, Hao Li, Jie Shi, Hamed Haddadi, and Fan Mo. 2024. Mitigating hallucinations in large language models via self-refinement-enhanced knowledge retrieval. arXiv preprint arXiv:2405.06545 (2024)

work page arXiv 2024

Showing first 80 references.