pith. the verified trust layer for science. sign in

arxiv: 2509.07794 · v3 · submitted 2025-09-09 · 💻 cs.IR

Query Expansion in the Age of Pre-trained and Large Language Models: A Comprehensive Survey

Pith reviewed 2026-05-18 18:03 UTC · model grok-4.3

classification 💻 cs.IR
keywords query expansionpre-trained language modelslarge language modelsinformation retrievalsurveytaxonomydesign dimensions
0
0 comments X p. Extension

The pith

Query expansion methods for pre-trained and large language models are unified by four design dimensions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey reviews how query expansion has changed with pre-trained and large language models, noting new behaviors such as stronger contextualization, controllable generation, and instruction following. It organizes recent techniques into a unified framework using four complementary dimensions to map the current landscape. The dimensions cover the location of expansion in the retrieval pipeline, its grounding with corpus evidence, its learning or alignment process, and its use of structured knowledge like knowledge graphs. The paper synthesizes application patterns across retrieval settings and discusses practical trade-offs in effectiveness, controllability, and cost. It concludes by identifying open challenges for building more reliable and adaptive query expansion systems.

Core claim

The paper claims that query expansion in the PLM and LLM era can be comprehensively understood through a taxonomy built on four design dimensions: where expansion is injected in the pipeline, how it is grounded and interacts with corpus evidence, how it is learned or aligned, and how structured knowledge such as knowledge graphs is incorporated.

What carries the argument

Four complementary design dimensions for classifying query expansion techniques by injection point, grounding to evidence, learning method, and structured knowledge use.

If this is right

  • Application patterns and deployment considerations become clearer across representative retrieval settings.
  • Trade-offs among effectiveness, controllability, grounding quality, and operating cost can be compared systematically.
  • Open challenges for reliable, safe, efficient, and continually adaptive query expansion can be prioritized under real-world constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The four-dimension framework could serve as a checklist for designing hybrid query expansion systems that combine multiple strategies.
  • Future empirical studies might measure whether techniques that explicitly address all four dimensions outperform those focused on only one or two.
  • The taxonomy may help surface gaps in current research, such as methods that maintain performance across changing corpora without retraining.

Load-bearing premise

The selected papers and four proposed design dimensions together give a comprehensive and unbiased picture of the entire field.

What would settle it

Discovery of multiple recent query expansion methods that resist classification into any of the four dimensions, or identification of major omitted papers that alter the synthesized view, would falsify the claim of a unified landscape.

Figures

Figures reproduced from arXiv: 2509.07794 by Chao Zhang, Ercong Nie, Guodong Zhou, Junjie Zou, Minghan Li, Suchao An, Tongna Chen, Xinxuan Lv.

Figure 1
Figure 1. Figure 1: A Taxonomy of Query Expansion Techniques: From Traditional Methods to PLM/LLM-Driven Techniques and Applications. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Application Scenarios of Query Expansion in Information Retrieval [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Application Scenarios of Query Expansion in Information Retrieval [PITH_FULL_IMAGE:figures/full_fig_p027_3.png] view at source ↗
read the original abstract

Modern information retrieval must reconcile short, ambiguous queries with increasingly diverse and dynamic corpora. Query expansion (QE) remains a core technique for mitigating vocabulary mismatch, but its design space has been reshaped by pre-trained and large language models (PLMs/LLMs). This survey reviews QE methods in the PLM/LLM era and provides a unified view of the emerging landscape. We first summarize how different model families enable new expansion behaviors, including stronger contextualization, more controllable generation, and instruction-following. We then organize recent techniques along four complementary design dimensions: where expansion is injected in the pipeline, how it is grounded and interacts with corpus evidence, how it is learned or aligned, and how structured knowledge such as knowledge graphs is incorporated. Beyond taxonomy, we synthesize application patterns and deployment considerations across representative retrieval settings, highlighting practical trade-offs among effectiveness, controllability, grounding quality, and operating cost. Finally, we outline open challenges and future directions toward more reliable, safe, efficient, and continually adaptive QE under real-world constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript surveys query expansion (QE) techniques in the pre-trained and large language model (PLM/LLM) era. It first summarizes how different model families enable new expansion behaviors such as stronger contextualization, controllable generation, and instruction following. It then organizes recent methods along four complementary design dimensions: where expansion is injected in the pipeline, how it is grounded and interacts with corpus evidence, how it is learned or aligned, and how structured knowledge (e.g., knowledge graphs) is incorporated. The survey synthesizes application patterns and deployment considerations across retrieval settings, highlights trade-offs among effectiveness, controllability, grounding quality, and cost, and outlines open challenges for reliable, safe, efficient, and adaptive QE.

Significance. If the taxonomy and synthesis hold, the paper offers a timely unified view of a rapidly evolving subfield in information retrieval. The four-dimension organization and explicit discussion of practical trade-offs could help researchers and practitioners navigate design choices; the coverage of open challenges provides a useful roadmap. Strengths include the focus on post-2022 literature and the attempt to move beyond isolated technique descriptions toward cross-cutting patterns.

major comments (2)
  1. [§3] §3 (Taxonomy): The claim that the four dimensions are 'complementary' and provide a 'unified view' would be strengthened by an explicit analysis of their orthogonality or overlap. For instance, grounding with corpus evidence and learning/alignment appear intertwined in many LLM-based methods; without a concrete mapping or counter-example table, the taxonomy risks being descriptive rather than prescriptive.
  2. [§4] §4 (Application patterns and trade-offs): The synthesis of effectiveness vs. operating cost is central to the practical contribution, yet the manuscript provides only qualitative discussion. Adding a summary table that aggregates reported metrics (e.g., nDCG deltas and latency) from representative papers would make the trade-off claims more falsifiable and load-bearing for the deployment considerations section.
minor comments (3)
  1. [Throughout] Ensure consistent use of 'PLM' versus 'LLM' terminology; some passages appear to treat them interchangeably while others distinguish scale and capabilities.
  2. [Open challenges] The open-challenges section lists 'continually adaptive QE' but does not reference specific continual-learning techniques from the broader IR or NLP literature; adding 2–3 targeted citations would improve concreteness.
  3. [Figures] Figure captions and axis labels in any taxonomy diagrams should explicitly state the source papers or selection criteria used to populate each cell.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments on our survey. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Taxonomy): The claim that the four dimensions are 'complementary' and provide a 'unified view' would be strengthened by an explicit analysis of their orthogonality or overlap. For instance, grounding with corpus evidence and learning/alignment appear intertwined in many LLM-based methods; without a concrete mapping or counter-example table, the taxonomy risks being descriptive rather than prescriptive.

    Authors: We agree that an explicit discussion of relationships among the dimensions would make the taxonomy more prescriptive. In the revised manuscript we will add a dedicated paragraph plus a mapping table in §3. The table will list representative methods, indicate primary and secondary dimensions for each, and provide counter-examples showing cases of clear orthogonality versus entanglement (e.g., pure corpus-grounded expansion versus jointly learned alignment). revision: yes

  2. Referee: [§4] §4 (Application patterns and trade-offs): The synthesis of effectiveness vs. operating cost is central to the practical contribution, yet the manuscript provides only qualitative discussion. Adding a summary table that aggregates reported metrics (e.g., nDCG deltas and latency) from representative papers would make the trade-off claims more falsifiable and load-bearing for the deployment considerations section.

    Authors: We acknowledge the value of making trade-off claims more concrete. Because of heterogeneous experimental protocols, we cannot produce a single comparable aggregate. We will therefore add a table in §4 that reports the key effectiveness and efficiency figures exactly as published for a curated set of representative papers, accompanied by an explicit caveats paragraph on non-comparability. This revision will be partial but will directly address the request for falsifiability. revision: partial

Circularity Check

0 steps flagged

No significant circularity in survey synthesis and taxonomy

full rationale

This is a survey paper whose central contribution is a review of existing QE literature in the PLM/LLM era plus a taxonomy organized along four complementary design dimensions drawn from the reviewed works. No equations, predictions, or derivations are present that could reduce to the paper's own inputs by construction. The dimensions (injection point, grounding, learning/alignment, structured knowledge) are presented as an organizational lens synthesized from external citations rather than self-defined or fitted from the paper's data. Any author self-citations are incidental and non-load-bearing for the core claims, which remain externally grounded and falsifiable against the broader literature.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is a survey paper with no new mathematical derivations or empirical claims. It relies on the body of prior QE literature for its taxonomy and synthesis without introducing free parameters, new axioms beyond standard IR assumptions, or invented entities.

axioms (1)
  • domain assumption Query expansion mitigates vocabulary mismatch between short queries and diverse corpora
    Stated in the opening of the abstract as the core motivation for QE.

pith-pipeline@v0.9.0 · 5733 in / 1150 out tokens · 34525 ms · 2026-05-18T18:03:19.045228+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

133 extracted references · 133 canonical work pages · 11 internal anchors

  1. [1]

    Ahmed Abdelali, Jim Cowie, and Hamdy S Soliman. 2007. Improving query precision using semantic expansion. Information processing & management 43, 3 (2007), 705–716

  2. [2]

    Kenya Abe, Kunihiro Takeoka, Makoto P Kato, and Masafumi Oyamada. 2025. LLM-based Query Expansion Fails for Unfamiliar and Ambiguous Queries. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3035–3039

  3. [3]

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

  4. [4]

    Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, et al. 2023. Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023)

  5. [5]

    Alan R Aronson and Thomas C Rindflesch. 1997. Query expansion using the UMLS Metathesaurus. In Proceedings of the AMIA annual fall symposium. 485

  6. [6]

    Aronson and Ph.D Thomas C

    Ph.D Alan R. Aronson and Ph.D Thomas C. Rindflesch. 1997. Query Expansion Using the UMLS®Metathesaurus®. https://api.semanticscholar. org/CorpusID:8713491

  7. [7]

    Samy Ateia and Udo Kruschwitz. 2025. BioRAGent: A Retrieval-Augmented Generation System for Showcasing Generative Query Expansion and Domain-Specific Search for Scientific Q&A. In European Conference on Information Retrieval. Springer, 1–5. 32 Li et al

  8. [8]

    Hiteshwar Kumar Azad and Akshay Deepak. 2019. Query expansion techniques for information retrieval: a survey. Information Processing & Management 56, 5 (2019), 1698–1735

  9. [9]

    Ricardo Baeza-Yates, Carlos Hurtado, and Marcelo Mendoza. 2004. Query recommendation using query logs in search engines. In International conference on extending database technology. Springer, 588–596

  10. [10]

    Elias Bassani, Nicola Tonellotto, and Gabriella Pasi. 2023. Personalized query expansion with contextual word embeddings. ACM Transactions on Information Systems 42, 2 (2023), 1–35

  11. [11]

    Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:202558505

  12. [12]

    Michele Bevilacqua, Giuseppe Ottaviano, Patrick Lewis, Scott Yih, Sebastian Riedel, and Fabio Petroni. 2022. Autoregressive search engines: Generating substrings as document identifiers. Advances in Neural Information Processing Systems 35 (2022), 31668–31683

  13. [13]

    Jagdev Bhogal, Andrew MacFarlane, and Peter Smith. 2007. A review of ontology based query expansion. Information processing & management 43, 4 (2007), 866–886

  14. [14]

    Nazia Bibi, Muhammad Usman Tariq, Zabeeh Ullah, Muhammad Babar, and Zahid Khan. 2025. Enhancing Code Search through Query Expansion: A Fusion of LSTM with GloVe and BERT Model (ECSQE). Results in Engineering (2025), 105979

  15. [15]

    Altan Cakir and Mert Gurkan. 2023. Modified query expansion through generative adversarial networks for information extraction in e-commerce. Machine Learning with Applications 14 (2023), 100509

  16. [16]

    Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. Acm Computing Surveys (CSUR) 44, 1 (2012), 1–50

  17. [17]

    Shufan Chen, He Zheng, and Lei Cui. 2025. When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models. In Findings of the Association for Computational Linguistics: NAACL 2025. 3621–3634

  18. [18]

    Xinran Chen, Xuanang Chen, Ben He, Tengfei Wen, and Le Sun. 2024. Analyze, generate and refine: Query expansion with LLMs for zero-shot open-domain QA. In Findings of the Association for Computational Linguistics ACL 2024. 11908–11922

  19. [19]

    Yung-Sung Chuang, Wei Fang, Shang-Wen Li, Wen-tau Yih, and James Glass. 2023. Expand, rerank, and retrieve: Query reranking for open-domain question answering. arXiv preprint arXiv:2305.17080 (2023)

  20. [20]

    Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. 2002. Probabilistic query expansion using query logs. In Proceedings of the 11th international conference on World Wide Web. 325–332

  21. [21]

    Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. 2003. Query expansion by mining user logs. IEEE transactions on knowledge and data engineering 15, 4 (2003), 829–839

  22. [22]

    Giulio D’Erasmo, Giovanni Trappolini, Fabrizio Silvestri, and Nicola Tonellotto. 2025. ECLIPSE: Contrastive Dimension Importance Estimation with Pseudo-Irrelevance Feedback for Dense Retrieval. In Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR). 147–154

  23. [23]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 4171–4186

  24. [24]

    Lijuan Diao, Hong Yan, Fuxue Li, Shoujun Song, Guohua Lei, and Feng Wang. 2018. The research of query expansion based on medical terms reweighting in medical information retrieval. EURASIP Journal on Wireless Communications and Networking 2018, 1 (2018), 105

  25. [25]

    Efthimis N Efthimiadis. 1996. Query Expansion. Annual review of information science and technology (ARIST) 31 (1996), 121–87

  26. [26]

    Jiazhan Feng, Chongyang Tao, Xiubo Geng, Tao Shen, Can Xu, Guodong Long, Dongyan Zhao, and Daxin Jiang. 2023. Synergistic interplay between search and large language models for information retrieval. arXiv preprint arXiv:2305.07402 (2023)

  27. [27]

    Luciano Floridi and Massimo Chiriatti. 2020. GPT-3: Its nature, scope, limits, and consequences. Minds and machines 30, 4 (2020), 681–694

  28. [28]

    On the Move to Meaningful Internet Systems

    Gaihua Fu, Christopher B Jones, and Alia I Abdelmoty. 2005. Ontology-based spatial query expansion in information retrieval. InOTM Confederated International Conferences" On the Move to Meaningful Internet Systems". Springer, 1466–1482

  29. [29]

    Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. 2023. Precise zero-shot dense retrieval without relevance labels. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1762–1777

  30. [30]

    Silviu Homoceanu and Wolf-Tilo Balke. 2014. Querying concepts in product data by means of query expansion. Web intelligence and agent systems 12, 1 (2014), 1–14

  31. [31]

    Junying Hu, Kai Sun, Cong Ma, Hai Zhang, Jiangshe Zhang, et al. [n. d.]. Query Expansion by Retrieval-Augmented Generation Based on Deepseek. Kai and Ma, Cong and Zhang, Hai and Zhang, Jiangshe, Query Expansion by Retrieval-Augmented Generation Based on Deepseek ([n. d.])

  32. [32]

    Qing Huang, Yang Yang, and Ming Cheng. 2019. Deep learning the semantics of change sequences for query expansion. Software: Practice and Experience 49, 11 (2019), 1600–1617

  33. [33]

    Qing Huang, Yangrui Yang, Xue Zhan, Hongyan Wan, and Guoqing Wu. 2018. Query expansion based on statistical learning from code changes. Software: Practice and Experience 48, 7 (2018), 1333–1351

  34. [34]

    Thomas Jaenich, Graham McDonald, and Iadh Ounis. 2025. Fair Exposure Allocation Using Generative Query Expansion. In European Conference on Information Retrieval. Springer, 267–281

  35. [35]

    Rolf Jagerman, Honglei Zhuang, Zhen Qin, Xuanhui Wang, and Michael Bendersky. 2023. Query expansion by prompting large language models. arXiv preprint arXiv:2305.03653 (2023). Query Expansion in the Age of Pre-trained and Large Language Models: A Comprehensive Survey 33

  36. [36]

    Pengyue Jia, Yiding Liu, Xiangyu Zhao, Xiaopeng Li, Changying Hao, Shuaiqiang Wang, and Dawei Yin. 2023. Mill: Mutual verification with large language models for zero-shot query expansion. arXiv preprint arXiv:2310.19056 (2023)

  37. [37]

    SeongKu Kang, Shivam Agarwal, Bowen Jin, Dongha Lee, Hwanjo Yu, and Jiawei Han. 2024. Improving retrieval in theme-specific applications using a corpus topical taxonomy. In Proceedings of the ACM Web Conference 2024. 1497–1508

  38. [38]

    Sam Kerr Kelly. 2021. Enhancing Query Expansion for Rare Diseases in PubMed Using Embedding-Based Semantic Representations. (2021)

  39. [39]

    Ayesha Khader and Faezeh Ensan. 2023. Learning to rank query expansion terms for COVID-19 scholarly search.Journal of Biomedical Informatics 142 (2023), 104386 – 104386. https://api.semanticscholar.org/CorpusID:258659102

  40. [40]

    Ayesha Khader, Hamid Sajjadi, and Faezeh Ensan. 2022. Contextual Query Expansion for Conducting Technology-Assisted Biomedical Reviews.. In Canadian AI

  41. [41]

    Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 39–48

  42. [42]

    Hamin Koo, Minseon Kim, and Sung Ju Hwang. 2024. Optimizing query generation for enhanced document retrieval in rag. arXiv preprint arXiv:2407.12325 (2024)

  43. [43]

    Robert Krovetz and W Bruce Croft. 1992. Lexical ambiguity and information retrieval. ACM Transactions on Information Systems (TOIS) 10, 2 (1992), 115–141

  44. [44]

    Vaibhav Kumar and Jamie Callan. 2020. Making information seeking easier: An improved pipeline for conversational search. In Findings of the Association for Computational Linguistics: EMNLP 2020. 3971–3980

  45. [45]

    Zhu Kunpeng, Wang Xiaolong, and Liu Yuanchao. 2009. A new query expansion method based on query logs mining. International Journal on Asian Language Processing 19, 1 (2009), 1–12

  46. [46]

    Victor Lavrenko and W Bruce Croft. 2017. Relevance-based language models. In ACM SIGIR Forum, Vol. 51. ACM New York, NY, USA, 260–267

  47. [47]

    Jinhyuk Lee, WonJin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2019. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36 (2019), 1234 – 1240. https://api.semanticscholar.org/CorpusID: 59291975

  48. [48]

    Sung-Min Lee, Eunhwan Park, Donghyeon Jeon, Inho Kang, and Seung-Hoon Na. 2024. RADCoT: Retrieval-augmented distillation to specialization models for generating chain-of-thoughts in query expansion. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 13514–13523

  49. [49]

    Jiayin Lei, Weijiang Li, Feng Wang, and Hui Deng. 2011. A survey on query expansion based on local analysis. In2011 4th International Conference on Intelligent Networks and Intelligent Systems. IEEE, 1–4

  50. [50]

    Yibin Lei, Yu Cao, Tianyi Zhou, Tao Shen, and Andrew Yates. 2024. Corpus-Steered Query Expansion with Large Language Models. InProceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers). 393–401

  51. [51]

    Yibin Lei, Tao Shen, and Andrew Yates. 2025. ThinkQE: Query Expansion via an Evolving Thinking Process. arXiv preprint arXiv:2506.09260 (2025)

  52. [52]

    Otávio AL Lemos, Adriano C de Paula, Felipe C Zanichelli, and Cristina V Lopes. 2014. Thesaurus-based automatic query expansion for interface-driven code search. In Proceedings of the 11th working conference on mining software repositories. 212–221

  53. [53]

    Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdel rahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2019. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Annual Meeting of the Association for Computational Linguistics. https://api.semanticscholar.org/Corp...

  54. [54]

    Dong Li, Yelong Shen, Ruoming Jin, Yi Mao, Kuan Wang, and Weizhu Chen. 2022. Generation-augmented query expansion for code retrieval. arXiv preprint arXiv:2212.10692 (2022)

  55. [55]

    Hang Li, Xiao Wang, Bevan Koopman, and Guido Zuccon. 2025. Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models. arXiv preprint arXiv:2503.14887 (2025)

  56. [56]

    Hang Li, Shengyao Zhuang, Bevan Koopman, and Guido Zuccon. 2025. LLM-VPRF: Large Language Model Based Vector Pseudo Relevance Feedback. arXiv preprint arXiv:2504.01448 (2025)

  57. [57]

    Ronghan Li, Mingze Cui, Benben Wang, Yu Wang, and Qiguang Miao. 2025. Query Expansion with Topic-Aware In-Context Learning and Vocabulary Projection for Open-Domain Dense Retrieval. Available at SSRN 5367307 (2025)

  58. [58]

    Jason Liu, Seohyun Kim, Vijayaraghavan Murali, Swarat Chaudhuri, and Satish Chandra. 2019. Neural query expansion for code search. In Proceedings of the 3rd acm sigplan international workshop on machine learning and programming languages. 29–37

  59. [59]

    Linqing Liu, Minghan Li, Jimmy Lin, Sebastian Riedel, and Pontus Stenetorp. 2022. Query expansion using contextual clue sampling with language models. arXiv preprint arXiv:2210.07093 (2022)

  60. [60]

    Lingyuan Liu and Mengxiang Zhang. 2025. Exp4Fuse: A Rank Fusion Framework for Enhanced Sparse Retrieval using Large Language Model-based Query Expansion. arXiv:2506.04760 [cs.IR] https://arxiv.org/abs/2506.04760

  61. [61]

    XiangZheng Liu. 2023. When self-supervision met Query Expansion. Authorea Preprints (2023)

  62. [62]

    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  63. [63]

    Meili Lu, Xiaobing Sun, Shaowei Wang, David Lo, and Yucong Duan. 2015. Query expansion via wordnet for effective code search. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 545–549. 34 Li et al

  64. [64]

    Zhiyong Lu, Won Kim, and W John Wilbur. 2009. Evaluation of query expansion using MeSH in PubMed.Information retrieval 12, 1 (2009), 69–80

  65. [65]

    Iain Mackie, Shubham Chatterjee, and Jeff Dalton. 2023. Generative and Pseudo-Relevant Feedback for Sparse, Dense and Learned Sparse Retrieval. In Workshop on Large Language Models’Interpretation and Trustworthiness, CIKM 2023

  66. [66]

    Iain Mackie, Shubham Chatterjee, and Jeffrey Dalton. 2023. Generative relevance feedback with large language models. In Proceedings of the 46th international ACM SIGIR conference on research and development in information retrieval. 2026–2031

  67. [67]

    Iain Mackie and Jeffrey Dalton. 2022. Query-specific knowledge graphs for complex finance topics. arXiv preprint arXiv:2211.04142 (2022)

  68. [68]

    Aritra Mandal, Ishita K Khan, and Prathyusha Senthil Kumar. 2019. Query Rewriting using Automatic Synonym Extraction for E-commerce Search.. In eCOM@ SIGIR

  69. [69]

    Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, and Weizhu Chen. 2020. Generation-augmented retrieval for open-domain question answering. arXiv preprint arXiv:2009.08553 (2020)

  70. [70]

    Yuetian Mao, Chengcheng Wan, Yuze Jiang, and Xiaodong Gu. 2023. Self-supervised query reformulation for code search. In Proceedings of the 31st acm joint european software engineering conference and symposium on the foundations of software engineering. 363–374

  71. [71]

    George Michalopoulos, Yuanxin Wang, Hussam Kaka, Helen H Chen, and Alexander Wong. 2020. UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus. In North American Chapter of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:224803491

  72. [72]

    Jack Minker, Gerald A Wilson, and Barbara H Zimmerman. 1972. An evaluation of query expansion by the addition of clustered terms for a document retrieval system. Information Storage and Retrieval 8, 6 (1972), 329–348

  73. [73]

    Fengran Mo, Jian-Yun Nie, Kaiyu Huang, Kelong Mao, Yutao Zhu, Peng Li, and Yang Liu. 2023. Learning to relate to previous turns in conversational search. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1722–1732

  74. [74]

    Shahrzad Naseri, Jeffrey Dalton, Andrew Yates, and James Allan. 2021. Ceqe: Contextualized embeddings for query expansion. In European conference on information retrieval. Springer, 467–482

  75. [75]

    Shahrzad Naseri, Jeffrey Dalton, Andrew Yates, and James Allan. 2022. CEQE to SQET: A study of contextualized embeddings for query expansion. Information Retrieval Journal 25, 2 (2022), 184–208

  76. [76]

    Jamal Abdul Nasir, Iraklis Varlamis, and Samreen Ishfaq. 2019. A knowledge-based semantic framework for query expansion. Information processing & management 56, 5 (2019), 1605–1617

  77. [77]

    Roberto Navigli, Paola Velardi, et al . 2003. An analysis of ontology-based query expansion strategies. In Proceedings of the 14th European Conference on Machine Learning, Workshop on Adaptive Text Extraction and Mining, Cavtat-Dubrovnik, Croatia. 42–49

  78. [78]

    Liming Nie, He Jiang, Zhilei Ren, Zeyi Sun, and Xiaochen Li. 2016. Query expansion based on crowd knowledge for code search.IEEE Transactions on Services Computing 9, 5 (2016), 771–783

  79. [79]

    Cheng Niu, Yuanhao Wu, Juno Zhu, Siliang Xu, Kashun Shum, Randy Zhong, Juntong Song, and Tong Zhang. 2023. Ragtruth: A hallucination corpus for developing trustworthy retrieval-augmented language models. arXiv preprint arXiv:2401.00396 (2023)

  80. [80]

    Mengjia Niu, Hao Li, Jie Shi, Hamed Haddadi, and Fan Mo. 2024. Mitigating hallucinations in large language models via self-refinement-enhanced knowledge retrieval. arXiv preprint arXiv:2405.06545 (2024)

Showing first 80 references.