ICICLE: Expanding Retrieval with In-Context Documents
Pith reviewed 2026-06-29 15:54 UTC · model grok-4.3
The pith
ICICLE adds new documents to generative retrieval at inference time by supplying them as context instead of retraining the model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ICICLE performs source-aware docid generation over both parametric memory and context-provided document-docid pairs by combining a [COPY]-based routing mechanism, preference-based calibration, and large context adaptation to distinguish context-grounded retrieval from parametric retrieval.
What carries the argument
The [COPY]-based routing mechanism that decides at each generation step whether to copy a document identifier from the provided context or generate from parametric knowledge.
If this is right
- New documents can be indexed without any parameter updates or corpus-specific retraining.
- Retrieval accuracy on newly introduced documents increases while accuracy on earlier documents is maintained.
- High numbers of in-context examples mainly degrade performance through routing errors rather than through context overload.
- Source-selection calibration becomes the central engineering target for scaling this style of retrieval.
Where Pith is reading between the lines
- The same routing idea could be tested on other generative tasks that mix fixed knowledge with supplied evidence at inference.
- If routing improves, the method might support continuously growing knowledge bases without periodic full retraining cycles.
- The distinction between parametric and context-grounded output could be measured directly in other decoder-only models to check transfer.
Load-bearing premise
The combination of routing, calibration, and context adaptation can reliably separate context-provided documents from the model's trained knowledge during inference.
What would settle it
A test set where newly supplied documents are consistently ignored or where performance on previously seen documents drops sharply once context examples are added.
read the original abstract
Generative retrieval (GR) maps queries directly to document identifiers (docids) using parametric knowledge, However, this design makes corpus expansion costly: adding new documents requires updating model parameters to encode new document-docid associations incurs repeated training and catastrophic forgetting of previously indexed documents. In this work, we revisit incremental GR as an in-context retrieval problem, where newly added documents are supplied as inference-time document-docid evidence. We propose ICICLE, an in-context indexing framework that performs source-aware docid generation over both parametric memory and context-provided document-docid pairs. ICICLE combines a `[COPY]`-based routing mechanism, preference-based calibration, and large context adaptation to distinguish context-grounded retrieval from parametric retrieval. Experiments on MS MARCO and NQ320K show that ICICLE improves retrieval of newly introduced documents while preserving seen-document retention without corpus-specific retraining. Our analysis further shows that high-shot degradation is mainly caused by routing failure, highlighting source-selection calibration as a key bottleneck for scaling in-context generative retrieval.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ICICLE, an in-context indexing framework for generative retrieval (GR) that treats corpus expansion as an inference-time problem. New documents are supplied as context-provided document-docid pairs; ICICLE performs source-aware docid generation by combining a [COPY]-based routing mechanism, preference-based calibration, and large-context adaptation to distinguish context-grounded from parametric retrieval. Experiments on MS MARCO and NQ320K are reported to show improved retrieval of newly introduced documents while preserving performance on previously seen documents, without corpus-specific retraining. An analysis attributes high-shot degradation primarily to routing failure.
Significance. If the empirical results hold, the work directly addresses a central practical limitation of generative retrieval—costly retraining and catastrophic forgetting upon corpus growth—by shifting expansion to inference-time in-context evidence. The explicit attribution of scaling bottlenecks to routing failure supplies a concrete, falsifiable direction for follow-on work. The approach is notable for avoiding any additional training while maintaining a clear separation between parametric and context-grounded sources.
minor comments (2)
- The abstract states that experiments 'show that ICICLE improves retrieval' but supplies no numerical deltas, baselines, or error bars; while the full manuscript reportedly contains these results, the abstract should include at least the headline quantitative gains to allow readers to assess the claim at a glance.
- Notation for the [COPY] token and the preference-calibration objective is introduced without an explicit equation or pseudocode block in the early sections; a compact formal definition would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of ICICLE and for recommending minor revision. The report correctly identifies the core contribution as shifting corpus expansion to inference-time in-context evidence while preserving parametric performance. No specific major comments were listed in the provided report, so we have no point-by-point responses to offer. We appreciate the recognition that routing failure is a key scaling bottleneck and will continue to explore calibration improvements in follow-up work.
Circularity Check
No significant circularity; empirical claims rest on experiments
full rationale
The manuscript contains no equations, derivations, or mathematical claims that could reduce to inputs by construction. It proposes an empirical framework (ICICLE) combining routing, calibration, and adaptation, then reports performance on MS MARCO and NQ320K. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations appear. The central claim is directly supported by the supplied experimental results and analysis of routing failure, with no internal reduction to prior fitted values or author-only uniqueness theorems.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption New documents can be supplied as inference-time context without requiring model parameter updates for docid associations.
invented entities (1)
-
ICICLE framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Language models are few-shot learners.Ad- vances in Neural Information Processing Sys- tems, 33:1877–1901, 2020
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Pra- fulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Ad- vances in Neural Information Processing Sys- tems, 33:1877–1901, 2020
1901
-
[2]
M3-embedding: Multi-linguality, multi- functionality, multi-granularity text embed- dings through self-knowledge distillation
Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. M3-embedding: Multi-linguality, multi- functionality, multi-granularity text embed- dings through self-knowledge distillation. In Findings of the association for computational linguistics: ACL 2024, pages 2318–2335, 2024
2024
-
[3]
Autore- gressive entity retrieval.arXiv preprint arXiv:2010.00904, 2020
Nicola De Cao, Gautier Izacard, Sebas- tian Riedel, and Fabio Petroni. Autore- gressive entity retrieval.arXiv preprint arXiv:2010.00904, 2020
-
[4]
A survey on in-context learning
Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, et al. A survey on in-context learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
2024
-
[5]
Ada-kv: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference.Advances in Neural Information Processing Systems, 38:113152– 113188, 2026
Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, and S Kevin Zhou. Ada-kv: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference.Advances in Neural Information Processing Systems, 38:113152– 113188, 2026
2026
-
[6]
How to train long-context language models (effectively)
Tianyu Gao, Alexander Wettig, Howard Yen, and Danqi Chen. How to train long-context language models (effectively). InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7376–7399, 2025
2025
-
[7]
Corpusbrain++: A continual generative pre-training framework for knowledge-intensive language tasks.ACM Transactions on Information Systems, 44(1): 1–35, 2025
Jiafeng Guo, Changjiang Zhou, Ruqing Zhang, Jiangui Chen, Maarten de Rijke, Yix- ingFan,andXueqiCheng. Corpusbrain++: A continual generative pre-training framework for knowledge-intensive language tasks.ACM Transactions on Information Systems, 44(1): 1–35, 2025
2025
-
[8]
Ruler: What’s the real context size of your long-context lan- guage models? InFirst Conference on Lan- guage Modeling, 2024
Cheng-Ping Hsieh, Simeng Sun, Samuel Kri- man, Shantanu Acharya, Dima Rekesh, Fei 10 Jia, and Boris Ginsburg. Ruler: What’s the real context size of your long-context lan- guage models? InFirst Conference on Lan- guage Modeling, 2024
2024
-
[9]
Mixlora-dsi: Dynam- ically expandable mixture-of-lora experts for rehearsal-free generative retrieval over dy- namic corpora
Tuan-Luc Huynh, Thuy Vu, Weiqing Wang, Trung Le, Dragan Gasevic, Yuan-Fang Li, and Thanh-Toan Do. Mixlora-dsi: Dynam- ically expandable mixture-of-lora experts for rehearsal-free generative retrieval over dy- namic corpora. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 380–396, 2025
2025
-
[10]
Dense passage retrieval for open-domain question answer- ing
VladimirKarpukhin,BarlasOguz,SewonMin, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answer- ing. InProceedings of the 2020 conference on empirical methods in natural language process- ing (EMNLP), pages 6769–6781, 2020
2020
-
[11]
Incdsi: Incrementally updatable document retrieval
Varsha Kishore, Chao Wan, Justin Lovelace, Yoav Artzi, and Kilian Q Weinberger. Incdsi: Incrementally updatable document retrieval. InInternational Conference on Machine Learn- ing, pages 17122–17134. PMLR, 2023
2023
-
[12]
Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Il- lia Polosukhin, Matthew Kelcey, Jacob De- vlin, Kenton Lee, Kristina N. Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: a benchmark for question answer- ing research.Tr...
2019
-
[13]
Plaid shirttt for large-scale streaming dense re- trieval
Dawn Lawrie, Efsun Kayi, Eugene Yang, James Mayfield, and Douglas W Oard. Plaid shirttt for large-scale streaming dense re- trieval. InProceedings of the 47th Interna- tional ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2574–2578, 2024
2024
-
[14]
Nonparametric decoding for generative retrieval
Hyunji Lee, Jaeyoung Kim, Hoyeon Chang, Hanseok Oh, Sohee Yang, Vladimir Karpukhin, Yi Lu, and Minjoon Seo. Nonparametric decoding for generative retrieval. InFindings of the Association for Computational Linguistics: ACL 2023, pages 12642–12661, 2023
2023
-
[15]
Glen: Generative retrieval via lexical in- dex learning
Sunkyung Lee, Minjin Choi, and Jongwuk Lee. Glen: Generative retrieval via lexical in- dex learning. InProceedings of the 2023 Con- ference on Empirical Methods in Natural Lan- guage Processing, pages 7693–7704, 2023
2023
-
[16]
Dsi++: Updating transformer mem- orywithnewdocuments
Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q Tran, Jinfeng Rao, Marc Najork, Emma Strubell, and Donald Metzler. Dsi++: Updating transformer mem- orywithnewdocuments. InProceedingsofthe 2023 conference on Empirical Methods in Nat- ural Language Processing, pages 8198–8213, 2023
2023
-
[17]
A Parametric Memory Head for Continual Generative Retrieval
Kidist Amde Mekonnen, Yubao Tang, and Maarten de Rijke. A parametric memory head for continual generative retrieval.arXiv preprint arXiv:2604.23388, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[18]
Rethinking the role of demonstrations: What makes in-context learning work? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
2022
-
[19]
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Tri Nguyen, Mir Rosenberg, Xia Song, Jian- feng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. MS MARCO: A human gener- atedmachinereadingcomprehensiondataset. CoRR, abs/1611.09268, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[20]
Direct preference optimiza- tion: Your language model is secretly a re- ward model.Advances in neural information processing systems, 36:53728–53741, 2023
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimiza- tion: Your language model is secretly a re- ward model.Advances in neural information processing systems, 36:53728–53741, 2023
2023
-
[21]
Recommender systems with generative retrieval.Advances in Neu- ral Information Processing Systems, 36:10299– 10315, 2023
Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, 11 Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al. Recommender systems with generative retrieval.Advances in Neu- ral Information Processing Systems, 36:10299– 10315, 2023
2023
-
[22]
In-context retrieval-augmented language models.Trans- actions of the Association for Computational Linguistics, 11:1316–1331, 2023
Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton- Brown, and Yoav Shoham. In-context retrieval-augmented language models.Trans- actions of the Association for Computational Linguistics, 11:1316–1331, 2023
2023
-
[23]
Now Publishers Inc, 2009
Stephen Robertson and Hugo Zaragoza.The probabilistic relevance framework: BM25 and beyond, volume 4. Now Publishers Inc, 2009
2009
-
[24]
Trusting your evidence: Hallucinate less with context-aware decoding
Weijia Shi, Xiaochuang Han, Mike Lewis, Yu- lia Tsvetkov, Luke Zettlemoyer, and Wen-tau Yih. Trusting your evidence: Hallucinate less with context-aware decoding. InProceedings of the 2024 Conference of the North Ameri- can Chapter of the Association for Computa- tional Linguistics: Human Language Technolo- gies (Volume 2: Short Papers), pages 783–791, 2024
2024
-
[25]
Cream: Continual retrieval on dy- namic streaming corpora with adaptive soft memory
HuiJeong Son, Hyeongu Kang, Sunho Kim, Subeen Ho, SeongKu Kang, Dongha Lee, and Susik Yoon. Cream: Continual retrieval on dy- namic streaming corpora with adaptive soft memory. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1, pages 1297–1308, 2026
2026
-
[26]
Learning to tokenize for gen- erative retrieval.Advances in Neural Infor- mation Processing Systems, 36:46345–46361, 2023
Weiwei Sun, Lingyong Yan, Zheng Chen, Shuaiqiang Wang, Haichao Zhu, Pengjie Ren, Zhumin Chen, Dawei Yin, Maarten Rijke, and Zhaochun Ren. Learning to tokenize for gen- erative retrieval.Advances in Neural Infor- mation Processing Systems, 36:46345–46361, 2023
2023
-
[27]
Transformer memory as a differentiable search index.Ad- vances in Neural Information Processing Sys- tems, 35:21831–21843, 2022
Yi Tay, Vinh Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, et al. Transformer memory as a differentiable search index.Ad- vances in Neural Information Processing Sys- tems, 35:21831–21843, 2022
2022
-
[28]
A neural corpus indexer for doc- ument retrieval.Advances in Neural Infor- mation Processing Systems, 35:25600–25614, 2022
Yujing Wang, Yingyan Hou, Haonan Wang, Ziming Miao, Shibin Wu, Qi Chen, Yuqing Xia, Chengmin Chi, Guoshuai Zhao, Zheng Liu, et al. A neural corpus indexer for doc- ument retrieval.Advances in Neural Infor- mation Processing Systems, 35:25600–25614, 2022
2022
-
[29]
In- fllm: Training-freelong-contextextrapolation for llms with an efficient context memory.Ad- vances in neural information processing sys- tems, 37:119638–119661, 2024
Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, and Maosong Sun. In- fllm: Training-freelong-contextextrapolation for llms with an efficient context memory.Ad- vances in neural information processing sys- tems, 37:119638–119661, 2024
2024
-
[30]
C-Pack: Packed Resources For General Chinese Embeddings
Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. C-pack: Packaged re- sources to advance general chinese embed- ding.arXiv preprint arXiv:2309.07597, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[31]
Knowledge conflicts for llms: A sur- vey
Rongwu Xu, Zehan Qi, Zhijiang Guo, Cunx- iang Wang, Hongru Wang, Yue Zhang, and Wei Xu. Knowledge conflicts for llms: A sur- vey. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Pro- cessing, pages 8541–8565, 2024
2024
-
[32]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[33]
Jianxin Yang. Longqlora: Efficient and effective method to extend context length of large language models.arXiv preprint arXiv:2311.04879, 2023
-
[34]
Replication and exploration of generative re- trieval over dynamic corpora
Zhen Zhang, Xinyu Ma, Weiwei Sun, Pengjie Ren, Zhumin Chen, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke, and Zhaochun Ren. Replication and exploration of generative re- trieval over dynamic corpora. InProceedings 12 of the 48th International ACM SIGIR Confer- ence on Research and Development in Informa- tion Retrieval, pages 3325–3334, 2025
2025
-
[35]
Model editing for new document integration in generative information retrieval
Zhen Zhang, Zihan Wang, Xinyu Ma, ShuaiqiangWang, DaweiYin, XinXin, Pengjie Ren, Maarten de Rijke, and Zhaochun Ren. Model editing for new document integration in generative information retrieval. InPro- ceedings of the ACM Web Conference 2026, pages 1993–2003, 2026. 13 A. In-Context Template Learning A.1. In-Context Template We use a unified in-context t...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.