pith. sign in

arxiv: 2605.26902 · v2 · pith:5Z3BNPWVnew · submitted 2026-05-26 · 💻 cs.IR · cs.AI

ICICLE: Expanding Retrieval with In-Context Documents

Pith reviewed 2026-06-29 15:54 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords generative retrievalin-context indexingdocument expansionincremental retrievalsource-aware generationMS MARCONQ320K
0
0 comments X

The pith

ICICLE adds new documents to generative retrieval at inference time by supplying them as context instead of retraining the model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reframes incremental generative retrieval as an in-context problem where new document-docid pairs are provided directly at inference. It introduces routing and calibration steps that let the model decide whether to pull a document identifier from its trained parameters or from the supplied context. If successful, this removes the need to retrain when the corpus grows and avoids the forgetting that normally occurs with parameter updates. The approach is tested on MS MARCO and NQ320K, showing gains on newly added documents while old-document performance stays stable. The analysis identifies routing failures as the main limit when many examples are supplied.

Core claim

ICICLE performs source-aware docid generation over both parametric memory and context-provided document-docid pairs by combining a [COPY]-based routing mechanism, preference-based calibration, and large context adaptation to distinguish context-grounded retrieval from parametric retrieval.

What carries the argument

The [COPY]-based routing mechanism that decides at each generation step whether to copy a document identifier from the provided context or generate from parametric knowledge.

If this is right

  • New documents can be indexed without any parameter updates or corpus-specific retraining.
  • Retrieval accuracy on newly introduced documents increases while accuracy on earlier documents is maintained.
  • High numbers of in-context examples mainly degrade performance through routing errors rather than through context overload.
  • Source-selection calibration becomes the central engineering target for scaling this style of retrieval.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same routing idea could be tested on other generative tasks that mix fixed knowledge with supplied evidence at inference.
  • If routing improves, the method might support continuously growing knowledge bases without periodic full retraining cycles.
  • The distinction between parametric and context-grounded output could be measured directly in other decoder-only models to check transfer.

Load-bearing premise

The combination of routing, calibration, and context adaptation can reliably separate context-provided documents from the model's trained knowledge during inference.

What would settle it

A test set where newly supplied documents are consistently ignored or where performance on previously seen documents drops sharply once context examples are added.

read the original abstract

Generative retrieval (GR) maps queries directly to document identifiers (docids) using parametric knowledge, However, this design makes corpus expansion costly: adding new documents requires updating model parameters to encode new document-docid associations incurs repeated training and catastrophic forgetting of previously indexed documents. In this work, we revisit incremental GR as an in-context retrieval problem, where newly added documents are supplied as inference-time document-docid evidence. We propose ICICLE, an in-context indexing framework that performs source-aware docid generation over both parametric memory and context-provided document-docid pairs. ICICLE combines a `[COPY]`-based routing mechanism, preference-based calibration, and large context adaptation to distinguish context-grounded retrieval from parametric retrieval. Experiments on MS MARCO and NQ320K show that ICICLE improves retrieval of newly introduced documents while preserving seen-document retention without corpus-specific retraining. Our analysis further shows that high-shot degradation is mainly caused by routing failure, highlighting source-selection calibration as a key bottleneck for scaling in-context generative retrieval.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper proposes ICICLE, an in-context indexing framework for generative retrieval (GR) that treats corpus expansion as an inference-time problem. New documents are supplied as context-provided document-docid pairs; ICICLE performs source-aware docid generation by combining a [COPY]-based routing mechanism, preference-based calibration, and large-context adaptation to distinguish context-grounded from parametric retrieval. Experiments on MS MARCO and NQ320K are reported to show improved retrieval of newly introduced documents while preserving performance on previously seen documents, without corpus-specific retraining. An analysis attributes high-shot degradation primarily to routing failure.

Significance. If the empirical results hold, the work directly addresses a central practical limitation of generative retrieval—costly retraining and catastrophic forgetting upon corpus growth—by shifting expansion to inference-time in-context evidence. The explicit attribution of scaling bottlenecks to routing failure supplies a concrete, falsifiable direction for follow-on work. The approach is notable for avoiding any additional training while maintaining a clear separation between parametric and context-grounded sources.

minor comments (2)
  1. The abstract states that experiments 'show that ICICLE improves retrieval' but supplies no numerical deltas, baselines, or error bars; while the full manuscript reportedly contains these results, the abstract should include at least the headline quantitative gains to allow readers to assess the claim at a glance.
  2. Notation for the [COPY] token and the preference-calibration objective is introduced without an explicit equation or pseudocode block in the early sections; a compact formal definition would improve reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of ICICLE and for recommending minor revision. The report correctly identifies the core contribution as shifting corpus expansion to inference-time in-context evidence while preserving parametric performance. No specific major comments were listed in the provided report, so we have no point-by-point responses to offer. We appreciate the recognition that routing failure is a key scaling bottleneck and will continue to explore calibration improvements in follow-up work.

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on experiments

full rationale

The manuscript contains no equations, derivations, or mathematical claims that could reduce to inputs by construction. It proposes an empirical framework (ICICLE) combining routing, calibration, and adaptation, then reports performance on MS MARCO and NQ320K. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations appear. The central claim is directly supported by the supplied experimental results and analysis of routing failure, with no internal reduction to prior fitted values or author-only uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are detailed beyond the high-level proposal of the ICICLE method itself.

axioms (1)
  • domain assumption New documents can be supplied as inference-time context without requiring model parameter updates for docid associations.
    Core premise of reframing incremental GR as in-context retrieval.
invented entities (1)
  • ICICLE framework no independent evidence
    purpose: In-context indexing with source-aware docid generation
    New method proposed in the work; no independent evidence outside the paper.

pith-pipeline@v0.9.1-grok · 5724 in / 1152 out tokens · 25859 ms · 2026-06-29T15:54:03.154250+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 6 canonical work pages · 4 internal anchors

  1. [1]

    Language models are few-shot learners.Ad- vances in Neural Information Processing Sys- tems, 33:1877–1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Pra- fulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Ad- vances in Neural Information Processing Sys- tems, 33:1877–1901, 2020

  2. [2]

    M3-embedding: Multi-linguality, multi- functionality, multi-granularity text embed- dings through self-knowledge distillation

    Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. M3-embedding: Multi-linguality, multi- functionality, multi-granularity text embed- dings through self-knowledge distillation. In Findings of the association for computational linguistics: ACL 2024, pages 2318–2335, 2024

  3. [3]

    Autore- gressive entity retrieval.arXiv preprint arXiv:2010.00904, 2020

    Nicola De Cao, Gautier Izacard, Sebas- tian Riedel, and Fabio Petroni. Autore- gressive entity retrieval.arXiv preprint arXiv:2010.00904, 2020

  4. [4]

    A survey on in-context learning

    Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, et al. A survey on in-context learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

  5. [5]

    Ada-kv: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference.Advances in Neural Information Processing Systems, 38:113152– 113188, 2026

    Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, and S Kevin Zhou. Ada-kv: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference.Advances in Neural Information Processing Systems, 38:113152– 113188, 2026

  6. [6]

    How to train long-context language models (effectively)

    Tianyu Gao, Alexander Wettig, Howard Yen, and Danqi Chen. How to train long-context language models (effectively). InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7376–7399, 2025

  7. [7]

    Corpusbrain++: A continual generative pre-training framework for knowledge-intensive language tasks.ACM Transactions on Information Systems, 44(1): 1–35, 2025

    Jiafeng Guo, Changjiang Zhou, Ruqing Zhang, Jiangui Chen, Maarten de Rijke, Yix- ingFan,andXueqiCheng. Corpusbrain++: A continual generative pre-training framework for knowledge-intensive language tasks.ACM Transactions on Information Systems, 44(1): 1–35, 2025

  8. [8]

    Ruler: What’s the real context size of your long-context lan- guage models? InFirst Conference on Lan- guage Modeling, 2024

    Cheng-Ping Hsieh, Simeng Sun, Samuel Kri- man, Shantanu Acharya, Dima Rekesh, Fei 10 Jia, and Boris Ginsburg. Ruler: What’s the real context size of your long-context lan- guage models? InFirst Conference on Lan- guage Modeling, 2024

  9. [9]

    Mixlora-dsi: Dynam- ically expandable mixture-of-lora experts for rehearsal-free generative retrieval over dy- namic corpora

    Tuan-Luc Huynh, Thuy Vu, Weiqing Wang, Trung Le, Dragan Gasevic, Yuan-Fang Li, and Thanh-Toan Do. Mixlora-dsi: Dynam- ically expandable mixture-of-lora experts for rehearsal-free generative retrieval over dy- namic corpora. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 380–396, 2025

  10. [10]

    Dense passage retrieval for open-domain question answer- ing

    VladimirKarpukhin,BarlasOguz,SewonMin, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answer- ing. InProceedings of the 2020 conference on empirical methods in natural language process- ing (EMNLP), pages 6769–6781, 2020

  11. [11]

    Incdsi: Incrementally updatable document retrieval

    Varsha Kishore, Chao Wan, Justin Lovelace, Yoav Artzi, and Kilian Q Weinberger. Incdsi: Incrementally updatable document retrieval. InInternational Conference on Machine Learn- ing, pages 17122–17134. PMLR, 2023

  12. [12]

    Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

    Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Il- lia Polosukhin, Matthew Kelcey, Jacob De- vlin, Kenton Lee, Kristina N. Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: a benchmark for question answer- ing research.Tr...

  13. [13]

    Plaid shirttt for large-scale streaming dense re- trieval

    Dawn Lawrie, Efsun Kayi, Eugene Yang, James Mayfield, and Douglas W Oard. Plaid shirttt for large-scale streaming dense re- trieval. InProceedings of the 47th Interna- tional ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2574–2578, 2024

  14. [14]

    Nonparametric decoding for generative retrieval

    Hyunji Lee, Jaeyoung Kim, Hoyeon Chang, Hanseok Oh, Sohee Yang, Vladimir Karpukhin, Yi Lu, and Minjoon Seo. Nonparametric decoding for generative retrieval. InFindings of the Association for Computational Linguistics: ACL 2023, pages 12642–12661, 2023

  15. [15]

    Glen: Generative retrieval via lexical in- dex learning

    Sunkyung Lee, Minjin Choi, and Jongwuk Lee. Glen: Generative retrieval via lexical in- dex learning. InProceedings of the 2023 Con- ference on Empirical Methods in Natural Lan- guage Processing, pages 7693–7704, 2023

  16. [16]

    Dsi++: Updating transformer mem- orywithnewdocuments

    Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q Tran, Jinfeng Rao, Marc Najork, Emma Strubell, and Donald Metzler. Dsi++: Updating transformer mem- orywithnewdocuments. InProceedingsofthe 2023 conference on Empirical Methods in Nat- ural Language Processing, pages 8198–8213, 2023

  17. [17]

    A Parametric Memory Head for Continual Generative Retrieval

    Kidist Amde Mekonnen, Yubao Tang, and Maarten de Rijke. A parametric memory head for continual generative retrieval.arXiv preprint arXiv:2604.23388, 2026

  18. [18]

    Rethinking the role of demonstrations: What makes in-context learning work? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

    Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

  19. [19]

    MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

    Tri Nguyen, Mir Rosenberg, Xia Song, Jian- feng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. MS MARCO: A human gener- atedmachinereadingcomprehensiondataset. CoRR, abs/1611.09268, 2016

  20. [20]

    Direct preference optimiza- tion: Your language model is secretly a re- ward model.Advances in neural information processing systems, 36:53728–53741, 2023

    Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimiza- tion: Your language model is secretly a re- ward model.Advances in neural information processing systems, 36:53728–53741, 2023

  21. [21]

    Recommender systems with generative retrieval.Advances in Neu- ral Information Processing Systems, 36:10299– 10315, 2023

    Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, 11 Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al. Recommender systems with generative retrieval.Advances in Neu- ral Information Processing Systems, 36:10299– 10315, 2023

  22. [22]

    In-context retrieval-augmented language models.Trans- actions of the Association for Computational Linguistics, 11:1316–1331, 2023

    Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton- Brown, and Yoav Shoham. In-context retrieval-augmented language models.Trans- actions of the Association for Computational Linguistics, 11:1316–1331, 2023

  23. [23]

    Now Publishers Inc, 2009

    Stephen Robertson and Hugo Zaragoza.The probabilistic relevance framework: BM25 and beyond, volume 4. Now Publishers Inc, 2009

  24. [24]

    Trusting your evidence: Hallucinate less with context-aware decoding

    Weijia Shi, Xiaochuang Han, Mike Lewis, Yu- lia Tsvetkov, Luke Zettlemoyer, and Wen-tau Yih. Trusting your evidence: Hallucinate less with context-aware decoding. InProceedings of the 2024 Conference of the North Ameri- can Chapter of the Association for Computa- tional Linguistics: Human Language Technolo- gies (Volume 2: Short Papers), pages 783–791, 2024

  25. [25]

    Cream: Continual retrieval on dy- namic streaming corpora with adaptive soft memory

    HuiJeong Son, Hyeongu Kang, Sunho Kim, Subeen Ho, SeongKu Kang, Dongha Lee, and Susik Yoon. Cream: Continual retrieval on dy- namic streaming corpora with adaptive soft memory. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1, pages 1297–1308, 2026

  26. [26]

    Learning to tokenize for gen- erative retrieval.Advances in Neural Infor- mation Processing Systems, 36:46345–46361, 2023

    Weiwei Sun, Lingyong Yan, Zheng Chen, Shuaiqiang Wang, Haichao Zhu, Pengjie Ren, Zhumin Chen, Dawei Yin, Maarten Rijke, and Zhaochun Ren. Learning to tokenize for gen- erative retrieval.Advances in Neural Infor- mation Processing Systems, 36:46345–46361, 2023

  27. [27]

    Transformer memory as a differentiable search index.Ad- vances in Neural Information Processing Sys- tems, 35:21831–21843, 2022

    Yi Tay, Vinh Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, et al. Transformer memory as a differentiable search index.Ad- vances in Neural Information Processing Sys- tems, 35:21831–21843, 2022

  28. [28]

    A neural corpus indexer for doc- ument retrieval.Advances in Neural Infor- mation Processing Systems, 35:25600–25614, 2022

    Yujing Wang, Yingyan Hou, Haonan Wang, Ziming Miao, Shibin Wu, Qi Chen, Yuqing Xia, Chengmin Chi, Guoshuai Zhao, Zheng Liu, et al. A neural corpus indexer for doc- ument retrieval.Advances in Neural Infor- mation Processing Systems, 35:25600–25614, 2022

  29. [29]

    In- fllm: Training-freelong-contextextrapolation for llms with an efficient context memory.Ad- vances in neural information processing sys- tems, 37:119638–119661, 2024

    Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, and Maosong Sun. In- fllm: Training-freelong-contextextrapolation for llms with an efficient context memory.Ad- vances in neural information processing sys- tems, 37:119638–119661, 2024

  30. [30]

    C-Pack: Packed Resources For General Chinese Embeddings

    Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. C-pack: Packaged re- sources to advance general chinese embed- ding.arXiv preprint arXiv:2309.07597, 2023

  31. [31]

    Knowledge conflicts for llms: A sur- vey

    Rongwu Xu, Zehan Qi, Zhijiang Guo, Cunx- iang Wang, Hongru Wang, Yue Zhang, and Wei Xu. Knowledge conflicts for llms: A sur- vey. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Pro- cessing, pages 8541–8565, 2024

  32. [32]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

  33. [33]

    Longqlora: Efficient and effective method to extend context length of large language models.arXiv preprint arXiv:2311.04879, 2023

    Jianxin Yang. Longqlora: Efficient and effective method to extend context length of large language models.arXiv preprint arXiv:2311.04879, 2023

  34. [34]

    Replication and exploration of generative re- trieval over dynamic corpora

    Zhen Zhang, Xinyu Ma, Weiwei Sun, Pengjie Ren, Zhumin Chen, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke, and Zhaochun Ren. Replication and exploration of generative re- trieval over dynamic corpora. InProceedings 12 of the 48th International ACM SIGIR Confer- ence on Research and Development in Informa- tion Retrieval, pages 3325–3334, 2025

  35. [35]

    Model editing for new document integration in generative information retrieval

    Zhen Zhang, Zihan Wang, Xinyu Ma, ShuaiqiangWang, DaweiYin, XinXin, Pengjie Ren, Maarten de Rijke, and Zhaochun Ren. Model editing for new document integration in generative information retrieval. InPro- ceedings of the ACM Web Conference 2026, pages 1993–2003, 2026. 13 A. In-Context Template Learning A.1. In-Context Template We use a unified in-context t...