ICICLE: Expanding Retrieval with In-Context Documents

Eugene Yang; Kuan-Yu Chen; Pu-Jen Cheng; Yu-Chen Den; Yung-Yu Shih; Yun-Nung Chen; Zhi Rui Tam

arxiv: 2605.26902 · v2 · pith:5Z3BNPWVnew · submitted 2026-05-26 · 💻 cs.IR · cs.AI

ICICLE: Expanding Retrieval with In-Context Documents

Yu-Chen Den , Yung-Yu Shih , Zhi Rui Tam , Kuan-Yu Chen , Pu-Jen Cheng , Yun-Nung Chen , Eugene Yang This is my paper

Pith reviewed 2026-06-29 15:54 UTC · model grok-4.3

classification 💻 cs.IR cs.AI

keywords generative retrievalin-context indexingdocument expansionincremental retrievalsource-aware generationMS MARCONQ320K

0 comments

The pith

ICICLE adds new documents to generative retrieval at inference time by supplying them as context instead of retraining the model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reframes incremental generative retrieval as an in-context problem where new document-docid pairs are provided directly at inference. It introduces routing and calibration steps that let the model decide whether to pull a document identifier from its trained parameters or from the supplied context. If successful, this removes the need to retrain when the corpus grows and avoids the forgetting that normally occurs with parameter updates. The approach is tested on MS MARCO and NQ320K, showing gains on newly added documents while old-document performance stays stable. The analysis identifies routing failures as the main limit when many examples are supplied.

Core claim

ICICLE performs source-aware docid generation over both parametric memory and context-provided document-docid pairs by combining a [COPY]-based routing mechanism, preference-based calibration, and large context adaptation to distinguish context-grounded retrieval from parametric retrieval.

What carries the argument

The [COPY]-based routing mechanism that decides at each generation step whether to copy a document identifier from the provided context or generate from parametric knowledge.

If this is right

New documents can be indexed without any parameter updates or corpus-specific retraining.
Retrieval accuracy on newly introduced documents increases while accuracy on earlier documents is maintained.
High numbers of in-context examples mainly degrade performance through routing errors rather than through context overload.
Source-selection calibration becomes the central engineering target for scaling this style of retrieval.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same routing idea could be tested on other generative tasks that mix fixed knowledge with supplied evidence at inference.
If routing improves, the method might support continuously growing knowledge bases without periodic full retraining cycles.
The distinction between parametric and context-grounded output could be measured directly in other decoder-only models to check transfer.

Load-bearing premise

The combination of routing, calibration, and context adaptation can reliably separate context-provided documents from the model's trained knowledge during inference.

What would settle it

A test set where newly supplied documents are consistently ignored or where performance on previously seen documents drops sharply once context examples are added.

read the original abstract

Generative retrieval (GR) maps queries directly to document identifiers (docids) using parametric knowledge, However, this design makes corpus expansion costly: adding new documents requires updating model parameters to encode new document-docid associations incurs repeated training and catastrophic forgetting of previously indexed documents. In this work, we revisit incremental GR as an in-context retrieval problem, where newly added documents are supplied as inference-time document-docid evidence. We propose ICICLE, an in-context indexing framework that performs source-aware docid generation over both parametric memory and context-provided document-docid pairs. ICICLE combines a `[COPY]`-based routing mechanism, preference-based calibration, and large context adaptation to distinguish context-grounded retrieval from parametric retrieval. Experiments on MS MARCO and NQ320K show that ICICLE improves retrieval of newly introduced documents while preserving seen-document retention without corpus-specific retraining. Our analysis further shows that high-shot degradation is mainly caused by routing failure, highlighting source-selection calibration as a key bottleneck for scaling in-context generative retrieval.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ICICLE reframes incremental generative retrieval as an in-context task using [COPY] routing and calibration, which is a practical step but the reported gains look incremental rather than decisive.

read the letter

ICICLE treats corpus expansion in generative retrieval as an inference-time problem instead of a retraining one. New documents come in as context, and the model uses a [COPY] token plus preference calibration to decide whether to generate from its parameters or from the supplied evidence.

The routing mechanism and the explicit analysis of high-shot degradation are the clearest contributions. The paper shows that routing failures, not context length itself, drive the drop-off, which gives a useful diagnostic for anyone working on source-aware generation. Experiments on MS MARCO and NQ320K are said to improve new-document recall while keeping seen-document performance stable, without any corpus-specific fine-tuning.

The main limitation is that the quantitative improvements are not detailed enough in the available text to judge their size or consistency. It is also unclear how sensitive the calibration is to document similarity or to context lengths beyond the tested range. These are engineering questions rather than fatal ones, but they matter for adoption.

The work is aimed at people already building generative retrieval systems who need to handle growing collections. It is narrow but directly relevant to that group.

I would send it to referees. The idea is coherent, the problem is real, and the routing analysis adds something concrete even if the headline numbers need verification.

Referee Report

0 major / 2 minor

Summary. The paper proposes ICICLE, an in-context indexing framework for generative retrieval (GR) that treats corpus expansion as an inference-time problem. New documents are supplied as context-provided document-docid pairs; ICICLE performs source-aware docid generation by combining a [COPY]-based routing mechanism, preference-based calibration, and large-context adaptation to distinguish context-grounded from parametric retrieval. Experiments on MS MARCO and NQ320K are reported to show improved retrieval of newly introduced documents while preserving performance on previously seen documents, without corpus-specific retraining. An analysis attributes high-shot degradation primarily to routing failure.

Significance. If the empirical results hold, the work directly addresses a central practical limitation of generative retrieval—costly retraining and catastrophic forgetting upon corpus growth—by shifting expansion to inference-time in-context evidence. The explicit attribution of scaling bottlenecks to routing failure supplies a concrete, falsifiable direction for follow-on work. The approach is notable for avoiding any additional training while maintaining a clear separation between parametric and context-grounded sources.

minor comments (2)

The abstract states that experiments 'show that ICICLE improves retrieval' but supplies no numerical deltas, baselines, or error bars; while the full manuscript reportedly contains these results, the abstract should include at least the headline quantitative gains to allow readers to assess the claim at a glance.
Notation for the [COPY] token and the preference-calibration objective is introduced without an explicit equation or pseudocode block in the early sections; a compact formal definition would improve reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of ICICLE and for recommending minor revision. The report correctly identifies the core contribution as shifting corpus expansion to inference-time in-context evidence while preserving parametric performance. No specific major comments were listed in the provided report, so we have no point-by-point responses to offer. We appreciate the recognition that routing failure is a key scaling bottleneck and will continue to explore calibration improvements in follow-up work.

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on experiments

full rationale

The manuscript contains no equations, derivations, or mathematical claims that could reduce to inputs by construction. It proposes an empirical framework (ICICLE) combining routing, calibration, and adaptation, then reports performance on MS MARCO and NQ320K. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations appear. The central claim is directly supported by the supplied experimental results and analysis of routing failure, with no internal reduction to prior fitted values or author-only uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are detailed beyond the high-level proposal of the ICICLE method itself.

axioms (1)

domain assumption New documents can be supplied as inference-time context without requiring model parameter updates for docid associations.
Core premise of reframing incremental GR as in-context retrieval.

invented entities (1)

ICICLE framework no independent evidence
purpose: In-context indexing with source-aware docid generation
New method proposed in the work; no independent evidence outside the paper.

pith-pipeline@v0.9.1-grok · 5724 in / 1152 out tokens · 25859 ms · 2026-06-29T15:54:03.154250+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 6 canonical work pages · 4 internal anchors

[1]

Language models are few-shot learners.Ad- vances in Neural Information Processing Sys- tems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Pra- fulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Ad- vances in Neural Information Processing Sys- tems, 33:1877–1901, 2020

1901
[2]

M3-embedding: Multi-linguality, multi- functionality, multi-granularity text embed- dings through self-knowledge distillation

Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. M3-embedding: Multi-linguality, multi- functionality, multi-granularity text embed- dings through self-knowledge distillation. In Findings of the association for computational linguistics: ACL 2024, pages 2318–2335, 2024

2024
[3]

Autore- gressive entity retrieval.arXiv preprint arXiv:2010.00904, 2020

Nicola De Cao, Gautier Izacard, Sebas- tian Riedel, and Fabio Petroni. Autore- gressive entity retrieval.arXiv preprint arXiv:2010.00904, 2020

work page arXiv 2010
[4]

A survey on in-context learning

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, et al. A survey on in-context learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2024
[5]

Ada-kv: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference.Advances in Neural Information Processing Systems, 38:113152– 113188, 2026

Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, and S Kevin Zhou. Ada-kv: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference.Advances in Neural Information Processing Systems, 38:113152– 113188, 2026

2026
[6]

How to train long-context language models (effectively)

Tianyu Gao, Alexander Wettig, Howard Yen, and Danqi Chen. How to train long-context language models (effectively). InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7376–7399, 2025

2025
[7]

Corpusbrain++: A continual generative pre-training framework for knowledge-intensive language tasks.ACM Transactions on Information Systems, 44(1): 1–35, 2025

Jiafeng Guo, Changjiang Zhou, Ruqing Zhang, Jiangui Chen, Maarten de Rijke, Yix- ingFan,andXueqiCheng. Corpusbrain++: A continual generative pre-training framework for knowledge-intensive language tasks.ACM Transactions on Information Systems, 44(1): 1–35, 2025

2025
[8]

Ruler: What’s the real context size of your long-context lan- guage models? InFirst Conference on Lan- guage Modeling, 2024

Cheng-Ping Hsieh, Simeng Sun, Samuel Kri- man, Shantanu Acharya, Dima Rekesh, Fei 10 Jia, and Boris Ginsburg. Ruler: What’s the real context size of your long-context lan- guage models? InFirst Conference on Lan- guage Modeling, 2024

2024
[9]

Mixlora-dsi: Dynam- ically expandable mixture-of-lora experts for rehearsal-free generative retrieval over dy- namic corpora

Tuan-Luc Huynh, Thuy Vu, Weiqing Wang, Trung Le, Dragan Gasevic, Yuan-Fang Li, and Thanh-Toan Do. Mixlora-dsi: Dynam- ically expandable mixture-of-lora experts for rehearsal-free generative retrieval over dy- namic corpora. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 380–396, 2025

2025
[10]

Dense passage retrieval for open-domain question answer- ing

VladimirKarpukhin,BarlasOguz,SewonMin, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answer- ing. InProceedings of the 2020 conference on empirical methods in natural language process- ing (EMNLP), pages 6769–6781, 2020

2020
[11]

Incdsi: Incrementally updatable document retrieval

Varsha Kishore, Chao Wan, Justin Lovelace, Yoav Artzi, and Kilian Q Weinberger. Incdsi: Incrementally updatable document retrieval. InInternational Conference on Machine Learn- ing, pages 17122–17134. PMLR, 2023

2023
[12]

Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Il- lia Polosukhin, Matthew Kelcey, Jacob De- vlin, Kenton Lee, Kristina N. Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: a benchmark for question answer- ing research.Tr...

2019
[13]

Plaid shirttt for large-scale streaming dense re- trieval

Dawn Lawrie, Efsun Kayi, Eugene Yang, James Mayfield, and Douglas W Oard. Plaid shirttt for large-scale streaming dense re- trieval. InProceedings of the 47th Interna- tional ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2574–2578, 2024

2024
[14]

Nonparametric decoding for generative retrieval

Hyunji Lee, Jaeyoung Kim, Hoyeon Chang, Hanseok Oh, Sohee Yang, Vladimir Karpukhin, Yi Lu, and Minjoon Seo. Nonparametric decoding for generative retrieval. InFindings of the Association for Computational Linguistics: ACL 2023, pages 12642–12661, 2023

2023
[15]

Glen: Generative retrieval via lexical in- dex learning

Sunkyung Lee, Minjin Choi, and Jongwuk Lee. Glen: Generative retrieval via lexical in- dex learning. InProceedings of the 2023 Con- ference on Empirical Methods in Natural Lan- guage Processing, pages 7693–7704, 2023

2023
[16]

Dsi++: Updating transformer mem- orywithnewdocuments

Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q Tran, Jinfeng Rao, Marc Najork, Emma Strubell, and Donald Metzler. Dsi++: Updating transformer mem- orywithnewdocuments. InProceedingsofthe 2023 conference on Empirical Methods in Nat- ural Language Processing, pages 8198–8213, 2023

2023
[17]

A Parametric Memory Head for Continual Generative Retrieval

Kidist Amde Mekonnen, Yubao Tang, and Maarten de Rijke. A parametric memory head for continual generative retrieval.arXiv preprint arXiv:2604.23388, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[18]

Rethinking the role of demonstrations: What makes in-context learning work? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2022
[19]

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

Tri Nguyen, Mir Rosenberg, Xia Song, Jian- feng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. MS MARCO: A human gener- atedmachinereadingcomprehensiondataset. CoRR, abs/1611.09268, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[20]

Direct preference optimiza- tion: Your language model is secretly a re- ward model.Advances in neural information processing systems, 36:53728–53741, 2023

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimiza- tion: Your language model is secretly a re- ward model.Advances in neural information processing systems, 36:53728–53741, 2023

2023
[21]

Recommender systems with generative retrieval.Advances in Neu- ral Information Processing Systems, 36:10299– 10315, 2023

Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, 11 Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al. Recommender systems with generative retrieval.Advances in Neu- ral Information Processing Systems, 36:10299– 10315, 2023

2023
[22]

In-context retrieval-augmented language models.Trans- actions of the Association for Computational Linguistics, 11:1316–1331, 2023

Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton- Brown, and Yoav Shoham. In-context retrieval-augmented language models.Trans- actions of the Association for Computational Linguistics, 11:1316–1331, 2023

2023
[23]

Now Publishers Inc, 2009

Stephen Robertson and Hugo Zaragoza.The probabilistic relevance framework: BM25 and beyond, volume 4. Now Publishers Inc, 2009

2009
[24]

Trusting your evidence: Hallucinate less with context-aware decoding

Weijia Shi, Xiaochuang Han, Mike Lewis, Yu- lia Tsvetkov, Luke Zettlemoyer, and Wen-tau Yih. Trusting your evidence: Hallucinate less with context-aware decoding. InProceedings of the 2024 Conference of the North Ameri- can Chapter of the Association for Computa- tional Linguistics: Human Language Technolo- gies (Volume 2: Short Papers), pages 783–791, 2024

2024
[25]

Cream: Continual retrieval on dy- namic streaming corpora with adaptive soft memory

HuiJeong Son, Hyeongu Kang, Sunho Kim, Subeen Ho, SeongKu Kang, Dongha Lee, and Susik Yoon. Cream: Continual retrieval on dy- namic streaming corpora with adaptive soft memory. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1, pages 1297–1308, 2026

2026
[26]

Learning to tokenize for gen- erative retrieval.Advances in Neural Infor- mation Processing Systems, 36:46345–46361, 2023

Weiwei Sun, Lingyong Yan, Zheng Chen, Shuaiqiang Wang, Haichao Zhu, Pengjie Ren, Zhumin Chen, Dawei Yin, Maarten Rijke, and Zhaochun Ren. Learning to tokenize for gen- erative retrieval.Advances in Neural Infor- mation Processing Systems, 36:46345–46361, 2023

2023
[27]

Transformer memory as a differentiable search index.Ad- vances in Neural Information Processing Sys- tems, 35:21831–21843, 2022

Yi Tay, Vinh Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, et al. Transformer memory as a differentiable search index.Ad- vances in Neural Information Processing Sys- tems, 35:21831–21843, 2022

2022
[28]

A neural corpus indexer for doc- ument retrieval.Advances in Neural Infor- mation Processing Systems, 35:25600–25614, 2022

Yujing Wang, Yingyan Hou, Haonan Wang, Ziming Miao, Shibin Wu, Qi Chen, Yuqing Xia, Chengmin Chi, Guoshuai Zhao, Zheng Liu, et al. A neural corpus indexer for doc- ument retrieval.Advances in Neural Infor- mation Processing Systems, 35:25600–25614, 2022

2022
[29]

In- fllm: Training-freelong-contextextrapolation for llms with an efficient context memory.Ad- vances in neural information processing sys- tems, 37:119638–119661, 2024

Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, and Maosong Sun. In- fllm: Training-freelong-contextextrapolation for llms with an efficient context memory.Ad- vances in neural information processing sys- tems, 37:119638–119661, 2024

2024
[30]

C-Pack: Packed Resources For General Chinese Embeddings

Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. C-pack: Packaged re- sources to advance general chinese embed- ding.arXiv preprint arXiv:2309.07597, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

Knowledge conflicts for llms: A sur- vey

Rongwu Xu, Zehan Qi, Zhijiang Guo, Cunx- iang Wang, Hongru Wang, Yue Zhang, and Wei Xu. Knowledge conflicts for llms: A sur- vey. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Pro- cessing, pages 8541–8565, 2024

2024
[32]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[33]

Longqlora: Efficient and effective method to extend context length of large language models.arXiv preprint arXiv:2311.04879, 2023

Jianxin Yang. Longqlora: Efficient and effective method to extend context length of large language models.arXiv preprint arXiv:2311.04879, 2023

work page arXiv 2023
[34]

Replication and exploration of generative re- trieval over dynamic corpora

Zhen Zhang, Xinyu Ma, Weiwei Sun, Pengjie Ren, Zhumin Chen, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke, and Zhaochun Ren. Replication and exploration of generative re- trieval over dynamic corpora. InProceedings 12 of the 48th International ACM SIGIR Confer- ence on Research and Development in Informa- tion Retrieval, pages 3325–3334, 2025

2025
[35]

Model editing for new document integration in generative information retrieval

Zhen Zhang, Zihan Wang, Xinyu Ma, ShuaiqiangWang, DaweiYin, XinXin, Pengjie Ren, Maarten de Rijke, and Zhaochun Ren. Model editing for new document integration in generative information retrieval. InPro- ceedings of the ACM Web Conference 2026, pages 1993–2003, 2026. 13 A. In-Context Template Learning A.1. In-Context Template We use a unified in-context t...

2026

[1] [1]

Language models are few-shot learners.Ad- vances in Neural Information Processing Sys- tems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Pra- fulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Ad- vances in Neural Information Processing Sys- tems, 33:1877–1901, 2020

1901

[2] [2]

M3-embedding: Multi-linguality, multi- functionality, multi-granularity text embed- dings through self-knowledge distillation

Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. M3-embedding: Multi-linguality, multi- functionality, multi-granularity text embed- dings through self-knowledge distillation. In Findings of the association for computational linguistics: ACL 2024, pages 2318–2335, 2024

2024

[3] [3]

Autore- gressive entity retrieval.arXiv preprint arXiv:2010.00904, 2020

Nicola De Cao, Gautier Izacard, Sebas- tian Riedel, and Fabio Petroni. Autore- gressive entity retrieval.arXiv preprint arXiv:2010.00904, 2020

work page arXiv 2010

[4] [4]

A survey on in-context learning

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, et al. A survey on in-context learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2024

[5] [5]

Ada-kv: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference.Advances in Neural Information Processing Systems, 38:113152– 113188, 2026

Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, and S Kevin Zhou. Ada-kv: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference.Advances in Neural Information Processing Systems, 38:113152– 113188, 2026

2026

[6] [6]

How to train long-context language models (effectively)

Tianyu Gao, Alexander Wettig, Howard Yen, and Danqi Chen. How to train long-context language models (effectively). InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7376–7399, 2025

2025

[7] [7]

Corpusbrain++: A continual generative pre-training framework for knowledge-intensive language tasks.ACM Transactions on Information Systems, 44(1): 1–35, 2025

Jiafeng Guo, Changjiang Zhou, Ruqing Zhang, Jiangui Chen, Maarten de Rijke, Yix- ingFan,andXueqiCheng. Corpusbrain++: A continual generative pre-training framework for knowledge-intensive language tasks.ACM Transactions on Information Systems, 44(1): 1–35, 2025

2025

[8] [8]

Ruler: What’s the real context size of your long-context lan- guage models? InFirst Conference on Lan- guage Modeling, 2024

Cheng-Ping Hsieh, Simeng Sun, Samuel Kri- man, Shantanu Acharya, Dima Rekesh, Fei 10 Jia, and Boris Ginsburg. Ruler: What’s the real context size of your long-context lan- guage models? InFirst Conference on Lan- guage Modeling, 2024

2024

[9] [9]

Mixlora-dsi: Dynam- ically expandable mixture-of-lora experts for rehearsal-free generative retrieval over dy- namic corpora

Tuan-Luc Huynh, Thuy Vu, Weiqing Wang, Trung Le, Dragan Gasevic, Yuan-Fang Li, and Thanh-Toan Do. Mixlora-dsi: Dynam- ically expandable mixture-of-lora experts for rehearsal-free generative retrieval over dy- namic corpora. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 380–396, 2025

2025

[10] [10]

Dense passage retrieval for open-domain question answer- ing

VladimirKarpukhin,BarlasOguz,SewonMin, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answer- ing. InProceedings of the 2020 conference on empirical methods in natural language process- ing (EMNLP), pages 6769–6781, 2020

2020

[11] [11]

Incdsi: Incrementally updatable document retrieval

Varsha Kishore, Chao Wan, Justin Lovelace, Yoav Artzi, and Kilian Q Weinberger. Incdsi: Incrementally updatable document retrieval. InInternational Conference on Machine Learn- ing, pages 17122–17134. PMLR, 2023

2023

[12] [12]

Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Il- lia Polosukhin, Matthew Kelcey, Jacob De- vlin, Kenton Lee, Kristina N. Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: a benchmark for question answer- ing research.Tr...

2019

[13] [13]

Plaid shirttt for large-scale streaming dense re- trieval

Dawn Lawrie, Efsun Kayi, Eugene Yang, James Mayfield, and Douglas W Oard. Plaid shirttt for large-scale streaming dense re- trieval. InProceedings of the 47th Interna- tional ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2574–2578, 2024

2024

[14] [14]

Nonparametric decoding for generative retrieval

Hyunji Lee, Jaeyoung Kim, Hoyeon Chang, Hanseok Oh, Sohee Yang, Vladimir Karpukhin, Yi Lu, and Minjoon Seo. Nonparametric decoding for generative retrieval. InFindings of the Association for Computational Linguistics: ACL 2023, pages 12642–12661, 2023

2023

[15] [15]

Glen: Generative retrieval via lexical in- dex learning

Sunkyung Lee, Minjin Choi, and Jongwuk Lee. Glen: Generative retrieval via lexical in- dex learning. InProceedings of the 2023 Con- ference on Empirical Methods in Natural Lan- guage Processing, pages 7693–7704, 2023

2023

[16] [16]

Dsi++: Updating transformer mem- orywithnewdocuments

Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q Tran, Jinfeng Rao, Marc Najork, Emma Strubell, and Donald Metzler. Dsi++: Updating transformer mem- orywithnewdocuments. InProceedingsofthe 2023 conference on Empirical Methods in Nat- ural Language Processing, pages 8198–8213, 2023

2023

[17] [17]

A Parametric Memory Head for Continual Generative Retrieval

Kidist Amde Mekonnen, Yubao Tang, and Maarten de Rijke. A parametric memory head for continual generative retrieval.arXiv preprint arXiv:2604.23388, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[18] [18]

Rethinking the role of demonstrations: What makes in-context learning work? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2022

[19] [19]

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

Tri Nguyen, Mir Rosenberg, Xia Song, Jian- feng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. MS MARCO: A human gener- atedmachinereadingcomprehensiondataset. CoRR, abs/1611.09268, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[20] [20]

Direct preference optimiza- tion: Your language model is secretly a re- ward model.Advances in neural information processing systems, 36:53728–53741, 2023

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimiza- tion: Your language model is secretly a re- ward model.Advances in neural information processing systems, 36:53728–53741, 2023

2023

[21] [21]

Recommender systems with generative retrieval.Advances in Neu- ral Information Processing Systems, 36:10299– 10315, 2023

Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, 11 Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Tran, Jonah Samost, et al. Recommender systems with generative retrieval.Advances in Neu- ral Information Processing Systems, 36:10299– 10315, 2023

2023

[22] [22]

In-context retrieval-augmented language models.Trans- actions of the Association for Computational Linguistics, 11:1316–1331, 2023

Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton- Brown, and Yoav Shoham. In-context retrieval-augmented language models.Trans- actions of the Association for Computational Linguistics, 11:1316–1331, 2023

2023

[23] [23]

Now Publishers Inc, 2009

Stephen Robertson and Hugo Zaragoza.The probabilistic relevance framework: BM25 and beyond, volume 4. Now Publishers Inc, 2009

2009

[24] [24]

Trusting your evidence: Hallucinate less with context-aware decoding

Weijia Shi, Xiaochuang Han, Mike Lewis, Yu- lia Tsvetkov, Luke Zettlemoyer, and Wen-tau Yih. Trusting your evidence: Hallucinate less with context-aware decoding. InProceedings of the 2024 Conference of the North Ameri- can Chapter of the Association for Computa- tional Linguistics: Human Language Technolo- gies (Volume 2: Short Papers), pages 783–791, 2024

2024

[25] [25]

Cream: Continual retrieval on dy- namic streaming corpora with adaptive soft memory

HuiJeong Son, Hyeongu Kang, Sunho Kim, Subeen Ho, SeongKu Kang, Dongha Lee, and Susik Yoon. Cream: Continual retrieval on dy- namic streaming corpora with adaptive soft memory. InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1, pages 1297–1308, 2026

2026

[26] [26]

Learning to tokenize for gen- erative retrieval.Advances in Neural Infor- mation Processing Systems, 36:46345–46361, 2023

Weiwei Sun, Lingyong Yan, Zheng Chen, Shuaiqiang Wang, Haichao Zhu, Pengjie Ren, Zhumin Chen, Dawei Yin, Maarten Rijke, and Zhaochun Ren. Learning to tokenize for gen- erative retrieval.Advances in Neural Infor- mation Processing Systems, 36:46345–46361, 2023

2023

[27] [27]

Transformer memory as a differentiable search index.Ad- vances in Neural Information Processing Sys- tems, 35:21831–21843, 2022

Yi Tay, Vinh Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, et al. Transformer memory as a differentiable search index.Ad- vances in Neural Information Processing Sys- tems, 35:21831–21843, 2022

2022

[28] [28]

A neural corpus indexer for doc- ument retrieval.Advances in Neural Infor- mation Processing Systems, 35:25600–25614, 2022

Yujing Wang, Yingyan Hou, Haonan Wang, Ziming Miao, Shibin Wu, Qi Chen, Yuqing Xia, Chengmin Chi, Guoshuai Zhao, Zheng Liu, et al. A neural corpus indexer for doc- ument retrieval.Advances in Neural Infor- mation Processing Systems, 35:25600–25614, 2022

2022

[29] [29]

In- fllm: Training-freelong-contextextrapolation for llms with an efficient context memory.Ad- vances in neural information processing sys- tems, 37:119638–119661, 2024

Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, and Maosong Sun. In- fllm: Training-freelong-contextextrapolation for llms with an efficient context memory.Ad- vances in neural information processing sys- tems, 37:119638–119661, 2024

2024

[30] [30]

C-Pack: Packed Resources For General Chinese Embeddings

Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. C-pack: Packaged re- sources to advance general chinese embed- ding.arXiv preprint arXiv:2309.07597, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [31]

Knowledge conflicts for llms: A sur- vey

Rongwu Xu, Zehan Qi, Zhijiang Guo, Cunx- iang Wang, Hongru Wang, Yue Zhang, and Wei Xu. Knowledge conflicts for llms: A sur- vey. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Pro- cessing, pages 8541–8565, 2024

2024

[32] [32]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[33] [33]

Longqlora: Efficient and effective method to extend context length of large language models.arXiv preprint arXiv:2311.04879, 2023

Jianxin Yang. Longqlora: Efficient and effective method to extend context length of large language models.arXiv preprint arXiv:2311.04879, 2023

work page arXiv 2023

[34] [34]

Replication and exploration of generative re- trieval over dynamic corpora

Zhen Zhang, Xinyu Ma, Weiwei Sun, Pengjie Ren, Zhumin Chen, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke, and Zhaochun Ren. Replication and exploration of generative re- trieval over dynamic corpora. InProceedings 12 of the 48th International ACM SIGIR Confer- ence on Research and Development in Informa- tion Retrieval, pages 3325–3334, 2025

2025

[35] [35]

Model editing for new document integration in generative information retrieval

Zhen Zhang, Zihan Wang, Xinyu Ma, ShuaiqiangWang, DaweiYin, XinXin, Pengjie Ren, Maarten de Rijke, and Zhaochun Ren. Model editing for new document integration in generative information retrieval. InPro- ceedings of the ACM Web Conference 2026, pages 1993–2003, 2026. 13 A. In-Context Template Learning A.1. In-Context Template We use a unified in-context t...

2026