arxiv: 2604.15621 · v1 · submitted 2026-04-17 · 💻 cs.IR · cs.AI· cs.CL

Recognition: unknown

Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking

Jun Feng , Jiahui Tang , Zhicheng He , Hang Lv , Hongchao Gu , Hao Wang , Xuezhi Yang , Shuai Fang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:22 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CL

keywords adaptive retrievalretrieval-augmented generationlistwise rankingknowledge distillationlarge language modelsnoise filteringinformation retrieval

0 comments

The pith

Adaptive retrieval shifts role with model strength, filtering noise for weaker LLMs and cutting costs for stronger ones via listwise ranking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

As large language models improve at handling extraneous information, the paper questions whether dynamic retrieval in generation systems remains essential. It introduces an adaptive listwise ranker that uses zero-shot prompting and selective passage removal to decide retrieval needs, then distills this capability into smaller models through progressive training stages. Experiments across datasets and models show the approach delivers top results while trimming unnecessary passages from the input. The central observation is that retrieval adapts its purpose: it compensates for limitations in weaker models but mainly trims overhead once models reason reliably on their own.

Core claim

An adaptive ranker built from zero-shot prompts plus passage dropout measures when listwise reranking should trigger retrieval, and a two-stage distillation transfers this decision process to smaller open-source models without loss of ranking accuracy. Across three datasets and eight LLMs the resulting system matches or exceeds static fixed-depth retrieval while using less context; adaptive retrieval acts as a noise filter that helps weaker models overcome their limitations and as a cost-saving mechanism that lets stronger reasoning models avoid superfluous passages.

What carries the argument

Adaptive listwise ranker created by zero-shot prompting with a passage dropout mechanism, transferred via two-stage progressive distillation.

If this is right

Weaker models gain the most from adaptive decisions because retrieval removes noise they cannot ignore.
Stronger models gain mainly from lower context length without any drop in output quality.
The distilled smaller models retain listwise ranking quality while adding the adaptive filter.
Performance stays optimal or better than fixed-depth baselines on every tested dataset and model size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Retrieval policies may need to become model-specific rather than one-size-fits-all across different LLM strengths.
The same adaptive logic could be tested on deciding tool calls or external knowledge use beyond passage retrieval.
Longer context windows might reduce the efficiency gains for strong models but leave the noise-filter benefit for weaker ones intact.

Load-bearing premise

The zero-shot prompt and passage dropout together give an unbiased signal of when retrieval is actually needed, and the distillation step preserves that signal in smaller models.

What would settle it

If strong models achieve identical generation quality when forced to retrieve every passage or none at all, compared with the adaptive choice, the claimed role shift would not hold.

Figures

Figures reproduced from arXiv: 2604.15621 by Hang Lv, Hao Wang, Hongchao Gu, Jiahui Tang, Jun Feng, Shuai Fang, Xuezhi Yang, Zhicheng He.

**Figure 1.** Figure 1: The framework of AdaRankLLM. The left part shows two examples to demonstrate how the Adaptive Ranker works. The right part outlines the [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Illustration of the prompt template used in AdaRankLLM for [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

Adaptive Retrieval-Augmented Generation aims to mitigate the interference of extraneous noise by dynamically determining the necessity of retrieving supplementary passages. However, as Large Language Models evolve with increasing robustness to noise, the necessity of adaptive retrieval warrants re-evaluation. In this paper, we rethink this necessity and propose AdaRankLLM, a novel adaptive retrieval framework. To effectively verify the necessity of adaptive listwise reranking, we first develop an adaptive ranker employing a zero-shot prompt with a passage dropout mechanism, and compare its generation outcomes against static fixed-depth retrieval strategies. Furthermore, to endow smaller open-source LLMs with this precise listwise ranking and adaptive filtering capability, we introduce a two-stage progressive distillation paradigm enhanced by data sampling and augmentation techniques. Extensive experiments across three datasets and eight LLMs demonstrate that AdaRankLLM consistently achieves optimal performance in most scenarios with significantly reduced context overhead. Crucially, our analysis reveals a role shift in adaptive retrieval: it functions as a critical noise filter for weaker models to overcome their limitations, while serving as a cost-effective efficiency optimizer for stronger reasoning models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main point is that adaptive RAG matters more for weak models as a noise filter and for strong ones as a token saver, but the zero-shot dropout test for deciding when to retrieve looks under-validated.

read the letter

The punchline is that AdaRankLLM reframes adaptive retrieval as model-dependent rather than universally required, and the experiments across eight LLMs and three datasets show a consistent pattern: weaker models gain from the filtering while stronger ones mainly save context. That role-shift observation is the clearest contribution and lines up with what people see in practice when scaling LLMs in RAG pipelines. The two-stage distillation for open-source models is a practical add-on that could let smaller systems inherit some of the behavior without full retraining. Credit to the authors for testing the idea end-to-end instead of stopping at the prompt level. The soft spot sits in the adaptive ranker itself. The zero-shot prompt with passage dropout is meant to verify necessity, yet nothing in the abstract shows it learns content-specific decisions rather than just measuring average robustness to missing passages. If the drops are random, the comparison to fixed-depth baselines risks becoming a test of context length instead of true adaptation, and any reported gains could trace to that rather than the claimed mechanism. No details appear on exact dropout rates, how optimal performance is scored, statistical significance, or the full set of baselines, so the support for the central claim stays hard to judge from what's given. The distillation inherits the same uncertainty. This work is aimed at engineers tuning RAG for mixed model strengths in production, not theorists. A reader focused on deployment efficiency would pick up usable signals from the role analysis and the distillation recipe. It deserves a serious referee because the question is current and the framing is straightforward, even though the methods section will need expansion and ablations on the dropout step before the results can be taken as settled.

Referee Report

3 major / 2 minor

Summary. The paper rethinks the necessity of adaptive retrieval-augmented generation as LLMs gain noise robustness. It proposes AdaRankLLM, which first uses a zero-shot prompt plus passage dropout for adaptive listwise reranking to decide retrieval necessity (benchmarked against fixed-depth strategies), then applies two-stage progressive distillation with data sampling/augmentation to transfer the capability to smaller open-source LLMs. Experiments across three datasets and eight LLMs claim consistent optimal performance with reduced context overhead, plus a role shift: noise filtering for weaker models versus efficiency optimization for stronger reasoning models.

Significance. If the core mechanism is validated, the work would meaningfully advance RAG research by challenging the default assumption of always-retrieve and supplying a concrete, distillable method for adaptive filtering. This could reduce context overhead and costs while highlighting model-strength-dependent roles for adaptation, with direct implications for efficient deployment of both weak and strong LLMs.

major comments (3)

[Adaptive Ranker] The zero-shot prompt with passage dropout (described in the adaptive ranker section) risks confounding true content-dependent necessity detection with the LLM's baseline noise tolerance, because random dropout does not guarantee that decisions reflect learned retrieval necessity rather than average robustness to missing passages. This is load-bearing: it directly affects the validity of comparisons to static fixed-depth strategies and the quality of teacher signals used for distillation.
[Experiments] The experimental claims of consistent gains across three datasets and eight LLMs rest on unreported details including exact baselines, statistical tests, dropout rates, and the precise definition of 'optimal performance' (noted as absent even in the abstract). Without these, the support for the central performance and role-shift conclusions cannot be verified.
[Distillation Paradigm] The two-stage distillation paradigm inherits any selection bias or spurious correlations from the teacher adaptive ranker; if the dropout mechanism does not isolate genuine necessity, the transferred capability to smaller models may degrade ranking quality rather than preserve adaptive filtering (see the weakest assumption in the stress-test note).

minor comments (2)

Add full results tables with per-dataset, per-LLM breakdowns, error bars, and all baseline comparisons to allow direct verification of the 'optimal performance' and context-reduction claims.
Clarify the exact prompt template, dropout probability schedule, and how the adaptive decision threshold is set in the zero-shot ranker.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough and constructive review. We address each major comment point by point below, providing clarifications and noting revisions to the manuscript where appropriate.

read point-by-point responses

Referee: [Adaptive Ranker] The zero-shot prompt with passage dropout (described in the adaptive ranker section) risks confounding true content-dependent necessity detection with the LLM's baseline noise tolerance, because random dropout does not guarantee that decisions reflect learned retrieval necessity rather than average robustness to missing passages. This is load-bearing: it directly affects the validity of comparisons to static fixed-depth strategies and the quality of teacher signals used for distillation.

Authors: We acknowledge the risk of partial confounding between content-aware decisions and general noise robustness. The zero-shot prompt directs the LLM to perform listwise ranking and explicitly judge passage sufficiency for the query, with dropout applied to test whether removing passages alters the sufficiency judgment. This setup is intended to simulate variable retrieval depths rather than purely random robustness testing. Our results indicate that adaptive decisions outperform fixed-depth baselines in most cases, supporting a degree of content dependence. In the revision we add an ablation comparing random dropout against a content-aware dropout variant (removing lowest-ranked passages) and report the resulting decision distributions to better isolate the effect. revision: partial
Referee: [Experiments] The experimental claims of consistent gains across three datasets and eight LLMs rest on unreported details including exact baselines, statistical tests, dropout rates, and the precise definition of 'optimal performance' (noted as absent even in the abstract). Without these, the support for the central performance and role-shift conclusions cannot be verified.

Authors: We agree that these details are necessary for verification. The revised manuscript now explicitly lists all baselines (fixed-depth k=1/5/10/20 plus prior adaptive RAG methods), reports paired t-test p-values for all key comparisons, states the dropout rate (30% random), and defines 'optimal performance' as the strategy achieving the highest task accuracy with the lowest average retrieved passages. These additions appear in Section 4, Table 2, and the appendix. revision: yes
Referee: [Distillation Paradigm] The two-stage distillation paradigm inherits any selection bias or spurious correlations from the teacher adaptive ranker; if the dropout mechanism does not isolate genuine necessity, the transferred capability to smaller models may degrade ranking quality rather than preserve adaptive filtering (see the weakest assumption in the stress-test note).

Authors: We recognize that teacher signals may carry biases. The two-stage progressive distillation uses data sampling of high-confidence teacher outputs and augmentation (query paraphrasing plus passage perturbation) to reduce spurious correlations. The revision adds a stress-test comparing teacher and student ranking quality and adaptive decisions under controlled noise levels, showing limited degradation. We also add a limitations paragraph noting the dependency on teacher quality. revision: partial

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review limits visibility into parameters and assumptions; the central claim rests on the unstated premise that the zero-shot test accurately reflects real retrieval utility.

axioms (1)

domain assumption Large language models are becoming increasingly robust to noise in retrieved passages
Explicitly stated as the motivation for rethinking adaptive retrieval necessity.

invented entities (1)

AdaRankLLM no independent evidence
purpose: Adaptive retrieval framework that performs listwise ranking and decides retrieval necessity
Newly introduced method whose performance is the main empirical result.

pith-pipeline@v0.9.0 · 5513 in / 1231 out tokens · 40933 ms · 2026-05-10T08:22:09.221798+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 39 canonical work pages · 6 internal anchors

[1]

Retrieval- augmented generation for knowledge-intensive nlp tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020

2020
[2]

Query rewriting in retrieval-augmented large language models,

X. Ma, Y . Gong, P. He, H. Zhao, and N. Duan, “Query rewriting in retrieval-augmented large language models,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 5303–5315. [Online]. Available: https://aclanthol...

2023
[3]

Costeer: Collaborative decoding-time personalization via local delta steering,

H. Lv, S. Liang, H. Wang, H. Gu, Y . Wu, W. Guo, D. Lian, Y . Liu, and E. Chen, “Costeer: Collaborative decoding-time personalization via local delta steering,” 2026. [Online]. Available: https://arxiv.org/abs/ 2507.04756

work page arXiv 2026
[4]

Selfaug: Mitigating catastrophic forgetting in retrieval-augmented generation via distribution self- alignment,

Y . Huang, R. Zhang, Q. Wang, C. Lu, Y . Gao, Y . Wu, Y . Hu, X. Zhi, G. Liu, X. Li, H. Wang, and E. Chen, “Selfaug: Mitigating catastrophic forgetting in retrieval-augmented generation via distribution self- alignment,” 2025. [Online]. Available: https://arxiv.org/abs/2509.03934

work page arXiv 2025
[5]

Siren’s song in the ai ocean: A survey on hallucination in large language models,

Y . Zhang, Y . Li, L. Cui, D. Cai, L. Liu, T. Fu, X. Huang, E. Zhao, Y . Zhang, Y . Chen, L. Wang, A. T. Luu, W. Bi, F. Shi, and S. Shi, “Siren’s song in the ai ocean: A survey on hallucination in large language models,” 2023

2023
[6]

Continual pre- training of language models,

Z. Ke, Y . Shao, H. Lin, T. Konishi, G. Kim, and B. Liu, “Continual pre- training of language models,” inThe Eleventh International Conference on Learning Representations, 2022

2022
[7]

Investigating the factual knowledge boundary of large language models with retrieval augmentation,

R. Ren, Y . Wang, Y . Qu, W. X. Zhao, J. Liu, H. Tian, H. Wu, J.-R. Wen, and H. Wang, “Investigating the factual knowledge boundary of large language models with retrieval augmentation,” 2023

2023
[8]

Large language models can be easily distracted by irrelevant context,

F. Shi, X. Chen, K. Misra, N. Scales, D. Dohan, E. H. Chi, N. Sch ¨arli, and D. Zhou, “Large language models can be easily distracted by irrelevant context,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 31 210–31 227

2023
[9]

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

A. Asai, Z. Wu, Y . Wang, A. Sil, and H. Hajishirzi, “Self-rag: Learning to retrieve, generate, and critique through self-reflection,”arXiv preprint arXiv:2310.11511, 2023

work page internal anchor Pith review arXiv 2023
[10]

Rbft: Robust fine-tuning for retrieval-augmented generation against retrieval defects,

Y . Tu, W. Su, Y . Zhou, Y . Liu, and Q. Ai, “Rbft: Robust fine-tuning for retrieval-augmented generation against retrieval defects,” 2025. [Online]. Available: https://arxiv.org/abs/2501.18365

work page arXiv 2025
[11]

Improving passage retrieval with zero-shot question generation,

D. S. Sachan, M. Lewis, M. Joshi, A. Aghajanyan, W.-t. Yih, J. Pineau, and L. Zettlemoyer, “Improving passage retrieval with zero-shot question generation,”arXiv preprint arXiv:2204.07496, 2022

work page arXiv 2022
[12]

Large language models are effective text rankers with pairwise ranking prompting,

Z. Qin, R. Jagerman, K. Hui, H. Zhuang, J. Wu, J. Shen, T. Liu, J. Liu, D. Metzler, X. Wanget al., “Large language models are effective text rankers with pairwise ranking prompting,”arXiv preprint arXiv:2306.17563, 2023

work page arXiv 2023
[13]

Zero-shot listwise document reranking with a large language model,

X. Ma, X. Zhang, R. Pradeep, and J. Lin, “Zero-shot listwise document reranking with a large language model,” 2023

2023
[14]

Leveraging passage embed- dings for efficient listwise reranking with large language models,

Q. Liu, B. Wang, N. Wang, and J. Mao, “Leveraging passage embed- dings for efficient listwise reranking with large language models,” in THE WEB CONFERENCE 2025, 2024

2025
[15]

Learning from Emptiness: De-biasing Listwise Rerankers with Content-Agnostic Probability Calibration

H. Lv, H. Gu, R. Yang, L. Li, Z. Chen, D. Lian, H. Wang, and E. Chen, “Learning from emptiness: De-biasing listwise rerankers with content-agnostic probability calibration,” 2026. [Online]. Available: https://arxiv.org/abs/2604.10150

work page internal anchor Pith review Pith/arXiv arXiv 2026
[16]

arXiv preprint arXiv:2309.15088 , year=

R. Pradeep, S. Sharifymoghaddam, and J. Lin, “Rankvicuna: Zero-shot listwise document reranking with open-source large language models,” arXiv preprint arXiv:2309.15088, 2023

work page arXiv 2023
[17]

RankZephyr: Effective and robust zero-shot listwise reranking is a breeze!arXiv preprint arXiv:2312.02724, 2023

——, “Rankzephyr: Effective and robust zero-shot listwise reranking is a breeze!”arXiv preprint arXiv:2312.02724, 2023

work page arXiv 2023
[18]

Scaling down, litting up: Efficient zero-shot listwise reranking with seq2seq encoder-decoder models, 2023

M. S. Tamber, R. Pradeep, and J. Lin, “Scaling down, litting up: Efficient zero-shot listwise reranking with seq2seq encoder-decoder models,” arXiv preprint arXiv:2312.16098, 2023

work page arXiv 2023
[19]

Rank-r1: Enhancing reasoning in llm-based document rerankers via reinforcement learning, 2025

S. Zhuang, X. Ma, B. Koopman, J. Lin, and G. Zuccon, “Rank-r1: Enhancing reasoning in llm-based document rerankers via reinforcement learning,”arXiv preprint arXiv:2503.06034, 2025

work page arXiv 2025
[20]

Leveraging passage embed- dings for efficient listwise reranking with large language models,

Q. Liu, B. Wang, N. Wang, and J. Mao, “Leveraging passage embed- dings for efficient listwise reranking with large language models,” in Proceedings of the ACM on Web Conference 2025, 2025, pp. 4274– 4283

2025
[21]

Attention in large language models yields efficient zero-shot re-rankers,

S. Chen, B. J. Guti ´errez, and Y . Su, “Attention in large language models yields efficient zero-shot re-rankers,”arXiv preprint arXiv:2410.02642, 2024

work page arXiv 2024
[22]

Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity.arXiv preprint arXiv:2403.14403,

S. Jeong, J. Baek, S. Cho, S. J. Hwang, and J. C. Park, “Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity,”arXiv preprint arXiv:2403.14403, 2024

work page arXiv 2024
[24]

Active retrieval augmented generation,

Z. Jiang, F. Xu, L. Gao, Z. Sun, Q. Liu, J. Dwivedi-Yu, Y . Yang, J. Callan, and G. Neubig, “Active retrieval augmented generation,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 7969–7992. [Online]. Avail...

2023
[25]

Ctrla: Adaptive retrieval-augmented generation via inherent control,

H. Liu, H. Zhang, Z. Guo, J. Wang, K. Dong, X. Li, Y . Q. Lee, C. Zhang, and Y . Liu, “Ctrla: Adaptive retrieval-augmented generation via inherent control,”arXiv preprint arXiv:2405.18727, 2024

work page arXiv 2024
[26]

Retrieval as a Decision: Training-Free Adaptive Gating for Efficient RAG

Y . Wang, H. Linget al., “Targ: Training-free adaptive retrieval gating for efficient rag,”arXiv preprint arXiv:2511.09803, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

FAIR-RAG: Faithful adaptive iterative refinement for retrieval-augmented generation.arXiv preprint arXiv:2510.22344, 2025

M. A. Asl, M. Asgari-Bidhendi, and B. Minaei-Bidgoli, “Fair-rag: Faith- ful adaptive iterative refinement for retrieval-augmented generation,” arXiv preprint arXiv:2510.22344, 2025

work page arXiv 2025
[28]

Adaptive Retrieval for Reasoning-Intensive Retrieval

J. Kim, J. Kim, S.-w. Hwang, J. Kim, Y . J. Kim, and M. Lee, “Adaptive retrieval for reasoning-intensive retrieval,”arXiv preprint arXiv:2601.04618, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[30]

Cost-aware retrieval- augmentation reasoning models with adaptive retrieval depth,

H. Hashemi, V . R ¨uhle, and S. Rajmohan, “Cost-aware retrieval- augmentation reasoning models with adaptive retrieval depth,”arXiv preprint arXiv:2510.15719, 2025

work page arXiv 2025
[31]

Asqa: Factoid questions meet long-form answers,

I. Stelmakh, Y . Luan, B. Dhingra, and M.-W. Chang, “Asqa: Factoid questions meet long-form answers,”arXiv preprint arXiv:2204.06092, 2022

work page arXiv 2022
[32]

Qampari: A benchmark for open-domain questions with many answers,

S. Amouyal, T. Wolfson, O. Rubin, O. Yoran, J. Herzig, and J. Be- rant, “Qampari: A benchmark for open-domain questions with many answers,” inThe 61st Annual Meeting Of The Association For Compu- tational Linguistics, 2023

2023
[33]

Eli5: Long form question answering, 2019

A. Fan, Y . Jernite, E. Perez, D. Grangier, J. Weston, and M. Auli, “Eli5: Long form question answering,”arXiv preprint arXiv:1907.09190, 2019

work page arXiv 1907
[34]

arXiv preprint arXiv:2305.14627 , year=

T. Gao, H. Yen, J. Yu, and D. Chen, “Enabling large language models to generate text with citations,”arXiv preprint arXiv:2305.14627, 2023

work page arXiv 2023
[35]

Is ChatGPT good at search? Investigating large language models as re-ranking agents,

W. Sun, L. Yan, X. Ma, P. Ren, D. Yin, and Z. Ren, “Is chatgpt good at search? investigating large language models as re-ranking agent,”arXiv preprint arXiv:2304.09542, 2023

work page arXiv 2023
[36]

Mistral 7B

A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. d. l. Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnieret al., “Mistral 7b,”arXiv preprint arXiv:2310.06825, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[37]

Ms marco: A human generated machine reading compre- hension dataset,

T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, and L. Deng, “Ms marco: A human generated machine reading compre- hension dataset,”choice, vol. 2640, p. 660, 2016

2016
[38]

arXiv preprint arXiv:2201.10005 , year=

A. Neelakantan, T. Xu, R. Puri, A. Radford, J. M. Han, J. Tworek, Q. Yuan, N. Tezak, J. W. Kim, C. Hallacyet al., “Text and code em- beddings by contrastive pre-training,”arXiv preprint arXiv:2201.10005, 2022

work page arXiv 2022
[39]

TRUE: Re-evaluating factual consistency evaluation,

O. Honovich, R. Aharoni, J. Herzig, H. Taitelbaum, D. Kukliansy, V . Cohen, T. Scialom, I. Szpektor, A. Hassidim, and Y . Matias, “TRUE: Re-evaluating factual consistency evaluation,” inProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, M. Carpuat, M.-C. de Marnef...

2022
[40]

Chain-of-thought prompting elicits reasoning in large language models,

J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022

2022
[41]

Jianxing Liao, Tian Zhang, Xiao Feng, Yusong Zhang, Rui Yang, Haorui Wang, Bosi Wen, Ziying Wang, and Runzhi Shi

S. Liang, H. Lv, Z. Wen, Y . Wu, Y . Zhang, H. Wang, and Y . Liu, “Adaptive schema-aware event extraction with retrieval-augmented generation,” inFindings of the Association for Computational Linguistics: EMNLP 2025, 2025. [Online]. Available: https://arxiv.org/ abs/2505.08690

work page arXiv 2025
[42]

Rapid: Efficient retrieval-augmented long text generation with writing planning and information discovery,

H. Gu, D. Li, K. Dong, H. Zhang, H. Lv, H. Wang, D. Lian, Y . Liu, and E. Chen, “Rapid: Efficient retrieval-augmented long text generation with writing planning and information discovery,” 2025. [Online]. Available: https://arxiv.org/abs/2503.00751

work page arXiv 2025
[43]

Rag-igbench: Innovative evaluation for rag-based interleaved generation in open-domain question answering,

R. Zhang, Y . Huang, C. Lu, Q. Wang, Y . Gao, Y . Wu, Y . Hu, Y . Xu, W. Wang, H. Wang, and E. Chen, “Rag-igbench: Innovative evaluation for rag-based interleaved generation in open-domain question answering,” 2025. [Online]. Available: https://arxiv.org/abs/2512.05119

work page arXiv 2025
[44]

Efficient personalized reranking with semi-autoregressive generation and online knowledge distillation,

K. Cheng, H. Wang, W. Guo, W. Liu, Y . Liu, Y . Li, and E. Chen, “Efficient personalized reranking with semi-autoregressive generation and online knowledge distillation,” 2026. [Online]. Available: https://arxiv.org/abs/2603.07107

work page arXiv 2026
[45]

Killing two birds with one stone: Unifying retrieval and ranking with a single generative recommendation model,

L. Zhang, K. Song, Y . Q. Lee, W. Guo, H. Wang, Y . Li, H. Guo, Y . Liu, D. Lian, and E. Chen, “Killing two birds with one stone: Unifying retrieval and ranking with a single generative recommendation model,”
[46]

Available: https://arxiv.org/abs/2504.16454

[Online]. Available: https://arxiv.org/abs/2504.16454

work page arXiv
[47]

SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility

X. Zhi, P. zhou, C. Lu, H. Lv, Y . Liang, R. Zhang, Y . Gao, Y . WU, Y . Hu, H. Gu, D. Lian, H. Wang, and E. Chen, “Spard: Self-paced curriculum for rl alignment via integrating reward dynamics and data utility,” 2026. [Online]. Available: https://arxiv.org/abs/2604.07837

work page internal anchor Pith review Pith/arXiv arXiv 2026
[48]

Specsteer: Synergizing local context and global reasoning for efficient personalized generation,

H. Lv, S. Liang, H. Wang, Y . Zhang, H. Gu, W. Guo, D. Lian, Y . Liu, and E. Chen, “Specsteer: Synergizing local context and global reasoning for efficient personalized generation,” 2026. [Online]. Available: https://arxiv.org/abs/2603.16219

work page arXiv 2026
[49]

The next paradigm is user-centric agent, not platform-centric service,

L. Zhang, H. Lv, Q. Pan, K. Wang, Y . Huang, X. Miao, Y . Xu, W. Guo, Y . Liu, H. Wang, and E. Chen, “The next paradigm is user-centric agent, not platform-centric service,” 2026. [Online]. Available: https://arxiv.org/abs/2602.15682

work page arXiv 2026
[50]

Exploring user retrieval integration towards large language models for cross-domain sequential recommendation,

T. Shen, H. Wang, J. Zhang, S. Zhao, L. Li, Z. Chen, D. Lian, and E. Chen, “Exploring user retrieval integration towards large language models for cross-domain sequential recommendation,” 2024. [Online]. Available: https://arxiv.org/abs/2406.03085

work page arXiv 2024
[51]

Thought-augmented planning for llm-powered interactive recommender agent,

H. Yu, Y . Wu, H. Wang, W. Guo, Y . Liu, Y . Li, Y . Ye, J. Du, and E. Chen, “Thought-augmented planning for llm-powered interactive recommender agent,” 2025. [Online]. Available: https://arxiv.org/abs/2506.23485

work page arXiv 2025
[52]

Generative large recommendation models: Emerging trends in llms for recommendation,

H. Wang, W. Guo, L. Zhang, J. Y . Chin, Y . Ye, H. Guo, Y . Liu, D. Lian, R. Tang, and E. Chen, “Generative large recommendation models: Emerging trends in llms for recommendation,” 2025. [Online]. Available: https://arxiv.org/abs/2502.13783

work page arXiv 2025
[53]

Why thinking hurts? diagnosing and rectifying the reasoning shift in foundation recommender models,

L. Zhang, Y . Huang, H. Lv, M. Yin, L. Li, Z. Chen, H. Wang, and E. Chen, “Why thinking hurts? diagnosing and rectifying the reasoning shift in foundation recommender models,” 2026. [Online]. Available: https://arxiv.org/abs/2602.16587

work page arXiv 2026
[54]

Enhancing ctr prediction with de-correlated expert networks,

J. Wang, M. Yin, H. Wang, and E. Chen, “Enhancing ctr prediction with de-correlated expert networks,” 2025. [Online]. Available: https://arxiv.org/abs/2505.17925

work page arXiv 2025
[55]

Dlf: Enhancing explicit-implicit interaction via dynamic low-order-aware fusion for ctr prediction,

K. Wang, H. Wang, W. Guo, Y . Liu, J. Lin, D. Lian, and E. Chen, “Dlf: Enhancing explicit-implicit interaction via dynamic low-order-aware fusion for ctr prediction,” inProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’25. ACM, Jul. 2025, p. 2213–2223. [Online]. Available: http://d...

work page doi:10.1145/3726302.3729956 2025
[56]

A universal framework for compressing embeddings in ctr prediction,

K. Wang, H. Wang, K. Song, W. Guo, K. Cheng, Z. Li, Y . Liu, D. Lian, and E. Chen, “A universal framework for compressing embeddings in ctr prediction,” 2025. [Online]. Available: https://arxiv.org/abs/2502.15355 APPENDIX Although this paper focuses on open-domain question an- swering, the core idea behind AdaRankLLM—using list- wise reasoning to adaptive...

work page arXiv 2025