arxiv: 2604.24608 · v1 · submitted 2026-04-27 · 💻 cs.IR · cs.AI· cs.CL

Recognition: unknown

Learning to Route Queries to Heads for Attention-based Re-ranking with Large Language Models

Yuxing Tian , Fengran Mo , Zhiqi Huang , Weixu Zhang , Jian-Yun Nie

Authors on Pith no claims yet

Pith reviewed 2026-05-08 01:56 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CL

keywords attention headsre-rankinglarge language modelsquery routinginformation retrievalpseudo labelssparsity regularization

0 comments

The pith

A lightweight router learns to pick query-specific attention heads in LLMs for improved document re-ranking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes RouteHead to address the fact that useful attention heads in LLMs vary by query when used for zero-shot re-ranking. Instead of aggregating signals from all heads or a fixed subset, it trains a router to map each query to an optimal small set of heads. Pseudo labels for training come from an offline search over head combinations, and the router is kept lightweight by using learnable embeddings for heads plus query embeddings taken from the frozen LLM. A sparsity regularizer prevents selection of redundant or conflicting heads. The resulting method beats strong baselines on multiple benchmarks and LLM backbones.

Core claim

We propose RouteHead, a query-dependent head selection method for attention-based re-ranking with LLMs. We learn a lightweight router that maps each query to an optimal head set by representing heads with learnable embeddings and queries with embeddings from the hidden states of the frozen LLM. The router is trained on pseudo labels constructed via offline search together with a sparsity regularizer. Relevance scores are computed by aggregating attention signals only from the selected heads. Experiments on diverse benchmarks and multiple LLM backbones show that the proposed method consistently outperforms strong baselines.

What carries the argument

The lightweight router, which uses learnable head embeddings and query embeddings extracted from LLM hidden states to predict and select per-query optimal head sets from pseudo labels.

If this is right

Re-ranking quality rises on diverse benchmarks because only non-redundant, query-relevant heads contribute to the final score.
The same router architecture works across multiple different LLM backbones without retraining the underlying model.
Sparsity regularization during training limits the number of heads used, reducing potential signal conflicts.
Attention signals become more fine-grained and effective for zero-shot relevance estimation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The routing mechanism could be adapted to other attention-heavy LLM tasks such as summarization or question answering where head utility also varies.
If the router generalizes reliably, inference cost could drop by computing attention only for the selected heads rather than the full set.
This approach resembles dynamic routing in mixture-of-experts models and might benefit from similar load-balancing techniques.

Load-bearing premise

The pseudo labels generated by offline search accurately identify the optimal head sets for each query and the router trained on those labels generalizes to new queries without inheriting search biases or overfitting.

What would settle it

Run the router on a held-out set of queries for which an exhaustive offline search can be performed to find the true best head combinations; if the router-selected heads produce lower re-ranking quality than those true optima or fail to beat fixed-head baselines, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2604.24608 by Fengran Mo, Jian-Yun Nie, Weixu Zhang, Yuxing Tian, Zhiqi Huang.

**Figure 1.** Figure 1: Performance comparison (nDCG@10) across three view at source ↗

read the original abstract

Large Language Models (LLMs) have recently been explored as fine-grained zero-shot re-rankers by leveraging attention signals to estimate document relevance. However, existing methods either aggregate attention signals across all heads or rely on a statically selected subset identified by heuristic rules. This solution can be suboptimal because the informative heads can vary across queries or domains. Moreover, naively combining multiple heads can degrade performance due to redundancy or conflicting ranking signals. In this paper, we propose a query-dependent head selection method, RouteHead, for attention-based re-ranking with LLMs. Specifically, we learn a lightweight router that can map each query to an optimal head set, and relevance scores are computed by aggregating attention signals only from these heads. Since query-to-head optimal labels are unavailable, we first construct pseudo labels via an offline search. The router represents each head with a learnable embedding and represents each query using an embedding extracted from the hidden states of the frozen LLM. Then it is trained on the pseudo labels with a sparsity regularizer. Experiments on diverse benchmarks and multiple LLM backbones show that the proposed method consistently outperforms strong baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a learned query router for picking sparse attention heads in LLM re-rankers and reports gains over all-head or static baselines, but the pseudo-label step from offline search is the unverified part.

read the letter

The main takeaway is that RouteHead trains a small router to pick which attention heads to use for each query instead of averaging everything or using a fixed subset. The router takes a query embedding from the frozen LLM hidden states, combines it with learnable head embeddings, and is trained with a sparsity penalty on labels from an offline search. That routing step is the actual addition over the prior static or aggregate approaches mentioned in the abstract.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce RouteHead, a query-dependent head selection method for attention-based re-ranking using LLMs. It learns a lightweight router that maps each query to an optimal set of attention heads by using query embeddings from frozen LLM hidden states and learnable head embeddings. Pseudo labels for training are generated via an offline search, and the router is trained with a supervised loss plus sparsity regularizer to prevent using redundant heads. Experiments on diverse benchmarks with multiple LLM backbones show consistent outperformance compared to baselines that aggregate all heads or use static subsets.

Significance. If the central assumptions hold, this work could advance attention-based re-ranking by enabling dynamic, query-specific selection of informative heads, addressing the suboptimality of full aggregation or heuristic static selection. The lightweight nature of the router makes it practical for deployment. It highlights the potential of learning to route within LLM internals for IR tasks, and if the pseudo-label approach generalizes well, it could inspire similar techniques for other LLM components.

major comments (2)

[§3.1] §3.1 (Pseudo-label construction): The offline search for generating pseudo labels lacks detail on the enumeration strategy, budget, heuristics, or any verification that selected head sets are optimal (e.g., no comparison to exhaustive search on small cases). This is load-bearing because the router is supervised directly on these labels; if the search misses superior combinations or introduces bias, reported gains may reflect label artifacts rather than learned query-dependent routing.
[§4] §4 (Experiments): No ablation is reported comparing the trained router against an oracle head set (optimal heads found by search on held-out queries) or measuring router generalization error on unseen queries. Without this, it is impossible to confirm that outperformance stems from effective routing rather than the router inheriting search biases or overfitting to the pseudo-label distribution.

minor comments (2)

[Abstract] Abstract: The claim of 'consistent outperformance' would be stronger with explicit mention of the specific benchmarks, LLM backbones, and baseline implementations used.
Notation: The router architecture (query embedding extraction and head embedding interaction) would benefit from a figure or explicit pseudocode to clarify the forward pass and sparsity application.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the insightful comments and the recommendation for major revision. We address the major comments point by point below, proposing revisions where appropriate to improve the clarity and rigor of the manuscript.

read point-by-point responses

Referee: [§3.1] §3.1 (Pseudo-label construction): The offline search for generating pseudo labels lacks detail on the enumeration strategy, budget, heuristics, or any verification that selected head sets are optimal (e.g., no comparison to exhaustive search on small cases). This is load-bearing because the router is supervised directly on these labels; if the search misses superior combinations or introduces bias, reported gains may reflect label artifacts rather than learned query-dependent routing.

Authors: We agree that more details are needed on the pseudo-label construction in §3.1. In the revised manuscript, we will provide a detailed description of the enumeration strategy, including the specific search algorithm, budget constraints, and heuristics employed. Additionally, we will include a verification experiment comparing our search results to exhaustive search on small-scale cases to demonstrate that the selected head sets are optimal or near-optimal. This will help confirm that the pseudo labels are reliable and not artifacts of the search process. revision: yes
Referee: [§4] §4 (Experiments): No ablation is reported comparing the trained router against an oracle head set (optimal heads found by search on held-out queries) or measuring router generalization error on unseen queries. Without this, it is impossible to confirm that outperformance stems from effective routing rather than the router inheriting search biases or overfitting to the pseudo-label distribution.

Authors: We acknowledge the importance of validating the router against an oracle and assessing generalization. We will add an ablation study reporting the router's prediction accuracy on a held-out validation set of queries to measure generalization error. However, performing the full offline search to obtain oracle head sets for all held-out test queries is computationally prohibitive given the scale of our experiments. We will explicitly discuss this limitation in the revised paper and argue that the consistent performance gains over static baselines indicate effective query-dependent routing rather than mere inheritance of search biases. revision: partial

standing simulated objections not resolved

Obtaining oracle optimal head sets via search on the full held-out test queries, due to excessive computational requirements.

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper constructs pseudo labels for optimal head sets via an independent offline search process, then trains a router on query embeddings (from frozen LLM) and head embeddings using supervised loss plus sparsity regularizer. No equations or steps reduce the router's output or final performance claims to the inputs by construction. No self-citations, uniqueness theorems, or smuggled ansatzes are invoked as load-bearing. Experimental outperformance on benchmarks is presented as empirical validation rather than a mathematical identity. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The approach rests on the assumption that head informativeness is query-dependent and that pseudo labels from offline search provide reliable supervision; the router itself is a new learned component.

free parameters (1)

sparsity regularizer coefficient
Hyperparameter controlling how many heads the router selects; value not specified in abstract.

axioms (1)

domain assumption Attention heads produce varying and sometimes conflicting relevance signals across queries
Invoked to justify moving from static or full aggregation to per-query selection.

invented entities (1)

Lightweight query-to-head router with learnable head embeddings no independent evidence
purpose: Maps query embeddings to sparse optimal head sets
New component introduced to solve the selection problem; no independent evidence provided beyond the method itself.

pith-pipeline@v0.9.0 · 5509 in / 1160 out tokens · 34777 ms · 2026-05-08T01:56:45.518156+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 32 canonical work pages · 7 internal anchors

[1]

Vera Boteva, Demian Gholipour, Artem Sokolov, and Stefan Riezler. 2016. A full-text learning to rank dataset for medical information retrieval. InAdvances in Information Retrieval: 38th European Conference on IR Research, ECIR 2016, Padua, Italy, March 20–23, 2016. Proceedings 38. Springer, 716–722

2016
[2]

Shijie Chen, Bernal Jimenez Gutierrez, and Yu Su. 2025. Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers. InThe Thirteenth Inter- national Conference on Learning Representations

2025
[3]

Shouyuan Chen, Sherman Wong, Liangjian Chen, and Yuandong Tian. 2023. Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595(2023)

work page internal anchor Pith review arXiv 2023
[4]

Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel S Weld
[5]

Specter: Document-level representation learning using citation-informed transformers.arXiv preprint arXiv:2004.07180(2020)

work page arXiv 2004
[6]

Thomas Diggelmann, Jordan Boyd-Graber, Jannis Bulian, Massimiliano Ciaramita, and Markus Leippold. 2020. Climate-fever: A dataset for verification of real-world climate claims.arXiv preprint arXiv:2012.00614(2020)

work page arXiv 2020
[7]

Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, and Hao Peng. 2024. Data engineering for scaling language models to 128k context.arXiv preprint arXiv:2402.10171(2024)

work page arXiv 2024
[8]

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. 2023. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997 2, 1 (2023)

work page internal anchor Pith review arXiv 2023
[9]

Faegheh Hasibi, Fedor Nikolaev, Chenyan Xiong, Krisztian Balog, Svein Erik Bratsberg, Alexander Kotov, and Jamie Callan. 2017. DBpedia-entity v2: a test collection for entity search. InProceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. 1265–1268

2017
[10]

Gautier Izacard and Edouard Grave. 2021. Distilling Knowledge from Reader to Retriever for Question Answering. InInternational Conference on Learning Representations. https://openreview.net/forum?id=NTEz-6wysdb

2021
[11]

Vitor Jeronymo, Mauricio Nascimento, Roberto Lotufo, and Rodrigo Nogueira
[12]

arXiv:2209.13738 [cs.CL] https://arxiv.org/abs/2209.13738

mRobust04: A Multilingual Version of the TREC Robust 2004 Benchmark. arXiv:2209.13738 [cs.CL] https://arxiv.org/abs/2209.13738

work page arXiv 2004
[13]

Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, and Xia Hu. 2024. Llm maybe longlm: Self-extend llm context window without tuning.arXiv preprint arXiv:2401.01325(2024)

work page arXiv 2024
[14]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Com...

work page doi:10.18653/v1/2020.emnlp-main.550 2020
[15]

Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Infor- mation Retrieval(Virtual Event, China)(SIGIR ’20). Association for Computing Machinery, New York, NY , USA, 39–48. doi:10.114...

work page doi:10.1145/3397271.3401075 2020
[16]

Toutanova, Llion Jones, Ming-Wei Chang, An- drew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Matthew Kelcey, Jacob Devlin, Kenton Lee, Kristina N. Toutanova, Llion Jones, Ming-Wei Chang, An- drew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Questions: a Benchmark for Question Answering Research....

2019
[17]

Yibin Lei, Liang Ding, Yu Cao, Changtong Zan, Andrew Yates, and Dacheng Tao. 2023. Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training. InFindings of the Association for Computational Linguistics: ACL 2023, Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 10932–1...

work page doi:10.18653/v1/ 2023
[18]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems33 (2020), 9459–9474

2020
[19]

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michi- hiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, et al
[20]

Holistic evaluation of language models.arXiv preprint arXiv:2211.09110 (2022)

work page internal anchor Pith review arXiv 2022
[21]

Xueguang Ma, Xinyu Zhang, Ronak Pradeep, and Jimmy Lin. 2023. Zero- shot listwise document reranking with a large language model.arXiv preprint arXiv:2305.02156(2023)

work page arXiv 2023
[22]

Macedo Maia, Siegfried Handschuh, André Freitas, Brian Davis, Ross McDer- mott, Manel Zarrouk, and Alexandra Balahur. 2018. WWW’18 Open Chal- lenge: Financial Opinion Mining and Question Answering. InCompanion Pro- ceedings of the The Web Conference 2018(<conf-loc>, <city>Lyon</city>, <country>France</country>, </conf-loc>)(WWW ’18). International World W...

work page doi:10.1145/3184558.3192301 2018
[23]

Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, and Maarten de Rijke. 2024. Ranked List Truncation for Large Language Model-based Re-Ranking. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 141–151

2024
[24]

Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, and Maarten de Rijke. 2025. Query Performance Prediction Using Relevance Judg- ments Generated by Large Language Models.ACM Transactions on Information Systems43, 4 (2025), 1–35

2025
[25]

Chuan Meng, Litu Ou, Sean MacAvaney, and Jeff Dalton. 2026. Revisiting Text Ranking in Deep Research. InProceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval

2026
[26]

Chuan Meng, Francesco Tonolini, Fengran Mo, Nikolaos Aletras, Emine Yilmaz, and Gabriella Kazai. 2025. Bridging the Gap: From Ad-hoc to Proactive Search in Conversations. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 64–74

2025
[27]

Paul Michel, Omer Levy, and Graham Neubig. 2019. Are sixteen heads really better than one?Advances in neural information processing systems32 (2019)

2019
[28]

Fengran Mo, Yifan Gao, Chuan Meng, Xin Liu, Zhuofeng Wu, Kelong Mao, Zhengyang Wang, Pei Chen, Zheng Li, Xian Li, et al. 2025. Uniconv: Unifying retrieval and response generation for large language models in conversations. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 6936–6949

2025
[29]

Fengran Mo, Kelong Mao, Ziliang Zhao, Hongjin Qian, Haonan Chen, Yiruo Cheng, Xiaoxi Li, Yutao Zhu, Zhicheng Dou, and Jian-Yun Nie. 2025. A survey of conversational search.ACM Transactions on Information Systems43, 6 (2025), 1–50

2025
[30]

Fengran Mo, Kelong Mao, Yutao Zhu, Yihong Wu, Kaiyu Huang, and Jian-Yun Nie. 2023. Convgqr: Generative query reformulation for conversational search. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 4998–5012

2023
[31]

Fengran Mo, Zhan Su, Yuchen Hui, Jinghan Zhang, Jia Ao Sun, Zheyuan Liu, Chao Zhang, Tetsuya Sakai, and Jian-Yun Nie. 2026. Opendecoder: Open large language model decoding to incorporate document quality in rag.arXiv preprint arXiv:2601.09028(2026)

work page arXiv 2026
[32]

Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernandez Abrego, Ji Ma, Vincent Zhao, Yi Luan, Keith Hall, Ming-Wei Chang, and Yinfei Yang
[33]

Large Dual Encoders Are Generalizable Retrievers

Large Dual Encoders Are Generalizable Retrievers. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Process- ing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 9844–9855. doi:10.18653/v1/2022.emnlp-main.669

work page doi:10.18653/v1/2022.emnlp-main.669 2022
[34]

Rodrigo Nogueira, Zhiying Jiang, and Jimmy Lin. 2020. Document ranking with a pretrained sequence-to-sequence model.arXiv preprint arXiv:2003.06713(2020)

work page arXiv 2020
[35]

Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, and Jimmy Lin. 2019. Multi-stage document ranking with BERT.arXiv preprint arXiv:1910.14424(2019)

work page arXiv 2019
[36]

Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Le Yan, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, et al. 2024. Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting. InFindings of the Association for Computational Linguistics: NAACL 2024. 1504–1518. SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Y ...

2024
[37]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Em- pirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association...

work page doi:10.18653/v1/d19-1410 2019
[38]

Devendra Sachan, Mike Lewis, Mandar Joshi, Armen Aghajanyan, Wen-tau Yih, Joelle Pineau, and Luke Zettlemoyer. 2022. Improving Passage Retrieval with Zero-Shot Question Generation. InProceedings of the 2022 Conference on Empir- ical Methods in Natural Language Processing. 3781–3797

2022
[39]

Hongjin SU, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han yu Wang, Liu Haisu, Quan Shi, Zachary S Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O Arik, Danqi Chen, and Tao Yu. 2025. BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval. InThe Thirteenth International Conference on Learning Representations....

2025
[40]

Zhan Su, Fengran Mo, Jinghan Zhang, Yuchen Hui, Jiaao Sun, and Jian yun Nie
[41]

arXiv:2511.17044 [cs.IR] https://arxiv.org/abs/2511.17044

Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters. arXiv:2511.17044 [cs.IR] https://arxiv.org/abs/2511.17044

work page arXiv
[42]

Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. 2023. Is ChatGPT Good at Search? Inves- tigating Large Language Models as Re-Ranking Agents. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for ...

work page doi:10.18653/v1/2023.emnlp-main.923 2023
[43]

Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogeneous Benchmark for Zero-shot Evalu- ation of Information Retrieval Models. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). https://openreview.net/forum?id=wCu6T5xFjeJ

2021
[44]

James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal
[45]

FEVER: a large-scale dataset for fact extraction and VERification.arXiv preprint arXiv:1803.05355(2018)

work page internal anchor Pith review arXiv 2018
[46]

Yuxing Tian, Fengran Mo, Weixu Zhang, Yiyan Qi, and Jian-Yun Nie. 2026. ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting. In Findings of the Association for Computational Linguistics: EACL 2026, Vera Demberg, Kentaro Inui, and Lluís Marquez (Eds.). Association for Computational Linguistics, Rabat, Morocco, 1282–1295. doi:10.18653/v1...

work page doi:10.18653/v1/2026.findings-eacl.66 2026
[47]

Linh Tran, Yulong Li, Radu Florian, and Wei Sun. 2025. Contrastive Retrieval Heads Improve Attention-Based Re-Ranking. arXiv:2510.02219 [cs.IR] https: //arxiv.org/abs/2510.02219

work page arXiv 2025
[48]

Elena V oita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. 2019. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5797–5808

2019
[49]

Ellen V oorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, William R Hersh, Kyle Lo, Kirk Roberts, Ian Soboroff, and Lucy Lu Wang. 2021. TREC-COVID: constructing a pandemic information retrieval test collection. In ACM SIGIR Forum, V ol. 54. ACM New York, NY , USA, 1–12

2021
[50]

David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi. 2020. Fact or Fiction: Verifying Scientific Claims. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Lingui...

work page doi:10.18653/v1/2020.emnlp-main.609 2020
[51]

Yan Wang, Yi Han, Lingfei Qian, Yueru He, Xueqing Peng, Dongji Feng, Zhuohan Xie, Vincent Jim Zhang, Rosie Guo, Fengran Mo, et al . 2026. Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation.arXiv preprint arXiv:2602.16990(2026)

work page arXiv 2026
[52]

Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, and Yao Fu. 2025. Re- trieval Head Mechanistically Explains Long-Context Factuality. InThe Thirteenth International Conference on Learning Representations

2025
[53]

Shitao Xiao, Zheng Liu, Peitian Zhang, Niklas Muennighoff, Defu Lian, and Jian-Yun Nie. 2024. C-Pack: Packed Resources For General Chinese Embeddings. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Machinery, New York, NY , USA, 641...

work page arXiv 2024
[54]

Zhichao Xu, Fengran Mo, Zhiqi Huang, Crystina Zhang, Puxuan Yu, Bei Wang, Jimmy Lin, and Vivek Srikumar. 2025. A survey of model architectures in information retrieval.arXiv preprint arXiv:2502.14822(2025)

work page arXiv 2025
[55]

Jinghan Zhang, Fengran Mo, Xiting Wang, and Kunpeng Liu. 2024. Blind spot navigation in llm reasoning with thought space explorer.arXiv preprint arXiv:2410.24155(2024)

work page arXiv 2024
[56]

Jinghan Zhang, Fengran Mo, Tharindu Cyril Weerasooriya, Ruimin Dai, Xi- aoyan Han, Yanjie Fu, Dakuo Wang, and Kunpeng Liu. 2026. StaRPO: Stability- Augmented Reinforcement Policy Optimization.arXiv preprint arXiv:2604.08905 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[57]

Jinghan Zhang, Xiting Wang, Weijieying Ren, Lu Jiang, Dongjie Wang, and Kunpeng Liu. 2025. Ratt: A thought structure for coherent and correct llm reasoning. InProceedings of the AAAI Conference on Artificial Intelligence, V ol. 39. 26733–26741

2025
[58]

Weixu Zhang, Fanghua Ye, Qiang Gao, Jian Li, Haolun Wu, Yuxing Tian, Sijing Duan, Nan Du, Xiaolong Li, and Xue Liu. 2026. Context-Fidelity Boosting: Enhancing Faithful Generation through Watermark-Inspired Decoding. arXiv:2604.22335 [cs.CL] https://arxiv.org/abs/2604.22335

work page internal anchor Pith review Pith/arXiv arXiv 2026
[59]

Wuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen, and Xi Ye. 2025. Query- Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 23791–23805. doi:10.18653/v1/2025.emnlp-main.1214

work page doi:10.18653/v1/2025.emnlp-main.1214 2025
[60]

Weixu Zhang, Ye Yuan, Changjiang Han, Yuxing Tian, Zipeng Sun, Linfeng Du, Jikun Kang, Hong Kang, Xue Liu, and Haolun Wu. 2026. Preference Heads in Large Language Models: A Mechanistic Framework for Interpretable Personaliza- tion. arXiv:2604.22345 [cs.CL] https://arxiv.org/abs/2604.22345

work page internal anchor Pith review Pith/arXiv arXiv 2026