ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking

Xianming Li , Aamir Shakir , Rui Huang , Tsz-fung Andrew Lee , Julius Lipp , Benjamin Clavi\'e , Jing Li

Authors on Pith no claims yet

classification 💻 cs.IR cs.CL

keywords rerankingmodelsprorankcomputationaldocumentlanguagelearningslms

read the original abstract

Reranking is fundamental to information retrieval and retrieval-augmented generation, with recent Large Language Models (LLMs) significantly advancing reranking quality. Most current works rely on large-scale LLMs (>7B parameters), presenting high computational costs. Small Language Models (SLMs) offer a promising alternative because of computational efficiency. However, our preliminary quantitative analysis reveals key limitations of SLMs: their representation space is narrow, leading to reduced expressiveness, and they struggle with understanding task prompts without fine-tuning. To address these issues, we introduce a novel two-stage training approach, ProRank, for SLM-based document reranking. We propose using reinforcement learning to improve the understanding of task prompts. Additionally, we introduce fine-grained score learning to enhance representation expressiveness and further improve document reranking quality. Extensive experiments suggest that ProRank consistently outperforms both the most advanced open-source and proprietary reranking models. Notably, our 0.5B ProRank even surpasses powerful LLM reranking models on the BEIR benchmark, establishing that properly trained SLMs can achieve superior document reranking performance while maintaining computational efficiency.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Semantic Recall for Vector Search
cs.IR 2026-04 unverdicted novelty 7.0

Semantic Recall is a new evaluation metric for approximate nearest neighbor search that focuses only on semantically relevant results, with Tolerant Recall as a proxy when relevance labels are unavailable.
Your LLM Agent Can Leak Your Data: Data Exfiltration via Backdoored Tool Use
cs.CR 2026-04 unverdicted novelty 6.0

Back-Reveal shows that LLM agents with tool access can be backdoored via fine-tuning to exfiltrate stored user context through memory and retrieval tool calls, with multi-turn interactions enabling sustained leakage.