Benchmarking information retrieval models on complex retrieval tasks.arXiv preprint arXiv:2509.07253,

Julian Killingback, Hamed Zamani · 2025 · arXiv 2509.07253

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving

cs.IR · 2026-05-13 · conditional · novelty 7.0 · 2 refs

LeanSearch v2 recovers 46.1% of ground-truth premise groups for research-level Lean 4 theorems within 10 candidates and raises fixed-loop proof success to 20%.

Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation

cs.IR · 2026-04-22 · unverdicted · novelty 7.0

An LLM simulation framework generates multilingual tip-of-the-tongue queries, validated by rank correlation with real queries, producing the first large-scale ToT benchmarks for four languages.

Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines

cs.IR · 2026-04-24 · unverdicted · novelty 5.0

QPP methods can select query variants that boost end-to-end RAG quality over the original query, though retrieval-optimized variants often fail to produce the best generated answers, revealing a utility gap.

Reproducing Adaptive Reranking for Reasoning-Intensive IR

cs.IR · 2026-04-30 · unverdicted · novelty 2.0

Reproducing GAR on BRIGHT shows it boosts reasoning-intensive retrieval effectiveness with low overhead when the reranker's signal quality is strong.

citing papers explorer

Showing 4 of 4 citing papers.

LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving cs.IR · 2026-05-13 · conditional · none · ref 11 · 2 links
LeanSearch v2 recovers 46.1% of ground-truth premise groups for research-level Lean 4 theorems within 10 candidates and raises fixed-loop proof success to 20%.
Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation cs.IR · 2026-04-22 · unverdicted · none · ref 25
An LLM simulation framework generates multilingual tip-of-the-tongue queries, validated by rank correlation with real queries, producing the first large-scale ToT benchmarks for four languages.
Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines cs.IR · 2026-04-24 · unverdicted · none · ref 48
QPP methods can select query variants that boost end-to-end RAG quality over the original query, though retrieval-optimized variants often fail to produce the best generated answers, revealing a utility gap.
Reproducing Adaptive Reranking for Reasoning-Intensive IR cs.IR · 2026-04-30 · unverdicted · none · ref 16
Reproducing GAR on BRIGHT shows it boosts reasoning-intensive retrieval effectiveness with low overhead when the reranker's signal quality is strong.

Benchmarking information retrieval models on complex retrieval tasks.arXiv preprint arXiv:2509.07253,

fields

years

verdicts

representative citing papers

citing papers explorer