ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.
arXiv preprint arXiv:2506.12364 , year=
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8verdicts
UNVERDICTED 8representative citing papers
FES-RAG reframes multimodal RAG as fragment-level selection using Fragment Information Gain to outperform document-level methods with up to 27% relative CIDEr gains on M2RAG while shortening context.
ELVA applies ranking-driven RLVR to multimodal retrieval to reduce grain blindness in contrastive learning, reporting SOTA results and a 13.1% gain on the new MRBench benchmark.
miniReranker reduces multimodal reranking runtime to under 1% of the dense baseline under high-reuse conditions while retaining over 96% of performance via vision-first prompting, early exit, sparse cross-segment attention, and embedder-guided token pruning.
MEG-RAG defines a new MEG metric based on Semantic Certainty Anchoring and trains a multimodal reranker to select evidence aligned with ground-truth semantic anchors, yielding higher accuracy and consistency on the M²RAG benchmark.
GR2 applies mid-training on semantic IDs, reasoning distillation, RL with conditional verifiable rewards, and a context compressor to re-ranking in industrial recsys, reporting +18.7% R@1 over baselines.
DocRetriever introduces a framework using layout-aware sparse embeddings for hybrid encoding without OCR and a generalizable reasoning-augmented reranker for few-shot settings, plus the MultiDocR benchmark for evaluation.
A re-ranking system for rich-media search that plans query intents from sessions, adds visual signals from VLMs, and uses an LLM to score results on multiple facets before multi-task RL adaptation, with reported gains in engagement after industrial deployment.
citing papers explorer
-
Very Efficient Listwise Multimodal Reranking for Long Documents
ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.
-
Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG
FES-RAG reframes multimodal RAG as fragment-level selection using Fragment Information Gain to outperform document-level methods with up to 27% relative CIDEr gains on M2RAG while shortening context.
-
ELVA: Exploring Ranking-Driven Universal Multimodal Retrieval
ELVA applies ranking-driven RLVR to multimodal retrieval to reduce grain blindness in contrastive learning, reporting SOTA results and a 13.1% gain on the new MRBench benchmark.
-
miniReranker: Efficient Multimodal Reranking through Visual Cache Reuse and Interaction Sparsity
miniReranker reduces multimodal reranking runtime to under 1% of the dense baseline under high-reuse conditions while retaining over 96% of performance via vision-first prompting, early exit, sparse cross-segment attention, and embedder-guided token pruning.
-
MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG
MEG-RAG defines a new MEG metric based on Semantic Certainty Anchoring and trains a multimodal reranker to select evidence aligned with ground-truth semantic anchors, yielding higher accuracy and consistency on the M²RAG benchmark.
-
GR2 Technical Report
GR2 applies mid-training on semantic IDs, reasoning distillation, RL with conditional verifiable rewards, and a context compressor to re-ranking in industrial recsys, reporting +18.7% R@1 over baselines.
-
DocRetriever: A Plug-and-Play Framework for Multimodal Document Retrieval with Comprehensive Benchmark
DocRetriever introduces a framework using layout-aware sparse embeddings for hybrid encoding without OCR and a generalizable reasoning-augmented reranker for few-shot settings, plus the MultiDocR benchmark for evaluation.
-
Rich-Media Re-Ranker: A User Satisfaction-Driven LLM Re-ranking Framework for Rich-Media Search
A re-ranking system for rich-media search that plans query intents from sessions, adds visual signals from VLMs, and uses an LLM to score results on multiple facets before multi-task RL adaptation, with reported gains in engagement after industrial deployment.