MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

Jason Lee; Laxman Dhulipala; Majid Hadian; Rajesh Jayaram; Vahab Mirrokni

arxiv: 2405.19504 · v2 · pith:2CCYNU7Qnew · submitted 2024-05-29 · 💻 cs.DS · cs.DB· cs.IR

MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

Laxman Dhulipala , Majid Hadian , Rajesh Jayaram , Jason Lee , Vahab Mirrokni This is my paper

classification 💻 cs.DS cs.DBcs.IR

keywords multi-vectorretrievalmodelsmuverasimilarityembeddingfdesrecall

0 comments

read the original abstract

Neural embedding models have become a fundamental component of modern information retrieval (IR) pipelines. These models produce a single embedding $x \in \mathbb{R}^d$ per data-point, allowing for fast retrieval via highly optimized maximum inner product search (MIPS) algorithms. Recently, beginning with the landmark ColBERT paper, multi-vector models, which produce a set of embedding per data point, have achieved markedly superior performance for IR tasks. Unfortunately, using these models for IR is computationally expensive due to the increased complexity of multi-vector retrieval and scoring. In this paper, we introduce MUVERA (MUlti-VEctor Retrieval Algorithm), a retrieval mechanism which reduces multi-vector similarity search to single-vector similarity search. This enables the usage of off-the-shelf MIPS solvers for multi-vector retrieval. MUVERA asymmetrically generates Fixed Dimensional Encodings (FDEs) of queries and documents, which are vectors whose inner product approximates multi-vector similarity. We prove that FDEs give high-quality $\epsilon$-approximations, thus providing the first single-vector proxy for multi-vector similarity with theoretical guarantees. Empirically, we find that FDEs achieve the same recall as prior state-of-the-art heuristics while retrieving 2-5$\times$ fewer candidates. Compared to prior state of the art implementations, MUVERA achieves consistently good end-to-end recall and latency across a diverse set of the BEIR retrieval datasets, achieving an average of 10$\%$ improved recall with $90\%$ lower latency.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Multi-Vector Embeddings are Provably More Expressive than Single Vector Embeddings
cs.DS 2026-06 unverdicted novelty 8.0

Proves that for bounded n, there exist MV embeddings with m vectors each whose Chamfer matrix requires single-vector dimension D = (ε² m)^Ω(1/ε) to approximate within ε, separating MV from SV expressiveness.
ANN Search: Recall What Matters
cs.IR 2026-06 conditional novelty 6.0

ANN search quality is better assessed by 1/Ratio@k than Recall@k because the former tracks downstream task utility more closely while allowing substantially lower computational cost.
Diagnosable ColBERT: Debugging Late-Interaction Retrieval Models Using a Learned Latent Space as Reference
cs.IR 2026-04 unverdicted novelty 5.0

Diagnosable ColBERT aligns ColBERT embeddings to an expert-grounded clinical latent space to enable direct diagnosis of model misunderstandings and better training data curation.
ClinicalAligner26AM: A Cross-Lingual Aligner for Dataset Translation; Evidences from the MultiClinCorpus Shared Task
cs.CL 2026-06 unverdicted novelty 4.0

ClinicalAligner26AM tops the MultiClinCorpus shared task by distilling Sinkhorn-sharpened multi-level alignments into a clinical encoder for projecting Spanish entity annotations to six target languages with F1 above 0.95.