pith. sign in

hub Mixed citations

Multilingual E5 Text Embeddings: A Technical Report

Mixed citation behavior. Most common role is method (43%).

65 Pith papers citing it
Method 43% of classified citations
abstract

This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023. Three embedding models of different sizes (small / base / large) are provided, offering a balance between the inference efficiency and embedding quality. The training procedure adheres to the English E5 model recipe, involving contrastive pre-training on 1 billion multilingual text pairs, followed by fine-tuning on a combination of labeled datasets. Additionally, we introduce a new instruction-tuned embedding model, whose performance is on par with state-of-the-art, English-only models of similar sizes. Information regarding the model release can be found at https://github.com/microsoft/unilm/tree/master/e5 .

hub tools

citation-role summary

method 6 baseline 4 background 2 dataset 1 other 1

citation-polarity summary

years

2026 59 2025 6

clear filters

representative citing papers

LMEB: Long-horizon Memory Embedding Benchmark

cs.CL · 2026-03-13 · unverdicted · novelty 7.0

LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.

Boosting Self-Consistency with Ranking

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.

MIMO: Multilingual Information Retrieval via Monolingual Objectives

cs.IR · 2026-05-29 · unverdicted · novelty 6.0

MIMO is a two-stage distillation-plus-contrastive framework that anchors multilingual embeddings to a monolingual English space and outperforms prior cross-lingual baselines on MLIR and multi-monolingual benchmarks.

An Annotation Scheme and Classifier for Personal Facts in Dialogue

cs.CL · 2026-05-11 · accept · novelty 6.0

An extended annotation scheme with new categories and attributes plus a Gemma-300M-based multi-head classifier achieves 81.6% macro F1 on personal fact classification, outperforming few-shot LLM baselines by nearly 9 points with lower compute.

MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal

cs.IR · 2026-05-08 · unverdicted · novelty 6.0

MLAIRE is a protocol that evaluates multilingual retrievers on both semantic accuracy and query-language preference using parallel passages and new metrics like LPR and Lang-nDCG, showing that standard metrics hide distinct behavioral differences among retrievers.

JFinTEB: Japanese Financial Text Embedding Benchmark

cs.IR · 2026-04-17 · unverdicted · novelty 6.0

JFinTEB is the first benchmark for evaluating Japanese financial text embeddings across retrieval and classification tasks derived from realistic financial scenarios.

citing papers explorer

Showing 1 of 1 citing paper after filters.