pith. sign in

hub Mixed citations

Multilingual E5 Text Embeddings: A Technical Report

Mixed citation behavior. Most common role is method (43%).

63 Pith papers citing it
Method 43% of classified citations
abstract

This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023. Three embedding models of different sizes (small / base / large) are provided, offering a balance between the inference efficiency and embedding quality. The training procedure adheres to the English E5 model recipe, involving contrastive pre-training on 1 billion multilingual text pairs, followed by fine-tuning on a combination of labeled datasets. Additionally, we introduce a new instruction-tuned embedding model, whose performance is on par with state-of-the-art, English-only models of similar sizes. Information regarding the model release can be found at https://github.com/microsoft/unilm/tree/master/e5 .

hub tools

citation-role summary

method 6 baseline 4 background 2 dataset 1 other 1

citation-polarity summary

years

2026 57 2025 6

clear filters

representative citing papers

LMEB: Long-horizon Memory Embedding Benchmark

cs.CL · 2026-03-13 · unverdicted · novelty 7.0

LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.

MIMO: Multilingual Information Retrieval via Monolingual Objectives

cs.IR · 2026-05-29 · unverdicted · novelty 6.0

MIMO is a two-stage distillation-plus-contrastive framework that anchors multilingual embeddings to a monolingual English space and outperforms prior cross-lingual baselines on MLIR and multi-monolingual benchmarks.

An Annotation Scheme and Classifier for Personal Facts in Dialogue

cs.CL · 2026-05-11 · accept · novelty 6.0

An extended annotation scheme with new categories and attributes plus a Gemma-300M-based multi-head classifier achieves 81.6% macro F1 on personal fact classification, outperforming few-shot LLM baselines by nearly 9 points with lower compute.

MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal

cs.IR · 2026-05-08 · unverdicted · novelty 6.0

MLAIRE is a protocol that evaluates multilingual retrievers on both semantic accuracy and query-language preference using parallel passages and new metrics like LPR and Lang-nDCG, showing that standard metrics hide distinct behavioral differences among retrievers.

JFinTEB: Japanese Financial Text Embedding Benchmark

cs.IR · 2026-04-17 · unverdicted · novelty 6.0

JFinTEB is the first benchmark for evaluating Japanese financial text embeddings across retrieval and classification tasks derived from realistic financial scenarios.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • Query-Conditioned Knowledge Alignment for Reliable Cross-System Medical Reasoning cs.AI · 2026-05-18 · conditional · none · ref 30 · internal anchor

    QCEA reformulates entity alignment as a query-conditioned ranking task with semantic encoding, graph learning, and direction-aware transformation to handle context-dependent, asymmetric correspondences in medical knowledge graphs.

  • Human-Inspired Context-Selective Multimodal Memory for Social Robots cs.AI · 2026-04-13 · unverdicted · none · ref 55 · internal anchor

    A new memory system for social robots selectively stores multimodal memories by emotional salience and novelty, achieving 0.506 Spearman correlation in selectivity and up to 13% better Recall@1 in multimodal retrieval.