hub Mixed citations

Multilingual E5 Text Embeddings: A Technical Report

Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei · 2024 · cs.CL · arXiv 2402.05672

Mixed citation behavior. Most common role is method (43%).

79 Pith papers citing it

Method 43% of classified citations

open full Pith review browse 79 citing papers arXiv PDF

abstract

This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023. Three embedding models of different sizes (small / base / large) are provided, offering a balance between the inference efficiency and embedding quality. The training procedure adheres to the English E5 model recipe, involving contrastive pre-training on 1 billion multilingual text pairs, followed by fine-tuning on a combination of labeled datasets. Additionally, we introduce a new instruction-tuned embedding model, whose performance is on par with state-of-the-art, English-only models of similar sizes. Information regarding the model release can be found at https://github.com/microsoft/unilm/tree/master/e5 .

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 6 baseline 4 background 2 dataset 1 other 1

citation-polarity summary

use method 6 baseline 4 background 2 unclear 1 use dataset 1

representative citing papers

ALEE: Any-Language Evaluation of Embeddings via English-Centric Minimal Pairs

cs.CL · 2026-06-30 · unverdicted · novelty 7.0

ALEE generates AMR-based English minimal pairs with fine-grained semantic shifts, translates them, and evaluates embedding models on 275+ languages to expose cross-lingual gaps linked to training data and tokenization.

HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions

cs.IR · 2026-06-22 · unverdicted · novelty 7.0

HAKARI-Bench reconstructs 35 benchmarks into 551 tasks across 43 languages, reproducing full MTEB, MMTEB, and BEIR rankings with Spearman correlation above 0.97 while supporting efficiency variant comparisons.

CORE-Bench: A Comprehensive Benchmark for Code Retrieval in the Era of Agentic Coding

cs.IR · 2026-06-10 · accept · novelty 7.0

CORE-Bench is a benchmark for code retrieval in agentic coding settings, built from curated tasks and SWE-bench instances, showing performance drops and gains from fine-tuning.

SEA-Embedding: Open and Reproducible Text Embeddings for Southeast Asia

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

SEA-Embedding is a fully open text embedding pipeline for Southeast Asian languages that achieves state-of-the-art performance on the SEA-BED benchmark by analyzing data composition, training objectives, and base encoder choices.

The Harder Text Embedding Benchmark (HTEB): Beyond One-dimensional Static Robustness

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

HTEB introduces dynamic, multi-axis evaluation of text embedding robustness using LLM transformations, finding decoupled profiles across models and that scaling does not close all robustness gaps.

IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.

Temporal Decay of Co-Citation Predictability: A 20-Year Statute Retrieval Benchmark from 396M Ukrainian Court Citations

cs.CL · 2026-05-17 · conditional · novelty 7.0

Co-citation predictability for statute retrieval decays over 20 years in Ukrainian court data, dropping 33-47% in MRR with non-uniform patterns across legal domains.

How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

DAPRO provides the first dynamic, theoretically guaranteed way to allocate interaction budgets across test cases for bounding time-to-event in multi-turn LLM evaluations, achieving tighter coverage than static conformal survival methods.

Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

EPIC trains LLMs to treat continuous embeddings as in-context prompts, yielding state-of-the-art text embedding performance on MTEB with or without prompts at inference and lower compute.

ATIR: Towards Audio-Text Interleaved Contextual Retrieval

cs.SD · 2026-04-22 · unverdicted · novelty 7.0

Defines ATIR task and benchmark for mixed audio-text queries; MLLM model with token compression shows substantial gains over strong baselines.

Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers

cs.IR · 2026-04-19 · unverdicted · novelty 7.0

Code-switching creates a fundamental performance bottleneck for multilingual retrievers, causing drops of up to 27% on new benchmarks CSR-L and CS-MTEB, with embedding divergence as the key cause and vocabulary expansion insufficient to fix it.

Claim2Vec: Embedding Fact-Check Claims for Multilingual Similarity and Clustering

cs.CL · 2026-04-10 · unverdicted · novelty 7.0

Claim2Vec is a contrastively fine-tuned multilingual encoder that improves claim clustering performance and embedding space structure on multilingual fact-check datasets.

LMEB: Long-horizon Memory Embedding Benchmark

cs.CL · 2026-03-13 · unverdicted · novelty 7.0

LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

cs.IR · 2026-02-13 · unverdicted · novelty 7.0

SQuTR aggregates 37k queries from six text retrieval datasets, synthesizes speech from 200 speakers, adds 17 noise categories at varying SNR, and shows that even large retrieval models degrade sharply under extreme acoustic noise.

MultiSynt/MT: Trillion-Token Multi-Parallel Pre-Training Data Translated Across 36 Languages

cs.CL · 2026-07-01 · unverdicted · novelty 6.0

MultiSynt/MT supplies 4.8 trillion translated tokens in 36 languages from 100B English tokens, letting LLMs match native-data baselines with 72% fewer tokens and beat them by 15% at equal budget.

Toward a Hybrid Digital Twin of Society: Quantifying Cognitive-Spatial Linkages Through Online-Offline Feedback Networks

physics.soc-ph · 2026-06-25 · unverdicted · novelty 6.0

A Feedback Network model is developed showing online semantic exploration is more concentrated than physical mobility, with stable retail-business linkages and greater COVID disruption to spatial than cognitive routines, as a step toward hybrid digital twins of society.

EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory

cs.CL · 2026-06-19 · unverdicted · novelty 6.0

EvoEmbedding generates evolvable embeddings via a latent memory updated during sequential processing, outperforming larger models on long-context retrieval and generalizing to 10x longer contexts in downstream tasks.

Universal Encoders for Modular Relational Deep Learning

cs.LG · 2026-06-19 · unverdicted · novelty 6.0

Proposes a pretrained Universal Row Encoder using transformers and global statistics to generate table-width invariant row embeddings for modular relational graph models, claiming improved transfer, convergence, and memory on RelBench.

ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElection

cs.AI · 2026-06-17 · unverdicted · novelty 6.0

ARIADNE routes queries to the best adapter via embedding-space centroid proximity, recovering 97.44% of upper-bound performance on 23 NLP tasks and 89.7% selection accuracy on 44 tasks without training or internal access.

A Unified Framework for Context-Aware and Relation-Aware Graph Retrieval-Augmented Generation

cs.AI · 2026-06-16 · unverdicted · novelty 6.0

HyGRAG is a hierarchical graph RAG framework that constructs LLM summaries over hybrid chunk-entity graphs, retrieves via context and relation awareness across levels, and enables dynamic updates, reporting a 9.7% average accuracy gain on multi-hop reasoning tasks.

Bounding Box Label Propagation for Re-Annotation of Document Layout Analysis Datasets

cs.CV · 2026-06-16 · unverdicted · novelty 6.0

BBLP uses a multi-modal object encoder for label propagation in object detection and reaches 81.6% of fully-supervised mAP on D4LA with only 10% labelled data.

Unlocking Latent Value: Taxonomy-Guided Recovery of High-Performing Data from Low-Tier Web Corpora

cs.CL · 2026-06-05 · unverdicted · novelty 6.0

A multi-dimensional taxonomy filtering approach recovers high-performing data from deprioritized web corpora, with filtered low-tier subsets outperforming unfiltered top-tier data on reasoning and coding benchmarks.

FIGMA: Towards FIne-Grained Music retrievAl

cs.SD · 2026-06-04 · unverdicted · novelty 6.0

FIGMA proposes a multi-view contrastive architecture plus the FGMCaps dataset to retrieve music from fine-grained textual descriptions of musical attributes, reporting up to 73.3% relative gains over CLAP baselines.

StoryVideoQA: Scaling Deep Video Understanding with a Large-Scale, Multi-Genre and Auto-Generated Dataset

cs.CV · 2026-06-04 · unverdicted · novelty 6.0

StoryVideoQA provides the largest auto-generated deep video understanding dataset to date with 363K QAs across TV and movies, paired with the PlotTree agent for hierarchical plot-based reasoning that existing VideoQA models struggle to match.

citing papers explorer

Showing 19 of 19 citing papers after filters.

HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions cs.IR · 2026-06-22 · unverdicted · none · ref 136 · internal anchor
HAKARI-Bench reconstructs 35 benchmarks into 551 tasks across 43 languages, reproducing full MTEB, MMTEB, and BEIR rankings with Spearman correlation above 0.97 while supporting efficiency variant comparisons.
CORE-Bench: A Comprehensive Benchmark for Code Retrieval in the Era of Agentic Coding cs.IR · 2026-06-10 · accept · none · ref 28 · internal anchor
CORE-Bench is a benchmark for code retrieval in agentic coding settings, built from curated tasks and SWE-bench instances, showing performance drops and gains from fine-tuning.
Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers cs.IR · 2026-04-19 · unverdicted · none · ref 50 · internal anchor
Code-switching creates a fundamental performance bottleneck for multilingual retrievers, causing drops of up to 27% on new benchmarks CSR-L and CS-MTEB, with embedding divergence as the key cause and vocabulary expansion insufficient to fix it.
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise cs.IR · 2026-02-13 · unverdicted · none · ref 36 · internal anchor
SQuTR aggregates 37k queries from six text retrieval datasets, synthesizes speech from 200 speakers, adds 17 noise categories at varying SNR, and shows that even large retrieval models degrade sharply under extreme acoustic noise.
MIMO: Multilingual Information Retrieval via Monolingual Objectives cs.IR · 2026-05-29 · unverdicted · none · ref 2 · internal anchor
MIMO is a two-stage distillation-plus-contrastive framework that anchors multilingual embeddings to a monolingual English space and outperforms prior cross-lingual baselines on MLIR and multi-monolingual benchmarks.
MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal cs.IR · 2026-05-08 · unverdicted · none · ref 32 · internal anchor
MLAIRE is a protocol that evaluates multilingual retrievers on both semantic accuracy and query-language preference using parallel passages and new metrics like LPR and Lang-nDCG, showing that standard metrics hide distinct behavioral differences among retrievers.
JFinTEB: Japanese Financial Text Embedding Benchmark cs.IR · 2026-04-17 · unverdicted · none · ref 20 · internal anchor
JFinTEB is the first benchmark for evaluating Japanese financial text embeddings across retrieval and classification tasks derived from realistic financial scenarios.
HIVE: Query, Hypothesize, Verify An LLM Framework for Multimodal Reasoning-Intensive Retrieval cs.IR · 2026-04-08 · unverdicted · none · ref 35 · internal anchor
HIVE raises multimodal retrieval nDCG@10 to 41.7 on the MM-BRIGHT benchmark by inserting LLM-driven hypothesis generation and verification between retrieval passes, delivering +9.5 over the best text-only baseline and +14.1 over the best multimodal baseline.
Learning to Retrieve from Agent Trajectories cs.IR · 2026-03-30 · conditional · none · ref 16 · internal anchor
Retrievers trained on agent trajectories via the LRAT framework improve evidence recall, task success, and efficiency in agentic search benchmarks.
Reliable Evaluation Protocol for Low-Precision Retrieval cs.IR · 2025-08-05 · unverdicted · none · ref 10 · internal anchor
Proposes High-Precision Scoring (HPS) and Tie-aware Retrieval Metrics (TRM) to reduce tie-induced instability in low-precision retrieval evaluation.
Lost in the Evidence? Reproducing Document Position and Context Size Effects in RAG cs.IR · 2026-05-26 · unverdicted · none · ref 20 · internal anchor
Reproducibility study shows position and context size effects in RAG depend on topic sampling and retrieval quality, proposes calibration for stable trends, and releases code after finding discrepancies with prior industry work.
RAGEAR: Retrieval-Augmented Graph-Enhanced Academic Recommender cs.IR · 2026-05-26 · unverdicted · none · ref 22 · internal anchor
RAGEAR improves course recommendation ranking by fusing transcript retrieval with symbolic knowledge graph filtering and a custom aggregation function over a transcript-only baseline.
TextClusterLab: An Integrated Framework for Reliable Text Clustering Studies cs.IR · 2026-05-17 · unverdicted · none · ref 35 · internal anchor
TextClusterLab introduces an LLM-driven generator for synthetic text clustering datasets with tunable attributes and a suitability benchmark for evaluation.
On the Representational Limits of Quantum-Inspired 1024-D Document Embeddings: An Experimental Evaluation Framework cs.IR · 2026-04-10 · unverdicted · none · ref 25 · internal anchor
Quantum-inspired 1024-D document embeddings exhibit weak, unstable ranking performance and structural geometric limitations, performing better as auxiliary components in hybrid lexical-embedding retrieval systems.
Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms cs.IR · 2025-12-05 · unverdicted · none · ref 20 · internal anchor
ELERAG integrates Wikidata entity linking with hybrid RRF re-ranking into RAG and outperforms baselines on a custom Italian academic dataset while cross-encoder methods win on the general SQuAD-it dataset.
Improving Korean-English Cross-Lingual Retrieval: A Data-Centric Study of Language Composition and Model Merging cs.IR · 2025-07-11 · unverdicted · none · ref 36 · internal anchor
Language composition in training data creates opposing effects on CLIR and mono-IR performance for Korean-English retrieval, which model merging can partially resolve.
Granite Embedding Multilingual R2 Models cs.IR · 2026-05-13 · unverdicted · none · ref 18 · internal anchor
Granite Embedding Multilingual R2 releases 311M and 97M parameter bi-encoder models that achieve state-of-the-art retrieval performance on multilingual text, code, long-document, and reasoning datasets.
Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation cs.IR · 2026-04-23 · unverdicted · none · ref 34 · internal anchor
A distillation technique embeds LLM-generated textual user profiles into efficient sequential recommenders without runtime LLM inference, architectural changes, or fine-tuning.
HR-Agents: Using Multiple LLM-based Agents to Improve Q&A about Brazilian Labor Legislation cs.IR · 2026-03-13 · unverdicted · none · ref 26 · internal anchor
A multi-agent LLM system using CrewAI and RAG improves response coherence and correctness over a single-LLM RAG baseline for Brazilian labor law Q&A.

Multilingual E5 Text Embeddings: A Technical Report

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer