pith. sign in

hub Mixed citations

Multilingual E5 Text Embeddings: A Technical Report

Mixed citation behavior. Most common role is method (43%).

78 Pith papers citing it
Method 43% of classified citations
abstract

This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023. Three embedding models of different sizes (small / base / large) are provided, offering a balance between the inference efficiency and embedding quality. The training procedure adheres to the English E5 model recipe, involving contrastive pre-training on 1 billion multilingual text pairs, followed by fine-tuning on a combination of labeled datasets. Additionally, we introduce a new instruction-tuned embedding model, whose performance is on par with state-of-the-art, English-only models of similar sizes. Information regarding the model release can be found at https://github.com/microsoft/unilm/tree/master/e5 .

hub tools

citation-role summary

method 6 baseline 4 background 2 dataset 1 other 1

citation-polarity summary

years

2026 72 2025 6

clear filters

representative citing papers

LMEB: Long-horizon Memory Embedding Benchmark

cs.CL · 2026-03-13 · unverdicted · novelty 7.0

LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.

Universal Encoders for Modular Relational Deep Learning

cs.LG · 2026-06-19 · unverdicted · novelty 6.0

Proposes a pretrained Universal Row Encoder using transformers and global statistics to generate table-width invariant row embeddings for modular relational graph models, claiming improved transfer, convergence, and memory on RelBench.

FIGMA: Towards FIne-Grained Music retrievAl

cs.SD · 2026-06-04 · unverdicted · novelty 6.0

FIGMA proposes a multi-view contrastive architecture plus the FGMCaps dataset to retrieve music from fine-grained textual descriptions of musical attributes, reporting up to 73.3% relative gains over CLAP baselines.

Boosting Self-Consistency with Ranking

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • ATIR: Towards Audio-Text Interleaved Contextual Retrieval cs.SD · 2026-04-22 · unverdicted · none · ref 43 · internal anchor

    Defines ATIR task and benchmark for mixed audio-text queries; MLLM model with token compression shows substantial gains over strong baselines.

  • FIGMA: Towards FIne-Grained Music retrievAl cs.SD · 2026-06-04 · unverdicted · none · ref 12 · internal anchor

    FIGMA proposes a multi-view contrastive architecture plus the FGMCaps dataset to retrieve music from fine-grained textual descriptions of musical attributes, reporting up to 73.3% relative gains over CLAP baselines.