hub

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, Zheng Liu · 2024 · cs.CL · arXiv 2402.03216

37 Pith papers cite this work. Polarity classification is still indexing.

37 Pith papers citing it

open full Pith review browse 37 citing papers arXiv PDF

abstract

In this paper, we introduce a new embedding model called M3-Embedding, which is distinguished for its versatility in \textit{Multi-Linguality}, \textit{Multi-Functionality}, and \textit{Multi-Granularity}. It provides a uniform support for the semantic retrieval of more than 100 working languages. It can simultaneously accomplish the three common retrieval functionalities: dense retrieval, multi-vector retrieval, and sparse retrieval. Besides, it is also capable of processing inputs of different granularities, spanning from short sentences to long documents of up to 8,192 tokens. The effective training of M3-Embedding presents a series of technical contributions. Notably, we propose a novel self-knowledge distillation approach, where the relevance scores from different retrieval functionalities can be integrated as the teacher signal to enhance the training quality. We also optimize the batching strategy, which enables a large batch size and high training throughput to improve the discriminativeness of embeddings. M3-Embedding exhibits a superior performance in our experiment, leading to new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.

hub tools

JSON dossier citing papers JSON arXiv source

claims ledger

abstract In this paper, we introduce a new embedding model called M3-Embedding, which is distinguished for its versatility in \textit{Multi-Linguality}, \textit{Multi-Functionality}, and \textit{Multi-Granularity}. It provides a uniform support for the semantic retrieval of more than 100 working languages. It can simultaneously accomplish the three common retrieval functionalities: dense retrieval, multi-vector retrieval, and sparse retrieval. Besides, it is also capable of processing inputs of different granularities, spanning from short sentences to long documents of up to 8,192 tokens. The effective

co-cited works

representative citing papers

Very Efficient Listwise Multimodal Reranking for Long Documents

cs.IR · 2026-05-12 · unverdicted · novelty 7.0

ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.

Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents

cs.CR · 2026-05-11 · unverdicted · novelty 7.0

Nautilus Compass is a black-box drift detector for production LLM agents that uses weighted cosine similarity on BGE-m3 embeddings of raw text against anchors, achieving 0.83 ROC AUC on real session traces while shipping as plugins and servers with an audit log.

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG

cs.IR · 2026-04-30 · unverdicted · novelty 7.0

FES-RAG reframes multimodal RAG as fragment-level selection using Fragment Information Gain to outperform document-level methods with up to 27% relative CIDEr gains on M2RAG while shortening context.

Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval

cs.IR · 2026-04-26 · accept · novelty 7.0

Prism-Reranker models output relevance, contribution statements, and evidence passages to support agentic retrieval beyond scalar scoring.

Latent Abstraction for Retrieval-Augmented Generation

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

LAnR unifies retrieval-augmented generation inside a single LLM by deriving dense retrieval vectors from a [PRED] token's hidden states and using entropy to adaptively stop retrieval, outperforming prior RAG on six QA benchmarks with better efficiency.

vstash: Local-First Hybrid Retrieval with Adaptive Fusion for LLM Agents

cs.IR · 2026-04-16 · conditional · novelty 7.0

vstash shows that hybrid retrieval disagreements provide a free training signal to fine-tune 33M-parameter embeddings, yielding NDCG@10 gains up to 19.5% on NFCorpus and matching some larger models on three of five BEIR datasets.

Sell More, Play Less: Benchmarking LLM Realistic Selling Skill

cs.CL · 2026-04-08 · conditional · novelty 7.0

SalesLLM provides an automatic evaluation framework for LLM sales dialogues that correlates 0.98 with human experts and shows top models approaching human performance while weaker ones lag.

Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

RACER routes between reasoning and non-reasoning LLM judges via constrained distributionally robust optimization to achieve better accuracy-cost trade-offs under distribution shift.

MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal

cs.IR · 2026-05-08 · unverdicted · novelty 6.0

MLAIRE is a protocol that evaluates multilingual retrievers on both semantic accuracy and query-language preference using parallel passages and new metrics like LPR and Lang-nDCG, showing that standard metrics hide distinct behavioral differences among retrievers.

QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization

cs.DB · 2026-05-04 · unverdicted · novelty 6.0

QuIVer constructs ANN graphs using only 2-bit sign-magnitude binary quantization for topology decisions, achieving at least 88% Recall@10 at high throughput with low memory on embedding datasets.

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.

Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation

cs.IR · 2026-04-24 · unverdicted · novelty 6.0

UAE trains bi-encoder retrievers to match LLM utility distributions via Utility-Modulated InfoNCE, yielding over 30% gains in Recall@1 and MAP on QASPER while running 180x faster than re-ranking.

QuantClaw: Precision Where It Matters for OpenClaw

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

QuantClaw dynamically routes precision in agent workflows to cut cost by up to 21.4% and latency by 15.7% while keeping or improving task performance.

From Tokens to Concepts: Leveraging SAE for SPLADE

cs.IR · 2026-04-23 · unverdicted · novelty 6.0

SAE-SPLADE substitutes SPLADE's backbone vocabulary with SAE-derived semantic concepts and matches standard SPLADE performance with better efficiency on in- and out-of-domain tasks.

To Know is to Construct: Schema-Constrained Generation for Agent Memory

cs.CL · 2026-04-22 · unverdicted · novelty 6.0

SCG-MEM reformulates agent memory access as schema-constrained generation within dynamic cognitive schemas, using assimilation and accommodation for updates plus an associative graph for reasoning, and outperforms retrieval baselines on the LoCoMo benchmark.

Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

cs.CV · 2026-04-20 · unverdicted · novelty 6.0 · 2 refs

OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.

Domain-oriented RAG Assessment (DoRA): Synthetic Benchmarking for RAG-based Question Answering on Defense Documents

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

DoRA is a new synthetic benchmark for RAG-based QA on defense documents where fine-tuning Llama3.1-8B-Instruct on it improves task success by up to 26% and cuts hallucination rates by 47%.

MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search

cs.IR · 2026-04-19 · unverdicted · novelty 6.0

MemSearch-o1 mitigates memory dilution in agentic LLM search through reasoning-aligned token-level memory growth, retracing with a contribution function, and path reorganization, improving reasoning activation on benchmarks.

BiCon-Gate: Consistency-Gated De-colloquialisation for Dialogue Fact-Checking

cs.CL · 2026-04-15 · unverdicted · novelty 6.0

BiCon-Gate improves dialogue fact-checking by applying staged de-colloquialisation and gating rewrites based on semantic consistency with context, yielding gains on the DialFact benchmark over baselines including LLM rewrites.

WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

WikiSeeker boosts KB-VQA performance by using VLMs to rewrite image-informed queries for better retrieval and to decide when to route to external LLM or rely on internal VLM knowledge.

LiquiLM: Bridging the Semantic Gap in Liquidity Flaw Audit via DCN and LLMs

cs.CR · 2026-04-04 · unverdicted · novelty 6.0

LiquiLM integrates LLMs and DCN to audit liquidity flaws in blockchain smart contracts, achieving over 90% F1-score and uncovering 238 high-risk contracts plus 10 CVE-certified vulnerabilities in real-world PoL and Ethereum contracts.

Muon is Scalable for LLM Training

cs.LG · 2025-02-24 · unverdicted · novelty 6.0

Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.

Personalized Deep Research: A User-Centric Framework, Dataset, and Hybrid Evaluation for Knowledge Discovery

cs.IR · 2026-05-11 · conditional · novelty 5.0

PDR is a user-context-aware framework for LLM research agents that improves report relevance over static baselines, supported by a new dataset and hybrid evaluation.

citing papers explorer

Showing 37 of 37 citing papers.

Very Efficient Listwise Multimodal Reranking for Long Documents cs.IR · 2026-05-12 · unverdicted · none · ref 50 · internal anchor
ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.
Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents cs.CR · 2026-05-11 · unverdicted · none · ref 4 · internal anchor
Nautilus Compass is a black-box drift detector for production LLM agents that uses weighted cosine similarity on BGE-m3 embeddings of raw text against anchors, achieving 0.83 ROC AUC on real session traces while shipping as plugins and servers with an audit log.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory cs.CL · 2026-05-01 · unverdicted · none · ref 2 · internal anchor
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG cs.IR · 2026-04-30 · unverdicted · none · ref 6 · internal anchor
FES-RAG reframes multimodal RAG as fragment-level selection using Fragment Information Gain to outperform document-level methods with up to 27% relative CIDEr gains on M2RAG while shortening context.
Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval cs.IR · 2026-04-26 · accept · none · ref 2 · internal anchor
Prism-Reranker models output relevance, contribution statements, and evidence passages to support agentic retrieval beyond scalar scoring.
Latent Abstraction for Retrieval-Augmented Generation cs.CL · 2026-04-20 · unverdicted · none · ref 5 · internal anchor
LAnR unifies retrieval-augmented generation inside a single LLM by deriving dense retrieval vectors from a [PRED] token's hidden states and using entropy to adaptively stop retrieval, outperforming prior RAG on six QA benchmarks with better efficiency.
vstash: Local-First Hybrid Retrieval with Adaptive Fusion for LLM Agents cs.IR · 2026-04-16 · conditional · none · ref 19 · internal anchor
vstash shows that hybrid retrieval disagreements provide a free training signal to fine-tune 33M-parameter embeddings, yielding NDCG@10 gains up to 19.5% on NFCorpus and matching some larger models on three of five BEIR datasets.
Sell More, Play Less: Benchmarking LLM Realistic Selling Skill cs.CL · 2026-04-08 · conditional · none · ref 6 · internal anchor
SalesLLM provides an automatic evaluation framework for LLM sales dialogues that correlates 0.98 with human experts and shows top models approaching human performance while weaker ones lag.
Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge cs.AI · 2026-05-11 · unverdicted · none · ref 4 · internal anchor
RACER routes between reasoning and non-reasoning LLM judges via constrained distributionally robust optimization to achieve better accuracy-cost trade-offs under distribution shift.
MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal cs.IR · 2026-05-08 · unverdicted · none · ref 33 · internal anchor
MLAIRE is a protocol that evaluates multilingual retrievers on both semantic accuracy and query-language preference using parallel passages and new metrics like LPR and Lang-nDCG, showing that standard metrics hide distinct behavioral differences among retrievers.
QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization cs.DB · 2026-05-04 · unverdicted · none · ref 22 · internal anchor
QuIVer constructs ANN graphs using only 2-bit sign-magnitude binary quantization for topology decisions, achieving at least 88% Recall@10 at high throughput with low memory on embedding datasets.
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus cs.CL · 2026-05-01 · unverdicted · none · ref 17 · internal anchor
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation cs.IR · 2026-04-24 · unverdicted · none · ref 2 · internal anchor
UAE trains bi-encoder retrievers to match LLM utility distributions via Utility-Modulated InfoNCE, yielding over 30% gains in Recall@1 and MAP on QASPER while running 180x faster than re-ranking.
QuantClaw: Precision Where It Matters for OpenClaw cs.AI · 2026-04-24 · unverdicted · none · ref 43 · internal anchor
QuantClaw dynamically routes precision in agent workflows to cut cost by up to 21.4% and latency by 15.7% while keeping or improving task performance.
From Tokens to Concepts: Leveraging SAE for SPLADE cs.IR · 2026-04-23 · unverdicted · none · ref 8 · internal anchor
SAE-SPLADE substitutes SPLADE's backbone vocabulary with SAE-derived semantic concepts and matches standard SPLADE performance with better efficiency on in- and out-of-domain tasks.
To Know is to Construct: Schema-Constrained Generation for Agent Memory cs.CL · 2026-04-22 · unverdicted · none · ref 1 · internal anchor
SCG-MEM reformulates agent memory access as schema-constrained generation within dynamic cognitive schemas, using assimilation and accommodation for updates plus an associative graph for reasoning, and outperforms retrieval baselines on the LoCoMo benchmark.
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation cs.CV · 2026-04-20 · unverdicted · none · ref 11 · 2 links · internal anchor
OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.
Domain-oriented RAG Assessment (DoRA): Synthetic Benchmarking for RAG-based Question Answering on Defense Documents cs.CL · 2026-04-20 · unverdicted · none · ref 41 · internal anchor
DoRA is a new synthetic benchmark for RAG-based QA on defense documents where fine-tuning Llama3.1-8B-Instruct on it improves task success by up to 26% and cuts hallucination rates by 47%.
MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search cs.IR · 2026-04-19 · unverdicted · none · ref 35 · internal anchor
MemSearch-o1 mitigates memory dilution in agentic LLM search through reasoning-aligned token-level memory growth, retracing with a contribution function, and path reorganization, improving reasoning activation on benchmarks.
BiCon-Gate: Consistency-Gated De-colloquialisation for Dialogue Fact-Checking cs.CL · 2026-04-15 · unverdicted · none · ref 2 · internal anchor
BiCon-Gate improves dialogue fact-checking by applying staged de-colloquialisation and gating rewrites based on semantic consistency with context, yielding gains on the DialFact benchmark over baselines including LLM rewrites.
WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering cs.CV · 2026-04-07 · unverdicted · none · ref 7 · internal anchor
WikiSeeker boosts KB-VQA performance by using VLMs to rewrite image-informed queries for better retrieval and to decide when to route to external LLM or rely on internal VLM knowledge.
LiquiLM: Bridging the Semantic Gap in Liquidity Flaw Audit via DCN and LLMs cs.CR · 2026-04-04 · unverdicted · none · ref 8 · internal anchor
LiquiLM integrates LLMs and DCN to audit liquidity flaws in blockchain smart contracts, achieving over 90% F1-score and uncovering 238 high-risk contracts plus 10 CVE-certified vulnerabilities in real-world PoL and Ethereum contracts.
Muon is Scalable for LLM Training cs.LG · 2025-02-24 · unverdicted · none · ref 47 · internal anchor
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
Personalized Deep Research: A User-Centric Framework, Dataset, and Hybrid Evaluation for Knowledge Discovery cs.IR · 2026-05-11 · conditional · none · ref 5 · internal anchor
PDR is a user-context-aware framework for LLM research agents that improves report relevance over static baselines, supported by a new dataset and hybrid evaluation.
Personalizing LLMs with Binary Feedback: A Preference-Corrected Optimization Framework cs.CL · 2026-05-11 · unverdicted · none · ref 35 · internal anchor
C-BPO personalizes LLMs via preference-calibrated binary signals and PU learning theory to isolate inter-user differences from shared task knowledge.
Cross-Lingual Jailbreak Detection via Semantic Codebooks cs.CL · 2026-04-28 · unverdicted · none · ref 4 · internal anchor
Semantic similarity to an English jailbreak codebook detects cross-lingual attacks with high accuracy on curated benchmarks but shows poor separability on diverse unsafe prompts.
Diagnosable ColBERT: Debugging Late-Interaction Retrieval Models Using a Learned Latent Space as Reference cs.IR · 2026-04-21 · unverdicted · none · ref 17 · internal anchor
Diagnosable ColBERT aligns ColBERT embeddings to an expert-grounded clinical latent space to enable direct diagnosis of model misunderstandings and better training data curation.
CPGRec+: A Balance-oriented Framework for Personalized Video Game Recommendations cs.IR · 2026-04-16 · unverdicted · none · ref 9 · internal anchor
CPGRec+ improves game recommendations on Steam data by reweighting player-game edges with signed preference strengths and using LLMs to generate preference-aware descriptions, yielding higher accuracy and diversity than prior models.
Collaboration, Integration, and Thematic Exploration in European Framework Programmes: A Longitudinal Network Analysis physics.soc-ph · 2026-04-13 · unverdicted · none · ref 33 · internal anchor
EU Framework Programmes have increased participation equity and integrated new countries through collaboration, yet research remains concentrated on established trajectories rather than broadly exploratory.
VulTriage: Triple-Path Context Augmentation for LLM-Based Vulnerability Detection cs.AI · 2026-05-10 · conditional · none · ref 11 · 2 links · internal anchor
VulTriage combines control dependency extraction, CWE knowledge retrieval, and semantic summarization to improve LLM accuracy on vulnerability detection, reaching SOTA on PrimeVul and generalizing to Kotlin.
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering cs.CL · 2026-04-27 · unverdicted · none · ref 7 · internal anchor
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
Enhancing Online Recruitment with Category-Aware MoE and LLM-based Data Augmentation cs.AI · 2026-04-23 · unverdicted · none · ref 30 · internal anchor
LLM chain-of-thought rewriting of job postings plus category-aware MoE improves person-job fit AUC by 2.4%, GAUC by 7.5%, and live click-through conversion by 19.4%.
Mira-Embeddings-V1: Domain-Adapted Semantic Reranking for Recruitment via LLM-Synthesized Data cs.CL · 2026-04-20 · conditional · none · ref 4 · internal anchor
Mira-Embeddings-V1 adapts embeddings for recruitment reranking by synthesizing positive and hard-negative samples with LLMs, then applies JD-JD contrastive and JD-CV triplet training plus a BoundaryHead MLP, lifting Recall@50 from 68.89% to 77.55% and Recall@200 from 0.5969 to 0.7047.
Comparison of Modern Multilingual Text Embedding Techniques for Hate Speech Detection Task cs.CL · 2026-04-16 · unverdicted · none · ref 31 · internal anchor
Supervised models using embeddings like jina and e5 reach up to 92% accuracy on multilingual hate speech detection, substantially outperforming anomaly detection, while PCA to 64 dimensions preserves most performance in the supervised case.
Continual Learning with Multilingual Foundation Model cs.CL · 2026-05-13 · unverdicted · none · ref 11 · internal anchor
Framework using XLM-RoBERTa, back-translation augmentation, and language-specific thresholds detects reclaimed slurs with 2-5% F1 score gains.
A Case-Driven Multi-Agent Framework for E-Commerce Search Relevance cs.IR · 2026-05-07 · unverdicted · none · ref 35 · internal anchor
A case-driven multi-agent system automates the full pipeline of bad-case detection, annotation, and resolution for e-commerce search relevance using Annotator, Optimizer, and User agents plus supporting components.
A Reproducibility Study of Metacognitive Retrieval-Augmented Generation cs.IR · 2026-04-21 · unverdicted · none · ref 5 · internal anchor
MetaRAG is only partially reproducible with lower absolute scores than originally reported, gains substantially from reranking, and shows greater robustness than SIM-RAG under extended retrieval features.

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

hub tools

claims ledger

co-cited works

fields

years

verdicts

representative citing papers

citing papers explorer