super hub Mixed citations

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Defu Lian, Jianlv Chen, Kun Luo, Peitian Zhang, Shitao Xiao, Zheng Liu · 2024 · cs.CL · arXiv 2402.03216

Mixed citation behavior. Most common role is background (39%).

108 Pith papers citing it

Background 39% of classified citations

open full Pith review browse 108 citing papers more from Defu Lian arXiv PDF

abstract

In this paper, we introduce a new embedding model called M3-Embedding, which is distinguished for its versatility in \textit{Multi-Linguality}, \textit{Multi-Functionality}, and \textit{Multi-Granularity}. It provides a uniform support for the semantic retrieval of more than 100 working languages. It can simultaneously accomplish the three common retrieval functionalities: dense retrieval, multi-vector retrieval, and sparse retrieval. Besides, it is also capable of processing inputs of different granularities, spanning from short sentences to long documents of up to 8,192 tokens. The effective training of M3-Embedding presents a series of technical contributions. Notably, we propose a novel self-knowledge distillation approach, where the relevance scores from different retrieval functionalities can be integrated as the teacher signal to enhance the training quality. We also optimize the batching strategy, which enables a large batch size and high training throughput to improve the discriminativeness of embeddings. M3-Embedding exhibits a superior performance in our experiment, leading to new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 method 5 baseline 3 dataset 2

citation-polarity summary

background 7 use method 5 baseline 3 use dataset 2 unclear 1

claims ledger

abstract In this paper, we introduce a new embedding model called M3-Embedding, which is distinguished for its versatility in \textit{Multi-Linguality}, \textit{Multi-Functionality}, and \textit{Multi-Granularity}. It provides a uniform support for the semantic retrieval of more than 100 working languages. It can simultaneously accomplish the three common retrieval functionalities: dense retrieval, multi-vector retrieval, and sparse retrieval. Besides, it is also capable of processing inputs of different granularities, spanning from short sentences to long documents of up to 8,192 tokens. The effective

authors

Defu Lian Jianlv Chen Kun Luo Peitian Zhang Shitao Xiao Zheng Liu

co-cited works

representative citing papers

CORTEX: High-Quality Cross-Domain Organization of Web-Scale Corpora through Ontological Corpus Graph

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

Cortex uses an Ontological Corpus Graph to structure web-scale corpora, creating a refined 24.14B-token corpus and a new benchmark validated on eight LLMs.

Diagnosing and Mitigating Retrieval Bottlenecks in LLM-Based Cold-Start Recommendation

cs.IR · 2026-06-29 · conditional · novelty 7.0

Retrieval coverage limits LLM rerankers in cold-start recommendation; a learned hybrid fusion improves pool quality but LLM reranking often degrades end-to-end performance while simpler rankers exploit the pool.

Beyond the Reranker: Do RAG Retrieval Enhancements Help Once a Strong Reranker Is Present?

cs.IR · 2026-06-14 · conditional · novelty 7.0

On heterogeneous document collections, only query expansion and a newly introduced per-source calibrated corrector (SSCC) deliver reliable gains beyond a strong cross-encoder reranker; other common retrieval enhancements do not.

LEDGER: A Long-Context Benchmark of Corporate Annual Reports for Grounded Financial Retrieval and Extraction

cs.CL · 2026-06-11 · unverdicted · novelty 7.0

LEDGER provides a corpus of 4,999 annual reports with 31 labeled KPIs and three benchmarks for page-level retrieval, needle-in-haystack lookup, and full KPI extraction from long documents.

Towards Cost-effective LLMs Routing with Batch Prompting

cs.DB · 2026-05-27 · unverdicted · novelty 7.0

RoBatch is a two-stage framework that formulates and solves the joint Route with Batching Problem via a batch-aware proxy utility model and greedy scheduling, outperforming separate routing or batching baselines on six benchmarks.

Very Efficient Listwise Multimodal Reranking for Long Documents

cs.IR · 2026-05-12 · unverdicted · novelty 7.0

ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.

Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents

cs.CR · 2026-05-11 · unverdicted · novelty 7.0

Nautilus Compass is a black-box drift detector for production LLM agents that uses weighted cosine similarity on BGE-m3 embeddings of raw text against anchors, achieving 0.83 ROC AUC on real session traces while shipping as plugins and servers with an audit log.

QuIVer: Rethinking ANN Graph Topology via Training-Free Binary Quantization

cs.DB · 2026-05-04 · unverdicted · novelty 7.0 · 2 refs

QuIVer performs Vamana-style graph construction entirely inside a 2-bit Sign-Magnitude BQ space, achieving >=88% Recall@10 on contrastive-learning embeddings and 2.5-5.5x higher throughput than DiskANN/HNSW at matched recall with 4.7x less hot memory.

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG

cs.IR · 2026-04-30 · unverdicted · novelty 7.0

FES-RAG reframes multimodal RAG as fragment-level selection using Fragment Information Gain to outperform document-level methods with up to 27% relative CIDEr gains on M2RAG while shortening context.

Prism-Reranker: Beyond Relevance Scoring -- Jointly Producing Contributions and Evidence for Agentic Retrieval

cs.IR · 2026-04-26 · accept · novelty 7.0

Prism-Reranker models output relevance, contribution statements, and evidence passages to support agentic retrieval beyond scalar scoring.

Latent Abstraction for Retrieval-Augmented Generation

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

LAnR unifies retrieval-augmented generation inside a single LLM by deriving dense retrieval vectors from a [PRED] token's hidden states and using entropy to adaptively stop retrieval, outperforming prior RAG on six QA benchmarks with better efficiency.

vstash: Local-First Hybrid Retrieval with Adaptive Fusion for LLM Agents

cs.IR · 2026-04-16 · conditional · novelty 7.0

vstash shows that hybrid retrieval disagreements provide a free training signal to fine-tune 33M-parameter embeddings, yielding NDCG@10 gains up to 19.5% on NFCorpus and matching some larger models on three of five BEIR datasets.

Sell More, Play Less: Benchmarking LLM Realistic Selling Skill

cs.CL · 2026-04-08 · conditional · novelty 7.0

SalesLLM provides an automatic evaluation framework for LLM sales dialogues that correlates 0.98 with human experts and shows top models approaching human performance while weaker ones lag.

LMEB: Long-horizon Memory Embedding Benchmark

cs.CL · 2026-03-13 · unverdicted · novelty 7.0

LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

cs.IR · 2026-02-13 · unverdicted · novelty 7.0

SQuTR aggregates 37k queries from six text retrieval datasets, synthesizes speech from 200 speakers, adds 17 noise categories at varying SNR, and shows that even large retrieval models degrade sharply under extreme acoustic noise.

Pattern-Calibrated Multimodal Prediction under Blockwise Missingness

stat.ME · 2026-07-02 · unverdicted · novelty 6.0

MOSAIC learns overlap-aware shared-specific representations, fits a first-stage predictor on overlapping data, and calibrates the gap using target-pattern samples, with non-asymptotic error bounds decomposing overlap size, calibration gap, and representation error.

Identifying and Resolving Pitfalls of Knowledge-Based VQA Benchmarks: Auditing, Repairing, and Augmenting

cs.CL · 2026-06-30 · unverdicted · novelty 6.0

Audit of KB-VQA benchmarks reveals systematic violations of answer derivability, question clarity, and visual disambiguation assumptions, with new repair and multi-entity augmentation protocols producing different model performance trends.

SHARD: cell-keyed residual splitting for alignment-resistant private dense retrieval

cs.CR · 2026-06-26 · unverdicted · novelty 6.0 · 2 refs

SHARD introduces cell-keyed residual splitting that turns dense retrieval embeddings into revocable, renewable, unlinkable templates resistant to alignment attacks while preserving exact utility under CKKS reranking.

Uncertainty-Aware Hybrid Retrieval for Long-Document RAG

cs.AI · 2026-06-11 · unverdicted · novelty 6.0

UMG-RAG improves long-document RAG by uncertainty-aware fusion of multi-granularity retrievals from complementary dense and sparse retrievers, plus a parent-promotion variant.

When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval

cs.CL · 2026-06-11 · conditional · novelty 6.0

Optimal interpolation of query embeddings from parallel translations outperforms the best monolingual query in 88/105 cases on mMARCO, showing English-driven asymmetry and negative correlation with typological distance.

DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?

cs.RO · 2026-06-10 · unverdicted · novelty 6.0

DIRECT is a multimodal-context router that allocates test-time compute across chain-of-thought depth, model size, and memory history for VLM embodied planners, improving the success-cost Pareto frontier and matching stronger models at up to 65% lower latency on benchmarks and a physical Franka arm.

REAL: A Reasoning-Enhanced Graph Framework for Long-Term Memory Management of LLMs

cs.CL · 2026-06-09 · unverdicted · novelty 6.0

REAL represents long-term LLM memory as a temporal confidence-aware directed property graph with non-destructive updates and uses evaluator-guided beam search plus counterfactual inference for retrieval, reporting 22.72% average gains over baselines.

A Multi-modal Agentic Co-pilot for Evidence Grounded Computational Pathology

cs.AI · 2026-06-06 · unverdicted · novelty 6.0

PathPocket constructs a 4.55M-entity pathology hypergraph from 110k graded documents and deploys a multi-agent framework that outperforms prior systems on 200k cases while raising pathologist accuracy in user studies.

citing papers explorer

Showing 8 of 108 citing papers.

Multimodal Contextualized Support for Enhancing Video Retrieval System cs.CV · 2024-12-10 · unverdicted · none · ref 2 · internal anchor
Proposes a multimodal pipeline for video retrieval that incorporates information from multiple frames to enable higher-level abstraction beyond single-image object detection.
5ting at SemEval-2026 Task 8: Strong End-to-End Multi-Turn RAG via LLM-Based Reranking and Faithfulness Control cs.CL · 2026-06-27 · unverdicted · none · ref 21 · internal anchor
5ting achieves nDCG@5 of 0.4719 on Task A and harmonic score 0.5597 with RL_F 0.7692 on Task C for multi-turn RAG via standard dense retrieval plus LLM reranking and faithfulness constraints.
Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices cs.DC · 2025-03-11 · unverdicted · none · ref 147 · internal anchor
Position paper claiming that distributed training across massive edge devices can overcome data depletion and centralized compute monopolies in LLM scaling.
Overview of the EReL@MIR 2025 Multimodal Document Retrieval Challenge (Track 1) cs.CV · 2026-06-02 · unverdicted · none · ref 1 · internal anchor
The EReL@MIR 2025 Track 1 challenge evaluates single systems on two multimodal retrieval tasks and finds that Qwen2-VL decoder-based embedders dominate, with a training-free entry within 0.1 points of the fine-tuned winner.
SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering cs.CL · 2026-05-30 · unreviewed · ref 10 · internal anchor
Aligning Dense Retrievers with LLM Utility via Distillation cs.IR · 2026-04-24 · unreviewed · ref 2 · internal anchor
From Tokens to Concepts: Leveraging SAE for SPLADE cs.IR · 2026-04-23 · unreviewed · ref 8 · internal anchor
A Benchmark Construction and Evaluation Framework for Specialist Domains: Case Study on Defense-related Documents cs.CL · 2026-04-20 · unreviewed · ref 41 · internal anchor

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer