hub

Improving text embeddings with large language models

· 2023 · arXiv 2401.00368

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

MathAtlas: A Benchmark for Autoformalization in the Wild

cs.AI · 2026-05-13 · accept · novelty 8.0

MathAtlas is the first large-scale benchmark for autoformalizing graduate mathematics, where even strong models reach only 9.8% correctness on theorem statements and drop to 2.6% on the hardest dependency-deep subset.

Fine-grained Claim-level RAG Benchmark for Law

cs.CL · 2026-05-20 · unverdicted · novelty 7.0 · 3 refs

ClaimRAG-LAW is a French-English legal RAG benchmark with claim-level granularity for experts and non-experts that reveals limitations in current retrieval and generation performance.

Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.

IntrAgent: An LLM Agent for Content-Grounded Information Retrieval through Literature Review

cs.IR · 2026-04-23 · unverdicted · novelty 7.0

IntrAgent uses a two-stage pipeline of section ranking and iterative reading to perform content-grounded literature information retrieval, achieving 13.2% higher accuracy than RAG and agent baselines on the new IntraBench benchmark.

Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker

cs.CL · 2025-11-11 · unverdicted · novelty 7.0

UWE is a task-agnostic bi-encoder that uses many-to-many InfoNCE and token-level soft late interaction to achieve zero-shot ranking across unseen work-related target spaces while using far fewer parameters than Qwen3-8B and improving MAP by 4.4 points.

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

cs.CL · 2024-04-29 · conditional · novelty 7.0

A panel of smaller diverse LLMs outperforms a single large model as an evaluator of generations, showing less intra-model bias and over 7x lower cost.

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

cs.CL · 2024-02-05 · unverdicted · novelty 7.0

M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.

Structure Retention in Embedding Spaces as a Predictor of Benchmark Performance

cs.CL · 2026-05-21 · unverdicted · novelty 6.0

Embedding model performance on MTEB tasks correlates strongly with nearest-neighbor overlap and ICA magnitude differences in their embedding spaces.

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.

Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA

cs.IR · 2026-04-10 · conditional · novelty 6.0

Two-hop QA retrieval performance depends on whether the hop-2 entity is in the question or bridge passage, and a simple predicate-based router trained on one dataset transfers to improve R@5 on others.

E5-V: Universal Embeddings with Multimodal Large Language Models

cs.CL · 2024-07-17 · unverdicted · novelty 6.0

E5-V produces strong universal multimodal embeddings from MLLMs trained solely on text pairs, often surpassing prior methods across retrieval and related tasks without multimodal fine-tuning.

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

cs.CL · 2024-05-27 · accept · novelty 6.0

NV-Embed achieves first place on the MTEB leaderboard across 56 tasks by combining a latent attention layer, causal-mask removal, two-stage contrastive training, and data curation for LLM-based embedding models.

Not All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Tasks

cs.SE · 2026-05-14 · unverdicted · novelty 5.0

Retriever-side choices, particularly the retrieval algorithm, exert more influence on RAG performance than generator selection across code generation, summarization, and repair tasks.

Resolving the Robustness-Precision Trade-off in Financial RAG through Hybrid Document-Routed Retrieval

cs.CL · 2026-03-26 · unverdicted · novelty 5.0

HDRR combines document-level semantic routing with scoped chunk retrieval to outperform both pure chunk-based retrieval and semantic file routing on the FinDER benchmark, delivering higher average scores, lower failure rates, and more perfect answers.

Legal Retrieval for Public Defenders

cs.IR · 2026-01-20 · conditional · novelty 5.0

NJ BriefBank is a domain-adapted legal retrieval tool for public defenders that improves on standard benchmarks by incorporating legal reasoning, domain data, and synthetic examples, with a new released taxonomy and annotated evaluation dataset.

Retrofitting Small Multilingual Models for Retrieval: Matching 7B Performance with 300M Parameters

cs.CL · 2025-10-16 · conditional · novelty 5.0

A 300M multilingual embedding model matches or exceeds 7B retrieval performance via optimized data scale, hard negatives, and task diversity over language diversity.

Improving Korean-English Cross-Lingual Retrieval: A Data-Centric Study of Language Composition and Model Merging

cs.IR · 2025-07-11 · unverdicted · novelty 5.0

Language composition in training data creates opposing effects on CLIR and mono-IR performance for Korean-English retrieval, which model merging can partially resolve.

VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents

cs.CV · 2025-07-07 · unverdicted · novelty 5.0

VLM2Vec-V2 is a multimodal embedding model trained on an extended MMEB-V2 benchmark that adds video and visual document tasks and reports gains on both new and prior image benchmarks.

Multilingual E5 Text Embeddings: A Technical Report

cs.CL · 2024-02-08 · unverdicted · novelty 5.0

Open-source multilingual E5 embedding models are trained via contrastive pre-training on 1 billion text pairs and fine-tuning, with an instruction-tuned model matching English SOTA performance.

Are Decoder-Only Large Language Models the Silver Bullet for Code Search?

cs.SE · 2024-10-29 · unverdicted · novelty 4.0

Fine-tuned decoder-only LLMs achieve up to 40.4% higher MAP than UniXcoder on CoSQA+ for code search, with non-monotonic size scaling and data composition sensitivity.

citing papers explorer

Showing 20 of 20 citing papers.

MathAtlas: A Benchmark for Autoformalization in the Wild cs.AI · 2026-05-13 · accept · none · ref 36
MathAtlas is the first large-scale benchmark for autoformalizing graduate mathematics, where even strong models reach only 9.8% correctness on theorem statements and drop to 2.6% on the hardest dependency-deep subset.
Fine-grained Claim-level RAG Benchmark for Law cs.CL · 2026-05-20 · unverdicted · none · ref 49 · 3 links
ClaimRAG-LAW is a French-English legal RAG benchmark with claim-level granularity for experts and non-experts that reveals limitations in current retrieval and generation performance.
Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling cs.CL · 2026-05-18 · unverdicted · none · ref 98
RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.
IntrAgent: An LLM Agent for Content-Grounded Information Retrieval through Literature Review cs.IR · 2026-04-23 · unverdicted · none · ref 8
IntrAgent uses a two-stage pipeline of section ranking and iterative reading to perform content-grounded literature information retrieval, achieving 13.2% higher accuracy than RAG and agent baselines on the new IntraBench benchmark.
Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker cs.CL · 2025-11-11 · unverdicted · none · ref 40
UWE is a task-agnostic bi-encoder that uses many-to-many InfoNCE and token-level soft late interaction to achieve zero-shot ranking across unseen work-related target spaces while using far fewer parameters than Qwen3-8B and improving MAP by 4.4 points.
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models cs.CL · 2024-04-29 · conditional · none · ref 27
A panel of smaller diverse LLMs outperforms a single large model as an evaluator of generations, showing less intra-model bias and over 7x lower cost.
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation cs.CL · 2024-02-05 · unverdicted · none · ref 15
M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.
Structure Retention in Embedding Spaces as a Predictor of Benchmark Performance cs.CL · 2026-05-21 · unverdicted · none · ref 130
Embedding model performance on MTEB tasks correlates strongly with nearest-neighbor overlap and ICA magnitude differences in their embedding spaces.
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus cs.CL · 2026-05-01 · unverdicted · none · ref 75
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA cs.IR · 2026-04-10 · conditional · none · ref 12
Two-hop QA retrieval performance depends on whether the hop-2 entity is in the question or bridge passage, and a simple predicate-based router trained on one dataset transfers to improve R@5 on others.
E5-V: Universal Embeddings with Multimodal Large Language Models cs.CL · 2024-07-17 · unverdicted · none · ref 13
E5-V produces strong universal multimodal embeddings from MLLMs trained solely on text pairs, often surpassing prior methods across retrieval and related tasks without multimodal fine-tuning.
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models cs.CL · 2024-05-27 · accept · none · ref 99
NV-Embed achieves first place on the MTEB leaderboard across 56 tasks by combining a latent attention layer, causal-mask removal, two-stage contrastive training, and data curation for LLM-based embedding models.
Not All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Tasks cs.SE · 2026-05-14 · unverdicted · none · ref 47
Retriever-side choices, particularly the retrieval algorithm, exert more influence on RAG performance than generator selection across code generation, summarization, and repair tasks.
Resolving the Robustness-Precision Trade-off in Financial RAG through Hybrid Document-Routed Retrieval cs.CL · 2026-03-26 · unverdicted · none · ref 16
HDRR combines document-level semantic routing with scoped chunk retrieval to outperform both pure chunk-based retrieval and semantic file routing on the FinDER benchmark, delivering higher average scores, lower failure rates, and more perfect answers.
Legal Retrieval for Public Defenders cs.IR · 2026-01-20 · conditional · none · ref 40
NJ BriefBank is a domain-adapted legal retrieval tool for public defenders that improves on standard benchmarks by incorporating legal reasoning, domain data, and synthetic examples, with a new released taxonomy and annotated evaluation dataset.
Retrofitting Small Multilingual Models for Retrieval: Matching 7B Performance with 300M Parameters cs.CL · 2025-10-16 · conditional · none · ref 12
A 300M multilingual embedding model matches or exceeds 7B retrieval performance via optimized data scale, hard negatives, and task diversity over language diversity.
Improving Korean-English Cross-Lingual Retrieval: A Data-Centric Study of Language Composition and Model Merging cs.IR · 2025-07-11 · unverdicted · none · ref 35
Language composition in training data creates opposing effects on CLIR and mono-IR performance for Korean-English retrieval, which model merging can partially resolve.
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents cs.CV · 2025-07-07 · unverdicted · none · ref 24
VLM2Vec-V2 is a multimodal embedding model trained on an extended MMEB-V2 benchmark that adds video and visual document tasks and reports gains on both new and prior image benchmarks.
Multilingual E5 Text Embeddings: A Technical Report cs.CL · 2024-02-08 · unverdicted · none · ref 50
Open-source multilingual E5 embedding models are trained via contrastive pre-training on 1 billion text pairs and fine-tuning, with an instruction-tuned model matching English SOTA performance.
Are Decoder-Only Large Language Models the Silver Bullet for Code Search? cs.SE · 2024-10-29 · unverdicted · none · ref 38
Fine-tuned decoder-only LLMs achieve up to 40.4% higher MAP than UniXcoder on CoSQA+ for code search, with non-monotonic size scaling and data composition sensitivity.

Improving text embeddings with large language models

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer