Arctic-embed: Scalable, efficient, and accurate text embedding models

Luke Merrick, Danmei Xu, Gaurav Nuti, Daniel Campos · 2024 · arXiv 2405.05374

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.

On the Robustness of Multilingual Text Embedding Rankings Across Learning Tasks, Languages, and Benchmark Datasets

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

Meta-study of MTEB rankings introduces dataset-composition and ranking-scheme robustness indicators and finds only a small subset of models stay consistently strong across tasks, languages, and evaluation variations.

Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

LRD framework with Frenet, NRS, and GFMI metrics shows layer-wise structure in 31 models provides usable signal for model selection and pruning on MTEB tasks.

LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations

cs.IR · 2025-09-16 · conditional · novelty 6.0

LEAF distills teacher-aligned student embedding models that achieve new SOTA results on BEIR and MTEB for their size class while requiring only modest data and compute.

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

cs.CL · 2024-06-25 · unverdicted · novelty 6.0

FineWeb is a curated 15T-token web dataset that produces stronger LLMs than prior open collections, while its educational subset sharply improves performance on MMLU and ARC benchmarks.

MimirRAG: A Multi-Agent RAG Framework for Financial Data Retrieval with Metadata Integration

cs.LG · 2026-05-24 · unverdicted · novelty 4.0

MimirRAG, a multi-agent RAG framework with metadata integration and table-aware chunking, reaches 89.3% accuracy on FinanceBench and outperforms prior baselines for financial document retrieval.

citing papers explorer

Showing 6 of 6 citing papers.

SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents cs.AI · 2026-05-07 · unverdicted · none · ref 20
SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.
On the Robustness of Multilingual Text Embedding Rankings Across Learning Tasks, Languages, and Benchmark Datasets cs.CL · 2026-05-29 · unverdicted · none · ref 28
Meta-study of MTEB rankings introduces dataset-composition and ranking-scheme robustness indicators and finds only a small subset of models stay consistently strong across tasks, languages, and evaluation variations.
Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs cs.LG · 2026-05-12 · unverdicted · none · ref 46
LRD framework with Frenet, NRS, and GFMI metrics shows layer-wise structure in 31 models provides usable signal for model selection and pruning on MTEB tasks.
LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations cs.IR · 2025-09-16 · conditional · none · ref 22
LEAF distills teacher-aligned student embedding models that achieve new SOTA results on BEIR and MTEB for their size class while requiring only modest data and compute.
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale cs.CL · 2024-06-25 · unverdicted · none · ref 66
FineWeb is a curated 15T-token web dataset that produces stronger LLMs than prior open collections, while its educational subset sharply improves performance on MMLU and ARC benchmarks.
MimirRAG: A Multi-Agent RAG Framework for Financial Data Retrieval with Metadata Integration cs.LG · 2026-05-24 · unverdicted · none · ref 35
MimirRAG, a multi-agent RAG framework with metadata integration and table-aware chunking, reaches 89.3% accuracy on FinanceBench and outperforms prior baselines for financial document retrieval.

Arctic-embed: Scalable, efficient, and accurate text embedding models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer