pith. sign in

super hub Mixed citations

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Mixed citation behavior. Most common role is background (46%).

213 Pith papers citing it
Background 46% of classified citations
abstract

In this work, we introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series, in text embedding and reranking capabilities, built upon the Qwen3 foundation models. Leveraging the Qwen3 LLMs' robust capabilities in multilingual text understanding and generation, our innovative multi-stage training pipeline combines large-scale unsupervised pre-training with supervised fine-tuning on high-quality datasets. Effective model merging strategies further ensure the robustness and adaptability of the Qwen3 Embedding series. During the training process, the Qwen3 LLMs serve not only as backbone models but also play a crucial role in synthesizing high-quality, rich, and diverse training data across multiple domains and languages, thus enhancing the training pipeline. The Qwen3 Embedding series offers a spectrum of model sizes (0.6B, 4B, 8B) for both embedding and reranking tasks, addressing diverse deployment scenarios where users can optimize for either efficiency or effectiveness. Empirical evaluations demonstrate that the Qwen3 Embedding series achieves state-of-the-art results across diverse benchmarks. Notably, it excels on the multilingual evaluation benchmark MTEB for text embedding, as well as in various retrieval tasks, including code retrieval, cross-lingual retrieval and multilingual retrieval. To facilitate reproducibility and promote community-driven research and development, the Qwen3 Embedding models are publicly available under the Apache 2.0 license.

hub tools

citation-role summary

background 14 method 6 baseline 3 dataset 1

citation-polarity summary

claims ledger

  • abstract In this work, we introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series, in text embedding and reranking capabilities, built upon the Qwen3 foundation models. Leveraging the Qwen3 LLMs' robust capabilities in multilingual text understanding and generation, our innovative multi-stage training pipeline combines large-scale unsupervised pre-training with supervised fine-tuning on high-quality datasets. Effective model merging strategies further ensure the robustness and adaptability of the Qwen3 Embedding series. During the training process, the

authors

co-cited works

representative citing papers

STRABLE: Benchmarking Tabular Machine Learning with Strings

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.

FollowTable: A Benchmark for Instruction-Following Table Retrieval

cs.IR · 2026-05-01 · unverdicted · novelty 8.0

FollowTable is the first large-scale benchmark for instruction-following table retrieval, paired with an Instruction Responsiveness Score, showing that existing models fail to adapt to fine-grained constraints beyond topical similarity.

HEART-Bench: Do LLM Agents Exhibit Human-like Psychology?

cs.CL · 2026-05-28 · unverdicted · novelty 7.0

HEART-Bench evaluates LLM agents on psychological consistency using 11 Big-Five-grounded characters with 1,000 episodic memories each and 64 DIAMONDS-based decision scenarios, yielding 673 validated MCQs.

Decision-Aware Quadratic ReLU Replacement for HE-Friendly Inference

cs.CR · 2026-05-21 · unverdicted · novelty 7.0 · 2 refs

Formulates quadratic ReLU replacement as a linear separation problem in lifted space, with exact conditions for calibration-lossless replacement and convex relaxations for approximate cases, achieving plaintext accuracy at lower cost under CKKS.

BOOKMARKS: Efficient Active Storyline Memory for Role-playing

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.

citing papers explorer

Showing 50 of 213 citing papers.