super hub Mixed citations

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Baosong Yang, Dingkun Long, Huan Lin, Mingxin Li, Xin Zhang, Yanzhao Zhang · 2025 · cs.CL · arXiv 2506.05176

Mixed citation behavior. Most common role is background (46%).

324 Pith papers citing it

Background 46% of classified citations

open full Pith review browse 324 citing papers more from Baosong Yang arXiv PDF

abstract

In this work, we introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series, in text embedding and reranking capabilities, built upon the Qwen3 foundation models. Leveraging the Qwen3 LLMs' robust capabilities in multilingual text understanding and generation, our innovative multi-stage training pipeline combines large-scale unsupervised pre-training with supervised fine-tuning on high-quality datasets. Effective model merging strategies further ensure the robustness and adaptability of the Qwen3 Embedding series. During the training process, the Qwen3 LLMs serve not only as backbone models but also play a crucial role in synthesizing high-quality, rich, and diverse training data across multiple domains and languages, thus enhancing the training pipeline. The Qwen3 Embedding series offers a spectrum of model sizes (0.6B, 4B, 8B) for both embedding and reranking tasks, addressing diverse deployment scenarios where users can optimize for either efficiency or effectiveness. Empirical evaluations demonstrate that the Qwen3 Embedding series achieves state-of-the-art results across diverse benchmarks. Notably, it excels on the multilingual evaluation benchmark MTEB for text embedding, as well as in various retrieval tasks, including code retrieval, cross-lingual retrieval and multilingual retrieval. To facilitate reproducibility and promote community-driven research and development, the Qwen3 Embedding models are publicly available under the Apache 2.0 license.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 14 method 6 baseline 3 dataset 1

citation-polarity summary

background 11 use method 6 baseline 3 unclear 2 support 1 use dataset 1

claims ledger

abstract In this work, we introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series, in text embedding and reranking capabilities, built upon the Qwen3 foundation models. Leveraging the Qwen3 LLMs' robust capabilities in multilingual text understanding and generation, our innovative multi-stage training pipeline combines large-scale unsupervised pre-training with supervised fine-tuning on high-quality datasets. Effective model merging strategies further ensure the robustness and adaptability of the Qwen3 Embedding series. During the training process, the

authors

Baosong Yang Dingkun Long Huan Lin Mingxin Li Xin Zhang Yanzhao Zhang

co-cited works

representative citing papers

CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges

cs.CL · 2026-06-18 · unverdicted · novelty 8.0

Presents a new expert-curated dataset of multi-turn counterspeech dialogues in five languages targeting hate against seven groups, with span annotations linking to verified external knowledge for RAG applications.

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

cs.CL · 2026-06-11 · unverdicted · novelty 8.0

SkMTEB is the first comprehensive text embedding benchmark for Slovak, and vocabulary-trimmed E5 adaptations achieve competitive performance with much smaller models.

DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation

cs.CL · 2026-05-31 · unverdicted · novelty 8.0

DiscourseFlip is a graph-guided attack allocating limited poisoning budget to induce targeted opinion shifts over semantic query networks in black-box RAG.

STRABLE: Benchmarking Tabular Machine Learning with Strings

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.

SLAM: Structural Linguistic Activation Marking for Language Models

cs.CL · 2026-05-06 · unverdicted · novelty 8.0 · 2 refs

SLAM achieves 100% detection on Gemma-2 models with only 1-2 point quality cost by causally steering SAE-identified residual-stream directions for linguistic structure.

ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Matching in Text-Audio Retrieval

cs.AI · 2026-05-05 · unverdicted · novelty 8.0 · 2 refs

ReasonAudio benchmark reveals that state-of-the-art text-audio retrieval models struggle with reasoning tasks like negation and duration, and multimodal LLMs lose reasoning ability after contrastive fine-tuning.

FollowTable: A Benchmark for Instruction-Following Table Retrieval

cs.IR · 2026-05-01 · unverdicted · novelty 8.0

FollowTable is the first large-scale benchmark for instruction-following table retrieval, paired with an Instruction Responsiveness Score, showing that existing models fail to adapt to fine-grained constraints beyond topical similarity.

Can Language Models Actually Retrieve In-Context? Drowning in Documents at Million Token Scale

cs.CL · 2026-07-01 · unverdicted · novelty 7.0

A 0.6B LM with length-aware attention adjustments performs competitive in-context retrieval at million-token scale on MS MARCO, NQ, and LIMIT benchmarks.

MoHallBench: A Benchmark for Motion Hallucination in Video Large Language Models

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

MoHallBench is a new benchmark evaluating motion hallucination in VideoLLMs from co-occurrence priors, sequential inference, and similarity confusion, revealing decoupling from action recognition performance.

Embedding Inference Attack

cs.CR · 2026-07-01 · unverdicted · novelty 7.0

Tailored queries enable identification of the embedding model used by a black-box IR system from the unordered set of retrieved documents, even when a reranker is present.

STEB: Style Text Embedding Benchmark

cs.CL · 2026-06-30 · unverdicted · novelty 7.0

STEB is a new benchmark of 96 datasets in 7 languages for evaluating style text embeddings on authorship, detection, and linguistic probing tasks.

Beyond IID: How General Are Tabular Foundation Models, Really?

cs.LG · 2026-06-29 · unverdicted · novelty 7.0

Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.

Turn-Averaged SAEs for Feature Discovery and Long-Context Attribution

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

Turn-averaged SAEs reconstruct average activations over conversation turns to represent high-level turn characteristics with a fixed number of features, simplifying long-context interpretability compared to per-token SAEs.

OctoSense: Self-Supervised Learning for Multimodal Robot Perception

cs.CV · 2026-06-25 · unverdicted · novelty 7.0

OctoSense supplies a large multimodal robotics dataset and a late-fusion masked autoencoder that runs fast and outperforms image-only models on optical flow, depth, segmentation, and ego-motion tasks while remaining robust under sensor degradation.

TheoremGraph: Bridging Formal and Informal Mathematics

cs.IR · 2026-06-24 · unverdicted · novelty 7.0

TheoremGraph builds a unified statement-level dependency graph across informal arXiv math and formal Lean code via parsing, embeddings, and LLM validation, releasing the data and APIs for search and retrieval.

DREAM: Dense Retrieval Embeddings via Autoregressive Modeling

cs.CL · 2026-06-23 · unverdicted · novelty 7.0

DREAM enables training of dense retrieval embeddings using autoregressive next-token prediction from LLMs by modulating attention with retriever scores.

ChartWalker: Benchmarking the Cross-Chart RAG Task with Hierarchical Knowledge Graphs

cs.IR · 2026-06-22 · unverdicted · novelty 7.0

ChartWalker provides a hierarchical knowledge graph construction method and structure-aware sampling to generate cross-chart RAG benchmarks, releasing ChartWalker-Bench that exposes performance gaps across RAG paradigms.

SemCEB: A Cardinality Estimation Benchmark for Semantic Operators

cs.DB · 2026-06-22 · unverdicted · novelty 7.0

SemCEB is the first benchmark for cardinality estimation over semantic operators, evaluating sampling methods and Semantic Histograms on accuracy, cost, latency, and memory using 102 queries on a real-world dataset.

HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions

cs.IR · 2026-06-22 · unverdicted · novelty 7.0

HAKARI-Bench reconstructs 35 benchmarks into 551 tasks across 43 languages, reproducing full MTEB, MMTEB, and BEIR rankings with Spearman correlation above 0.97 while supporting efficiency variant comparisons.

Measuring Semantic Progress in Multi-turn Dialogue via Information Gain

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

A Gaussian information-gain metric in embedding space quantifies semantic progress in dialogues via uncertainty reduction and shows competitive agreement with human judgments on MT-Bench and UltraFeedback.

Agreement in Representation Space for Open-Ended Self-Consistency

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

EBA clusters sampled LLM generations in representation space to estimate agreement, outperforming random selection with stable scaling and showing that central positions correlate with higher generation quality.

Tail-Aware Adaptive-k: Query-Adaptive Context Selection for Retrieval-Augmented Generation

cs.IR · 2026-06-10 · unverdicted · novelty 7.0

TAA-k finds query-adaptive retrieval cutoffs by first using knee detection to isolate a candidate window around the relevance-to-noise transition, then applying EVT goodness-of-fit tests inside that window.

CORE-Bench: A Comprehensive Benchmark for Code Retrieval in the Era of Agentic Coding

cs.IR · 2026-06-10 · accept · novelty 7.0

CORE-Bench is a benchmark for code retrieval in agentic coding settings, built from curated tasks and SWE-bench instances, showing performance drops and gains from fine-tuning.

ActProbe: Action-Space Probe for Early Failure Detection of Generative Robot Policies

cs.RO · 2026-06-07 · unverdicted · novelty 7.0

ActProbe is an action-space detector that uses temporal consistency error and action chunk magnitude from policy outputs, mapped via LSTM-MLP, to predict failures earlier than baselines across policies and real-robot tasks.

citing papers explorer

Showing 28 of 28 citing papers after filters.

MoHallBench: A Benchmark for Motion Hallucination in Video Large Language Models cs.CV · 2026-07-01 · unverdicted · none · ref 47 · internal anchor
MoHallBench is a new benchmark evaluating motion hallucination in VideoLLMs from co-occurrence priors, sequential inference, and similarity confusion, revealing decoupling from action recognition performance.
OctoSense: Self-Supervised Learning for Multimodal Robot Perception cs.CV · 2026-06-25 · unverdicted · none · ref 70 · internal anchor
OctoSense supplies a large multimodal robotics dataset and a late-fusion masked autoencoder that runs fast and outperforms image-only models on optical flow, depth, segmentation, and ego-motion tasks while remaining robust under sensor degradation.
DermAgent: A Self-Reflective Agentic System for Dermatological Image Analysis with Multi-Tool Reasoning and Traceable Decision-Making cs.CV · 2026-05-14 · unverdicted · none · ref 32 · internal anchor
DermAgent orchestrates seven vision-language tools in a Plan-Execute-Reflect loop with dual-modality retrieval from 413k cases and a critic module to outperform GPT-4o by 17.6% in zero-shot dermatological diagnosis accuracy.
ReTool-Video: Recursive Tool-Using Video Agents with Meta-Augmented Tool Grounding cs.CV · 2026-05-13 · unverdicted · none · ref 51 · internal anchor
ReTool-Video uses a 134-tool meta-augmented library and recursive grounding to translate abstract video intents into fine-grained multimodal operations, outperforming baselines on MVBench, MLVU, and Video-MME.
AssemblyBench: Physics-Aware Assembly of Complex Industrial Objects cs.CV · 2026-05-13 · unverdicted · none · ref 45 · internal anchor
AssemblyBench dataset and AssemblyDyno transformer model enable physics-aware prediction of assembly sequences and trajectories for complex industrial objects from multimodal instructions and 3D shapes.
OASIS: On-Demand Hierarchical Event Memory for Streaming Video Reasoning cs.CV · 2026-04-18 · unverdicted · none · ref 66 · internal anchor
OASIS organizes streaming video into hierarchical events and retrieves memory on-demand via intent-driven refinement to improve long-horizon accuracy and compositional reasoning with bounded token costs.
Jolia: Concept-Level Vision-Language Alignment for 3D CT Contrastive Learning cs.CV · 2026-06-23 · unverdicted · none · ref 29 · internal anchor
ConQuer augments global CLIP alignment with independent per-concept contrastive losses on anatomical regions extracted from reports, producing Jolia which outperforms CLIP baselines on classification, report generation, and transfer.
StoryVideoQA: Scaling Deep Video Understanding with a Large-Scale, Multi-Genre and Auto-Generated Dataset cs.CV · 2026-06-04 · unverdicted · none · ref 109 · internal anchor
StoryVideoQA provides the largest auto-generated deep video understanding dataset to date with 363K QAs across TV and movies, paired with the PlotTree agent for hierarchical plot-based reasoning that existing VideoQA models struggle to match.
MM-Matryoshka: Towards Budget-Elastic Visual Document Retrieval via a 2D Multimodal Matryoshka Training Framework cs.CV · 2026-06-03 · unverdicted · none · ref 31 · internal anchor
MM-Matryoshka is a 2D Matryoshka training framework enabling budget-elastic ColPali-style multi-vector visual document retrieval along dimension and layer without separate models per budget.
Astra: a generalizable report generation foundation model for 3D computed tomography cs.CV · 2026-05-29 · unverdicted · none · ref 39 · internal anchor
Astra is a 3D CT vision-language foundation model trained on 90,678 thoracoabdominal scans that claims 44.1% better diagnostic metrics on internal and six external cohorts plus 29.6% faster chest reporting in real workflows.
AnE: Pushing the Reasoning Frontier of Multimodal LLMs via Anchor Evolution cs.CV · 2026-05-25 · unverdicted · none · ref 31 · internal anchor
AnE combines Truth Anchor Expansion and Scaffold-Stripping to deliver 10.3% gains on eight multimodal reasoning benchmarks for MLLMs.
AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild cs.CV · 2026-05-21 · unverdicted · none · ref 82 · 2 links · internal anchor
AnyMo pre-trains a graph encoder on physics-simulated multi-placement IMU data and aligns full-body motion tokens with LLMs to enable zero-shot activity recognition, retrieval, and captioning across unseen datasets and setups.
TextTeacher: What Can Language Teach About Images? cs.CV · 2026-05-21 · unverdicted · none · ref 76 · internal anchor
TextTeacher uses frozen text embeddings from captions as semantic anchors to guide vision model training, improving ImageNet accuracy by up to 2.7 p.p. and transfer performance by 1.0 p.p. on average.
Iterative Definition Refinement for Zero-Shot Classification via LLM-Based Semantic Prototype Optimization cs.CV · 2026-04-30 · unverdicted · none · ref 36 · internal anchor
Iterative LLM-based refinement of category definitions improves zero-shot classification performance across 13 embedding models on a new 10-category web URL benchmark.
MiMIC: Mitigating Visual Modality Collapse in Universal Multimodal Retrieval While Avoiding Semantic Misalignment cs.CV · 2026-04-23 · unverdicted · none · ref 110 · internal anchor
MiMIC mitigates visual modality collapse and semantic misalignment in universal multimodal retrieval via fusion-in-decoder architecture and robust single-modality training.
RaTA-Tool: Retrieval-based Tool Selection with Multimodal Large Language Models cs.CV · 2026-04-16 · unverdicted · none · ref 42 · internal anchor
RaTA-Tool retrieves suitable external tools for multimodal queries by matching generated task descriptions against tool metadata, supported by a new Hugging Face-derived dataset and DPO optimization.
WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering cs.CV · 2026-04-07 · unverdicted · none · ref 41 · internal anchor
WikiSeeker boosts KB-VQA performance by using VLMs to rewrite image-informed queries for better retrieval and to decide when to route to external LLM or rely on internal VLM knowledge.
VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing cs.CV · 2026-04-02 · unverdicted · none · ref 37 · internal anchor
VitaTouch combines vision-tactile encoders with a dual Q-Former and contrastive alignment to an LLM, achieving 88.89% hardness and 75.13% roughness accuracy on a new 186-object dataset plus 94% success in robotic sorting trials.
Mitigating Batch Effects in Histopathology via Language-Mediated Robust Embedding Generation cs.CV · 2026-06-27 · unverdicted · none · ref 84 · internal anchor
GLMP generates robust pathology embeddings by routing histology images through an intermediate textual representation produced by general-purpose MLLMs to mitigate batch effects.
Traits Run Deeper: Trait-Specific Asymmetric Fusion for Personality Assessment cs.CV · 2026-06-09 · unverdicted · none · ref 48 · internal anchor
Traits Run Deeper proposes MFR, TSMF asymmetric fusion, and DCPR modules to improve multimodal personality assessment, claiming 25% MSE reduction and first place on AVI Challenge 2026.
From 3D Perception to Safety Reasoning: A Graph-Based Framework for Real-Time Underground Mine Monitoring cs.CV · 2026-06-02 · unverdicted · none · ref 63 · internal anchor
A graph-structured framework fuses 3D perception with rule-based, LLM, and memory reasoning to raise hazard coverage from 57% to 93% across 115 simulated underground mine scenarios.
DocRetriever: A Plug-and-Play Framework for Multimodal Document Retrieval with Comprehensive Benchmark cs.CV · 2026-05-28 · unverdicted · none · ref 74 · internal anchor
DocRetriever introduces a framework using layout-aware sparse embeddings for hybrid encoding without OCR and a generalizable reasoning-augmented reranker for few-shot settings, plus the MultiDocR benchmark for evaluation.
Universal CT Representations from Anatomy to Disease Phenotype through Agglomerative Pretraining cs.CV · 2026-05-21 · unverdicted · none · ref 17 · 2 links · internal anchor
FlexiCT provides CT foundation models via agglomerative pretraining on 266227 volumes from 56 datasets that match or exceed task-specific models on five task families while organizing embeddings along tumor-stage gradients.
HOG-Layout: Hierarchical 3D Scene Generation, Optimization and Editing via Vision-Language Models cs.CV · 2026-04-12 · unverdicted · none · ref 53 · internal anchor
HOG-Layout enables text-driven hierarchical 3D scene generation, optimization, and real-time editing using LLMs, VLMs, RAG for semantic consistency, and an optimization module for physical plausibility.
Latent-CURE for Breast Cancer Diagnosis cs.CV · 2026-06-29 · unverdicted · none · ref 27 · internal anchor
Latent-CURE introduces latent-space chain-of-thought reasoning and dual-asymmetric optimization to produce transparent, robust breast cancer diagnoses in imbalanced cohorts.
Robusto-2: Benchmarking Humans & VLMs for Autonomous Driving in Lima & New York City cs.CV · 2026-06-18 · unverdicted · none · ref 47 · internal anchor
Humans and VLMs diverge in VQA responses on driving footage, with human answers consistent across origins and no strong geography modulation observed, likely due to high OOD nature.
Zero-Shot Semantic Re-Identification for Autonomous Driving: A VLM Baseline Study cs.CV · 2026-06-08 · unverdicted · none · ref 21 · internal anchor
Zero-shot VLM semantic descriptions achieve re-identification retrieval performance comparable to a supervised CNN baseline in autonomous driving but encounter attribute inconsistency across viewpoints.
OmniFysics: Towards Physical Intelligence Evolution via Omni-Modal Signal Processing and Network Optimization cs.CV · 2026-02-05 · unverdicted · none · ref 33 · internal anchor
OmniFysics is an omni-modal network using a dynamic physical data engine and evolutive tuning to improve performance on multimodal benchmarks and physics-oriented tasks.

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer