super hub Mixed citations

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Baosong Yang, Dingkun Long, Huan Lin, Mingxin Li, Xin Zhang, Yanzhao Zhang · 2025 · cs.CL · arXiv 2506.05176

Mixed citation behavior. Most common role is background (46%).

197 Pith papers citing it

Background 46% of classified citations

open full Pith review browse 197 citing papers more from Baosong Yang arXiv PDF

abstract

In this work, we introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series, in text embedding and reranking capabilities, built upon the Qwen3 foundation models. Leveraging the Qwen3 LLMs' robust capabilities in multilingual text understanding and generation, our innovative multi-stage training pipeline combines large-scale unsupervised pre-training with supervised fine-tuning on high-quality datasets. Effective model merging strategies further ensure the robustness and adaptability of the Qwen3 Embedding series. During the training process, the Qwen3 LLMs serve not only as backbone models but also play a crucial role in synthesizing high-quality, rich, and diverse training data across multiple domains and languages, thus enhancing the training pipeline. The Qwen3 Embedding series offers a spectrum of model sizes (0.6B, 4B, 8B) for both embedding and reranking tasks, addressing diverse deployment scenarios where users can optimize for either efficiency or effectiveness. Empirical evaluations demonstrate that the Qwen3 Embedding series achieves state-of-the-art results across diverse benchmarks. Notably, it excels on the multilingual evaluation benchmark MTEB for text embedding, as well as in various retrieval tasks, including code retrieval, cross-lingual retrieval and multilingual retrieval. To facilitate reproducibility and promote community-driven research and development, the Qwen3 Embedding models are publicly available under the Apache 2.0 license.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 14 method 6 baseline 3 dataset 1

citation-polarity summary

background 11 use method 6 baseline 3 unclear 2 support 1 use dataset 1

claims ledger

abstract In this work, we introduce the Qwen3 Embedding series, a significant advancement over its predecessor, the GTE-Qwen series, in text embedding and reranking capabilities, built upon the Qwen3 foundation models. Leveraging the Qwen3 LLMs' robust capabilities in multilingual text understanding and generation, our innovative multi-stage training pipeline combines large-scale unsupervised pre-training with supervised fine-tuning on high-quality datasets. Effective model merging strategies further ensure the robustness and adaptability of the Qwen3 Embedding series. During the training process, the

authors

Baosong Yang Dingkun Long Huan Lin Mingxin Li Xin Zhang Yanzhao Zhang

co-cited works

representative citing papers

STRABLE: Benchmarking Tabular Machine Learning with Strings

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.

SLAM: Structural Linguistic Activation Marking for Language Models

cs.CL · 2026-05-06 · unverdicted · novelty 8.0 · 2 refs

SLAM achieves 100% detection on Gemma-2 models with only 1-2 point quality cost by causally steering SAE-identified residual-stream directions for linguistic structure.

ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Matching in Text-Audio Retrieval

cs.AI · 2026-05-05 · unverdicted · novelty 8.0 · 2 refs

ReasonAudio benchmark reveals that state-of-the-art text-audio retrieval models struggle with reasoning tasks like negation and duration, and multimodal LLMs lose reasoning ability after contrastive fine-tuning.

FollowTable: A Benchmark for Instruction-Following Table Retrieval

cs.IR · 2026-05-01 · unverdicted · novelty 8.0

FollowTable is the first large-scale benchmark for instruction-following table retrieval, paired with an Instruction Responsiveness Score, showing that existing models fail to adapt to fine-grained constraints beyond topical similarity.

IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.

Decision-Aware Quadratic ReLU Replacement for HE-Friendly Inference

cs.CR · 2026-05-21 · unverdicted · novelty 7.0 · 2 refs

Formulates quadratic ReLU replacement as a linear separation problem in lifted space, with exact conditions for calibration-lossless replacement and convex relaxations for approximate cases, achieving plaintext accuracy at lower cost under CKKS.

DermAgent: A Self-Reflective Agentic System for Dermatological Image Analysis with Multi-Tool Reasoning and Traceable Decision-Making

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

DermAgent orchestrates seven vision-language tools in a Plan-Execute-Reflect loop with dual-modality retrieval from 413k cases and a critic module to outperform GPT-4o by 17.6% in zero-shot dermatological diagnosis accuracy.

BOOKMARKS: Efficient Active Storyline Memory for Role-playing

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.

ReTool-Video: Recursive Tool-Using Video Agents with Meta-Augmented Tool Grounding

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

ReTool-Video uses a 134-tool meta-augmented library and recursive grounding to translate abstract video intents into fine-grained multimodal operations, outperforming baselines on MVBench, MLVU, and Video-MME.

AcquisitionSynthesis: Targeted Data Generation using Acquisition Functions

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

AcquisitionSynthesis uses acquisition functions as rewards to train generators that produce higher-quality synthetic data, delivering 2-7% gains on math, medical QA, and coding tasks with improved robustness to forgetting.

LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving

cs.IR · 2026-05-13 · conditional · novelty 7.0 · 2 refs

LeanSearch v2 recovers 46.1% of ground-truth premise groups for research-level Lean 4 theorems within 10 candidates and raises fixed-loop proof success to 20%.

AssemblyBench: Physics-Aware Assembly of Complex Industrial Objects

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

AssemblyBench dataset and AssemblyDyno transformer model enable physics-aware prediction of assembly sequences and trajectories for complex industrial objects from multimodal instructions and 3D shapes.

Very Efficient Listwise Multimodal Reranking for Long Documents

cs.IR · 2026-05-12 · unverdicted · novelty 7.0

ZipRerank delivers state-of-the-art multimodal listwise reranking accuracy for long documents at up to 10x lower latency via early interaction and single-pass scoring.

Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

Evolving-RL jointly optimizes experience extraction and utilization in LLM agents via RL with separate evaluation signals, delivering up to 98.7% relative gains on out-of-distribution tasks in ALFWorld and Mind2Web.

Skill Description Deception Attack against Task Routing in Internet of Agents

cs.MA · 2026-05-11 · conditional · novelty 7.0

Malicious agents can deceive LLM-based task routers in Internet of Agents systems by generating fake skill descriptions, achieving up to 98% success rate across nine domains.

CHASM: Online Changepoint Detection in Temporal and Cross-Variable Dependence

stat.ME · 2026-05-08 · unverdicted · novelty 7.0

CHASM detects changes in temporal and cross-variable dependence in multivariate time series by monitoring the truncated eigenvalue sequence of a recursively estimated DMD operator, using optimal assignment and augmented monitoring for complex values.

Toward Privileged Foundation Models:LUPI for Accelerated and Improved Learning

cs.LG · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

PIQL integrates privileged information to accelerate convergence, lower loss, and improve generalization in tabular foundation models.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

LatentRAG performs agentic RAG by generating latent tokens for thoughts and subqueries in one forward pass, matching explicit methods' accuracy on seven benchmarks while reducing latency by ~90%.

SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.

TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

TabEmbed is the first generalist embedding model for tabular data that unifies classification and retrieval in one space via contrastive learning and outperforms text embedding models on the new TabBench benchmark.

Rational Communication Shapes Morphological Composition

cs.CL · 2026-05-05 · unverdicted · novelty 7.0

Using historical corpora and the Rational Speech Act framework, attested English morphological compositions are ranked higher than plausible alternatives from the same time period when both semantic recoverability and production cost are considered.

Is It Novel and Why? Fine-Grained Patent Novelty Prediction Based on Passage Retrieval

cs.CL · 2026-05-04 · unverdicted · novelty 7.0

Introduces a feature-level annotated patent dataset and LLM retrieval-reasoning workflows that outperform embedding baselines on passage retrieval and novel feature identification while avoiding spurious correlations in novelty prediction.

Prosa: Rubric-Based Evaluation of LLMs on Real User Chats in Brazilian Portuguese

cs.CL · 2026-05-02 · conditional · novelty 7.0

Prosa demonstrates that rubric-based binary scoring with multi-judge filtering yields full agreement on 16 LLM rankings across judges on Brazilian Portuguese chats, compared to only 7/16 under holistic scoring, while widening score gaps by 47%.

citing papers explorer

Showing 50 of 65 citing papers after filters.

SLAM: Structural Linguistic Activation Marking for Language Models cs.CL · 2026-05-06 · unverdicted · none · ref 28 · 2 links · internal anchor
SLAM achieves 100% detection on Gemma-2 models with only 1-2 point quality cost by causally steering SAE-identified residual-stream directions for linguistic structure.
IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions cs.CL · 2026-05-21 · unverdicted · none · ref 71 · internal anchor
IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.
BOOKMARKS: Efficient Active Storyline Memory for Role-playing cs.CL · 2026-05-13 · unverdicted · none · ref 8 · internal anchor
BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.
AcquisitionSynthesis: Targeted Data Generation using Acquisition Functions cs.CL · 2026-05-13 · unverdicted · none · ref 21 · internal anchor
AcquisitionSynthesis uses acquisition functions as rewards to train generators that produce higher-quality synthetic data, delivering 2-7% gains on math, medical QA, and coding tasks with improved robustness to forgetting.
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment cs.CL · 2026-05-08 · unverdicted · none · ref 241 · internal anchor
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG cs.CL · 2026-05-07 · unverdicted · none · ref 33 · internal anchor
LatentRAG performs agentic RAG by generating latent tokens for thoughts and subqueries in one forward pass, matching explicit methods' accuracy on seven benchmarks while reducing latency by ~90%.
TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding cs.CL · 2026-05-06 · unverdicted · none · ref 45 · internal anchor
TabEmbed is the first generalist embedding model for tabular data that unifies classification and retrieval in one space via contrastive learning and outperforms text embedding models on the new TabBench benchmark.
Rational Communication Shapes Morphological Composition cs.CL · 2026-05-05 · unverdicted · none · ref 25 · internal anchor
Using historical corpora and the Rational Speech Act framework, attested English morphological compositions are ranked higher than plausible alternatives from the same time period when both semantic recoverability and production cost are considered.
Is It Novel and Why? Fine-Grained Patent Novelty Prediction Based on Passage Retrieval cs.CL · 2026-05-04 · unverdicted · none · ref 71 · internal anchor
Introduces a feature-level annotated patent dataset and LLM retrieval-reasoning workflows that outperform embedding baselines on passage retrieval and novel feature identification while avoiding spurious correlations in novelty prediction.
Prosa: Rubric-Based Evaluation of LLMs on Real User Chats in Brazilian Portuguese cs.CL · 2026-05-02 · conditional · none · ref 22 · internal anchor
Prosa demonstrates that rubric-based binary scoring with multi-judge filtering yields full agreement on 16 LLM rankings across judges on Brazilian Portuguese chats, compared to only 7/16 under holistic scoring, while widening score gaps by 47%.
Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders cs.CL · 2026-05-02 · unverdicted · none · ref 49 · internal anchor
EPIC trains LLMs to treat continuous embeddings as in-context prompts, yielding state-of-the-art text embedding performance on MTEB with or without prompts at inference and lower compute.
Crowded in B-Space: Calibrating Shared Directions for LoRA Merging cs.CL · 2026-04-18 · unverdicted · none · ref 32 · internal anchor
Pico reduces LoRA merge interference by calibrating over-shared directions in the B matrix before merging, yielding 3.4-8.3 point accuracy gains and sometimes beating joint training.
Psychological Steering of Large Language Models cs.CL · 2026-04-15 · unverdicted · none · ref 73 · internal anchor
Mean-difference residual stream injections outperform personality prompting for OCEAN trait steering in most LLMs, with hybrids performing best and showing approximate linearity but non-human trait covariances.
WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain cs.CL · 2026-03-17 · unverdicted · none · ref 30 · internal anchor
WorkRB is the first open community-driven benchmark for AI in the work domain, organizing 13 tasks from 7 groups with dynamic multilingual ontology loading and modular design for proprietary task integration.
LMEB: Long-horizon Memory Embedding Benchmark cs.CL · 2026-03-13 · unverdicted · none · ref 40 · internal anchor
LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.
Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker cs.CL · 2025-11-11 · unverdicted · none · ref 42 · internal anchor
UWE is a task-agnostic bi-encoder that uses many-to-many InfoNCE and token-level soft late interaction to achieve zero-shot ranking across unseen work-related target spaces while using far fewer parameters than Qwen3-8B and improving MAP by 4.4 points.
IPQA: A Benchmark for Core Intent Identification in Personalized Question Answering cs.CL · 2025-10-27 · conditional · none · ref 37 · internal anchor
IPQA is a new benchmark that measures how well models identify core user intents from history in personalized question answering, finding that performance is poor and declines with greater question complexity.
Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization cs.CL · 2025-10-06 · unverdicted · none · ref 37 · internal anchor
GQR is a test-time optimization technique that refines primary retriever query embeddings using complementary retriever scores to achieve high performance with smaller representations in multimodal visual document retrieval.
Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions cs.CL · 2025-07-07 · unverdicted · none · ref 48 · internal anchor
MemoryAgentBench is a new multi-turn benchmark assessing four memory competencies in LLM agents—accurate retrieval, test-time learning, long-range understanding, and selective forgetting—showing that existing methods fall short.
One prompt is not enough: Instruction Sensitivity Undermines Embedding Model Evaluation cs.CL · 2026-05-21 · accept · none · ref 16 · internal anchor
Single-prompt evaluations of instruction-tuned embedding models misrepresent performance and allow any model to be ranked first by favorable prompt choice.
Structure Retention in Embedding Spaces as a Predictor of Benchmark Performance cs.CL · 2026-05-21 · unverdicted · none · ref 140 · internal anchor
Embedding model performance on MTEB tasks correlates strongly with nearest-neighbor overlap and ICA magnitude differences in their embedding spaces.
Hypergraph as Language cs.CL · 2026-05-21 · unverdicted · none · ref 44 · internal anchor
Hyper-Align is a hypergraph-native framework that serializes high-order relations into LLM-compatible tokens via HIDT-O templates and a HIP projector, outperforming graph-centric methods on HyperAlign-Bench.
KoRe: Compact Knowledge Representations for Large Language Models cs.CL · 2026-05-19 · unverdicted · none · ref 22 · internal anchor
KoRe encodes 1-hop knowledge graph subgraphs as compact discrete tokens for injection into LLMs, achieving competitive benchmark performance with up to 10x token reduction.
EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective cs.CL · 2026-05-18 · unverdicted · none · ref 43 · internal anchor
EvoMemBench evaluates 15 memory methods for LLM agents and finds long-context baselines competitive with no single memory approach working consistently across settings.
Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models cs.CL · 2026-05-17 · unverdicted · none · ref 40 · internal anchor
PUMA detects reasoning-level semantic redundancy to enable early exit in chains of thought, achieving 26.2% average token reduction across five LRMs and five benchmarks while preserving accuracy and CoT quality.
JSPG: Dynamic Dictionary Filtering via Joint Semantic-Pinyin-Glyph Retrieval for Chinese Contextual ASR cs.CL · 2026-05-16 · unverdicted · none · ref 25 · internal anchor
JSPG jointly combines semantic, pinyin, and glyph retrieval with an extended Smith-Waterman algorithm to dynamically filter keyword dictionaries and improve accuracy in Chinese contextual ASR.
H-Mem: A Novel Memory Mechanism for Evolving and Retrieving Agent Memory via a Hybrid Structure cs.CL · 2026-05-15 · unverdicted · none · ref 30 · internal anchor
H-Mem introduces a hybrid tree-plus-graph memory mechanism that evolves short-term agent memories into long-term summaries and enables efficient retrieval, reporting state-of-the-art QA results on three benchmarks.
Task-Adaptive Embedding Refinement via Test-time LLM Guidance cs.CL · 2026-05-12 · unverdicted · none · ref 46 · internal anchor
Test-time LLM feedback refines query embeddings to deliver up to 25% relative gains on zero-shot literature search, intent detection, and related benchmarks.
Reproducing Complex Set-Compositional Information Retrieval cs.CL · 2026-05-05 · unverdicted · none · ref 31 · internal anchor
Neural retrievers that double BM25 performance on QUEST collapse below 0.02 Recall@100 on the new LIMIT+ benchmark while lexical methods reach 0.96, with all methods degrading as compositional depth increases.
MultiBreak: A Scalable and Diverse Multi-turn Jailbreak Benchmark for Evaluating LLM Safety cs.CL · 2026-05-03 · unverdicted · none · ref 31 · internal anchor
MultiBreak is a large diverse multi-turn jailbreak benchmark that achieves substantially higher attack success rates on LLMs than prior datasets and reveals topic-specific vulnerabilities in multi-turn settings.
Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning cs.CL · 2026-05-02 · unverdicted · none · ref 55 · internal anchor
Verbal-R3 uses a verbal reranker to generate analytic narratives that guide retrieval and reasoning in LLMs, achieving SOTA results on complex QA benchmarks.
Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding cs.CL · 2026-04-30 · unverdicted · none · ref 39 · internal anchor
TeCoD improves Text-to-SQL execution accuracy by up to 36% over in-context learning and cuts latency 2.2x on matched queries by extracting templates from historical pairs and enforcing them with constrained decoding.
REZE: Representation Regularization for Domain-adaptive Text Embedding Pre-finetuning cs.CL · 2026-04-19 · unverdicted · none · ref 69 · internal anchor
REZE controls representation shifts in contrastive pre-finetuning of text embeddings via eigenspace decomposition of anchor-positive pairs and adaptive soft-shrinkage on task-variant directions.
NOSE: Neural Olfactory-Semantic Embedding with Tri-Modal Orthogonal Contrastive Learning cs.CL · 2026-04-12 · unverdicted · none · ref 69 · internal anchor
NOSE aligns molecular, receptor, and linguistic modalities in a shared embedding space via tri-modal orthogonal contrastive learning and weak positive samples, achieving SOTA performance and zero-shot generalization on olfactory tasks.
TEC: A Collection of Human Trial-and-error Trajectories for Problem Solving cs.CL · 2026-04-08 · unverdicted · none · ref 43 · internal anchor
TEC is a new public dataset of detailed human trial-and-error trajectories and reflections on web tasks, with humans showing substantially higher accuracy than LLMs.
CLEAR: Cross-Lingual Enhancement in Alignment via Reverse-training cs.CL · 2026-04-07 · unverdicted · none · ref 4 · internal anchor
CLEAR is a reverse-training loss that improves cross-lingual retrieval performance by up to 15% in low-resource languages while minimizing degradation in English by using English as an alignment bridge.
Differences in Text Generated by Diffusion and Autoregressive Language Models cs.CL · 2026-04-04 · unverdicted · none · ref 39 · internal anchor
DLMs exhibit lower n-gram entropy, higher semantic coherence, and higher semantic diversity than ARMs, primarily due to bidirectional context and remasking decoding strategies.
RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards cs.CL · 2025-09-25 · unverdicted · none · ref 50 · internal anchor
RLBFF extracts binary principles from human feedback to train reward models that outperform Bradley-Terry models on RM-Bench and JudgeBench and enable customizable inference-time focus for LLM alignment.
EpiCache: Episodic KV Cache Management for Long-Term Conversation on Resource-Constrained Environments cs.CL · 2025-09-22 · unverdicted · none · ref 41 · internal anchor
EpiCache clusters long conversation history into coherent episodes for per-episode KV cache eviction, delivering up to 30% accuracy gains and 3.7x peak memory reduction on LongConvQA tasks under fixed budgets.
Should We Still Pretrain Encoders with Masked Language Modeling? cs.CL · 2025-07-01 · accept · none · ref 47 · internal anchor
Controlled ablations of 38 models find MLM superior to CLM on representation benchmarks while CLM offers better data efficiency and stability; a biphasic CLM-then-MLM schedule is optimal under fixed compute and improves when initialized from pretrained CLM models.
BeLink: Biomedical Entity Linking Meets Generative Re-Ranking cs.CL · 2026-05-21 · unverdicted · none · ref 33 · internal anchor
BeLink applies set-wise instruction-tuning to generative LLMs at the re-ranking stage of biomedical entity linking, reporting 3-24% accuracy gains and reduced inference time versus prior methods.
GUT-IS: A Data-Driven Approach to Integrating Constructs and Their Relations in Information Systems cs.CL · 2026-05-18 · unverdicted · none · ref 23 · internal anchor
A clustering method with an explicit purity-parsimony loss integrates structural equation models by grouping IS constructs via task-adapted text embeddings.
OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models cs.CL · 2026-05-12 · unverdicted · none · ref 32 · internal anchor
OmniThoughtVis curates 1.8M multimodal CoT samples via teacher distillation, difficulty annotation, and tag-based sampling, yielding consistent gains on nine reasoning benchmarks and allowing 4B models to match or beat undistilled 8B baselines.
GRC: Unifying Reasoning-Driven Generation, Retrieval and Compression cs.CL · 2026-05-09 · unverdicted · none · ref 5 · 2 links · internal anchor
GRC unifies generation, retrieval, and compression in LLMs via meta latent tokens for single-pass execution with modular flexibility.
MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval cs.CL · 2026-05-07 · unverdicted · none · ref 19 · 2 links · internal anchor
MemReranker applies multi-stage distillation to Qwen3-Reranker to produce reasoning-aware rerankers that outperform baselines on memory tasks with temporal and causal constraints.
Automatic Reflection Level Classification in Hungarian Student Essays cs.CL · 2026-05-04 · unverdicted · none · ref 57 · internal anchor
Classical machine learning models outperform Hungarian transformers slightly in overall performance (71% vs 68% average score) for classifying reflection levels in student essays, though transformers handle rare classes better.
Retrieval-Augmented Reasoning for Chartered Accountancy cs.CL · 2026-04-30 · unverdicted · none · ref 26 · internal anchor
CA-ThinkFlow reaches 68.75% of top proprietary model performance on CA-Ben using basic RAG plus built-in CoT with a 14B 4-bit model and Docling extraction.
STEM: Structure-Tracing Evidence Mining for Knowledge Graphs-Driven Retrieval-Augmented Generation cs.CL · 2026-04-24 · unverdicted · none · ref 3 · 2 links · internal anchor
STEM reframes multi-hop KGQA as schema-guided graph search with semantic-to-structural projection and Triple-GNN guidance, claiming SOTA accuracy and evidence completeness on multi-hop benchmarks.
All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG cs.CL · 2026-04-22 · unverdicted · none · ref 51 · internal anchor
Multilingual RAG rerankers exhibit language bias that limits cross-lingual evidence use, and the proposed LAURA method aligns ranking with downstream generation utility to reduce the bias and improve performance.
Understanding the Prompt Sensitivity cs.CL · 2026-04-20 · unverdicted · none · ref 9 · internal anchor
LLMs disperse meaning-preserving prompts internally instead of clustering them, which produces an excessively high upper bound on output log-probability differences via Taylor expansion and Cauchy-Schwarz.

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer