HTEB introduces dynamic, multi-axis evaluation of text embedding robustness using LLM transformations, finding decoupled profiles across models and that scaling does not close all robustness gaps.
hub
Llama-embed-nemotron-8b: A universal text embedding model for multilingual and cross-lingual tasks
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 11verdicts
UNVERDICTED 11representative citing papers
PAMELA provides a multi-user rating dataset and personalized reward model that predicts individual image preferences more accurately than prior population-level aesthetic models.
A graph-based MIS prompt selection method on embedding similarity graphs yields reduced benchmark subsets with highly consistent LLM rankings (Kendall's W ≥ 0.90 in 99.2% of cases) and 25-48% size reduction at higher thresholds.
Prompt-free self-training on self-generated text improves language models only under a compatibility condition between source and student, decoupling benchmark gains from verbatim memorization without explicit unlearning.
Embedding model performance on MTEB tasks correlates strongly with nearest-neighbor overlap and ICA magnitude differences in their embedding spaces.
Test-time LLM feedback refines query embeddings to deliver up to 25% relative gains on zero-shot literature search, intent detection, and related benchmarks.
The authors introduce aspect-aware datasets GoldRiM and SilverRiM for math papers and AchGNN, a heterogeneous GNN that outperforms prior methods by jointly modeling textual semantics, citations, and author lineage across aspects.
A clustering method with an explicit purity-parsimony loss integrates structural equation models by grouping IS constructs via task-adapted text embeddings.
LLM embeddings from clinical records, fused with tabular data via gradient-boosted trees, predict post-traumatic epilepsy at AUC-ROC 0.892 and AUPRC 0.798.
HARNESS-LM uses teacher fine-tuning, L2 query alignment, and contrastive refinement to distill large SLM retrievers into compact models that recover 98% precision with up to 27x lower latency on Bing Ads benchmarks.
Qwen3-VL-Embedding-8B achieves state-of-the-art performance with a 77.8 overall score on the MMEB-V2 multimodal embedding benchmark.
citing papers explorer
-
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking
Qwen3-VL-Embedding-8B achieves state-of-the-art performance with a 77.8 overall score on the MMEB-V2 multimodal embedding benchmark.