hub

Llama-embed-nemotron-8b: A universal text embedding model for multilingual and cross-lingual tasks

Yauhen Babakhin, Radek Osmulski, Ronay Ak, Gabriel Moreira, Mengyao Xu, Benedikt Schifferer, Bo Liu, Even Oldridge · 2025 · arXiv 2511.07025

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

baseline 2 dataset 1 method 1

citation-polarity summary

baseline 2 use dataset 1 use method 1

representative citing papers

The Harder Text Embedding Benchmark (HTEB): Beyond One-dimensional Static Robustness

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

HTEB introduces dynamic, multi-axis evaluation of text embedding robustness using LLM transformations, finding decoupled profiles across models and that scaling does not close all robustness gaps.

Personalizing Text-to-Image Generation to Individual Taste

cs.CV · 2026-04-08 · unverdicted · novelty 7.0

PAMELA provides a multi-user rating dataset and personalized reward model that predicts individual image preferences more accurately than prior population-level aesthetic models.

Consistent and Distinctive: LLM Benchmark Efficiency via Maximum Independent Set Prompt Selection on Similarity Graphs

cs.CL · 2026-05-31 · unverdicted · novelty 6.0

A graph-based MIS prompt selection method on embedding similarity graphs yields reduced benchmark subsets with highly consistent LLM rankings (Kendall's W ≥ 0.90 in 99.2% of cases) and 25-48% size reduction at higher thresholds.

Not All Synthetic Data Is Yours to Learn From

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

Prompt-free self-training on self-generated text improves language models only under a compatibility condition between source and student, decoupling benchmark gains from verbatim memorization without explicit unlearning.

Structure Retention in Embedding Spaces as a Predictor of Benchmark Performance

cs.CL · 2026-05-21 · unverdicted · novelty 6.0

Embedding model performance on MTEB tasks correlates strongly with nearest-neighbor overlap and ICA magnitude differences in their embedding spaces.

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

Test-time LLM feedback refines query embeddings to deliver up to 25% relative gains on zero-shot literature search, intent detection, and related benchmarks.

Aspect-Aware Content-Based Recommendations for Mathematical Research Papers

cs.IR · 2026-05-05 · unverdicted · novelty 6.0

The authors introduce aspect-aware datasets GoldRiM and SilverRiM for math papers and AchGNN, a heterogeneous GNN that outperforms prior methods by jointly modeling textual semantics, citations, and author lineage across aspects.

GUT-IS: A Data-Driven Approach to Integrating Constructs and Their Relations in Information Systems

cs.CL · 2026-05-18 · unverdicted · novelty 5.0

A clustering method with an explicit purity-parsimony loss integrates structural equation models by grouping IS constructs via task-adapted text embeddings.

Predicting Post-Traumatic Epilepsy from Clinical Records using Large Language Model Embeddings

cs.LG · 2026-04-16 · unverdicted · novelty 5.0

LLM embeddings from clinical records, fused with tabular data via gradient-boosted trees, predict post-traumatic epilepsy at AUC-ROC 0.892 and AUPRC 0.798.

HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval

cs.IR · 2026-05-22 · unverdicted · novelty 4.0

HARNESS-LM uses teacher fine-tuning, L2 query alignment, and contrastive refinement to distill large SLM retrievers into compact models that recover 98% precision with up to 27x lower latency on Bing Ads benchmarks.

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

cs.CL · 2026-01-08 · unverdicted · novelty 4.0

Qwen3-VL-Embedding-8B achieves state-of-the-art performance with a 77.8 overall score on the MMEB-V2 multimodal embedding benchmark.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking cs.CL · 2026-01-08 · unverdicted · none · ref 1
Qwen3-VL-Embedding-8B achieves state-of-the-art performance with a 77.8 overall score on the MMEB-V2 multimodal embedding benchmark.

Llama-embed-nemotron-8b: A universal text embedding model for multilingual and cross-lingual tasks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer