AcquisitionSynthesis uses acquisition functions as rewards to train generators that produce higher-quality synthetic data, delivering 2-7% gains on math, medical QA, and coding tasks with improved robustness to forgetting.
hub
Mteb: Massive text embedding benchmark
18 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
dataset 1polarities
use dataset 1representative citing papers
Analysis of 1.01 million unfiltered Bing queries identifies 18% as geospatial, dominated by transactional categories like costs (15.3%) that exceed traditional GIS scope.
CRAFT is a supervised LLM framework using retrieval-augmented generation, self-refinement, fine-tuning, and preference optimization to create fluent adversarial content that boosts target ranks in neural ranking models, outperforming baselines on MS MARCO and TREC benchmarks with cross-architecture
MMEB-V3 benchmark shows omni-modality embedding models fail to enforce instruction-specified modality constraints and exhibit asymmetric, query-biased retrieval.
mEOL creates aligned embeddings for text, images, and SVGs using instruction-guided MLLM one-word summaries and semantic SVG rewriting, outperforming baselines on a new text-to-SVG retrieval benchmark.
C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.
A sliced IGW distance is introduced with closed-form 1D expressions, rotational invariance, and studied structural and computational properties for efficient data alignment.
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
MIPIC trains nested Matryoshka representations via self-distilled intra-relational alignment with top-k CKA and progressive information chaining across depths, yielding competitive performance especially at extreme low dimensions.
JU'A is a new heterogeneous benchmark for Brazilian legal IR that distinguishes retrieval methods and shows domain-adapted models excel on aligned subsets while BM25 stays competitive elsewhere.
HoldUp uses LLM-guided clustering to provide holistic dataset context for semantic operators, yielding up to 33% higher classification accuracy and 30% higher scoring accuracy than row-by-row LLM processing across 15 datasets.
StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.
BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
Embeddings reliably capture authorial stylistic features in French literary texts, and these signals persist after LLM rewriting while showing model-specific patterns.
Function-based chunking underperforms other strategies in RAG code completion by 3.57-5.64 points, with context length as the dominant factor.
STAF applies sentence embeddings from transformers to classify SCA findings, reaching 89% F1 and beating prior filters by 11% within projects and 6% across projects.
E5 text embeddings trained with weakly-supervised contrastive pre-training on CCPairs outperform BM25 on BEIR zero-shot and achieve top results on MTEB, beating much larger models.
Mixed training of Qwen3-Embedding-4B on legal data plus SQuAD-pt yields higher average NDCG@10 (0.447), MRR@10 (0.595), and MAP@10 (0.308) across six Portuguese retrieval datasets than legal-only or base models, with largest gains on out-of-domain question-based search.
citing papers explorer
-
How Does Chunking Affect Retrieval-Augmented Code Completion? A Controlled Empirical Study
Function-based chunking underperforms other strategies in RAG code completion by 3.57-5.64 points, with context length as the dominant factor.
-
Towards Better Static Code Analysis Reports: Sentence Transformer-based Filtering of Non-Actionable Alerts
STAF applies sentence embeddings from transformers to classify SCA findings, reaching 89% F1 and beating prior filters by 11% within projects and 6% across projects.