{"total":18,"items":[{"citing_arxiv_id":"2607.00171","ref_index":109,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ALEE: Any-Language Evaluation of Embeddings via English-Centric Minimal Pairs","primary_cat":"cs.CL","submitted_at":"2026-06-30T20:45:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ALEE generates AMR-based English minimal pairs with fine-grained semantic shifts, translates them, and evaluates embedding models on 275+ languages to expose cross-lingual gaps linked to training data and tokenization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.28737","ref_index":33,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"5ting at SemEval-2026 Task 8: Strong End-to-End Multi-Turn RAG via LLM-Based Reranking and Faithfulness Control","primary_cat":"cs.CL","submitted_at":"2026-06-27T05:13:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"5ting achieves nDCG@5 of 0.4719 on Task A and harmonic score 0.5597 with RL_F 0.7692 on Task C for multi-turn RAG via standard dense retrieval plus LLM reranking and faithfulness constraints.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.25674","ref_index":64,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"BitNet Text Embeddings","primary_cat":"cs.CL","submitted_at":"2026-06-24T10:37:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BITEMBED converts LLM backbones to ternary BitNet-style encoders, adapts them with contrastive pre-training and teacher distillation, and produces text embeddings at multiple precisions that perform comparably to full-precision baselines on MMTEB.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.24346","ref_index":116,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"PETRA: Transforming Web Text for Petroleum-Engineering Domain Adaptation","primary_cat":"cs.IR","submitted_at":"2026-06-23T09:37:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PETRA is a curated 1.36M-chunk petroleum-engineering retrieval dataset and pipeline that raises in-domain nDCG from 0.703 to 0.763 via score fusion and delivers 44% relative gain on an Earth Science benchmark through reranker adaptation on synthetic supervision.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.03027","ref_index":45,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"SEA-Embedding: Open and Reproducible Text Embeddings for Southeast Asia","primary_cat":"cs.CL","submitted_at":"2026-06-02T02:05:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SEA-Embedding is a fully open text embedding pipeline for Southeast Asian languages that achieves state-of-the-art performance on the SEA-BED benchmark by analyzing data composition, training objectives, and base encoder choices.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28190","ref_index":61,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"The Harder Text Embedding Benchmark (HTEB): Beyond One-dimensional Static Robustness","primary_cat":"cs.CL","submitted_at":"2026-05-27T09:11:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"HTEB introduces dynamic, multi-axis evaluation of text embedding robustness using LLM transformations, finding decoupled profiles across models and that scaling does not close all robustness gaps.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22247","ref_index":60,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions","primary_cat":"cs.CL","submitted_at":"2026-05-21T09:53:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19645","ref_index":50,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"K-Quantization and its Impact on Output Performance","primary_cat":"cs.CL","submitted_at":"2026-05-19T10:31:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Empirical evaluation of quantization effects on eight LLMs across bit widths, showing performance generally declines at lower precision but with model-size-dependent resilience and acceptable accuracy at 2 bits for many cases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16608","ref_index":46,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"To MRL or not to MRL: Text Embeddings are Robust to Truncation Without Matryoshka Learning, Except In Heavy Truncation Scenarios","primary_cat":"cs.LG","submitted_at":"2026-05-15T20:17:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Truncated embeddings from non-MRL models perform comparably to or better than MRL-trained models for most truncation levels, except heavy truncation of 80% or more.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12487","ref_index":43,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Task-Adaptive Embedding Refinement via Test-time LLM Guidance","primary_cat":"cs.CL","submitted_at":"2026-05-12T17:58:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Test-time LLM feedback refines query embeddings to deliver up to 25% relative gains on zero-shot literature search, intent detection, and related benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"3482124. URLhttps://doi.org/10.1145/3459637.3482124. [42] Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv:2506.05176, 2025. URLhttps://arxiv.org/abs/2506.05176. [43] Zhisong Zhang, Emma Strubell, and Eduard Hovy. A survey of active learning for natural language processing. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors,Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6166- 6190, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational"},{"citing_arxiv_id":"2605.08421","ref_index":50,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Beyond Bag-of-Patches: Learning Global Layout via Textual Supervision for Late-Interaction Visual Document Retrieval","primary_cat":"cs.CV","submitted_at":"2026-05-08T19:28:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A text-supervised global layout embedding augments local patch representations in late-interaction VDR, yielding +2.4 nDCG@5 and +2.3 MAP@5 gains over ColPali/ColQwen baselines on ViDoRe-v2.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03824","ref_index":22,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Reproducing Complex Set-Compositional Information Retrieval","primary_cat":"cs.CL","submitted_at":"2026-05-05T14:51:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Neural retrievers that double BM25 performance on QUEST collapse below 0.02 Recall@100 on the new LIMIT+ benchmark while lexical methods reach 0.96, with all methods degrading as compositional depth increases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.27037","ref_index":63,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Hypencoder Revisited: Reproducibility and Analysis of Non-Linear Scoring for First-Stage Retrieval","primary_cat":"cs.IR","submitted_at":"2026-04-29T17:05:53+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Reproducibility study confirms Hypencoder's non-linear query-specific scoring improves retrieval over bi-encoders on standard benchmarks but standard methods remain faster and hard-task results are mixed due to implementation issues.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun'ichi Tsujii (Eds.). Association for Computational Linguistics, 2369-2380. doi:10.18653/V1/D18-1259 [63] Chris Zhang, Mengye Ren, and Raquel Urtasun. 2019. Graph HyperNetworks for Neural Architecture Search. In7th International Conference on Learning Rep- resentations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=rkgW0oA9FX [64] Zexuan Zhong, Ziqing Huang, Alexander Wettig, and Danqi Chen. 2023. Poi-"},{"citing_arxiv_id":"2604.23336","ref_index":26,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA","primary_cat":"cs.IR","submitted_at":"2026-04-25T14:45:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Rabtriever distills a generative reranker into an efficient bi-encoder using on-policy JEPA to achieve near-reranker accuracy with linear complexity on rationale-based retrieval.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18199","ref_index":25,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Linear-Time and Constant-Memory Text Embeddings Based on Recurrent Language Models","primary_cat":"cs.CL","submitted_at":"2026-04-20T12:50:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Fine-tuned recurrent models like Mamba2 produce competitive text embeddings with linear-time constant-memory inference via vertical chunking, outperforming transformers in memory use.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.14907","ref_index":46,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Comparison of Modern Multilingual Text Embedding Techniques for Hate Speech Detection Task","primary_cat":"cs.CL","submitted_at":"2026-04-16T11:49:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Supervised models using embeddings like jina and e5 reach up to 92% accuracy on multilingual hate speech detection, substantially outperforming anomaly detection, while PCA to 64 dimensions preserves most performance in the supervised case.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"3 Hate speech datasets Diversity of hate speech datasets enables us to assess whether the same embedding methods and downstream machine learning models are effective across various languages and dataset sizes. Lithuanian corpora - LtHateLtHate [ 45] is a new hate speech corpus for the Lithuanian language. It consists of public media comments taken from Litis [46] corpus and other public media sources. Comments from Litis corpus are sourced from two of the biggest Lithuanian online news portals in years 2010 to 2014. Comments from other media sources are spanning years 2021 to 2024 and were sourced from various social media platforms in the Lithuanian language and Lithuanian news portals. The topical composition of the corpus was inspired by the methodology described in [47]."},{"citing_arxiv_id":"2604.06771","ref_index":27,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Multi-Faceted Self-Consistent Preference Alignment for Query Rewriting in Conversational Search","primary_cat":"cs.CL","submitted_at":"2026-04-08T07:38:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MSPA-CQR improves conversational query rewriting by constructing self-consistent preference data across rewriting, retrieval, and response dimensions and training with prefix-guided multi-faceted direct preference optimization, showing effectiveness in both in- and out-of-distribution settings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2412.13663","ref_index":8,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference","primary_cat":"cs.CL","submitted_at":"2024-12-18T09:39:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}