{"total":11,"items":[{"citing_arxiv_id":"2606.01400","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Consistent and Distinctive: LLM Benchmark Efficiency via Maximum Independent Set Prompt Selection on Similarity Graphs","primary_cat":"cs.CL","submitted_at":"2026-05-31T18:45:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A graph-based MIS prompt selection method on embedding similarity graphs yields reduced benchmark subsets with highly consistent LLM rankings (Kendall's W ≥ 0.90 in 99.2% of cases) and 25-48% size reduction at higher thresholds.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.31126","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Not All Synthetic Data Is Yours to Learn From","primary_cat":"cs.CL","submitted_at":"2026-05-29T10:34:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Prompt-free self-training on self-generated text improves language models only under a compatibility condition between source and student, decoupling benchmark gains from verbatim memorization without explicit unlearning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28190","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Harder Text Embedding Benchmark (HTEB): Beyond One-dimensional Static Robustness","primary_cat":"cs.CL","submitted_at":"2026-05-27T09:11:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"HTEB introduces dynamic, multi-axis evaluation of text embedding robustness using LLM transformations, finding decoupled profiles across models and that scaling does not close all robustness gaps.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23572","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval","primary_cat":"cs.IR","submitted_at":"2026-05-22T12:39:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"HARNESS-LM uses teacher fine-tuning, L2 query alignment, and contrastive refinement to distill large SLM retrievers into compact models that recover 98% precision with up to 27x lower latency on Bing Ads benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22202","ref_index":79,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Structure Retention in Embedding Spaces as a Predictor of Benchmark Performance","primary_cat":"cs.CL","submitted_at":"2026-05-21T09:05:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Embedding model performance on MTEB tasks correlates strongly with nearest-neighbor overlap and ICA magnitude differences in their embedding spaces.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18567","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"GUT-IS: A Data-Driven Approach to Integrating Constructs and Their Relations in Information Systems","primary_cat":"cs.CL","submitted_at":"2026-05-18T15:44:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A clustering method with an explicit purity-parsimony loss integrates structural equation models by grouping IS constructs via task-adapted text embeddings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12487","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Task-Adaptive Embedding Refinement via Test-time LLM Guidance","primary_cat":"cs.CL","submitted_at":"2026-05-12T17:58:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Test-time LLM feedback refines query embeddings to deliver up to 25% relative gains on zero-shot literature search, intent detection, and related benchmarks.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"Intuitively, this increases the likelihood of encountering some positive pairs in DK(q), which may make the feedback signal more informative. We leave to future work the question of how to choose the optimal document collection for feedback. ModelsWe examine a diverse set of leading embedding models: Qwen3-Embedding-0.6B, Qwen3- Embedding-8B [42], Llama-Embed-Nemotron-8B [2], Linq-Embed-Mistral [8] and E5-Mistral-7B [39]. In our main results the LLM feedback scores are provided by Mistral-Small-3.2-24B-Instruct- 2506; we test some additional LLMs in Appendix C. Inference DetailsTo obtain feedback scores for a specific document, we send a query-document pair with a brief instruction that asks the LLM to judge whether this pair is a match."},{"citing_arxiv_id":"2605.03861","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Aspect-Aware Content-Based Recommendations for Mathematical Research Papers","primary_cat":"cs.IR","submitted_at":"2026-05-05T15:23:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The authors introduce aspect-aware datasets GoldRiM and SilverRiM for math papers and AchGNN, a heterogeneous GNN that outperforms prior methods by jointly modeling textual semantics, citations, and author lineage across aspects.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"The model operates on a heterogeneous graph with paper and author nodes and integrates semantic textual similar- ity,aspectinformation, and authorship relations within a unified framework. We evaluate AchGNN on GoldRiM and SilverRiM against several baselines, including fine-tuned LLM CbRPR [37], a heterogeneous GNN [57], and state-of-the-art LLM-embeddings [1, 59]. AchGNN consistently outperforms all competing baselines on GoldRiM and SilverRiM. We further evaluate AchGNN on the Papers with Code (PwC) dataset [21] to assess its applicability beyond heavily mathe- matics oriented benchmarks, where it demonstrates competitive performance. The recommendations generated by AchGNN have been integrated into the Mathematical Research Data Initiative plat-"},{"citing_arxiv_id":"2604.14547","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Predicting Post-Traumatic Epilepsy from Clinical Records using Large Language Model Embeddings","primary_cat":"cs.LG","submitted_at":"2026-04-16T02:24:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM embeddings from clinical records, fused with tabular data via gradient-boosted trees, predict post-traumatic epilepsy at AUC-ROC 0.892 and AUPRC 0.798.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.07427","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Personalizing Text-to-Image Generation to Individual Taste","primary_cat":"cs.CV","submitted_at":"2026-04-08T17:35:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PAMELA provides a multi-user rating dataset and personalized reward model that predicts individual image preferences more accurately than prior population-level aesthetic models.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"the-art architectures utilize graph neural networks and interaction matrices to model complex relationships between learned image attributes and demographic profiles [22,51,64]. In this work, we propose a large-scale dataset of individual user preferences that allows us to train a personalized preference predictor using large- scale pre-trained visual and language backbones [2,52] to achieve strong generalization towards unseen users, enabling effective steering of generative models. 3 Fig. 2: Visual diversity in the P AM∃LA benchmark.The dataset spans two primary domains: Art and Pho- tography. It comprising 21 distinct thematic categories as shown with examples. This structure isolates a model's ability to judge stylized artistic compositions from its ability to evaluate real-world, photographic subjects."},{"citing_arxiv_id":"2601.04720","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking","primary_cat":"cs.CL","submitted_at":"2026-01-08T08:36:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Qwen3-VL-Embedding-8B achieves state-of-the-art performance with a 77.8 overall score on the MMEB-V2 multimodal embedding benchmark.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"Loss for Retrieval DataThis category includes data from various multimodal and cross-modal retrieval tasks, such as Text-to-Text (T2T), Text-to-Image (T2I), and Image+Text-to-Image+Text (IT2IT) retrieval. In Stage 1, we use the same InfoNCE loss (Oord et al., 2018) formulation as in the Qwen3-Embedding: Lretrieval =− 1 N N ∑ i log e(s(q i,d+ i )/τ) Zi , (1) where s(·, ·) is a similarity function (we use cosine similarity), τ is a temperature parameter, and Zi aggregates scores from the positive pair and various types of negative pairs: Zi =e (s(q i,d+ i )/τ) + K ∑ k mik e(s(q i,d− i,k )/τ) + ∑ j̸=i mij e(s(q i,qj)/τ) + ∑ j̸=i mij e(s(d + i ,dj)/τ) + ∑ j̸=i mij e(s(q i,dj)/τ) corresponding to similarities with (1) the positive document d+"}],"limit":50,"offset":0}