super hub Mixed citations

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

Iryna Gurevych, Nils Reimers · 2019 · Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) · DOI 10.18653/v1/d19-1410

Mixed citation behavior. Most common role is background (54%).

179 Pith papers citing it

7,637 external citations · Crossref

Background 54% of classified citations

open at publisher browse 179 citing papers more from Iryna Gurevych

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 14 method 4 baseline 3 dataset 3

citation-polarity summary

background 13 use method 4 baseline 3 use dataset 3 support 1

authors

Iryna Gurevych Nils Reimers

co-cited works

representative citing papers

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

SimCSE: Simple Contrastive Learning of Sentence Embeddings

cs.CL · 2021-04-18 · conditional · novelty 8.0

SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.

ALEE: Any-Language Evaluation of Embeddings via English-Centric Minimal Pairs

cs.CL · 2026-06-30 · unverdicted · novelty 7.0

ALEE generates AMR-based English minimal pairs with fine-grained semantic shifts, translates them, and evaluates embedding models on 275+ languages to expose cross-lingual gaps linked to training data and tokenization.

Diagnosing and Mitigating Retrieval Bottlenecks in LLM-Based Cold-Start Recommendation

cs.IR · 2026-06-29 · conditional · novelty 7.0

Retrieval coverage limits LLM rerankers in cold-start recommendation; a learned hybrid fusion improves pool quality but LLM reranking often degrades end-to-end performance while simpler rankers exploit the pool.

CrypFormBench: Benchmarking Formal Analysis Capability of Large Language Models for Cryptographic Schemes

cs.CR · 2026-06-24 · unverdicted · novelty 7.0

CrypFormBench is a new benchmark jointly covering symbolic and computational security to evaluate LLMs on five formal analysis capabilities, with results showing top model Claude-3.5 scores 48.7/100 and most models struggling on generation, transformation, and correction.

Closing the Calibration Gap in Semantic Caching

cs.IR · 2026-06-18 · unverdicted · novelty 7.0

Introduces P-CHR AUC and CRR metrics to demonstrate that semantic caching model selection is limited by calibration quality rather than ranking performance.

Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation

cs.CL · 2026-06-17 · unverdicted · novelty 7.0

DICE aggregates independently encoded document chunks into a single vector to reduce evidence dilution in long-document dense retrieval, reporting gains on LongEmbed especially beyond 4k tokens.

BELLS-O: Evaluating the Operational Trade-offs of LLM Supervision Systems

cs.CR · 2026-06-12 · unverdicted · novelty 7.0

BELLS-O is the first vendor-neutral operational benchmark comparing specialized guardrails and repurposed frontier LLMs on accuracy, false-positive rates, speed, and monetary cost across 11 harm categories and 13 jailbreak techniques.

Unsupervised Style Representation Learning for AI-Text Detection via Paraphrase Inversion

cs.LG · 2026-06-08 · unverdicted · novelty 7.0

Unsupervised style representations learned via paraphrase inversion enable competitive few-shot and zero-shot AI-text detection with better generalization to unseen LLMs than supervised baselines.

Continuous Language Diffusion as a Decoder-Interface Problem

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.

Beyond Matching: Category-Guided Latent Intent Reasoning for Generative Retrieval in E-Commerce

cs.IR · 2026-06-05 · unverdicted · novelty 7.0

CaLIR learns continuous latent intent states guided by product category hierarchies for generative retrieval, combining hierarchical reasoning and dynamic prefix tries to balance effectiveness and low-latency inference on multilingual e-commerce data.

LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

LazyAttention kernelizes deferred positional encoding to enable zero-copy, position-agnostic KV cache reuse, delivering 1.37× lower TTFT and 1.40× higher throughput than Block-Attention under skewed document distributions while preserving output quality.

SEA-Embedding: Open and Reproducible Text Embeddings for Southeast Asia

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

SEA-Embedding is a fully open text embedding pipeline for Southeast Asian languages that achieves state-of-the-art performance on the SEA-BED benchmark by analyzing data composition, training objectives, and base encoder choices.

EvoPool: Evolutionary Programmatic Annotation for Label-Efficient Specialized Supervision

cs.CL · 2026-06-01 · unverdicted · novelty 7.0

EvoPool evolves pools of programmatic annotators that outperform LLM annotation by 0.141 average macro-F1 on 7 of 8 specialized tasks while running thousands of times faster.

Low-Resource Safety Failures Are Action Failures, Not Representation Failures

cs.CL · 2026-05-31 · conditional · novelty 7.0

Low-resource safety failures are action failures because the harmfulness representation transfers but the decision calibration does not; this is fixed by recalibrating a high-resource gate with 1-4 target-language examples.

DirectorBench: Diagnosing Long-Form Video Generation with Personalized Multi-Agent Evaluation

cs.CL · 2026-05-28 · unverdicted · novelty 7.0

DirectorBench is a profile-aware diagnostic benchmark that localizes bottlenecks in long-form video generation workflows using structured checkpoints and multi-agent evaluation.

Continual Model Routing in Evolving Model Hubs

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

Formalizes continual model routing (CMR), releases CMRBench with over 2000 models, and presents CARvE which outperforms retrieval, fine-tuning and adapter-merging baselines on model/family/domain accuracy.

The Harder Text Embedding Benchmark (HTEB): Beyond One-dimensional Static Robustness

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

HTEB introduces dynamic, multi-axis evaluation of text embedding robustness using LLM transformations, finding decoupled profiles across models and that scaling does not close all robustness gaps.

RLVR Datasets and Where to Find Them: Tracing Data Lineage for Better Training Data

cs.LG · 2026-05-26 · unverdicted · novelty 7.0

ATLAS traces RLVR data to 20 atomic sources, most datasets are variants, and DAPO++ curated with SCA improves RLVR performance while Q predicts training effectiveness.

IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

IdioLink introduces a benchmark dataset and evaluation showing that strong embedding models struggle to retrieve equivalent meanings across idiomatic and literal forms, relying on shallow cues instead.

SCARA: A Semantics-Constrained Autonomous Remediation Agent for Opaque Industrial Software Vulnerabilities

cs.CR · 2026-05-19 · unverdicted · novelty 7.0

SCARA introduces a four-stage pipeline using state-aware verification and constrained synthesis to remediate vulnerabilities in source-unavailable industrial software, reporting 100% precision and 88.9% success on a 15-case benchmark.

From Table to Cell: Attention for Better Reasoning with TABALIGN

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

TABALIGN pairs a diffusion language model planner emitting binary cell masks with a trained attention verifier, raising average accuracy 15.76 points over strong baselines on eight table benchmarks while speeding execution 44.64%.

Much of Geospatial Web Search Is Beyond Traditional GIS

cs.IR · 2026-05-11 · unverdicted · novelty 7.0 · 2 refs

Large-scale analysis of unfiltered search queries shows geospatial intent at 18% of total, dominated by transactional categories outside traditional GIS scope.

Matching Meaning at Scale: Evaluating Semantic Search for 18th-Century Intellectual History through the Case of Locke

cs.CL · 2026-05-10 · unverdicted · novelty 7.0 · 2 refs

Semantic search retrieves substantially more implicit receptions of Locke's work than lexical baselines in 18th-century corpora, yet remains constrained by lexical gatekeeping.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Led to Mislead: Adversarial Content Injection for Attacks on Neural Ranking Models cs.IR · 2026-05-02 · unverdicted · none · ref 42
CRAFT is a supervised LLM framework using retrieval-augmented generation, self-refinement, fine-tuning, and preference optimization to create fluent adversarial content that boosts target ranks in neural ranking models, outperforming baselines on MS MARCO and TREC benchmarks with cross-architecture
Safety Context Injection: Inference-Time Safety Alignment via Static Filtering and Agentic Analysis cs.CR · 2026-05-12 · unverdicted · none · ref 15
Safety Context Injection prepends structured external risk reports via static or agentic analysis to lower attack success rates and toxicity in reasoning models on AdvBench and GPTFuzz benchmarks.
SkillRAE: Agent Skill-Based Context Compilation for Retrieval-Augmented Execution cs.CL · 2026-05-11 · unverdicted · none · ref 49
SkillRAE organizes skills into a graph and compiles compact, grounded contexts for LLM agents, yielding 11.7% gains on SkillsBench over prior RAE methods.
Retina-RAG: Retrieval-Augmented Vision-Language Modeling for Joint Retinal Diagnosis and Clinical Report Generation cs.CV · 2026-05-07 · unverdicted · none · ref 37 · 2 links
Retina-RAG combines a retinal classifier, LoRA-tuned Qwen2.5-VL, and RAG to jointly grade DR, detect ME, and generate reports, reaching F1 scores of 0.731 and 0.948 while exceeding baselines on ROUGE-L and SBERT metrics.

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

hub tools

citation-role summary

citation-polarity summary

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer