hub

Simcse: Simple contrastive learning of sentence embeddings

· 2021 · arXiv 2104.08821

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

TabEmbed is the first generalist embedding model for tabular data that unifies classification and retrieval in one space via contrastive learning and outperforms text embedding models on the new TabBench benchmark.

Semantic Recall for Vector Search

cs.IR · 2026-04-22 · unverdicted · novelty 7.0

Semantic Recall is a new evaluation metric for approximate nearest neighbor search that focuses only on semantically relevant results, with Tolerant Recall as a proxy when relevance labels are unavailable.

mEOL: Training-Free Instruction-Guided Multimodal Embedder for Vector Graphics and Image Retrieval

cs.CV · 2026-04-18 · unverdicted · novelty 7.0

mEOL creates aligned embeddings for text, images, and SVGs using instruction-guided MLLM one-word summaries and semantic SVG rewriting, outperforming baselines on a new text-to-SVG retrieval benchmark.

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

cs.CL · 2024-02-05 · unverdicted · novelty 7.0

M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.

MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining

cs.CL · 2026-04-27 · unverdicted · novelty 6.0

MIPIC trains nested Matryoshka representations via self-distilled intra-relational alignment with top-k CKA and progressive information chaining across depths, yielding competitive performance especially at extreme low dimensions.

RePrompT: Recurrent Prompt Tuning for Integrating Structured EHR Encoders with Large Language Models

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

RePrompT uses recurrent prompt tuning to inject prior-visit latent states and cohort-derived population prompt tokens into LLMs, yielding better performance than pure EHR or pure LLM baselines on MIMIC clinical prediction tasks.

UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels

cs.LG · 2026-04-17 · unverdicted · novelty 6.0

UniCon unifies contrastive alignment across encoders and alignment types using kernels to enable exact closed-form updates instead of stochastic optimization.

Turning Generators into Retrievers: Unlocking MLLMs for Natural Language-Guided Geo-Localization

cs.CV · 2026-04-12 · unverdicted · novelty 6.0

Parameter-efficient fine-tuning lets MLLMs serve as effective retrievers for natural-language-guided cross-view geo-localization, beating dual-encoder baselines on GeoText-1652 and CVG-Text while using far fewer trainable parameters.

Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers

cs.IR · 2026-04-07 · unverdicted · novelty 6.0

Bias toward LLM texts in neural retrievers arises from artifact imbalances between positive and negative documents in training data that are absorbed during contrastive learning.

Policy-Governed LLM Routing with Intent Matching for Instrument Laboratories

cs.CY · 2026-04-03 · conditional · novelty 6.0

A governed LLM routing system for lab tutoring raises challenge-alignment from 0.90 to 0.98, boosts productive-struggle time, and cuts token costs by two-thirds while preserving answer accuracy.

Unsupervised Dense Information Retrieval with Contrastive Learning

cs.IR · 2021-12-16 · unverdicted · novelty 6.0

Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.

SimReg: Achieving Higher Performance in the Pretraining via Embedding Similarity Regularization

cs.CL · 2026-05-09 · unverdicted · novelty 5.0

SimReg regularization accelerates LLM pretraining convergence by over 30% and raises average zero-shot performance by over 1% across benchmarks.

LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy

cs.LG · 2026-05-05 · unverdicted · novelty 5.0

ACSE estimates LLM prompt uncertainty via adaptive clustering of semantic entropy across multiple responses and uses conformal prediction to bound error rates on accepted answers with distribution-free guarantees.

G-Loss: Graph-Guided Fine-Tuning of Language Models

cs.CL · 2026-04-28 · unverdicted · novelty 5.0

G-Loss builds a document-similarity graph and uses semi-supervised label propagation to guide fine-tuning of language models, yielding higher accuracy than standard losses on five classification benchmarks.

Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance

cs.CL · 2026-04-12 · unverdicted · novelty 5.0

A new pre-training task that maps languages bidirectionally in embedding space improves machine translation by up to 11.9 BLEU, cross-lingual QA by 6.72 BERTScore points, and understanding accuracy by over 5% over strong baselines.

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

cs.CL · 2023-11-09 · unverdicted · novelty 5.0

The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.

StarCoder: may the source be with you!

cs.CL · 2023-05-09 · accept · novelty 5.0

StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.

Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition

cs.AI · 2026-04-19 · conditional · novelty 4.0

Fine-tuned LLaMA3 with LoRA reaches 81.24% F1 on 18-category fine-grained medical entity recognition, beating zero-shot by 63.11% and few-shot by 35.63%.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition cs.AI · 2026-04-19 · conditional · none · ref 7
Fine-tuned LLaMA3 with LoRA reaches 81.24% F1 on 18-category fine-grained medical entity recognition, beating zero-shot by 63.11% and few-shot by 35.63%.

Simcse: Simple contrastive learning of sentence embeddings

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer