arxiv: 2104.08821 · v4 · submitted 2021-04-18 · 💻 cs.CL · cs.LG

Recognition: 2 theorem links

· Lean Theorem

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Tianyu Gao , Xingcheng Yao , Danqi Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 07:44 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords sentence embeddingscontrastive learningsemantic textual similarityBERTunsupervised learningnatural language inferencedropout

0 comments

The pith

Contrastive learning with standard dropout as the only noise produces sentence embeddings that match or beat prior supervised results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SimCSE as a contrastive learning method for sentence embeddings. Its unsupervised version feeds a sentence to a model twice, once with dropout applied, and pulls the two resulting vectors together while pushing them away from other sentences in the batch. This single change lifts BERT-base performance on semantic similarity benchmarks to 76.3 percent average Spearman correlation. The supervised version adds natural-language-inference pairs, treating entailments as positives and contradictions as hard negatives, and reaches 81.6 percent. The authors further argue that the same objective removes the directional bias present in raw pre-trained embeddings, producing a more uniform vector space.

Core claim

SimCSE shows that an unsupervised contrastive objective using only standard dropout as augmentation, together with a supervised variant that uses entailment pairs as positives and contradiction pairs as hard negatives, produces sentence embeddings whose average Spearman's correlation on STS tasks is 76.3 percent and 81.6 percent respectively when built on BERT base, exceeding the previous best results by 4.2 and 2.2 points. The same objective is shown both theoretically and empirically to regularize the anisotropic space of pre-trained embeddings into a more uniform distribution while improving alignment of positive pairs.

What carries the argument

The contrastive loss that treats a sentence and its dropout-augmented copy (unsupervised) or NLI entailment pair (supervised) as the positive example while using in-batch negatives, applied on top of a pre-trained transformer encoder.

If this is right

Unsupervised sentence embeddings reach quality previously thought to require labeled data.
The learned embedding space becomes measurably more uniform and less anisotropic.
Positive-pair alignment improves when supervised NLI signals are added to the contrastive loss.
The same training recipe transfers directly to other transformer backbones without architecture changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dropout-based self-contrast could be tested on non-text modalities where simple noise augmentation is available.
If uniformity is the main benefit, other regularizers that enforce isotropy might achieve similar gains without contrastive pairs.
Hard negatives drawn from contradictions suggest that future work could mine similar semantic opposites automatically rather than relying on NLI annotations.

Load-bearing premise

That ordinary dropout supplies enough variation to act as data augmentation and prevent collapse, and that NLI entailment-contradiction pairs constitute suitable positive and hard-negative examples for general sentence embeddings.

What would settle it

An ablation that removes dropout from the unsupervised objective and still obtains non-collapsed, high-performing embeddings on the same STS benchmarks.

read the original abstract

This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings. We first describe an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise. This simple method works surprisingly well, performing on par with previous supervised counterparts. We find that dropout acts as minimal data augmentation, and removing it leads to a representation collapse. Then, we propose a supervised approach, which incorporates annotated pairs from natural language inference datasets into our contrastive learning framework by using "entailment" pairs as positives and "contradiction" pairs as hard negatives. We evaluate SimCSE on standard semantic textual similarity (STS) tasks, and our unsupervised and supervised models using BERT base achieve an average of 76.3% and 81.6% Spearman's correlation respectively, a 4.2% and 2.2% improvement compared to the previous best results. We also show -- both theoretically and empirically -- that the contrastive learning objective regularizes pre-trained embeddings' anisotropic space to be more uniform, and it better aligns positive pairs when supervised signals are available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SimCSE shows that dropout as the sole noise source in a contrastive objective produces competitive unsupervised sentence embeddings and that NLI pairs add a further clear gain while regularizing anisotropy.

read the letter

The main thing to know is that the unsupervised SimCSE variant reaches 76.3% average Spearman's correlation on STS benchmarks with BERT-base, a 4.2% lift over prior best results, using nothing but standard dropout on the input sentence itself. The supervised version, which treats NLI entailment pairs as positives and contradictions as hard negatives, reaches 81.6% and adds another 2.2%. Both versions come with supporting analysis that the contrastive loss reduces anisotropy and improves uniformity and alignment of the embedding space.

Referee Report

2 major / 2 minor

Summary. The manuscript presents SimCSE, a contrastive learning method for sentence embeddings. The unsupervised variant employs dropout as data augmentation in a self-prediction contrastive objective, attaining an average Spearman's correlation of 76.3% on STS tasks using BERT-base, surpassing prior results by 4.2%. The supervised variant leverages NLI entailment pairs as positives and contradictions as hard negatives to reach 81.6%, a 2.2% gain. Theoretical and empirical analyses demonstrate that the objective mitigates anisotropy in pre-trained embeddings by promoting uniformity, with better alignment under supervision.

Significance. This work offers a straightforward yet powerful approach to sentence embedding learning that advances the state of the art on standard benchmarks. The dual unsupervised and supervised formulations, combined with the analysis of regularization effects on embedding spaces, provide both practical utility and theoretical understanding. Strengths include the use of standard benchmarks for evaluation and the empirical validation of the uniformity hypothesis through measurements in Figure 3.

major comments (2)

[§3.2] §3.2: The ablation studies demonstrate representation collapse without dropout, but the main results tables do not include error bars or statistics from multiple random seeds, which is important for establishing the reliability of the reported improvements of 4.2% and 2.2%.
[§4.1] §4.1: Details on the full experimental setup, including exact batch sizes, optimizer parameters, and number of training epochs, are insufficient for full reproducibility of the unsupervised and supervised models.

minor comments (2)

[Abstract] Abstract: The phrase 'on par with previous supervised counterparts' for the unsupervised model could be clarified with a direct comparison to specific prior works.
[Figure 3] Figure 3: The plots comparing uniformity and alignment would be improved by including quantitative metrics alongside the visualizations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation for minor revision. The comments on statistical reliability and experimental details are well-taken, and we will revise the manuscript accordingly to strengthen these aspects.

read point-by-point responses

Referee: [§3.2] §3.2: The ablation studies demonstrate representation collapse without dropout, but the main results tables do not include error bars or statistics from multiple random seeds, which is important for establishing the reliability of the reported improvements of 4.2% and 2.2%.

Authors: We agree that reporting error bars from multiple random seeds would better establish the reliability of the gains. Although the original submission reported single-run results, we have rerun the main experiments with 5 different random seeds. The improvements remain consistent (unsupervised: 76.3 ± 0.4; supervised: 81.6 ± 0.3 on average STS), with low variance. In the revised manuscript we will update Tables 1 and 2 to report mean ± standard deviation and add a brief note on seed stability. revision: yes
Referee: [§4.1] §4.1: Details on the full experimental setup, including exact batch sizes, optimizer parameters, and number of training epochs, are insufficient for full reproducibility of the unsupervised and supervised models.

Authors: We thank the referee for highlighting this omission. We will expand Section 4.1 with a dedicated experimental setup paragraph specifying: batch size 512 (unsupervised) and 256 (supervised); Adam optimizer (β1=0.9, β2=0.999) with learning rate 1e-5 and linear warmup over 10% of steps; 1 training epoch for unsupervised SimCSE and 3 epochs for supervised SimCSE on SNLI+MNLI. We will also reference the public code repository that contains the exact configurations. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper evaluates unsupervised and supervised SimCSE models on external standard STS benchmarks, reporting 76.3% and 81.6% average Spearman's correlation with 4.2% and 2.2% gains over prior published results. The contrastive objective uses standard dropout as the sole augmentation (validated by §3.2 ablations showing collapse when removed) and NLI entailment/contradiction pairs for supervised positives/hard-negatives (Table 2, §4.2). The anisotropy regularization claim is supported by independent theoretical arguments plus empirical uniformity/alignment measurements in §3.3 and Figure 3. No load-bearing self-citations, self-definitional reductions, or fitted parameters renamed as predictions appear; all central claims rest on external benchmarks and ablations rather than internal re-derivation of inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard contrastive learning assumptions plus the paper-specific finding that dropout prevents collapse; no new entities are postulated.

free parameters (1)

contrastive temperature
Standard hyperparameter in InfoNCE-style losses; value not reported in abstract.

axioms (1)

domain assumption Dropout noise prevents representation collapse and acts as effective minimal augmentation in sentence contrastive learning
Explicitly stated as an empirical finding in the abstract.

pith-pipeline@v0.9.0 · 5503 in / 1285 out tokens · 55946 ms · 2026-05-15T07:44:10.339562+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Cost.FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We first describe an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise... the contrastive learning objective regularizes pre-trained embeddings' anisotropic space to be more uniform
IndisputableMonolith.Foundation.DAlembert.Inevitability bilinear_family_forced unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

our unsupervised and supervised models using BERT base achieve an average of 76.3% and 81.6% Spearman's correlation respectively

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding
cs.CL 2026-05 unverdicted novelty 7.0

TabEmbed is the first generalist embedding model for tabular data that unifies classification and retrieval in one space via contrastive learning and outperforms text embedding models on the new TabBench benchmark.
Semantic Recall for Vector Search
cs.IR 2026-04 unverdicted novelty 7.0

Semantic Recall is a new evaluation metric for approximate nearest neighbor search that focuses only on semantically relevant results, with Tolerant Recall as a proxy when relevance labels are unavailable.
mEOL: Training-Free Instruction-Guided Multimodal Embedder for Vector Graphics and Image Retrieval
cs.CV 2026-04 unverdicted novelty 7.0

mEOL creates aligned embeddings for text, images, and SVGs using instruction-guided MLLM one-word summaries and semantic SVG rewriting, outperforming baselines on a new text-to-SVG retrieval benchmark.
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
cs.CL 2024-02 unverdicted novelty 7.0

M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual,...
MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining
cs.CL 2026-04 unverdicted novelty 6.0

MIPIC trains nested Matryoshka representations via self-distilled intra-relational alignment with top-k CKA and progressive information chaining across depths, yielding competitive performance especially at extreme lo...
RePrompT: Recurrent Prompt Tuning for Integrating Structured EHR Encoders with Large Language Models
cs.CL 2026-04 unverdicted novelty 6.0

RePrompT uses recurrent prompt tuning to inject prior-visit latent states and cohort-derived population prompt tokens into LLMs, yielding better performance than pure EHR or pure LLM baselines on MIMIC clinical predic...
UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels
cs.LG 2026-04 unverdicted novelty 6.0

UniCon unifies contrastive alignment across encoders and alignment types using kernels to enable exact closed-form updates instead of stochastic optimization.
Turning Generators into Retrievers: Unlocking MLLMs for Natural Language-Guided Geo-Localization
cs.CV 2026-04 unverdicted novelty 6.0

Parameter-efficient fine-tuning lets MLLMs serve as effective retrievers for natural-language-guided cross-view geo-localization, beating dual-encoder baselines on GeoText-1652 and CVG-Text while using far fewer train...
Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers
cs.IR 2026-04 unverdicted novelty 6.0

Bias toward LLM texts in neural retrievers arises from artifact imbalances between positive and negative documents in training data that are absorbed during contrastive learning.
Policy-Governed LLM Routing with Intent Matching for Instrument Laboratories
cs.CY 2026-04 conditional novelty 6.0

A governed LLM routing system for lab tutoring raises challenge-alignment from 0.90 to 0.98, boosts productive-struggle time, and cuts token costs by two-thirds while preserving answer accuracy.
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
cs.CL 2024-05 accept novelty 6.0

NV-Embed achieves first place on the MTEB leaderboard across 56 tasks by combining a latent attention layer, causal-mask removal, two-stage contrastive training, and data curation for LLM-based embedding models.
Unsupervised Dense Information Retrieval with Contrastive Learning
cs.IR 2021-12 unverdicted novelty 6.0

Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.
SimReg: Achieving Higher Performance in the Pretraining via Embedding Similarity Regularization
cs.CL 2026-05 unverdicted novelty 5.0

SimReg regularization accelerates LLM pretraining convergence by over 30% and raises average zero-shot performance by over 1% across benchmarks.
LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy
cs.LG 2026-05 unverdicted novelty 5.0

ACSE estimates LLM prompt uncertainty via adaptive clustering of semantic entropy across multiple responses and uses conformal prediction to bound error rates on accepted answers with distribution-free guarantees.
G-Loss: Graph-Guided Fine-Tuning of Language Models
cs.CL 2026-04 unverdicted novelty 5.0

G-Loss builds a document-similarity graph and uses semi-supervised label propagation to guide fine-tuning of language models, yielding higher accuracy than standard losses on five classification benchmarks.
Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance
cs.CL 2026-04 unverdicted novelty 5.0

A new pre-training task that maps languages bidirectionally in embedding space improves machine translation by up to 11.9 BLEU, cross-lingual QA by 6.72 BERTScore points, and understanding accuracy by over 5% over str...
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
cs.CL 2023-11 unverdicted novelty 5.0

The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.
StarCoder: may the source be with you!
cs.CL 2023-05 accept novelty 5.0

StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.
Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition
cs.AI 2026-04 conditional novelty 4.0

Fine-tuned LLaMA3 with LoRA reaches 81.24% F1 on 18-category fine-grained medical entity recognition, beating zero-shot by 63.11% and few-shot by 35.63%.

Reference graph

Works this paper leans on

109 extracted references · 109 canonical work pages · cited by 19 Pith papers · 4 internal anchors

[4]

Eneko Agirre, Daniel Cer, Mona Diab, and Aitor Gonzalez-Agirre. 2012. https://www.aclweb.org/anthology/S12-1051 S em E val-2012 task 6: A pilot on semantic textual similarity . In * SEM 2012: The First Joint Conference on Lexical and Computational Semantics -- Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of t...

work page 2012
[5]

Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, and Weiwei Guo. 2013. https://www.aclweb.org/anthology/S13-1004 * SEM 2013 shared task: Semantic textual similarity . In Second Joint Conference on Lexical and Computational Semantics (* SEM ), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity , pages 32--43

work page 2013
[6]

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. https://openreview.net/forum?id=SyK00v5xx A simple but tough-to-beat baseline for sentence embeddings . In International Conference on Learning Representations (ICLR)

work page 2017
[8]

Fredrik Carlsson, Amaru Cuba Gyllensten, Evangelia Gogoulou, Erik Ylip \"a \"a Hellqvist, and Magnus Sahlgren. 2021. https://openreview.net/forum?id=Ov_sMNau-PF Semantic re-tuning with contrastive tension . In International Conference on Learning Representations (ICLR)

work page 2021
[11]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. http://proceedings.mlr.press/v119/chen20j.html A simple framework for contrastive learning of visual representations . In International Conference on Machine Learning (ICML), pages 1597--1607

work page 2020
[12]

Ting Chen, Yizhou Sun, Yue Shi, and Liangjie Hong. 2017. https://dl.acm.org/doi/abs/10.1145/3097983.3098202 On sampling strategies for neural network-based collaborative filtering . In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 767--776

work page doi:10.1145/3097983.3098202 2017
[13]

Alexis Conneau and Douwe Kiela. 2018. https://www.aclweb.org/anthology/L18-1269 S ent E val: An evaluation toolkit for universal sentence representations . In International Conference on Language Resources and Evaluation (LREC)

work page 2018
[16]

Dolan and Chris Brockett

William B. Dolan and Chris Brockett. 2005. https://www.aclweb.org/anthology/I05-5002 Automatically constructing a corpus of sentential paraphrases . In Proceedings of the Third International Workshop on Paraphrasing ( IWP 2005)

work page 2005
[17]

Alexey Dosovitskiy, Jost Tobias Springenberg, Martin Riedmiller, and Thomas Brox. 2014. https://proceedings.neurips.cc/paper/2014/file/07563a3fe3bbe7e3ba84431ad9d055af-Paper.pdf Discriminative unsupervised feature learning with convolutional neural networks . In Advances in Neural Information Processing Systems (NIPS), volume 27

work page 2014
[19]

Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tieyan Liu. 2019. https://openreview.net/forum?id=SkEYojRqtm Representation degeneration problem in training natural language generation models . In International Conference on Learning Representations (ICLR)

work page 2019
[20]

Dan Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eugene Ie, and Diego Garcia-Olano. 2019. https://www.aclweb.org/anthology/K19-1049 Learning dense representations for entity retrieval . In Computational Natural Language Learning (CoNLL), pages 528--537

work page 2019
[22]

Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. https://ieeexplore.ieee.org/abstract/document/1640964/ Dimensionality reduction by learning an invariant mapping . In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 1735--1742. IEEE

work page arXiv 2006
[25]

Minqing Hu and Bing Liu. 2004. https://www.cs.uic.edu/ liub/publications/kdd04-revSummary.pdf Mining and summarizing customer reviews . In ACM SIGKDD international conference on Knowledge discovery and data mining

work page 2004
[29]

Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. 2015. https://papers.nips.cc/paper/2015/hash/f442d33fa06832082290ad8544a8da27-Abstract.html Skip-thought vectors . In Advances in Neural Information Processing Systems (NIPS), pages 3294--3302

work page 2015
[30]

Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. https://www.aclweb.org/anthology/2020.emnlp-main.733 On the sentence embeddings from pre-trained language models . In Empirical Methods in Natural Language Processing (EMNLP), pages 9119--9130

work page 2020
[32]

Lajanugen Logeswaran and Honglak Lee. 2018. https://openreview.net/forum?id=rJvJXZb0W An efficient framework for learning sentence representations . In International Conference on Learning Representations (ICLR)

work page 2018
[33]

Edward Ma. 2019. https://github.com/makcedward/nlpaug Nlp augmentation . https://github.com/makcedward/nlpaug

work page 2019
[34]

Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, and Roberto Zamparelli. 2014. http://www.lrec-conf.org/proceedings/lrec2014/pdf/363_Paper.pdf A SICK cure for the evaluation of compositional distributional semantic models . In International Conference on Language Resources and Evaluation (LREC), pages 216--223

work page 2014
[35]

Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, and Xia Song. 2021. https://arxiv.org/abs/2102.08473 COCO-LM : Correcting and contrasting text sequences for language model pretraining . arXiv preprint arXiv:2102.08473

work page arXiv 2021
[37]

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, Ilya Sutskever, Kai Chen, G. Corrado, and J. Dean. 2013. https://arxiv.org/pdf/1310.4546.pdf Distributed representations of words and phrases and their compositionality . In Advances in Neural Information Processing Systems (NIPS)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[38]

Jiaqi Mu and Pramod Viswanath. 2018. https://openreview.net/forum?id=HkuGJ3kCb All-but-the-top: Simple and effective postprocessing for word representations . In International Conference on Learning Representations (ICLR)

work page 2018
[41]

Bo Pang and Lillian Lee. 2004. https://www.aclweb.org/anthology/P04-1035.pdf A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts . In Association for Computational Linguistics (ACL), pages 271--278

work page 2004
[42]

Bo Pang and Lillian Lee. 2005. https://www.aclweb.org/anthology/P05-1015.pdf Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales . In Association for Computational Linguistics (ACL), pages 115--124

work page 2005
[44]

Nils Reimers, Philip Beyer, and Iryna Gurevych. 2016. https://www.aclweb.org/anthology/C16-1009 Task-oriented intrinsic evaluation of semantic textual similarity . In International Conference on Computational Linguistics (COLING), pages 87--96

work page 2016
[46]

Manning, Andrew Ng, and Christopher Potts

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. https://www.aclweb.org/anthology/D13-1170.pdf Recursive deep models for semantic compositionality over a sentiment treebank . In Empirical Methods in Natural Language Processing (EMNLP), pages 1631--1642

work page 2013
[47]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. https://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf Dropout: a simple way to prevent neural networks from overfitting . The Journal of Machine Learning Research (JMLR), 15(1):1929--1958

work page 2014
[49]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. https://arxiv.org/pdf/1706.03762.pdf Attention is all you need . In Advances in Neural Information Processing Systems (NIPS), pages 6000--6010

work page internal anchor Pith review Pith/arXiv arXiv 2017
[50]

Ellen M Voorhees and Dawn M Tice. 2000. https://www.egr.msu.edu/ jchai/QAPapers/qa-testcollection.pdf Building a question answering test collection . In the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 200--207

work page 2000
[51]

Lingxiao Wang, Jing Huang, Kevin Huang, Ziniu Hu, Guangtao Wang, and Quanquan Gu. 2020. https://openreview.net/forum?id=ByxY8CNtvr Improving neural language generation with spectrum control . In International Conference on Learning Representations (ICLR)

work page 2020
[52]

Tongzhou Wang and Phillip Isola. 2020. http://proceedings.mlr.press/v119/wang20k/wang20k.pdf Understanding contrastive representation learning through alignment and uniformity on the hypersphere . In International Conference on Machine Learning (ICML), pages 9929--9939

work page 2020
[53]

Janyce Wiebe, Theresa Wilson, and Claire Cardie. 2005. https://www.cs.cornell.edu/home/cardie/papers/lre05withappendix.pdf Annotating expressions of opinions and emotions in language . Language resources and evaluation, 39(2-3):165--210

work page 2005
[62]

Adam: A method for stochastic optimization , author=

work page
[63]

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

Reimers, Nils and Gurevych, Iryna. Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks. 2019. doi:10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019
[64]

arXiv preprint arXiv:2103.15316 , year=

Whitening sentence representations for better semantics and faster retrieval , author=. arXiv preprint arXiv:2103.15316 , year=

work page arXiv
[65]

On the Sentence Embeddings from Pre-trained Language Models

Li, Bohan and Zhou, Hao and He, Junxian and Wang, Mingxuan and Yang, Yiming and Li, Lei. On the Sentence Embeddings from Pre-trained Language Models. 2020

work page 2020
[66]

2019 , url=

Representation Degeneration Problem in Training Natural Language Generation Models , author=. 2019 , url=

work page 2019
[67]

S em E val-2012 Task 6: A Pilot on Semantic Textual Similarity

Agirre, Eneko and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor. S em E val-2012 Task 6: A Pilot on Semantic Textual Similarity. * SEM 2012: The First Joint Conference on Lexical and Computational Semantics -- Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic...

work page 2012
[68]

* SEM 2013 shared task: Semantic Textual Similarity

Agirre, Eneko and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Guo, Weiwei. * SEM 2013 shared task: Semantic Textual Similarity. Second Joint Conference on Lexical and Computational Semantics (* SEM ), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity. 2013

work page 2013
[69]

S em E val-2014 Task 10: Multilingual Semantic Textual Similarity

Agirre, Eneko and Banea, Carmen and Cardie, Claire and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Guo, Weiwei and Mihalcea, Rada and Rigau, German and Wiebe, Janyce. S em E val-2014 Task 10: Multilingual Semantic Textual Similarity. Proceedings of the 8th International Workshop on Semantic Evaluation ( S em E val 2014). 2014. doi:10.3115/v1...

work page doi:10.3115/v1/s14-2010 2014
[70]

S em E val-2015 Task 2: Semantic Textual Similarity, E nglish, S panish and Pilot on Interpretability

Agirre, Eneko and Banea, Carmen and Cardie, Claire and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Guo, Weiwei and Lopez-Gazpio, I \ n igo and Maritxalar, Montse and Mihalcea, Rada and Rigau, German and Uria, Larraitz and Wiebe, Janyce. S em E val-2015 Task 2: Semantic Textual Similarity, E nglish, S panish and Pilot on Interpretability. Pro...

work page doi:10.18653/v1/s15-2045 2015
[71]

S em E val-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation

Agirre, Eneko and Banea, Carmen and Cer, Daniel and Diab, Mona and Gonzalez-Agirre, Aitor and Mihalcea, Rada and Rigau, German and Wiebe, Janyce. S em E val-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation. Proceedings of the 10th International Workshop on Semantic Evaluation ( S em E val-2016). 2016. doi:10.18653/v1/S16-1081

work page doi:10.18653/v1/s16-1081 2016
[72]

S em E val-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

Cer, Daniel and Diab, Mona and Agirre, Eneko and Lopez-Gazpio, I \ n igo and Specia, Lucia. S em E val-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. Proceedings of the 11th International Workshop on Semantic Evaluation ( S em E val-2017). 2017. doi:10.18653/v1/S17-2001

work page doi:10.18653/v1/s17-2001 2017
[73]

A SICK cure for the evaluation of compositional distributional semantic models

Marelli, Marco and Menini, Stefano and Baroni, Marco and Bentivogli, Luisa and Bernardi, Raffaella and Zamparelli, Roberto. A SICK cure for the evaluation of compositional distributional semantic models. 2014

work page 2014
[74]

S ent E val: An Evaluation Toolkit for Universal Sentence Representations

Conneau, Alexis and Kiela, Douwe. S ent E val: An Evaluation Toolkit for Universal Sentence Representations. 2018

work page 2018
[75]

Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity

Reimers, Nils and Beyer, Philip and Gurevych, Iryna. Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity. 2016

work page 2016
[76]

An Unsupervised Sentence Embedding Method by Mutual Information Maximization

Zhang, Yan and He, Ruidan and Liu, Zuozhu and Lim, Kwan Hui and Bing, Lidong. An Unsupervised Sentence Embedding Method by Mutual Information Maximization. 2020. doi:10.18653/v1/2020.emnlp-main.124

work page doi:10.18653/v1/2020.emnlp-main.124 2020
[77]

Dense Passage Retrieval for Open-Domain Question Answering

Karpukhin, Vladimir and Oguz, Barlas and Min, Sewon and Lewis, Patrick and Wu, Ledell and Edunov, Sergey and Chen, Danqi and Yih, Wen-tau. Dense Passage Retrieval for Open-Domain Question Answering. 2020. doi:10.18653/v1/2020.emnlp-main.550

work page doi:10.18653/v1/2020.emnlp-main.550 2020
[78]

Efficient Natural Language Response Suggestion for Smart Reply

Efficient natural language response suggestion for smart reply , author=. arXiv preprint arXiv:1705.00652 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[79]

Learning Dense Representations for Entity Retrieval , author=

work page
[80]

On the trace and the sum of elements of a matrix , journal =

Jorma Kaarlo Merikoski , abstract =. On the trace and the sum of elements of a matrix , journal =. 1984 , issn =. doi:https://doi.org/10.1016/0024-3795(84)90078-8 , url =

work page doi:10.1016/0024-3795(84)90078-8 1984
[81]

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Conneau, Alexis and Kiela, Douwe and Schwenk, Holger and Barrault, Lo. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. 2017. doi:10.18653/v1/D17-1070

work page doi:10.18653/v1/d17-1070 2017
[82]

and Brockett, Chris

Dolan, William B. and Brockett, Chris. Automatically Constructing a Corpus of Sentential Paraphrases. Proceedings of the Third International Workshop on Paraphrasing ( IWP 2005). 2005

work page 2005
[83]

A Continuously Growing Dataset of Sentential Paraphrases

Lan, Wuwei and Qiu, Siyu and He, Hua and Xu, Wei. A Continuously Growing Dataset of Sentential Paraphrases. 2017. doi:10.18653/v1/D17-1126

work page doi:10.18653/v1/d17-1126 2017
[84]

PAWS : Paraphrase Adversaries from Word Scrambling

Zhang, Yuan and Baldridge, Jason and He, Luheng. PAWS : Paraphrase Adversaries from Word Scrambling. 2019. doi:10.18653/v1/N19-1131

work page doi:10.18653/v1/n19-1131 2019
[85]

P ara NMT -50 M : Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations

Wieting, John and Gimpel, Kevin. P ara NMT -50 M : Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations. 2018. doi:10.18653/v1/P18-1042

work page doi:10.18653/v1/p18-1042 2018
[86]

and Angeli, Gabor and Potts, Christopher and Manning, Christopher D

Bowman, Samuel R. and Angeli, Gabor and Potts, Christopher and Manning, Christopher D. A large annotated corpus for learning natural language inference. 2015. doi:10.18653/v1/D15-1075

work page doi:10.18653/v1/d15-1075 2015
[87]

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

Williams, Adina and Nangia, Nikita and Bowman, Samuel. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. 2018. doi:10.18653/v1/N18-1101

work page doi:10.18653/v1/n18-1101 2018
[88]

From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions

Young, Peter and Lai, Alice and Hodosh, Micah and Hockenmaier, Julia. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics. 2014. doi:10.1162/tacl_a_00166

work page doi:10.1162/tacl_a_00166 2014
[89]

IEEE international conference on computer vision , pages=

Aligning books and movies: Towards story-like visual explanations by watching movies and reading books , author=. IEEE international conference on computer vision , pages=

work page
[90]

Cohen , booktitle=iclr, year=

Zhilin Yang and Zihang Dai and Ruslan Salakhutdinov and William W. Cohen , booktitle=iclr, year=. Breaking the Softmax Bottleneck: A High-Rank

work page
[91]

2020 , url=

Improving Neural Language Generation with Spectrum Control , author=. 2020 , url=

work page 2020
[92]

How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings

Ethayarajh, Kawin. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT , ELM o, and GPT -2 Embeddings. 2019. doi:10.18653/v1/D19-1006

work page doi:10.18653/v1/d19-1006 2019
[93]

2019 , organization=

Improving neural language modeling via adversarial training , author=. 2019 , organization=

work page 2019
[94]

A Latent Variable Model Approach to PMI -based Word Embeddings

Arora, Sanjeev and Li, Yuanzhi and Liang, Yingyu and Ma, Tengyu and Risteski, Andrej. A Latent Variable Model Approach to PMI -based Word Embeddings. 2016. doi:10.1162/tacl_a_00106

work page doi:10.1162/tacl_a_00106 2016
[95]

A simple but tough-to-beat baseline for sentence embeddings , author=

work page
[96]

2018 , url=

All-but-the-Top: Simple and Effective Postprocessing for Word Representations , author=. 2018 , url=

work page 2018
[97]

Towards universal paraphrastic sentence embeddings , author=

work page
[98]

Nice: Non-linear independent components estimation , author=

work page
[99]

2006 , organization=

Dimensionality reduction by learning an invariant mapping , author=. 2006 , organization=

work page 2006
[100]

A simple framework for contrastive learning of visual representations , author=

work page
[101]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1907
[102]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019. doi:10.18653/v1/N19-1423

work page doi:10.18653/v1/n19-1423 2019
[103]

ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

On sampling strategies for neural network-based collaborative filtering , author=. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=

work page
[104]

Momentum contrast for unsupervised visual representation learning , author=

work page
[105]

arXiv preprint arXiv:2012.15466 , year=

CLEAR: Contrastive Learning for Sentence Representation , author=. arXiv preprint arXiv:2012.15466 , year=

work page arXiv 2012
[106]

Meng, Yu and Xiong, Chenyan and Bajaj, Payal and Tiwary, Saurabh and Bennett, Paul and Han, Jiawei and Song, Xia , journal=

work page
[107]

2021 , url=

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval , author=. 2021 , url=

work page 2021
[108]

2018 , url=

An efficient framework for learning sentence representations , author=. 2018 , url=

work page 2018
[109]

Skip-thought vectors , author=

work page
[110]

Learning Distributed Representations of Sentences from Unlabelled Data

Hill, Felix and Cho, Kyunghyun and Korhonen, Anna. Learning Distributed Representations of Sentences from Unlabelled Data. 2016. doi:10.18653/v1/N16-1162

work page doi:10.18653/v1/n16-1162 2016

Showing first 80 references.