Universal Sentence Encoder

Brian Strope; Chris Tar; Daniel Cer; Mario Guajardo-Cespedes; Nan Hua; Nicole Limtiaco; Noah Constant; Ray Kurzweil; Rhomni St. John; Sheng-yi Kong

arxiv: 1803.11175 · v2 · pith:P2EICT7Tnew · submitted 2018-03-29 · 💻 cs.CL

Universal Sentence Encoder

Daniel Cer , Yinfei Yang , Sheng-yi Kong , Nan Hua , Nicole Limtiaco , Rhomni St. John , Noah Constant , Mario Guajardo-Cespedes

show 5 more authors

Steve Yuan Chris Tar Yun-hsuan Sung Brian Strope Ray Kurzweil

This is my paper

classification 💻 cs.CL

keywords transferlearningmodelssentencewordembeddingsencodingperformance

0 comments

read the original abstract

We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pre-trained sentence encoding models are made freely available for download and on TF Hub.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
cs.CL 2026-05 unverdicted novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reason...
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
cs.CL 2019-08 unverdicted novelty 8.0

Sentence-BERT adapts BERT with siamese and triplet networks to produce sentence embeddings for efficient cosine-similarity comparisons, cutting computation time from hours to seconds on similarity search while matchin...
Enhancing Healthcare Search Intent Recognition with Query Representation Learning and Session Context
cs.IR 2026-05 unverdicted novelty 7.0

Clustering-based query representations with a novel multi-intent loss and a concordance rate metric improve healthcare search intent classification on two real-world log datasets.
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
cs.RO 2022-04 accept novelty 7.0

SayCan combines an LLM's high-level semantic knowledge with robot skill value functions to select only feasible actions, enabling completion of abstract natural-language instructions on a real mobile manipulator.
Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation
cs.LG 2026-05 unverdicted novelty 6.0

Counterfactual metrics on semi-simulated benchmarks fail to identify the treatment effect estimators preferred by observable metrics on real datasets, with simple meta-learners outperforming specialized causal models.
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus
cs.CL 2026-05 unverdicted novelty 6.0

Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
Semantics-Aware Hierarchical Token Communication: Clustering, Bit Mapping, and Power Allocation
eess.SP 2026-04 unverdicted novelty 6.0

H-TokCom groups tokens by semantic similarity and protects cluster-level bits with higher power, raising semantic similarity from 0.206 to 0.279 at 3 dB SNR on COCO data.
REZE: Representation Regularization for Domain-adaptive Text Embedding Pre-finetuning
cs.CL 2026-04 unverdicted novelty 6.0

REZE controls representation shifts in contrastive pre-finetuning of text embeddings via eigenspace decomposition of anchor-positive pairs and adaptive soft-shrinkage on task-variant directions.
TR-EduVSum: A Turkish-Focused Dataset and Consensus Framework for Educational Video Summarization
cs.CL 2026-04 unverdicted novelty 6.0

Presents TR-EduVSum dataset and AutoMUP consensus framework for generating gold-standard summaries from multiple human annotations of Turkish educational videos.
Transformer-Empowered Actor-Critic Reinforcement Learning for Sequence-Aware Service Function Chain Partitioning
cs.NI 2025-04 unverdicted novelty 6.0

A Transformer-empowered actor-critic RL framework with ε-LoPe exploration and asymptotic return normalization improves long-term service acceptance, resource utilization, and scalability for sequence-aware SFC partiti...
Vision-Language Foundation Models as Effective Robot Imitators
cs.RO 2023-11 conditional novelty 6.0

RoboFlamingo adapts open-source vision-language models for robot manipulation tasks via single-step comprehension plus an explicit policy head, outperforming prior methods on benchmarks with only light fine-tuning.
Generic Intent Representation in Web Search
cs.IR 2019-07 unverdicted novelty 6.0

GEN Encoder learns query intent embeddings from click logs as weak supervision and multi-task paraphrase training, outperforming prior methods on intent similarity and using nearest-neighbor search to cover half of un...
Leveraging Linguistic Characteristics for Bipolar Disorder Recognition with Gender Differences
cs.IR 2019-07 unverdicted novelty 6.0

Gender-enriched syntactic pattern features from Twitter data recognize bipolar disorder with F1 scores above 91%, outperforming TF-IDF, LIWC, ELMO, and BERT baselines.
Measuring Embedding Sensitivity to Authorial Style in French: Comparing Literary Texts with Language Model Rewritings
cs.CL 2026-05 unverdicted novelty 5.0

Embeddings reliably capture authorial stylistic features in French literary texts, and these signals persist after LLM rewriting while showing model-specific patterns.
LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy
cs.LG 2026-05 unverdicted novelty 5.0

ACSE estimates LLM prompt uncertainty via adaptive clustering of semantic entropy across multiple responses and uses conformal prediction to bound error rates on accepted answers with distribution-free guarantees.
Detecting Malicious Intents in Smart Contracts with Pre-trained Programming Language Models
cs.SE 2025-08 unverdicted novelty 5.0

SmartIntentV2 uses a pre-trained BERT model on smart contracts to achieve an F1 score of 0.9279 for detecting malicious intents, outperforming previous models and GPT-4.1.
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
cs.RO 2025-07 unverdicted novelty 5.0

The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
Usenix'23 Extended Version: Smart Learning to Find Dumb Contracts
cs.CR 2023-04 unverdicted novelty 5.0

DLVA trains neural networks on bytecode to match Slither source labels at 92.7% accuracy and 0.2 seconds per contract while outperforming nine other tools at 99.7% average accuracy.
Learning Compressed Sentence Representations for On-Device Text Processing
cs.CL 2019-06 unverdicted novelty 5.0

Four binarization strategies turn continuous sentence embeddings into binary form, cutting storage by over 98% with only about 2% performance drop on downstream tasks.
Exploring Ethical Concerns of Mobile Applications from App Reviews: A Literature Survey
cs.SE 2026-04 accept novelty 4.0

A systematic literature survey of 37 studies synthesizes methods for identifying ethical concerns from app reviews and proposes a research agenda focused on automated detection.
Transfer Learning for Risk Classification of Social Media Posts: Model Evaluation Study
cs.CL 2019-07 unverdicted novelty 4.0

Finetuning GPT-1 on 150000 unlabeled Reachout.com posts then feeding the features into AutoML yields a new state-of-the-art macro F1 of 0.572 for triaging risk in 1588 labeled CLPsych 2017 posts without metadata or history.
Survey on reinforcement learning for language processing
cs.CL 2021-04 unverdicted novelty 2.0

This survey reviews reinforcement learning applications to natural language processing problems, especially conversational systems, including problem descriptions, suitability of RL, advantages, limitations, and promi...