WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations

Mohammad Taher Pilehvar, Jose Camacho-Collados · 2018 · cs.CL · arXiv 1808.09121

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

open full Pith review browse 8 citing papers arXiv PDF

abstract

By design, word embeddings are unable to model the dynamic nature of words' semantics, i.e., the property of words to correspond to potentially different meanings. To address this limitation, dozens of specialized meaning representation techniques such as sense or contextualized embeddings have been proposed. However, despite the popularity of research on this topic, very few evaluation benchmarks exist that specifically focus on the dynamic semantics of words. In this paper we show that existing models have surpassed the performance ceiling of the standard evaluation dataset for the purpose, i.e., Stanford Contextual Word Similarity, and highlight its shortcomings. To address the lack of a suitable benchmark, we put forward a large-scale Word in Context dataset, called WiC, based on annotations curated by experts, for generic evaluation of context-sensitive representations. WiC is released in https://pilehvar.github.io/wic/.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Language Models are Few-Shot Learners

cs.CL · 2020-05-28 · accept · novelty 8.0

GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.

OLMo: Accelerating the Science of Language Models

cs.CL · 2024-02-01 · accept · novelty 7.0

OLMo delivers a fully open competitive language model with training data, code, and evaluations to enable community-driven scientific research on LMs.

The Power of Scale for Parameter-Efficient Prompt Tuning

cs.CL · 2021-04-18 · unverdicted · novelty 7.0

Prompt tuning matches full model tuning performance on large language models while tuning only a small fraction of parameters and improves robustness to domain shifts.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

cs.LG · 2019-10-23 · unverdicted · novelty 7.0

T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.

PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

cs.CL · 2026-05-13 · unverdicted · novelty 6.0

PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

cs.CL · 2019-05-02 · accept · novelty 6.0

SuperGLUE is a new benchmark with more difficult language understanding tasks, a toolkit, and leaderboard to drive further progress beyond GLUE.

SMoE: An Algorithm-System Co-Design for Pushing MoE to the Edge via Expert Substitution

cs.AI · 2025-08-26 · unverdicted · novelty 5.0

SMoE substitutes low-importance experts with cached similar ones in MoE inference on edge devices to achieve 48% lower decoding latency and over 60% cache hit rate with nearly lossless accuracy.

Lessons from the Trenches on Reproducible Evaluation of Language Models

cs.CL · 2024-05-23

citing papers explorer

Showing 8 of 8 citing papers.

Language Models are Few-Shot Learners cs.CL · 2020-05-28 · accept · none · ref 58
GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
OLMo: Accelerating the Science of Language Models cs.CL · 2024-02-01 · accept · none · ref 7 · internal anchor
OLMo delivers a fully open competitive language model with training data, code, and evaluations to enable community-driven scientific research on LMs.
The Power of Scale for Parameter-Efficient Prompt Tuning cs.CL · 2021-04-18 · unverdicted · none · ref 35
Prompt tuning matches full model tuning performance on large language models while tuning only a small fraction of parameters and improves robustness to domain shifts.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer cs.LG · 2019-10-23 · unverdicted · none · ref 55
T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.
PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts cs.CL · 2026-05-13 · unverdicted · none · ref 45 · internal anchor
PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems cs.CL · 2019-05-02 · accept · none · ref 133 · internal anchor
SuperGLUE is a new benchmark with more difficult language understanding tasks, a toolkit, and leaderboard to drive further progress beyond GLUE.
SMoE: An Algorithm-System Co-Design for Pushing MoE to the Edge via Expert Substitution cs.AI · 2025-08-26 · unverdicted · none · ref 16 · internal anchor
SMoE substitutes low-importance experts with cached similar ones in MoE inference on edge devices to achieve 48% lower decoding latency and over 60% cache hit rate with nearly lossless accuracy.
Lessons from the Trenches on Reproducible Evaluation of Language Models cs.CL · 2024-05-23 · unreviewed · ref 210 · internal anchor

WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer