What do you learn from context? Probing for sentence structure in contextualized word representations

Adam Poliak; Alex Wang; Benjamin Van Durme; Berlin Chen; Dipanjan Das; Ellie Pavlick; Ian Tenney; Najoung Kim; Patrick Xia; R Thomas McCoy

arxiv: 1905.06316 · v1 · pith:UFNLAAWDnew · submitted 2019-05-15 · 💻 cs.CL

What do you learn from context? Probing for sentence structure in contextualized word representations

Ian Tenney , Patrick Xia , Berlin Chen , Alex Wang , Adam Poliak , R Thomas McCoy , Najoung Kim , Benjamin Van Durme

show 3 more authors

Samuel R. Bowman Dipanjan Das Ellie Pavlick

This is my paper

classification 💻 cs.CL

keywords modelsprobingrepresentationstaskscontextualizedphenomenarecentsemantic

0 comments

read the original abstract

Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of downstream NLP tasks. Building on recent token-level probing work, we introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline. We probe word-level contextual representations from four recent models and investigate how they encode sentence structure across a range of syntactic, semantic, local, and long-range phenomena. We find that existing models trained on language modeling and translation produce strong representations for syntactic phenomena, but only offer comparably small improvements on semantic tasks over a non-contextual baseline.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
cs.CL 2026-05 unverdicted novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reason...
Mixing Times of Glauber Dynamics on Masked Language Models
cs.LG 2026-05 unverdicted novelty 6.0

Analysis of Glauber dynamics on masked language models shows O(n log n) mixing under bounded cross-token influence and metastability with exponential escape times at low temperatures, plus empirical phase transitions.
On the Blessing of Pre-training in Weak-to-Strong Generalization
cs.LG 2026-05 unverdicted novelty 6.0

Pre-training provides a geometric warm start in a single-index model that enables weak-to-strong generalization up to a supervisor-limited bound, with empirical phase-transition evidence in LLMs.
From Words to Amino Acids: Does the Curse of Depth Persist?
cs.LG 2026-02 unverdicted novelty 6.0

Protein language models exhibit consistent depth inefficiency where most task-relevant computation occurs in a subset of layers, mirroring patterns in large language models.
Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction
cs.CL 2026-04 unverdicted novelty 5.0

BERT embeddings encode narrative dimensions of time, space, causality, and character at the token level, as a linear probe achieves 94% accuracy versus 47% on variance-matched random embeddings, though unsupervised cl...
Enriching and Controlling Global Semantics for Text Summarization
cs.CL 2021-09 unverdicted novelty 5.0

A normalizing-flow neural topic model plus control mechanism are added to Transformer summarizers to supply and regulate global semantics, with reported gains over prior models on five benchmarks.