arxiv: 1808.08949 · v2 · pith:5G3EPDVTnew · submitted 2018-08-27 · 💻 cs.CL

Dissecting Contextual Word Embeddings: Architecture and Representation

Matthew E. Peters , Mark Neumann , Luke Zettlemoyer , Wen-tau Yih This is my paper

classification 💻 cs.CL

keywords contextualrepresentationswordarchitectureaccuracyarchitecturesbilmsembeddings

0 comments p. Extension

Add this Pith Number to your LaTeX paper

\usepackage{pith}
\pithnumber{5G3EPDVT}

Prints a linked pith:5G3EPDVT badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. However, many questions remain as to how and why these models are so effective. In this paper, we present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned. We show there is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks. Additionally, all architectures learn representations that vary with network depth, from exclusively morphological based at the word embedding layer through local syntax based in the lower contextual layers to longer range semantics such coreference at the upper layers. Together, these results suggest that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SEER: Spectral Entropy Encoding of Roles for Context-Aware Attention-Based Design Pattern Detection
cs.SE 2026-01 conditional novelty 5.0

SEER adds spectral-entropy role encoding from Laplacian spectra and empirically calibrated time-weighted calling contexts to raise macro-F1 from 92.47% to 93.20% and accuracy from 92.52% to 93.98% on PyDesignNet for 2...
Query Expansion in the Age of Pre-trained and Large Language Models: A Comprehensive Survey
cs.IR 2025-09 unverdicted novelty 5.0

A comprehensive survey that organizes query expansion methods in the PLM/LLM era along four design dimensions, synthesizes application patterns, and outlines future directions.