Learning to Generate Reviews and Discovering Sentiment

Alec Radford , Rafal Jozefowicz , Ilya Sutskever

Authors on Pith no claims yet

classification 💻 cs.LG cs.CLcs.NE

keywords sentimentcorrespondingdatalearnedmodelsnegativepositiverepresentations

read the original abstract

We explore the properties of byte-level recurrent language models. When given sufficient amounts of capacity, training data, and compute time, the representations learned by these models include disentangled features corresponding to high-level concepts. Specifically, we find a single unit which performs sentiment analysis. These representations, learned in an unsupervised manner, achieve state of the art on the binary subset of the Stanford Sentiment Treebank. They are also very data efficient. When using only a handful of labeled examples, our approach matches the performance of strong baselines trained on full datasets. We also demonstrate the sentiment unit has a direct influence on the generative process of the model. Simply fixing its value to be positive or negative generates samples with the corresponding positive or negative sentiment.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
cs.LG 2022-11 conditional novelty 8.0

GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.
Toy Models of Superposition
cs.LG 2022-09 accept novelty 8.0

Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulne...
Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models
cs.CL 2026-05 conditional novelty 7.0

Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
cs.CV 2022-05 accept novelty 7.0

Imagen achieves state-of-the-art photorealistic text-to-image generation by scaling a text-only pretrained T5 language model within a diffusion framework, reaching FID 7.27 on COCO without training on it.
Fine-Tuning Language Models from Human Preferences
cs.CL 2019-09 unverdicted novelty 7.0

Language models fine-tuned via RL on 5k-60k human preference comparisons produce stylistically better text continuations and human-preferred summaries that sometimes copy input sentences.
Vision Transformers Need Registers
cs.CV 2023-09 unverdicted novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
The Platonic Representation Hypothesis
cs.LG 2024-05 unverdicted novelty 5.0

Representations learned by large AI models are converging toward a shared statistical model of reality.
DINOv2: Learning Robust Visual Features without Supervision
cs.CV 2023-04 unverdicted novelty 5.0

Pith review generated a malformed one-line summary.
From Weights to Activations: Is Steering the Next Frontier of Adaptation?
cs.CL 2026-04 unverdicted novelty 4.0

Steering is positioned as a distinct adaptation paradigm that uses targeted activation interventions for local, reversible behavioral changes without parameter updates.
The Rise and Potential of Large Language Model Based Agents: A Survey
cs.AI 2023-09 accept novelty 4.0

The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
A Survey of Large Language Models
cs.CL 2023-03 accept novelty 3.0

This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.