Semi-supervised Sequence Learning

Semi-supervised Sequence Learning , Year = · 2015 · cs.LG · arXiv 1511.01432

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

We present two approaches that use unlabeled data to improve sequence learning with recurrent networks. The first approach is to predict what comes next in a sequence, which is a conventional language model in natural language processing. The second approach is to use a sequence autoencoder, which reads the input sequence into a vector and predicts the input sequence again. These two algorithms can be used as a "pretraining" step for a later supervised sequence learning algorithm. In other words, the parameters obtained from the unsupervised step can be used as a starting point for other supervised training models. In our experiments, we find that long short term memory recurrent networks after being pretrained with the two approaches are more stable and generalize better. With pretraining, we are able to train long short term memory recurrent networks up to a few hundred timesteps, thereby achieving strong performance in many text classification tasks, such as IMDB, DBpedia and 20 Newsgroups.

representative citing papers

Scaling Laws for Transfer

cs.LG · 2021-02-02 · unverdicted · novelty 6.0

Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

A General Language Assistant as a Laboratory for Alignment

cs.CL · 2021-12-01 · conditional · novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

Release Strategies and the Social Impacts of Language Models

cs.CL · 2019-08-24 · accept · novelty 4.0

OpenAI describes using staged releases for GPT-2 to balance beneficial uses against misuse risks and offers recommendations for AI publication.

citing papers explorer

Showing 4 of 4 citing papers.

Scaling Laws for Transfer cs.LG · 2021-02-02 · unverdicted · none · ref 169 · internal anchor
Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 99
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 44
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
Release Strategies and the Social Impacts of Language Models cs.CL · 2019-08-24 · accept · none · ref 1 · internal anchor
OpenAI describes using staged releases for GPT-2 to balance beneficial uses against misuse risks and offers recommendations for AI publication.

Semi-supervised Sequence Learning

fields

years

verdicts

representative citing papers

citing papers explorer