pith. machine review for the scientific record. sign in

arxiv: 1811.01088 · v2 · submitted 2018-11-02 · 💻 cs.CL

Recognition: unknown

Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

Authors on Pith no claims yet
classification 💻 cs.CL
keywords traininglanguagetaskssupplementarybertencodersglueimprovements
0
0 comments X
read the original abstract

Pretraining sentence encoders with language modeling and related unsupervised tasks has recently been shown to be very effective for language understanding tasks. By supplementing language model-style pretraining with further training on data-rich supervised tasks, such as natural language inference, we obtain additional performance improvements on the GLUE benchmark. Applying supplementary training on BERT (Devlin et al., 2018), we attain a GLUE score of 81.8---the state of the art (as of 02/24/2019) and a 1.4 point improvement over BERT. We also observe reduced variance across random restarts in this setting. Our approach yields similar improvements when applied to ELMo (Peters et al., 2018a) and Radford et al. (2018)'s model. In addition, the benefits of supplementary training are particularly pronounced in data-constrained regimes, as we show in experiments with artificially limited training data.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Language Models are Few-Shot Learners

    cs.CL 2020-05 accept novelty 8.0

    GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.

  2. OPT: Open Pre-trained Transformer Language Models

    cs.CL 2022-05 unverdicted novelty 7.0

    OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

  3. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

    cs.LG 2019-10 unverdicted novelty 7.0

    T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colo...

  4. BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

    cs.CL 2019-05 accept novelty 7.0

    BoolQ introduces naturally occurring yes/no questions as a challenging benchmark where BERT fine-tuned on MultiNLI reaches 80.4% accuracy against 90% human performance.

  5. SocialIQA: Commonsense Reasoning about Social Interactions

    cs.CL 2019-04 unverdicted novelty 7.0

    SocialIQA is the first large-scale benchmark with 38k crowdsourced questions testing commonsense about social interactions, where pretrained language models trail humans by over 20% but transfer to improve performance...

  6. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

    cs.CL 2019-05 accept novelty 6.0

    SuperGLUE is a new benchmark with more difficult language understanding tasks, a toolkit, and leaderboard to drive further progress beyond GLUE.