arxiv: 1707.05589 · v2 · pith:AVFLXBK6new · submitted 2017-07-18 · 💻 cs.CL

On the State of the Art of Evaluation in Neural Language Models

G\'abor Melis , Chris Dyer , Phil Blunsom This is my paper

classification 💻 cs.CL

keywords architectureslanguagemodelsneuralstateapparentlyarriveautomatic

0 comments p. Extension

Add this Pith Number to your LaTeX paper

\usepackage{pith}
\pithnumber{AVFLXBK6}

Prints a linked pith:AVFLXBK6 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Ongoing innovations in recurrent neural network architectures have provided a steady influx of apparently state-of-the-art results on language modelling benchmarks. However, these have been evaluated using differing code bases and limited computational resources, which represent uncontrolled sources of experimental variation. We reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrive at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models. We establish a new state of the art on the Penn Treebank and Wikitext-2 corpora, as well as strong baselines on the Hutter Prize dataset.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ChemCrow: Augmenting large-language models with chemistry tools
physics.chem-ph 2023-04 conditional novelty 6.0

ChemCrow augments LLMs with 18 expert chemistry tools to autonomously plan and execute syntheses and guide molecular discoveries in organic synthesis, drug discovery, and materials design.
Compressive Transformers for Long-Range Sequence Modelling
cs.LG 2019-11 unverdicted novelty 6.0

Compressive Transformer sets new records on WikiText-103 (17.1 ppl) and Enwik8 (0.97 bpc) via memory compression and introduces the PG-19 long-range language benchmark.