Generating Sequences With Recurrent Neural Networks

Alex Graves

classification 💻 cs.NE cs.CL

keywords datahandwritinggeneratenetworksneuralrecurrentsequencestext

read the original abstract

This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwriting (where the data are real-valued). It is then extended to handwriting synthesis by allowing the network to condition its predictions on a text sequence. The resulting system is able to generate highly realistic cursive handwriting in a wide variety of styles.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 16 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Adaptive Computation Time for Recurrent Neural Networks
cs.NE 2016-03 accept novelty 8.0

ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.
Neural Turing Machines
cs.NE 2014-10 unverdicted novelty 8.0

Neural Turing Machines augment neural networks with differentiable external memory to learn algorithmic tasks such as copying, sorting, and associative recall from examples.
Neural Machine Translation by Jointly Learning to Align and Translate
cs.CL 2014-09 accept novelty 8.0

An attention-based encoder-decoder model achieves English-to-French translation performance comparable to phrase-based systems by automatically learning soft alignments.
Adam: A Method for Stochastic Optimization
cs.LG 2014-12 accept novelty 7.5

A first-order stochastic optimizer that maintains bias-corrected exponential moving averages of the gradient and its square, dividing the former by the square root of the latter to set per-parameter step sizes.
Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models
cs.CL 2026-05 conditional novelty 7.0

Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.
Online Reasoning Video Object Segmentation
cs.CV 2026-04 unverdicted novelty 7.0

The work introduces the ORVOS task, the ORVOSB benchmark with causal annotations across 210 videos, and a baseline using updated prompts plus a temporal token reservoir.
Unified Vector Floorplan Generation via Markup Representation
cs.CV 2026-04 unverdicted novelty 7.0

A single transformer model using a new markup representation generates functional floorplans from diverse conditions and outperforms prior task-specific methods on the RPLAN dataset.
Flamingo: a Visual Language Model for Few-Shot Learning
cs.CV 2022-04 unverdicted novelty 7.0

Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
cs.LG 2019-10 unverdicted novelty 7.0

T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colo...
Anon: Extrapolating Adaptivity Beyond SGD and Adam
cs.AI 2026-05 unverdicted novelty 6.0

Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.
CASHG: Context-Aware Stylized Online Handwriting Generation
cs.CV 2026-04 conditional novelty 6.0

CASHG explicitly models inter-character connectivity with a Character Context Encoder and bigram-aware Transformer decoder to produce style-consistent sentence trajectories, plus a new CSM evaluation metric.
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
cs.CL 2022-11 unverdicted novelty 6.0

BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
cs.LG 2021-04 accept novelty 6.0

Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
Universal Transformers
cs.CL 2018-07 unverdicted novelty 6.0

Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.
Attention Is All You Need
cs.CL 2017-06 unverdicted novelty 5.0

Pith review generated a malformed one-line summary.
Large Language Models: A Survey
cs.CL 2024-02 accept novelty 3.0

The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.