hub Canonical reference

Generating Sequences With Recurrent Neural Networks

Graves, A · 2013 · cs.NE · arXiv 1308.0850

Canonical reference. 89% of citing Pith papers cite this work as background.

44 Pith papers citing it

Background 89% of classified citations

open full Pith review browse 44 citing papers arXiv PDF

abstract

This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwriting (where the data are real-valued). It is then extended to handwriting synthesis by allowing the network to condition its predictions on a text sequence. The resulting system is able to generate highly realistic cursive handwriting in a wide variety of styles.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9

citation-polarity summary

background 8 unclear 1

representative citing papers

Adaptive Computation Time for Recurrent Neural Networks

cs.NE · 2016-03-29 · accept · novelty 8.0

ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.

Neural Turing Machines

cs.NE · 2014-10-20 · unverdicted · novelty 8.0

Neural Turing Machines augment neural networks with differentiable external memory to learn algorithmic tasks such as copying, sorting, and associative recall from examples.

Neural Machine Translation by Jointly Learning to Align and Translate

cs.CL · 2014-09-01 · accept · novelty 8.0

An attention-based encoder-decoder model achieves English-to-French translation performance comparable to phrase-based systems by automatically learning soft alignments.

Adam: A Method for Stochastic Optimization

cs.LG · 2014-12-22 · accept · novelty 7.5

A first-order stochastic optimizer that maintains bias-corrected exponential moving averages of the gradient and its square, dividing the former by the square root of the latter to set per-parameter step sizes.

Gradient Transformer: Learning to Generate Updates for LLMs

cs.LG · 2026-05-26 · unverdicted · novelty 7.0

Gradient Transformer learns to map TinyLM update vectors to LLM update vectors for data-free knowledge distillation using correlations from shadow datasets.

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

FLEXITOKENS: Flexible Tokenization for Evolving Language Models

cs.CL · 2025-07-17 · unverdicted · novelty 7.0

FLEXITOKENS replaces rigid subword tokenizers and fixed-compression auxiliary losses with a simplified boundary-prediction objective in byte-level models, yielding lower over-fragmentation and up to 10-point gains on multilingual and domain-adaptation tasks.

Improving language models by retrieving from trillions of tokens

cs.CL · 2021-12-08 · unverdicted · novelty 7.0

RETRO matches GPT-3 and Jurassic-1 performance on the Pile benchmark using 25 times fewer parameters by conditioning on retrieved chunks from a 2-trillion-token database.

Perceiver IO: A General Architecture for Structured Inputs & Outputs

cs.LG · 2021-07-30 · unverdicted · novelty 7.0

Perceiver IO is a general architecture that processes arbitrary structured inputs and outputs with linear scaling and achieves strong results on GLUE, Sintel optical flow, multi-task reasoning, and StarCraft II without task-specific components.

Connectivity-Optimized Representation Learning via Persistent Homology

cs.LG · 2019-06-21 · unverdicted · novelty 7.0

A persistent homology loss enforces controllable connectivity in autoencoder latent spaces, improving one-class classification via kernel density estimation on the learned representations.

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models

cs.CL · 2026-05-10 · conditional · novelty 7.0

Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.

Online Reasoning Video Object Segmentation

cs.CV · 2026-04-13 · unverdicted · novelty 7.0

The work introduces the ORVOS task, the ORVOSB benchmark with causal annotations across 210 videos, and a baseline using updated prompts plus a temporal token reservoir.

Unified Vector Floorplan Generation via Markup Representation

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

A single transformer model using a new markup representation generates functional floorplans from diverse conditions and outperforms prior task-specific methods on the RPLAN dataset.

Flamingo: a Visual Language Model for Few-Shot Learning

cs.CV · 2022-04-29 · unverdicted · novelty 7.0

Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

cs.LG · 2019-10-23 · unverdicted · novelty 7.0

T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.

Stochastic dynamics learning with state-space systems

stat.ML · 2025-08-11 · unverdicted · novelty 6.0

Establishes that fading memory and solution stability hold generically in state-space systems for reservoir computing even without the echo state property, with a distributional attractor perspective for stochastic cases.

Compressive Transformers for Long-Range Sequence Modelling

cs.LG · 2019-11-13 · unverdicted · novelty 6.0

Compressive Transformer sets new records on WikiText-103 (17.1 ppl) and Enwik8 (0.97 bpc) via memory compression and introduces the PG-19 long-range language benchmark.

Separable Convolutional LSTMs for Faster Video Segmentation

cs.CV · 2019-07-16 · unverdicted · novelty 6.0

Separable convLSTMs cut parameters and FLOPs in video segmentation, delivering up to 15% faster GPU inference with similar or slightly lower accuracy.

Generative Modeling by Estimating Gradients of the Data Distribution

cs.LG · 2019-07-12 · unverdicted · novelty 6.0

Score-based generative modeling via multi-noise-level score matching and annealed Langevin dynamics produces samples on par with GANs and sets a new inception score record on CIFAR-10.

A Unified Framework of Online Learning Algorithms for Training Recurrent Neural Networks

cs.LG · 2019-07-05 · accept · novelty 6.0

A framework unifies recent online RNN training algorithms along four axes and demonstrates performance clustering on synthetic tasks, indicating that gradient alignment is insufficient to explain success especially for stochastic methods.

Recurrent Adversarial Service Times

stat.ML · 2019-06-24 · unverdicted · novelty 6.0

RNN for arrivals paired with recurrent GAN for service times to model queuing dynamics without assuming specific inter-event distributions.

Anon: Extrapolating Adaptivity Beyond SGD and Adam

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.

CASHG: Context-Aware Stylized Online Handwriting Generation

cs.CV · 2026-04-02 · conditional · novelty 6.0

CASHG explicitly models inter-character connectivity with a Character Context Encoder and bigram-aware Transformer decoder to produce style-consistent sentence trajectories, plus a new CSM evaluation metric.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

cs.CL · 2022-11-09 · unverdicted · novelty 6.0

BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Anon: Extrapolating Adaptivity Beyond SGD and Adam cs.AI · 2026-05-04 · unverdicted · none · ref 5
Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.

Generating Sequences With Recurrent Neural Networks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer