pith. sign in

hub Canonical reference

Generating Sequences With Recurrent Neural Networks

Canonical reference. 89% of citing Pith papers cite this work as background.

38 Pith papers citing it
Background 89% of classified citations
abstract

This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwriting (where the data are real-valued). It is then extended to handwriting synthesis by allowing the network to condition its predictions on a text sequence. The resulting system is able to generate highly realistic cursive handwriting in a wide variety of styles.

hub tools

citation-role summary

background 9

citation-polarity summary

roles

background 9

polarities

background 8 unclear 1

representative citing papers

Adaptive Computation Time for Recurrent Neural Networks

cs.NE · 2016-03-29 · accept · novelty 8.0

ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.

Neural Turing Machines

cs.NE · 2014-10-20 · unverdicted · novelty 8.0

Neural Turing Machines augment neural networks with differentiable external memory to learn algorithmic tasks such as copying, sorting, and associative recall from examples.

Adam: A Method for Stochastic Optimization

cs.LG · 2014-12-22 · accept · novelty 7.5

A first-order stochastic optimizer that maintains bias-corrected exponential moving averages of the gradient and its square, dividing the former by the square root of the latter to set per-parameter step sizes.

FLEXITOKENS: Flexible Tokenization for Evolving Language Models

cs.CL · 2025-07-17 · unverdicted · novelty 7.0

FLEXITOKENS replaces rigid subword tokenizers and fixed-compression auxiliary losses with a simplified boundary-prediction objective in byte-level models, yielding lower over-fragmentation and up to 10-point gains on multilingual and domain-adaptation tasks.

Perceiver IO: A General Architecture for Structured Inputs & Outputs

cs.LG · 2021-07-30 · unverdicted · novelty 7.0

Perceiver IO is a general architecture that processes arbitrary structured inputs and outputs with linear scaling and achieves strong results on GLUE, Sintel optical flow, multi-task reasoning, and StarCraft II without task-specific components.

Online Reasoning Video Object Segmentation

cs.CV · 2026-04-13 · unverdicted · novelty 7.0

The work introduces the ORVOS task, the ORVOSB benchmark with causal annotations across 210 videos, and a baseline using updated prompts plus a temporal token reservoir.

Flamingo: a Visual Language Model for Few-Shot Learning

cs.CV · 2022-04-29 · unverdicted · novelty 7.0

Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.

Stochastic dynamics learning with state-space systems

stat.ML · 2025-08-11 · unverdicted · novelty 6.0

Establishes that fading memory and solution stability hold generically in state-space systems for reservoir computing even without the echo state property, with a distributional attractor perspective for stochastic cases.

Recurrent Adversarial Service Times

stat.ML · 2019-06-24 · unverdicted · novelty 6.0

RNN for arrivals paired with recurrent GAN for service times to model queuing dynamics without assuming specific inter-event distributions.

Anon: Extrapolating Adaptivity Beyond SGD and Adam

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.

CASHG: Context-Aware Stylized Online Handwriting Generation

cs.CV · 2026-04-02 · conditional · novelty 6.0

CASHG explicitly models inter-character connectivity with a Character Context Encoder and bigram-aware Transformer decoder to produce style-consistent sentence trajectories, plus a new CSM evaluation metric.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

cs.CL · 2022-11-09 · unverdicted · novelty 6.0

BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.

citing papers explorer

Showing 38 of 38 citing papers.

  • Adaptive Computation Time for Recurrent Neural Networks cs.NE · 2016-03-29 · accept · none · ref 8

    ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.

  • Neural Turing Machines cs.NE · 2014-10-20 · unverdicted · none · ref 13

    Neural Turing Machines augment neural networks with differentiable external memory to learn algorithmic tasks such as copying, sorting, and associative recall from examples.

  • Neural Machine Translation by Jointly Learning to Align and Translate cs.CL · 2014-09-01 · accept · none · ref 13

    An attention-based encoder-decoder model achieves English-to-French translation performance comparable to phrase-based systems by automatically learning soft alignments.

  • Adam: A Method for Stochastic Optimization cs.LG · 2014-12-22 · accept · none · ref 4

    A first-order stochastic optimizer that maintains bias-corrected exponential moving averages of the gradient and its square, dividing the former by the square root of the latter to set per-parameter step sizes.

  • PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media cs.CL · 2026-05-16 · unverdicted · none · ref 248 · internal anchor

    PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

  • FLEXITOKENS: Flexible Tokenization for Evolving Language Models cs.CL · 2025-07-17 · unverdicted · none · ref 32 · internal anchor

    FLEXITOKENS replaces rigid subword tokenizers and fixed-compression auxiliary losses with a simplified boundary-prediction objective in byte-level models, yielding lower over-fragmentation and up to 10-point gains on multilingual and domain-adaptation tasks.

  • Improving language models by retrieving from trillions of tokens cs.CL · 2021-12-08 · unverdicted · none · ref 18 · internal anchor

    RETRO matches GPT-3 and Jurassic-1 performance on the Pile benchmark using 25 times fewer parameters by conditioning on retrieved chunks from a 2-trillion-token database.

  • Perceiver IO: A General Architecture for Structured Inputs & Outputs cs.LG · 2021-07-30 · unverdicted · none · ref 29 · internal anchor

    Perceiver IO is a general architecture that processes arbitrary structured inputs and outputs with linear scaling and achieves strong results on GLUE, Sintel optical flow, multi-task reasoning, and StarCraft II without task-specific components.

  • Connectivity-Optimized Representation Learning via Persistent Homology cs.LG · 2019-06-21 · unverdicted · none · ref 12 · internal anchor

    A persistent homology loss enforces controllable connectivity in autoencoder latent spaces, improving one-class classification via kernel density estimation on the learned representations.

  • Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models cs.CL · 2026-05-10 · conditional · none · ref 35

    Scratchpad Patching decouples compute from patch size in byte-level language models by inserting entropy-triggered scratchpads to update patch context dynamically.

  • Online Reasoning Video Object Segmentation cs.CV · 2026-04-13 · unverdicted · none · ref 16

    The work introduces the ORVOS task, the ORVOSB benchmark with causal annotations across 210 videos, and a baseline using updated prompts plus a temporal token reservoir.

  • Unified Vector Floorplan Generation via Markup Representation cs.CV · 2026-04-06 · unverdicted · none · ref 7

    A single transformer model using a new markup representation generates functional floorplans from diverse conditions and outperforms prior task-specific methods on the RPLAN dataset.

  • Flamingo: a Visual Language Model for Few-Shot Learning cs.CV · 2022-04-29 · unverdicted · none · ref 32

    Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.

  • Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer cs.LG · 2019-10-23 · unverdicted · none · ref 21

    T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.

  • Stochastic dynamics learning with state-space systems stat.ML · 2025-08-11 · unverdicted · none · ref 28 · internal anchor

    Establishes that fading memory and solution stability hold generically in state-space systems for reservoir computing even without the echo state property, with a distributional attractor perspective for stochastic cases.

  • Compressive Transformers for Long-Range Sequence Modelling cs.LG · 2019-11-13 · unverdicted · none · ref 117 · internal anchor

    Compressive Transformer sets new records on WikiText-103 (17.1 ppl) and Enwik8 (0.97 bpc) via memory compression and introduces the PG-19 long-range language benchmark.

  • Separable Convolutional LSTMs for Faster Video Segmentation cs.CV · 2019-07-16 · unverdicted · none · ref 7 · internal anchor

    Separable convLSTMs cut parameters and FLOPs in video segmentation, delivering up to 15% faster GPU inference with similar or slightly lower accuracy.

  • Generative Modeling by Estimating Gradients of the Data Distribution cs.LG · 2019-07-12 · unverdicted · none · ref 17 · internal anchor

    Score-based generative modeling via multi-noise-level score matching and annealed Langevin dynamics produces samples on par with GANs and sets a new inception score record on CIFAR-10.

  • A Unified Framework of Online Learning Algorithms for Training Recurrent Neural Networks cs.LG · 2019-07-05 · accept · none · ref 6 · internal anchor

    A framework unifies recent online RNN training algorithms along four axes and demonstrates performance clustering on synthetic tasks, indicating that gradient alignment is insufficient to explain success especially for stochastic methods.

  • Recurrent Adversarial Service Times stat.ML · 2019-06-24 · unverdicted · none · ref 9 · internal anchor

    RNN for arrivals paired with recurrent GAN for service times to model queuing dynamics without assuming specific inter-event distributions.

  • Anon: Extrapolating Adaptivity Beyond SGD and Adam cs.AI · 2026-05-04 · unverdicted · none · ref 5

    Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.

  • CASHG: Context-Aware Stylized Online Handwriting Generation cs.CV · 2026-04-02 · conditional · none · ref 13

    CASHG explicitly models inter-character connectivity with a Character Context Encoder and bigram-aware Transformer decoder to produce style-consistent sentence trajectories, plus a new CSM evaluation metric.

  • BLOOM: A 176B-Parameter Open-Access Multilingual Language Model cs.CL · 2022-11-09 · unverdicted · none · ref 243

    BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.

  • Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges cs.LG · 2021-04-27 · accept · none · ref 33

    Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.

  • Universal Transformers cs.CL · 2018-07-10 · unverdicted · none · ref 11

    Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

  • Lect\=uraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching cs.CL · 2026-06-15 · unverdicted · none · ref 69 · internal anchor

    LectūraAgents proposes a hierarchical multi-agent system with adaptive embodied teaching and the TASA algorithm for personalized AI-assisted learning, reporting gains in content quality, teaching actions, and personalization over baselines via expert educator validation on sample courses.

  • A Dialogue between Causal and Traditional Representation Learning: Toward Mutual Benefits in a Unified Formulation cs.LG · 2026-05-20 · unverdicted · none · ref 18 · internal anchor

    The paper introduces a unified formulation for representation learning with task and constraint components, arguing for mutual benefits between causal and traditional approaches and showing via experiments that causal constraint effectiveness depends on paired tasks.

  • Towards Migrating Neural Network Implementations cs.LG · 2025-11-04 · unverdicted · none · ref 18 · internal anchor

    A pivot-model abstraction method enables automatic migration of neural network implementations between frameworks such as PyTorch and TensorFlow while preserving functional equivalence.

  • Attention Is All You Need cs.CL · 2017-06-12 · unverdicted · none · ref 10

    Pith review generated a malformed one-line summary.

  • FruitEnsemble: MLLM-Guided Arbitration for Heterogeneous ensemble in Fine-Grained Fruit Recognition cs.CV · 2026-05-20 · unverdicted · none · ref 15 · internal anchor

    FruitEnsemble uses a weighted ensemble of backbones for top-3 candidates followed by MLLM arbitration on low-confidence samples to reach 70.49% accuracy on a new 306-class fruit dataset.

  • AIDOVECL: AI-generated Dataset of Outpainted Vehicles for Eye-level Classification and Localization cs.CV · 2024-10-31 · unverdicted · none · ref 3 · internal anchor

    AIDOVECL generates annotated synthetic vehicle images via outpainting to augment real training data and improve object detection performance by up to 10% overall and higher in diverse contexts.

  • RobustTP: End-to-End Trajectory Prediction for Heterogeneous Road-Agents in Dense Traffic with Noisy Sensor Inputs cs.RO · 2019-07-20 · unverdicted · none · ref 14 · internal anchor

    RobustTP uses a non-linear motion model plus instance segmentation to create noisy trajectories, then an LSTM-CNN to predict 5-second future positions of heterogeneous agents in dense traffic, claiming up to 18% ADE and 35.5% FDE gains over prior methods.

  • Convolutional Reservoir Computing for World Models cs.LG · 2019-07-18 · unverdicted · none · ref 14 · internal anchor

    RCRC uses untrained random CNNs and reservoir computing plus evolution strategies to reach claimed state-of-the-art scores in reinforcement learning tasks while avoiding data storage and heavy training.

  • Accurate Robotic Pouring for Serving Drinks cs.RO · 2019-06-21 · unverdicted · none · ref 6 · internal anchor

    RNN controller trained on human demos achieves 4ml pouring error on unseen containers for water and comparable accuracy for oil and syrup.

  • Recurrent Neural Networks with Long Term Temporal Dependencies in Machine Tool Wear Diagnosis and Prognosis eess.SP · 2019-07-27 · unverdicted · none · ref 7 · internal anchor

    LSTM RNNs model tool wear transition and observation functions from vibration data to enable one- and two-step ahead predictions and RUL estimation, outperforming simple RNNs.

  • Multiplicative Models for Recurrent Language Modeling cs.LG · 2019-06-30 · unverdicted · none · ref 7 · internal anchor

    New multiplicative RNN models are tested on char-level LM tasks to demonstrate the relevance of shared parametrization for the intermediate state.

  • Large Language Models: A Survey cs.CL · 2024-02-09 · accept · none · ref 16

    The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.

  • Bridging Language Models and Financial Analysis q-fin.ST · 2025-03-14 · unverdicted · none · ref 35 · internal anchor

    A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.