hub

Neural programmer-interpreters

Scott Reed, Nando De Freitas · 2015 · cs.LG · arXiv 1511.06279

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open full Pith review browse 10 citing papers arXiv PDF

abstract

We propose the neural programmer-interpreter (NPI): a recurrent and compositional neural network that learns to represent and execute programs. NPI has three learnable components: a task-agnostic recurrent core, a persistent key-value program memory, and domain-specific encoders that enable a single NPI to operate in multiple perceptually diverse environments with distinct affordances. By learning to compose lower-level programs to express higher-level programs, NPI reduces sample complexity and increases generalization ability compared to sequence-to-sequence LSTMs. The program memory allows efficient learning of additional tasks by building on existing programs. NPI can also harness the environment (e.g. a scratch pad with read-write pointers) to cache intermediate results of computation, lessening the long-term memory burden on recurrent hidden units. In this work we train the NPI with fully-supervised execution traces; each program has example sequences of calls to the immediate subprograms conditioned on the input. Rather than training on a huge number of relatively weak labels, NPI learns from a small number of rich examples. We demonstrate the capability of our model to learn several types of compositional programs: addition, sorting, and canonicalizing 3D models. Furthermore, a single NPI learns to execute these programs and all 21 associated subprograms.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Gradient-Based Program Synthesis with Neurally Interpreted Languages

cs.LG · 2026-04-20 · unverdicted · novelty 8.0

NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prior methods on combinatorial generalization tasks.

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

cs.LG · 2022-01-06 · unverdicted · novelty 8.0

Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.

Show Your Work: Scratchpads for Intermediate Computation with Language Models

cs.LG · 2021-11-30 · unverdicted · novelty 8.0

Training language models to generate intermediate computation steps on a scratchpad enables them to perform multi-step tasks such as long addition and arbitrary program execution that they otherwise fail at.

Adaptive Computation Time for Recurrent Neural Networks

cs.NE · 2016-03-29 · accept · novelty 8.0

ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.

Training Transformers as a Universal Computer

cs.AI · 2026-04-28 · unverdicted · novelty 7.0

A transformer trained on random meaningless MicroPy programs generalizes to execute diverse human-written programs, providing empirical evidence it can act as a universal computer.

Solving math word problems with process- and outcome-based feedback

cs.LG · 2022-11-25 · unverdicted · novelty 6.0

On GSM8K, outcome-based supervision achieves similar final-answer error rates to process-based with less labeling, but process-based or learned reward models are needed to reach 3.4% reasoning error among correct solutions.

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

cs.SE · 2021-02-09 · unverdicted · novelty 6.0

CodeXGLUE supplies a standardized collection of 10 code-related tasks, 14 datasets, an evaluation platform, and BERT-, GPT-, and encoder-decoder-style baselines.

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

cs.LG · 2021-04-27 · accept · novelty 6.0

Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.

Neural Computers

cs.LG · 2026-04-07 · unverdicted · novelty 5.0

Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives from traces.

Why Build an Assistant in Minecraft?

cs.AI · 2019-07-22 · unverdicted · novelty 4.0

A rationale is presented for developing an assistant in Minecraft to advance natural language understanding and dialogue learning.

citing papers explorer

Showing 10 of 10 citing papers.

Gradient-Based Program Synthesis with Neurally Interpreted Languages cs.LG · 2026-04-20 · unverdicted · none · ref 109
NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prior methods on combinatorial generalization tasks.
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets cs.LG · 2022-01-06 · unverdicted · none · ref 12
Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.
Show Your Work: Scratchpads for Intermediate Computation with Language Models cs.LG · 2021-11-30 · unverdicted · none · ref 15
Training language models to generate intermediate computation steps on a scratchpad enables them to perform multi-step tasks such as long addition and arbitrary program execution that they otherwise fail at.
Adaptive Computation Time for Recurrent Neural Networks cs.NE · 2016-03-29 · accept · none · ref 25
ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.
Training Transformers as a Universal Computer cs.AI · 2026-04-28 · unverdicted · none · ref 17
A transformer trained on random meaningless MicroPy programs generalizes to execute diverse human-written programs, providing empirical evidence it can act as a universal computer.
Solving math word problems with process- and outcome-based feedback cs.LG · 2022-11-25 · unverdicted · none · ref 33 · internal anchor
On GSM8K, outcome-based supervision achieves similar final-answer error rates to process-based with less labeling, but process-based or learned reward models are needed to reach 3.4% reasoning error among correct solutions.
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation cs.SE · 2021-02-09 · unverdicted · none · ref 67 · internal anchor
CodeXGLUE supplies a standardized collection of 10 code-related tasks, 14 datasets, an evaluation platform, and BERT-, GPT-, and encoder-decoder-style baselines.
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges cs.LG · 2021-04-27 · accept · none · ref 67
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
Neural Computers cs.LG · 2026-04-07 · unverdicted · none · ref 24
Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives from traces.
Why Build an Assistant in Minecraft? cs.AI · 2019-07-22 · unverdicted · none · ref 70 · internal anchor
A rationale is presented for developing an assistant in Minecraft to advance natural language understanding and dialogue learning.

Neural programmer-interpreters

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer