NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prior methods on combinatorial generalization tasks.
hub
Neural programmer-interpreters
10 Pith papers cite this work. Polarity classification is still indexing.
abstract
We propose the neural programmer-interpreter (NPI): a recurrent and compositional neural network that learns to represent and execute programs. NPI has three learnable components: a task-agnostic recurrent core, a persistent key-value program memory, and domain-specific encoders that enable a single NPI to operate in multiple perceptually diverse environments with distinct affordances. By learning to compose lower-level programs to express higher-level programs, NPI reduces sample complexity and increases generalization ability compared to sequence-to-sequence LSTMs. The program memory allows efficient learning of additional tasks by building on existing programs. NPI can also harness the environment (e.g. a scratch pad with read-write pointers) to cache intermediate results of computation, lessening the long-term memory burden on recurrent hidden units. In this work we train the NPI with fully-supervised execution traces; each program has example sequences of calls to the immediate subprograms conditioned on the input. Rather than training on a huge number of relatively weak labels, NPI learns from a small number of rich examples. We demonstrate the capability of our model to learn several types of compositional programs: addition, sorting, and canonicalizing 3D models. Furthermore, a single NPI learns to execute these programs and all 21 associated subprograms.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.
Training language models to generate intermediate computation steps on a scratchpad enables them to perform multi-step tasks such as long addition and arbitrary program execution that they otherwise fail at.
ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.
A transformer trained on random meaningless MicroPy programs generalizes to execute diverse human-written programs, providing empirical evidence it can act as a universal computer.
On GSM8K, outcome-based supervision achieves similar final-answer error rates to process-based with less labeling, but process-based or learned reward models are needed to reach 3.4% reasoning error among correct solutions.
CodeXGLUE supplies a standardized collection of 10 code-related tasks, 14 datasets, an evaluation platform, and BERT-, GPT-, and encoder-decoder-style baselines.
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives from traces.
A rationale is presented for developing an assistant in Minecraft to advance natural language understanding and dialogue learning.
citing papers explorer
-
Gradient-Based Program Synthesis with Neurally Interpreted Languages
NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prior methods on combinatorial generalization tasks.
-
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.
-
Show Your Work: Scratchpads for Intermediate Computation with Language Models
Training language models to generate intermediate computation steps on a scratchpad enables them to perform multi-step tasks such as long addition and arbitrary program execution that they otherwise fail at.
-
Adaptive Computation Time for Recurrent Neural Networks
ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.
-
Training Transformers as a Universal Computer
A transformer trained on random meaningless MicroPy programs generalizes to execute diverse human-written programs, providing empirical evidence it can act as a universal computer.
-
Solving math word problems with process- and outcome-based feedback
On GSM8K, outcome-based supervision achieves similar final-answer error rates to process-based with less labeling, but process-based or learned reward models are needed to reach 3.4% reasoning error among correct solutions.
-
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
CodeXGLUE supplies a standardized collection of 10 code-related tasks, 14 datasets, an evaluation platform, and BERT-, GPT-, and encoder-decoder-style baselines.
-
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
-
Neural Computers
Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives from traces.
-
Why Build an Assistant in Minecraft?
A rationale is presented for developing an assistant in Minecraft to advance natural language understanding and dialogue learning.