pith. machine review for the scientific record. sign in

hub

Neural Turing Machines

28 Pith papers cite this work. Polarity classification is still indexing.

28 Pith papers citing it
abstract

We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.

hub tools

citation-role summary

background 1

citation-polarity summary

roles

background 1

polarities

background 1

representative citing papers

Gradient-Based Program Synthesis with Neurally Interpreted Languages

cs.LG · 2026-04-20 · unverdicted · novelty 8.0

NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prior methods on combinatorial generalization tasks.

Adaptive Computation Time for Recurrent Neural Networks

cs.NE · 2016-03-29 · accept · novelty 8.0

ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.

Intrinsic Vicarious Conditioning for Deep Reinforcement Learning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Vicarious conditioning is proposed as a new intrinsic reward in RL that implements attention, retention, reproduction, and reinforcement via memory methods to enable low-shot learning from others without their policies or rewards, yielding longer episodes in tested environments.

Neural Information Causality

quant-ph · 2026-05-10 · unverdicted · novelty 7.0

Neural-IC separates embedding inequalities from capacity bounds in query-separated computations, with one-bit RAC benchmarks and CHSH-layer stability selecting the Tsirelson threshold for quantum enhancements.

Screening Is Enough

cs.LG · 2026-04-01 · unverdicted · novelty 7.0

Multiscreen replaces softmax attention with screening to provide absolute query-key relevance, resulting in models with 30% fewer parameters that maintain stable performance at long contexts.

Concrete Problems in AI Safety

cs.AI · 2016-06-21 · accept · novelty 7.0

The paper categorizes five concrete AI safety problems arising from flawed objectives, costly evaluation, and learning dynamics.

Universal Transformers

cs.CL · 2018-07-10 · unverdicted · novelty 6.0

Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

TIDE: Every Layer Knows the Token Beneath the Context

cs.CL · 2026-05-07 · unverdicted · novelty 5.0

TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.

citing papers explorer

Showing 28 of 28 citing papers.

  • Gradient-Based Program Synthesis with Neurally Interpreted Languages cs.LG · 2026-04-20 · unverdicted · none · ref 88 · internal anchor

    NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prior methods on combinatorial generalization tasks.

  • On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication cs.LG · 2026-03-30 · unverdicted · none · ref 35 · internal anchor

    Long-range dependency in integer multiplication is a mirage from 1D representation; a 2D grid reduces it to local 3x3 operations, letting a 321-parameter neural cellular automaton generalize perfectly to inputs 683 times longer than training while Transformers fail.

  • RULER: What's the Real Context Size of Your Long-Context Language Models? cs.CL · 2024-04-09 · accept · none · ref 15 · internal anchor

    RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.

  • Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets cs.LG · 2022-01-06 · unverdicted · none · ref 5 · internal anchor

    Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.

  • Show Your Work: Scratchpads for Intermediate Computation with Language Models cs.LG · 2021-11-30 · unverdicted · none · ref 9 · internal anchor

    Training language models to generate intermediate computation steps on a scratchpad enables them to perform multi-step tasks such as long addition and arbitrary program execution that they otherwise fail at.

  • Categorical Reparameterization with Gumbel-Softmax stat.ML · 2016-11-03 · unverdicted · none · ref 4 · internal anchor

    Gumbel-Softmax provides a continuous relaxation of categorical sampling that anneals to discrete samples for gradient-based optimization.

  • Adaptive Computation Time for Recurrent Neural Networks cs.NE · 2016-03-29 · accept · none · ref 10 · internal anchor

    ACT lets RNNs dynamically adapt computation depth per input via a differentiable halting unit, yielding large gains on synthetic tasks and structural insights on language data.

  • Does Engram Do Memory Retrieval in Autoregressive Image Generation? cs.CV · 2026-05-13 · accept · none · ref 5 · internal anchor

    Engram in AR image generation saves backbone FLOPs but trails pure AR baselines in FID and behaves as a gated side-pathway rather than a content-addressed retriever.

  • Intrinsic Vicarious Conditioning for Deep Reinforcement Learning cs.LG · 2026-05-12 · unverdicted · none · ref 27 · internal anchor

    Vicarious conditioning is proposed as a new intrinsic reward in RL that implements attention, retention, reproduction, and reinforcement via memory methods to enable low-shot learning from others without their policies or rewards, yielding longer episodes in tested environments.

  • On the Importance of Multistability for Horizon Generalization in Reinforcement Learning cs.LG · 2026-05-12 · unverdicted · none · ref 33 · internal anchor

    Multistability is necessary for temporal horizon generalization in POMDPs, sufficient in simple tasks along with transient dynamics in complex ones, while monostable parallelizable RNNs like SSMs and gated linear RNNs fail by construction.

  • Neural Information Causality quant-ph · 2026-05-10 · unverdicted · none · ref 19 · internal anchor

    Neural-IC separates embedding inequalities from capacity bounds in query-separated computations, with one-bit RAC benchmarks and CHSH-layer stability selecting the Tsirelson threshold for quantum enhancements.

  • Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval stat.ML · 2026-05-06 · unverdicted · none · ref 32 · internal anchor

    Winner-take-all linear memory capacity scales as d² ~ n log n due to extreme values; listwise retrieval via Tail-Average Margin yields d² ~ n with exact asymptotic theory.

  • Screening Is Enough cs.LG · 2026-04-01 · unverdicted · none · ref 25 · internal anchor

    Multiscreen replaces softmax attention with screening to provide absolute query-key relevance, resulting in models with 30% fewer parameters that maintain stable performance at long contexts.

  • Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach cs.LG · 2025-02-07 · unverdicted · none · ref 65 · internal anchor

    A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.

  • Concrete Problems in AI Safety cs.AI · 2016-06-21 · accept · none · ref 65 · internal anchor

    The paper categorizes five concrete AI safety problems arising from flawed objectives, costly evaluation, and learning dynamics.

  • Phasor Memory Networks: Stable Backpropagation Through Time for Scalable Explicit Memory cs.LG · 2026-05-13 · unverdicted · none · ref 6 · internal anchor

    PMNet uses unitary phasor dynamics and hierarchical anchors to make explicit memory stable for long sequences, matching a 3x larger Mamba model on long-context robustness with a 119M parameter network.

  • The Position Curse: LLMs Struggle to Locate the Last Few Items in a List cs.LG · 2026-05-08 · unverdicted · none · ref 7 · internal anchor

    LLMs exhibit the Position Curse, with backward position retrieval in lists lagging far behind forward retrieval, showing only partial gains from PosBench fine-tuning.

  • Borrowed Geometry: Computational Reuse of Frozen Text-Pretrained Transformer Weights Across Modalities cs.LG · 2026-05-01 · unverdicted · none · ref 8 · internal anchor

    Frozen text-pretrained transformer weights transfer across modalities through a thin interface, achieving SOTA on a robotic task and parity on decision-making with far fewer trainable parameters.

  • Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning cs.LG · 2026-04-23 · conditional · none · ref 19 · internal anchor

    Memory tokens are required for non-trivial performance in adaptive Universal Transformers on Sudoku-Extreme, with 8-32 tokens yielding stable 57% exact-match accuracy while trading off against ponder depth.

  • Ask Only When Needed: Proactive Retrieval from Memory and Skills for Experience-Driven Lifelong Agents cs.CL · 2026-04-22 · unverdicted · none · ref 18 · internal anchor

    ProactAgent learns a proactive retrieval policy via reinforcement learning on paired task continuations, improving lifelong agent performance and cutting retrieval overhead on SciWorld, AlfWorld, and StuLife.

  • Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges cs.LG · 2021-04-27 · accept · none · ref 34 · internal anchor

    Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.

  • Universal Transformers cs.CL · 2018-07-10 · unverdicted · none · ref 13 · internal anchor

    Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

  • TIDE: Every Layer Knows the Token Beneath the Context cs.CL · 2026-05-07 · unverdicted · none · ref 81 · internal anchor

    TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.

  • FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation cs.LG · 2026-05-06 · unverdicted · none · ref 6 · 2 links · internal anchor

    FAAST performs test-time supervised adaptation by analytically deriving fast weights from examples in one forward pass, matching backprop performance with over 90% less adaptation time and up to 95% memory savings versus memory-based methods.

  • Graph Memory Transformer (GMT) cs.LG · 2026-04-26 · unverdicted · none · ref 15 · internal anchor

    Graph Memory Transformer (GMT) swaps dense FFN sublayers for a graph of 128 centroids and a learned 128x128 transition matrix per block, yielding a 82M-parameter decoder-only LM that trains stably but trails a 103M dense baseline in perplexity.

  • Neural Computers cs.LG · 2026-04-07 · unverdicted · none · ref 11 · internal anchor

    Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives from traces.

  • Event-Centric World Modeling with Memory-Augmented Retrieval for Embodied Decision-Making cs.LG · 2026-04-08 · unverdicted · none · ref 8 · internal anchor

    An event-centric framework encodes environments as semantic events and retrieves weighted prior maneuvers from a knowledge bank to enable interpretable, physics-aware decision-making for UAVs.

  • A PyTorch Library of Turing-Complete Neural Networks cs.LG · 2026-05-03 · unverdicted · none · ref 2 · internal anchor

    A PyTorch package constructs neural networks that exactly simulate given Turing machines using transformer and recurrent architectures derived from prior theoretical results.