pith. sign in

hub Mixed citations

Memory Networks

Mixed citation behavior. Most common role is background (60%).

23 Pith papers citing it
Background 60% of classified citations
abstract

We describe a new class of learning models called memory networks. Memory networks reason with inference components combined with a long-term memory component; they learn how to use these jointly. The long-term memory can be read and written to, with the goal of using it for prediction. We investigate these models in the context of question answering (QA) where the long-term memory effectively acts as a (dynamic) knowledge base, and the output is a textual response. We evaluate them on a large-scale QA task, and a smaller, but more complex, toy task generated from a simulated world. In the latter, we show the reasoning power of such models by chaining multiple supporting sentences to answer questions that require understanding the intension of verbs.

hub tools

citation-role summary

background 4 baseline 1

citation-polarity summary

representative citing papers

REALM: Retrieval-Augmented Language Model Pre-Training

cs.CL · 2020-02-10 · accept · novelty 8.0

REALM augments language-model pre-training with an unsupervised retriever over Wikipedia documents and reports 4-16% absolute gains on open-domain QA benchmarks over prior implicit and explicit knowledge methods.

Reformer: The Efficient Transformer

cs.LG · 2020-01-13 · accept · novelty 8.0

Reformer matches standard Transformer accuracy on long sequences while using far less memory and running faster via LSH attention and reversible residual layers.

Graph Retention Networks for Dynamic Graphs

cs.LG · 2024-11-18 · unverdicted · novelty 7.0

Graph Retention Networks extend retention to dynamic graphs to enable parallelizable training, O(1) inference, and chunkwise long-term training while delivering competitive performance with major efficiency gains.

Graph Attention Networks

stat.ML · 2017-10-30 · accept · novelty 7.0

Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.

Titans: Learning to Memorize at Test Time

cs.LG · 2024-12-31 · unverdicted · novelty 6.0

Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.

3D Reconstruction with Spatial Memory

cs.CV · 2024-08-28 · unverdicted · novelty 6.0

Spann3R uses a learned spatial memory to regress per-image pointmaps directly in a shared global coordinate system, removing the need for optimization-based alignment after per-pair predictions.

Cognitive Architectures for Language Agents

cs.AI · 2023-09-05 · accept · novelty 6.0

CoALA is a modular cognitive architecture for language agents that organizes memory components, action spaces for internal and external interaction, and a generalized decision-making loop to support more systematic development of capable agents.

TIDE: Every Layer Knows the Token Beneath the Context

cs.CL · 2026-05-07 · unverdicted · novelty 5.0

TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.

citing papers explorer

Showing 23 of 23 citing papers.

  • REALM: Retrieval-Augmented Language Model Pre-Training cs.CL · 2020-02-10 · accept · none · ref 18 · internal anchor

    REALM augments language-model pre-training with an unsupervised retriever over Wikipedia documents and reports 4-16% absolute gains on open-domain QA benchmarks over prior implicit and explicit knowledge methods.

  • Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets cs.LG · 2022-01-06 · unverdicted · none · ref 16

    Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.

  • Reformer: The Efficient Transformer cs.LG · 2020-01-13 · accept · none · ref 21

    Reformer matches standard Transformer accuracy on long sequences while using far less memory and running faster via LSH attention and reversible residual layers.

  • Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models cs.CV · 2026-05-18 · unverdicted · none · ref 43 · internal anchor

    Incantation is the first video world model to use per-frame natural language conditioning for simultaneous multi-entity control and concept-level cross-entity transfer in interactive video generation.

  • Beyond Detection: A Structure-Aware Framework for Scene Text Tracking cs.CV · 2026-05-17 · unverdicted · none · ref 135 · internal anchor

    SymTrack is the first systematic detection-free framework for scene text tracking that constructs benchmarks from video text spotting datasets and reports up to 11.97% AUC gains over prior trackers.

  • Graph Retention Networks for Dynamic Graphs cs.LG · 2024-11-18 · unverdicted · none · ref 41 · internal anchor

    Graph Retention Networks extend retention to dynamic graphs to enable parallelizable training, O(1) inference, and chunkwise long-term training while delivering competitive performance with major efficiency gains.

  • Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval stat.ML · 2026-05-06 · unverdicted · none · ref 33

    Winner-take-all linear memory capacity scales as d² ~ n log n due to extreme values; listwise retrieval via Tail-Average Margin yields d² ~ n with exact asymptotic theory.

  • LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory cs.CL · 2024-10-14 · unverdicted · none · ref 96

    LongMemEval benchmarks long-term memory in chat assistants, revealing 30% accuracy drops across sustained interactions and proposing indexing-retrieval-reading optimizations that boost performance.

  • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks cs.CL · 2020-05-22 · accept · none · ref 68

    RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.

  • Graph Attention Networks stat.ML · 2017-10-30 · accept · none · ref 19

    Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.

  • Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling cs.LG · 2025-08-22 · unverdicted · none · ref 71 · internal anchor

    In a cellular automata rule-inference task designed to block memorization, neural models achieve high next-step accuracy but accuracy falls sharply with longer reasoning chains; depth, recurrence, memory, and test-time compute extend the reachable depth but do not remove the bound.

  • MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent cs.CL · 2025-07-03 · unverdicted · none · ref 31 · internal anchor

    MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.

  • Titans: Learning to Memorize at Test Time cs.LG · 2024-12-31 · unverdicted · none · ref 115 · internal anchor

    Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.

  • 3D Reconstruction with Spatial Memory cs.CV · 2024-08-28 · unverdicted · none · ref 83 · internal anchor

    Spann3R uses a learned spatial memory to regress per-image pointmaps directly in a shared global coordinate system, removing the need for optimization-based alignment after per-pair predictions.

  • Cognitive Architectures for Language Agents cs.AI · 2023-09-05 · accept · none · ref 83 · internal anchor

    CoALA is a modular cognitive architecture for language agents that organizes memory components, action spaces for internal and external interaction, and a generalized decision-making loop to support more systematic development of capable agents.

  • Compressive Transformers for Long-Range Sequence Modelling cs.LG · 2019-11-13 · unverdicted · none · ref 131 · internal anchor

    Compressive Transformer sets new records on WikiText-103 (17.1 ppl) and Enwik8 (0.97 bpc) via memory compression and introduces the PG-19 long-range language benchmark.

  • The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents cs.CL · 2026-05-08 · unverdicted · none · ref 10

    Expanded recall in LLM agents erodes cooperative intent in multi-agent social dilemmas, observed in 18 of 28 model-game settings.

  • Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts cs.CL · 2026-04-09 · conditional · none · ref 91

    Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-parameter model on the full dataset.

  • Mela: Test-Time Memory Consolidation based on Transformation Hypothesis cs.CL · 2026-05-11 · unverdicted · none · ref 23

    Mela is a Transformer variant with a dual-frequency Hierarchical Memory Module and MemStack that performs test-time memory consolidation, outperforming baselines on long contexts.

  • TIDE: Every Layer Knows the Token Beneath the Context cs.CL · 2026-05-07 · unverdicted · none · ref 79

    TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.

  • FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation cs.LG · 2026-05-06 · unverdicted · none · ref 19 · 2 links

    FAAST performs test-time supervised adaptation by analytically deriving fast weights from examples in one forward pass, matching backprop performance with over 90% less adaptation time and up to 95% memory savings versus memory-based methods.

  • Evolutionary Algorithm for Sinhala to English Translation cs.CL · 2019-07-06 · unverdicted · none · ref 15 · internal anchor

    An evolutionary algorithm identifies meanings in Sinhala sentences to produce English translations that are then grammatically corrected, reported to yield accurate results.

  • Machine Reading Comprehension: a Literature Review cs.CL · 2019-06-30 · unverdicted · none · ref 64 · internal anchor

    A 2019 survey of machine reading comprehension corpora and methods.