Model-Free Episodic Control

Charles Blundell , Benigno Uria , Alexander Pritzel , Yazhe Li , Avraham Ruderman , Joel Z Leibo , Jack Rae , Daan Wierstra

show 1 more author

Demis Hassabis

Authors on Pith no claims yet

classification 📊 stat.ML cs.LGq-bio.NC

keywords episodiclearningalgorithmscontroldeephighlyreinforcementrewarding

0 comments

read the original abstract

State of the art deep reinforcement learning algorithms take many millions of interactions to attain human-level performance. Humans, on the other hand, can very quickly exploit highly rewarding nuances of an environment upon first discovery. In the brain, such rapid learning is thought to depend on the hippocampus and its capacity for episodic memory. Here we investigate whether a simple model of hippocampal episodic control can learn to solve difficult sequential decision-making tasks. We demonstrate that it not only attains a highly rewarding strategy significantly faster than state-of-the-art deep reinforcement learning algorithms, but also achieves a higher overall reward on some of the more challenging domains.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation
cs.LG 2026-05 unverdicted novelty 6.0

FAAST analytically compiles labeled examples into fast weights via a single forward pass, matching backprop adaptation performance with over 90% less time and up to 95% less memory than memory-based methods.
Information as Structural Alignment: A Dynamical Theory of Continual Learning
cs.LG 2026-04 unverdicted novelty 6.0

IBF achieves near-zero forgetting and positive backward transfer in continual learning by driving configurations toward coherence through motion and modification dynamics without storing raw data.
BrainMem: Brain-Inspired Evolving Memory for Embodied Agent Task Planning
cs.RO 2026-03 unverdicted novelty 6.0

BrainMem equips LLM-based embodied planners with working, episodic, and semantic memory that evolves interaction histories into retrievable knowledge graphs and guidelines, raising success rates on long-horizon 3D benchmarks.
FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation
cs.LG 2026-05 unverdicted novelty 5.0

FAAST performs test-time supervised adaptation by analytically deriving fast weights from examples in one forward pass, matching backprop performance with over 90% less adaptation time and up to 95% memory savings ver...
Artifacts as Memory Beyond the Agent Boundary
cs.AI 2026-04 unverdicted novelty 5.0

Artifacts in the environment can reduce the memory an RL agent needs to represent its history, as shown by a mathematical proof and experiments with spatial paths.