Human-level reinforcement learning through theory-based modeling, exploration, and planning

Pedro A Tsividis, Joao Loula, Jake Burga, Nathan Foss, Andres Campero, Thomas Pouncy, Samuel J Gershman, Joshua B Tenenbaum · 2021 · arXiv 2107.12544

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Learning POMDP World Models from Observations with Language-Model Priors

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Pinductor leverages language-model priors to learn POMDP world models from limited trajectories, matching privileged-access methods in performance and exceeding tabular baselines in sample efficiency.

Brain alignment of reasoning and action representations from vision-language and action models during naturalistic gameplay

q-bio.NC · 2026-05-19 · unverdicted · novelty 6.0

VLMs and LAMs outperform RL baselines in voxel-wise brain encoding during gameplay, with LAMs showing prompt-asymmetric organization (27% unique action vs -5% unique reasoning) strongest in frontal-motor cortex.

Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

Frontier LRMs match human game-learning behavior and predict fMRI signals an order of magnitude better than RL or Bayesian agents because of their in-context game-state representations.

citing papers explorer

Showing 3 of 3 citing papers.

Learning POMDP World Models from Observations with Language-Model Priors cs.LG · 2026-05-13 · unverdicted · none · ref 30
Pinductor leverages language-model priors to learn POMDP world models from limited trajectories, matching privileged-access methods in performance and exceeding tabular baselines in sample efficiency.
Brain alignment of reasoning and action representations from vision-language and action models during naturalistic gameplay q-bio.NC · 2026-05-19 · unverdicted · none · ref 8
VLMs and LAMs outperform RL baselines in voxel-wise brain encoding during gameplay, with LAMs showing prompt-asymmetric organization (27% unique action vs -5% unique reasoning) strongest in frontal-motor cortex.
Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners cs.AI · 2026-05-08 · unverdicted · none · ref 4
Frontier LRMs match human game-learning behavior and predict fMRI signals an order of magnitude better than RL or Bayesian agents because of their in-context game-state representations.

Human-level reinforcement learning through theory-based modeling, exploration, and planning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer