pith. machine review for the scientific record. sign in

arxiv: 2605.08019 · v1 · submitted 2026-05-08 · 💻 cs.AI · q-bio.NC

Recognition: no theorem link

Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:50 UTC · model grok-4.3

classification 💻 cs.AI q-bio.NC
keywords large reasoning modelshuman learningfMRIgame discoverybrain alignmentreinforcement learningin-context representationbehavioral matching
0
0 comments X

The pith

Frontier large reasoning models match human game learning behavior and predict brain activity an order of magnitude better than reinforcement learning agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether frontier large reasoning models can discover rules, revise hypotheses, and plan multi-step actions in novel video games the way humans do. It evaluates these models against reinforcement learning agents and a Bayesian theory-based agent using human gameplay data paired with simultaneous fMRI recordings. The LRMs come closest to human behavioral patterns during discovery phases and their internal states predict brain activity across cortical and subcortical regions far more accurately than the alternatives. Targeted tests show that this brain alignment stems from the models' in-context tracking of game states rather than from planning outputs. A reader would care because the results position current AI systems as potential working models of human learning in complex, naturalistic settings.

Core claim

Frontier LRMs most closely match human behavioral patterns during game discovery and predict brain activity an order of magnitude better than both reinforcement learning alternatives across cortical and subcortical regions, with effects robust to permutation controls. Targeted manipulations further show that brain alignment reflects the model's in-context representation of the game state rather than its downstream planning or reasoning.

What carries the argument

The in-context representation of the game state inside frontier LRMs, which produces both behavioral similarity to humans and superior prediction of concurrent fMRI signals compared with model-free and model-based RL agents.

If this is right

  • LRMs can serve as computational accounts of human learning and decision making in complex naturalistic environments.
  • In-context game-state representation drives human-like hypothesis revision and multi-step planning.
  • Brain alignment occurs across both cortical and subcortical regions and survives permutation controls.
  • The advantage is specific to state representation rather than to downstream reasoning or action selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These models might be used to generate simulated human learning trajectories for testing educational game designs before human trials.
  • If the alignment generalizes beyond games, LRMs could become practical proxies for studying human decision processes in other sequential learning domains.
  • Disrupting state representations inside an LRM should selectively impair brain alignment without necessarily harming raw task performance.

Load-bearing premise

That the superior brain-activity prediction arises specifically from the models' in-context tracking of game states rather than from other unmeasured model properties or from the particular games selected for study.

What would settle it

Demonstrating that a non-LRM model lacking strong in-context state tracking achieves comparable brain-alignment accuracy on the same fMRI dataset, or that randomly scrambling the game-state inputs inside an LRM eliminates its brain-prediction advantage while leaving other capabilities intact.

Figures

Figures reproduced from arXiv: 2605.08019 by Austin Tudor David Andrews, Botos Csaba, Chris Summerfield, Joshua B. Tenenbaum, Laurence Hunt, Marcelo G. Mattar, Momchil Tomov, Rui Ponte Costa, Sreejan Kumar.

Figure 1
Figure 1. Figure 1: VGDL game paradigm. (A) Games are defined by combining game rules with map layouts to produce interactive environments. (B) Example Trial Structure of VGDL-fMRI Dataset. Color denotes game names: ( BAIT , CHASE , HELPER , LEMMINGS , PLAQUE ATTACK , ZELDA ). All participants played the same level progression structure with randomized game order. The subsequent levels reveal new rules incrementally. The Inte… view at source ↗
Figure 2
Figure 2. Figure 2: Multi-turn dialogue format for generating gameplay. At each step, the model receives the current observation as a user message and responds with an action. In the copied-reasoning condition, the model’s hidden reasoning trace is copied into the context as the stated rationale on the next turn. The full conversation history accumulates in the context window, giving the model access to all past observations,… view at source ↗
Figure 3
Figure 3. Figure 3: Performance of LRM and baseline models. Throughout the paper, cross-paradigm comparisons are anchored to two representative frontier LRMs: DeepSeek V4-Pro, the strongest behavioural model (jointly best on the discovery KDE-EMD and on the level-progression curve in this figure), and Qwen3.5-35B-A3B, the strongest brain-encoding model (Figure 5B). The left column here and panel A of [PITH_FULL_IMAGE:figures… view at source ↗
Figure 4
Figure 4. Figure 4: Example reasoning traces across game steps. DeepSeek V3.2 on BAIT (first four levels, up to first win on L3). Bar height shows reasoning trace length (characters) at each step. Alternating white/gray shading indicates levels; dashed vertical lines mark episode boundaries (green = win, red = loss). Green arrows mark first occurrences of novel game interactions, with curated excerpts from the model’s reactio… view at source ↗
Figure 5
Figure 5. Figure 5: Predicting brain activity with frontier LRM embeddings. (A) Encoding accuracy (per-subject best-layer Pearson r, averaged across subjects ± SEM) for RL baselines (DDQN, EfficientZero, HRR) compared to the two anchor frontier LRMs introduced in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Humans rapidly learn abstract knowledge when encountering novel environments and flexibly deploy this knowledge to guide efficient and intelligent action. Can modern AI systems learn and plan in a similar way? We study this question using a dataset of complex human gameplay with concurrent fMRI recordings, in which participants learn novel video games that require rule discovery, hypothesis revision, and multi-step planning. We jointly evaluate models by their ability to play the games, match human learning behavior, and predict brain activity during the same task, comparing a suite of frontier Large Reasoning Models (LRMs) against model-free and model-based deep reinforcement learning agents and a Bayesian theory-based agent. We find that frontier LRMs most closely match human behavioral patterns during game discovery and predict brain activity an order of magnitude better than both reinforcement learning alternatives across cortical and subcortical regions, with effects robust to permutation controls. Through targeted manipulations, we further show that brain alignment reflects the model's in-context representation of the game state rather than its downstream planning or reasoning. Our results establish LRMs as compelling computational accounts of human learning and decision making in complex, naturalistic environments. Project page with interactive replays: https://botcs.github.io/reason-to-play/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that frontier Large Reasoning Models (LRMs) most closely match human behavioral patterns during discovery in novel video games requiring rule learning and planning, and predict concurrent fMRI brain activity an order of magnitude better than model-free/model-based deep RL agents and a Bayesian theory-based agent across cortical and subcortical regions. Effects are robust to permutation controls, and targeted manipulations are presented to show that brain alignment specifically reflects the LRM's in-context representation of the game state rather than downstream planning or reasoning. The work concludes that LRMs provide compelling computational accounts of human learning and decision-making in naturalistic settings.

Significance. If the results hold, this would be a significant contribution by providing multi-metric (behavioral, neural, and task-performance) evidence that frontier AI systems can model human rule discovery and hypothesis revision. The inclusion of permutation controls and the attempt to isolate representational factors via manipulations strengthen the empirical case for using LRMs in cognitive neuroscience, potentially informing both AI development and theories of human learning.

major comments (2)
  1. [Targeted Manipulations] Targeted Manipulations section: the claim that brain alignment specifically reflects in-context game-state representation (rather than general model properties such as scale, embedding dimensionality, or pre-training overlap) is load-bearing for the central interpretation. The description does not detail how the manipulations fully orthogonalize these factors from game-state content, leaving open the possibility that residual differences in representational capacity explain the gap versus RL baselines without being diagnostic of human-like in-context rule learning.
  2. [Results (brain alignment)] Brain prediction results: the 'order of magnitude better' prediction advantage requires explicit reporting of the alignment metric values (e.g., correlation or R^{2} per region), exact statistical tests against baselines, and the permutation control outcomes with effect sizes to substantiate the magnitude and rule out that the advantage arises from unmeasured model differences.
minor comments (2)
  1. [Abstract] Abstract: specify the exact LRMs, number of participants, games, and trials to improve reproducibility and context for the claims.
  2. [Figures and Methods] Figure legends and methods: clarify the precise definition of the brain-alignment metric and how permutation controls were implemented (e.g., what was permuted and how many iterations).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the potential significance of our multi-metric evaluation of frontier LRMs against human learning and brain activity. We address each major comment below and will incorporate revisions to improve clarity and quantitative reporting in the manuscript.

read point-by-point responses
  1. Referee: [Targeted Manipulations] Targeted Manipulations section: the claim that brain alignment specifically reflects in-context game-state representation (rather than general model properties such as scale, embedding dimensionality, or pre-training overlap) is load-bearing for the central interpretation. The description does not detail how the manipulations fully orthogonalize these factors from game-state content, leaving open the possibility that residual differences in representational capacity explain the gap versus RL baselines without being diagnostic of human-like in-context rule learning.

    Authors: We agree that the Targeted Manipulations section would benefit from greater detail on how the controls isolate in-context game-state representations. In the revision we will expand this section to explicitly describe the model variants used, including matched comparisons on scale, embedding dimensionality, and pre-training corpus overlap. We will add ablation tables showing that the brain-alignment advantage is abolished when in-context state representations are disrupted (e.g., via state-shuffling or context-ablation) while holding other model properties constant, and we will report the residual variance explained by capacity differences alone. These additions will make the orthogonalization procedure transparent and strengthen the link to human-like rule learning. revision: yes

  2. Referee: [Results (brain alignment)] Brain prediction results: the 'order of magnitude better' prediction advantage requires explicit reporting of the alignment metric values (e.g., correlation or R^{2} per region), exact statistical tests against baselines, and the permutation control outcomes with effect sizes to substantiate the magnitude and rule out that the advantage arises from unmeasured model differences.

    Authors: We accept that the current text relies on a qualitative description of the advantage. In the revised manuscript we will add a new table (and supplementary figures) reporting mean Pearson r and R^{2} values per cortical and subcortical ROI for each model class, together with the results of paired t-tests (or Wilcoxon tests where appropriate) against the RL and Bayesian baselines, including exact p-values, Cohen’s d effect sizes, and 95% confidence intervals. We will also include the full permutation distributions (10,000 shuffles) with the observed LRM advantage expressed as a percentile and standardized effect size. These quantitative details will allow readers to evaluate the magnitude and robustness of the reported differences directly. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical model comparison with independent controls

full rationale

The paper reports an empirical study comparing frontier LRMs to RL and Bayesian agents on human gameplay data and fMRI recordings. Key results (behavioral matching, brain activity prediction, order-of-magnitude advantage, robustness to permutation controls) are presented as observed outcomes from model evaluations and targeted manipulations rather than derived via equations or self-citations that reduce to the inputs by construction. No load-bearing steps match the enumerated circularity patterns; the in-context representation claim rests on experimental manipulations, not definitional or fitted-input reductions. The analysis is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the assumptions that fMRI signals can be meaningfully compared to model internal states via some linear or correlational mapping and that the chosen games elicit the same cognitive processes in humans and models.

axioms (1)
  • domain assumption fMRI BOLD signals can be aligned with model activations through a shared representational space
    The brain-alignment analysis presupposes a method for mapping model hidden states to cortical and subcortical regions.

pith-pipeline@v0.9.0 · 5546 in / 1169 out tokens · 50629 ms · 2026-05-11T02:50:36.801856+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 6 internal anchors

  1. [1]

    How to grow a mind: Statistics, structure, and abstraction.science, 331(6022):1279–1285, 2011

    Joshua B Tenenbaum, Charles Kemp, Thomas L Griffiths, and Noah D Goodman. How to grow a mind: Statistics, structure, and abstraction.science, 331(6022):1279–1285, 2011

  2. [2]

    Using games to understand the mind.Nature human behaviour, 8(6):1035–1043, 2024

    Kelsey Allen, Franziska Brändle, Matthew Botvinick, Judith E Fan, Samuel J Gershman, Alison Gopnik, Thomas L Griffiths, Joshua K Hartshorne, Tobias U Hauser, Mark K Ho, et al. Using games to understand the mind.Nature human behaviour, 8(6):1035–1043, 2024

  3. [3]

    A video game description language for model-based or interactive learning

    Tom Schaul. A video game description language for model-based or interactive learning. In 2013 IEEE Conference on Computational Inteligence in Games (CIG), pages 1–8. IEEE, 2013

  4. [4]

    Tsividis, Joao Loula, Jake Burga, Nathan Foss, Andres Campero, Thomas Pouncy, Samuel J

    Pedro A Tsividis, Joao Loula, Jake Burga, Nathan Foss, Andres Campero, Thomas Pouncy, Samuel J Gershman, and Joshua B Tenenbaum. Human-level reinforcement learning through theory-based modeling, exploration, and planning.arXiv preprint arXiv:2107.12544, 2021

  5. [5]

    Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

    V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

  6. [6]

    World Models

    David Ha and Jürgen Schmidhuber. World models.arXiv preprint arXiv:1803.10122, 2(3), 2018

  7. [7]

    Mastering Diverse Domains through World Models

    Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023

  8. [8]

    Efficientzero v2: Mastering discrete and continuous control with limited data.arXiv preprint arXiv:2403.00564, 2024

    Shengjie Wang, Shaohuai Liu, Weirui Ye, Jiacheng You, and Yang Gao. Efficientzero v2: Mastering discrete and continuous control with limited data.arXiv preprint arXiv:2403.00564, 2024

  9. [9]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  10. [10]

    On the planning abilities of large language models-a critical investigation.Advances in Neural Information Processing Systems, 36:75993–76005, 2023

    Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, and Subbarao Kambhampati. On the planning abilities of large language models-a critical investigation.Advances in Neural Information Processing Systems, 36:75993–76005, 2023

  11. [11]

    DeepSeek-V3 Technical Report

    Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024

  12. [12]

    Deepseek-v3.2: Pushing the frontier of open large language models, 2025

    DeepSeek-AI. Deepseek-v3.2: Pushing the frontier of open large language models, 2025

  13. [13]

    Qwen3.5 technical report, 2026

    Qwen Team. Qwen3.5 technical report, 2026

  14. [14]

    Adapting vision-language models for evaluating world models.arXiv preprint arXiv:2506.17967, 2025

    Mariya Hendriksen, Tabish Rashid, David Bignell, Raluca Georgescu, Abdelhak Lemkhenter, Katja Hofmann, Sam Devlin, and Sarah Parisot. Adapting vision-language models for evaluating world models.arXiv preprint arXiv:2506.17967, 2025. 10

  15. [15]

    Evaluating world models with llm for decision making.arXiv preprint arXiv:2411.08794, 2024

    Chang Yang, Xinrun Wang, Junzhe Jiang, Qinggang Zhang, and Xiao Huang. Evaluating world models with llm for decision making.arXiv preprint arXiv:2411.08794, 2024

  16. [16]

    ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence

    ARC Foundation. Arc-agi-3: A new challenge for frontier agentic intelligence.arXiv preprint arXiv:2603.24621, 2026

  17. [17]

    Ai gamestore: A benchmark for evaluating ai agents in interactive games

    Zhen Ying et al. Ai gamestore: A benchmark for evaluating ai agents in interactive games. arXiv preprint, 2026

  18. [18]

    Zoom in: An introduction to circuits.Distill, 5(3):e00024–001, 2020

    Chris Olah, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, and Shan Carter. Zoom in: An introduction to circuits.Distill, 5(3):e00024–001, 2020

  19. [19]

    Toy Models of Superposition

    Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, et al. Toy models of superposition.arXiv preprint arXiv:2209.10652, 2022

  20. [20]

    arXiv preprint arXiv:2603.03414 , year=

    Patrick J Mineault, Thomas L Griffiths, and Sean Escola. Cognitive dark matter: Measuring what ai misses.arXiv preprint arXiv:2603.03414, 2026

  21. [21]

    The neural architecture of theory-based reinforcement learning.Neuron, 111(8): 1331–1344, 2023

    Momchil S Tomov, Pedro A Tsividis, Thomas Pouncy, Joshua B Tenenbaum, and Samuel J Gershman. The neural architecture of theory-based reinforcement learning.Neuron, 111(8): 1331–1344, 2023

  22. [22]

    Using goal-driven deep learning models to understand sensory cortex.Nature neuroscience, 19(3):356–365, 2016

    Daniel LK Yamins and James J DiCarlo. Using goal-driven deep learning models to understand sensory cortex.Nature neuroscience, 19(3):356–365, 2016

  23. [23]

    The neural architecture of language: Integrative modeling converges on predictive processing.Proceedings of the National Academy of Sciences, 118(45):e2105646118, 2021

    Martin Schrimpf, Idan Asher Blank, Greta Tuckute, Carina Kauf, Eghbal A Hosseini, Nancy Kanwisher, Joshua B Tenenbaum, and Evelina Fedorenko. The neural architecture of language: Integrative modeling converges on predictive processing.Proceedings of the National Academy of Sciences, 118(45):e2105646118, 2021

  24. [24]

    A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy.Neuron, 98(3):630–644, 2018

    Alexander JE Kell, Daniel LK Yamins, Erica N Shook, Sam V Norman-Haignere, and Josh H McDermott. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy.Neuron, 98(3):630–644, 2018

  25. [25]

    Training neural networks from scratch in a videogame leads to brittle brain encoding.bioRxiv, 2025

    Flora Paugam et al. Training neural networks from scratch in a videogame leads to brittle brain encoding.bioRxiv, 2025. doi:10.1101/2025.11.28.691119

  26. [26]

    Deepseek-v4 technical report, 2026

    DeepSeek-AI. Deepseek-v4 technical report, 2026

  27. [27]

    Language and experience: A computational model of social learning in complex tasks, 2026

    Cédric Colas et al. Language and experience: A computational model of social learning in complex tasks, 2026

  28. [28]

    Deep reinforcement learning with double q-learning

    Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning. InProceedings of the AAAI conference on artificial intelligence, volume 30, 2016

  29. [29]

    V oxelwise encoding models with non-spherical multivariate normal priors.Neuroimage, 197:482–492, 2019

    Anwar O Nunez-Elizalde, Alexander G Huth, and Jack L Gallant. V oxelwise encoding models with non-spherical multivariate normal priors.Neuroimage, 197:482–492, 2019

  30. [30]

    Brain-like object recognition with high-performing shallow recurrent anns.Advances in neural information processing systems, 32, 2019

    Jonas Kubilius, Martin Schrimpf, Kohitij Kar, Rishi Rajalingham, Ha Hong, Najib Majaj, Elias Issa, Pouya Bashivan, Jonathan Prescott-Roy, Kailyn Schmidt, et al. Brain-like object recognition with high-performing shallow recurrent anns.Advances in neural information processing systems, 32, 2019

  31. [31]

    arXiv preprint arXiv:2310.13018 , year=

    Ilia Sucholutsky, Lukas Muttenthaler, Adrian Weller, Andi Peng, Andreea Bobu, Been Kim, Bradley C Love, Erin Grant, Iris Groen, Jascha Achterberg, et al. Getting aligned on representa- tional alignment.arXiv preprint arXiv:2310.13018, 2023

  32. [32]

    Words are all you need? language as an approximation for human similarity judgments.arXiv preprint arXiv:2206.04105, 2022

    Raja Marjieh, Pol Van Rijn, Ilia Sucholutsky, Theodore R Sumers, Harin Lee, Thomas L Griffiths, and Nori Jacoby. Words are all you need? language as an approximation for human similarity judgments.arXiv preprint arXiv:2206.04105, 2022

  33. [33]

    Natural speech reveals the semantic maps that tile human cerebral cortex.Nature, 532 (7600):453–458, 2016

    Alexander G Huth, Wendy A De Heer, Thomas L Griffiths, Frédéric E Theunissen, and Jack L Gallant. Natural speech reveals the semantic maps that tile human cerebral cortex.Nature, 532 (7600):453–458, 2016. 11

  34. [34]

    Fatma Deniz, Anwar O Nunez-Elizalde, Alexander G Huth, and Jack L Gallant. The representa- tion of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality.Journal of Neuroscience, 39(39):7722–7736, 2019

  35. [35]

    Semantic language decoding across participants and stimulus modalities.Current Biology, 35(5):1023–1032, 2025

    Jerry Tang and Alexander G Huth. Semantic language decoding across participants and stimulus modalities.Current Biology, 35(5):1023–1032, 2025

  36. [36]

    Language is primarily a tool for communication rather than thought.Nature, 630(8017):575–586, 2024

    Evelina Fedorenko, Steven T Piantadosi, and Edward AF Gibson. Language is primarily a tool for communication rather than thought.Nature, 630(8017):575–586, 2024

  37. [37]

    Cognitive architectures for language agents.Transactions on Machine Learning Research, 2023

    Theodore Sumers, Shunyu Yao, Karthik R Narasimhan, and Thomas L Griffiths. Cognitive architectures for language agents.Transactions on Machine Learning Research, 2023

  38. [38]

    fmriprep: a robust preprocessing pipeline for functional mri.Nature methods, 16(1):111–116, 2019

    Oscar Esteban, Christopher J Markiewicz, Ross W Blair, Craig A Moodie, A Ilkay Isik, Asier Erramuzpe, James D Kent, Mathias Goncalves, Elizabeth DuPre, Madeleine Snyder, et al. fmriprep: a robust preprocessing pipeline for functional mri.Nature methods, 16(1):111–116, 2019

  39. [39]

    Modeling low-frequency fluctuation and hemodynamic response timecourse in event-related fmri

    Kendrick N Kay, Stephen V David, Ryan J Prenger, Kathleen A Hansen, and Jack L Gallant. Modeling low-frequency fluctuation and hemodynamic response timecourse in event-related fmri. Technical report, Wiley Online Library, 2008

  40. [40]

    The hierarchical cortical organization of human speech processing.Journal of Neuroscience, 37(27):6539–6557, 2017

    Wendy A De Heer, Alexander G Huth, Thomas L Griffiths, Jack L Gallant, and Frédéric E Theunissen. The hierarchical cortical organization of human speech processing.Journal of Neuroscience, 37(27):6539–6557, 2017

  41. [41]

    The voxelwise encoding model framework: a tutorial introduction to fitting encoding models to fmri data

    Tom Dupré la Tour, Matteo Visconti di Oleggio Castello, and Jack L Gallant. The voxelwise encoding model framework: a tutorial introduction to fitting encoding models to fmri data. Imaging Neuroscience, 3:imag_a_00575, 2025

  42. [42]

    Nonparametric estimation from incomplete observations

    Edward L Kaplan and Paul Meier. Nonparametric estimation from incomplete observations. Journal of the American statistical association, 53(282):457–481, 1958

  43. [43]

    Meth- ods for computing the maximum performance of computational models of fmri responses.PLoS computational biology, 15(3):e1006397, 2019

    Agustin Lage-Castellanos, Giancarlo Valente, Elia Formisano, and Federico De Martino. Meth- ods for computing the maximum performance of computational models of fmri responses.PLoS computational biology, 15(3):e1006397, 2019

  44. [44]

    Try Yourself

    Sreejan Kumar, Theodore R Sumers, Takateru Yamakoshi, Ariel Goldstein, Uri Hasson, Ken- neth A Norman, Thomas L Griffiths, Robert D Hawkins, and Samuel A Nastase. Shared functional specialization in transformer-based language models and the human brain.Nature communications, 15(1):5523, 2024. 6 Acknowledgements S.K. is funded by a Leon Levy Fellowship in ...

  45. [45]

    Random exploration actions were hardcoded to choose between actions 0 and 1 only ( np.random.choice([0, 1])), ignoring the actual action-space sizeself.n_actions

    Broken action selection(player.py:155). Random exploration actions were hardcoded to choose between actions 0 and 1 only ( np.random.choice([0, 1])), ignoring the actual action-space sizeself.n_actions. In games with 6 available actions (4 direction, action button, and NO_OP), the agent could never explore four of them

  46. [46]

    The state was computed as a frame difference (current_screen - last_screen) rather than using stacked frames or raw frames

    Incorrect state representation( player.py:266, 316). The state was computed as a frame difference (current_screen - last_screen) rather than using stacked frames or raw frames. Frame differences are unreliable and discard absolute position information, degrading the quality of the state signal available to the network

  47. [47]

    All rewards were clamped to [−1,1] , collapsing the reward structure

    Undiscriminating reward clipping( player.py:306). All rewards were clamped to [−1,1] , collapsing the reward structure. Different reward magnitudes—which carry distinct semantic meaning in VGDL games—became indistinguishable

  48. [48]

    The checkpoint-loading code loaded the same saved weights into the target network twice in consecutive lines, never updating the policy network

    Inconsistent model loading( player.py:98–99). The checkpoint-loading code loaded the same saved weights into the target network twice in consecutive lines, never updating the policy network. As a result, the policy network retained its random initialization after loading a checkpoint

  49. [49]

    The save condition usedor instead ofand: if self.episode_reward > self.best_reward or self.steps % 50000

    Flawed model-update logic(player.py:140). The save condition usedor instead ofand: if self.episode_reward > self.best_reward or self.steps % 50000. This 31 saved the model every 50k steps regardless of performance, overwriting good checkpoints with potentially worse ones

  50. [50]

    Wrong variable reference for target updates(player.py:136). Target-network updates were gated on total cumulativeself.steps rather than an episode-specific counter, causing target updates at inconsistent intervals and breaking learning stability

  51. [51]

    in the wild

    Seed-setting bug( player.py:251). The seed was assigned to the function object rather than called: torch.manual_seed = (self.config.random_seed) instead of torch.manual_seed(self.config.random_seed). The random seed was never actu- ally set, eliminating reproducibility. Beyond these code-level fixes, we made two methodological changes to the DDQN training...

  52. [52]

    theory-coding regions

    that EMPA’s theory-induction posterior carries more brain-aligned structure than DDQN’s fc1 representation in "theory-coding regions" (IFG, FFG, etc). 0.4 0.6 0.8 Behavioural similarity to humans 1 1+EMD 0.000 0.015 0.030 0.045 fMRI Encoding Accuracy (Pearson r) All ROIs DDQN EfficientZero EMPA/HRR V3.2 V4-Flash V4-Pro Q-9B Q-27B Q-35B Q-122B 0.25 0.50 0.75...