Exploratory Experience Shapes the Geometry of Predictive Representations
Pith reviewed 2026-06-29 09:36 UTC · model grok-4.3
The pith
Exploratory behavior produces more spatially organized predictive representations that preserve maze transitions in both agents and mice.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Exploratory agents develop representations that are more spatially organized and better preserve the structure of maze transitions in latent space. In contrast, exploitative agents learn less organized representations. More exploratory mice show representational geometries that closely match those of exploratory agents.
What carries the argument
Predictive-coding perception model updated from the agent's own trajectories, predicting future maze states and reward probability, with a controllable parameter selecting actions by expected information gain during exploration or by predicted reward during exploitation.
Load-bearing premise
That training the same predictive-coding model on mouse trajectories produces representations directly comparable to those from the artificial agent, with differences in mouse visitation patterns reflecting the same exploratory versus exploitative regimes as the agent's parameter.
What would settle it
If training the model on trajectories from the most exploratory mice produces latent geometries that fail to match the spatial organization and transition preservation seen in exploratory agents.
Figures
read the original abstract
Active sensing links behavior and learning through an action-perception loop: actions determine the observations used to update internal predictive models of perception, which subsequently guide the next actions. Predictive-coding frameworks provide a natural way to model this process, since internal representations are continuously updated to predict future observations. Here, we ask how exploratory and exploitative behavioral strategies shape these internal predictive representations. We build an online learning agent in a tree-like maze with a controllable parameter regulating the balance between exploratory and exploitative regimes. The agent updates a predictive-coding-based perception model from experience generated by its own behavior. The model predicts both future maze states and reward probability, allowing the agent to select actions either by expected information gain during exploration or by predicted reward during exploitation. We show that the resulting internal predictive representations depend strongly on the agent's behavioral regime. Exploratory agents develop representations that are more spatially organized and better preserve the structure of maze transitions in latent space. In contrast, exploitative agents learn less organized representations. We then train this predictive model on natural trajectories of water-deprived mice navigating the same maze and compare the resulting representations with those learned from agent trajectories. More exploratory mice show representational geometries that closely match those of exploratory agents, whereas mice with more restricted visitation patterns resemble reward-driven, exploitative agents. Together, these findings suggest that exploration enables predictive models to form generalized internal representations by organizing latent space around both spatial location and transition context in artificial agents and animals.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a predictive-coding agent in a tree-like maze develops more spatially organized latent representations that better preserve transition structure when its behavior is biased toward exploration (via a controllable balance parameter) rather than exploitation; the same model trained on water-deprived mouse trajectories yields geometries that align with the exploratory-agent regime for mice showing broader visitation and with the exploitative regime for mice showing restricted visitation, suggesting that exploratory experience shapes generalized internal predictive representations in both artificial agents and animals.
Significance. If the central comparison holds after controlling for experienced transition statistics, the result would link behavioral regime directly to the geometry of predictive representations and provide a concrete, testable bridge between controllable artificial agents and biological data.
major comments (1)
- [Results on mouse–agent representational comparison] The central mouse–agent alignment claim (abstract and Results on mouse trajectories) rests on the assumption that post-hoc partitioning of mice by visitation metric isolates the same exploratory/exploitative regime as the agent’s controllable balance parameter. The abstract gives no indication that training data volume, visit counts, or empirical transition matrices were equalized across conditions before geometry comparison; if the reported metrics (spatial organization, transition preservation) are sensitive to the distribution of experienced transitions rather than the regime per se, the alignment could be an artifact of unequal coverage.
minor comments (2)
- [Methods] Clarify the exact definition and units of the “balance parameter” and how it is held fixed versus varied across agent runs.
- [Results] Specify the precise geometry metrics (e.g., which distance or correlation measure quantifies “spatial organization” and “transition preservation”) and report effect sizes with confidence intervals.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight an important methodological consideration for the mouse–agent comparison. We address the concern point-by-point below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Results on mouse–agent representational comparison] The central mouse–agent alignment claim (abstract and Results on mouse trajectories) rests on the assumption that post-hoc partitioning of mice by visitation metric isolates the same exploratory/exploitative regime as the agent’s controllable balance parameter. The abstract gives no indication that training data volume, visit counts, or empirical transition matrices were equalized across conditions before geometry comparison; if the reported metrics (spatial organization, transition preservation) are sensitive to the distribution of experienced transitions rather than the regime per se, the alignment could be an artifact of unequal coverage.
Authors: We agree that equalizing experienced transition statistics is essential to isolate the effect of behavioral regime from coverage differences. The visitation metric used for partitioning directly operationalizes the regime (broader vs. restricted exploration), and the agent’s controllable parameter produces analogous differences in coverage. However, the referee correctly notes that the abstract does not explicitly state controls for data volume or transition matrices. In the revision we will: (1) report visit counts, trajectory numbers, and transition-matrix statistics for each mouse group in the abstract and Results; (2) add a supplementary analysis that subsamples mouse trajectories to match the empirical transition distributions and total visit counts of the agent conditions as closely as possible, then recompute the geometry metrics (spatial organization and transition preservation); and (3) verify whether the mouse–agent alignment persists under these matched conditions. If the alignment remains, it supports that regime per se shapes the representations; if not, we will qualify the claim accordingly. This control directly addresses the potential artifact raised. revision: yes
Circularity Check
No circularity: representations shaped by regime-specific trajectories; mouse comparison is external validation
full rationale
The paper constructs an agent whose behavior parameter controls trajectory statistics, then trains the same predictive model on those trajectories and measures geometry metrics on the resulting latents. This is an empirical demonstration that different input distributions produce different geometries, not a self-definitional loop or a fitted parameter renamed as prediction. The mouse analysis partitions real trajectories post-hoc by a visitation metric and applies the identical model; no equations or self-citations are shown that would make the geometry metrics reduce to the regime parameter by construction. The derivation chain is self-contained against the external mouse data and does not rely on load-bearing self-citation.
Axiom & Free-Parameter Ledger
free parameters (1)
- balance parameter between exploration and exploitation
axioms (1)
- domain assumption Predictive-coding frameworks provide a natural way to model the action-perception loop
Reference graph
Works this paper leans on
-
[1]
The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127-138
Karl Friston. The free-energy principle: a unified brain theory?Nature Reviews Neuroscience, 11(2): 127–138, February 2010. ISSN 1471-0048. doi: 10.1038/nrn2787. URL https://doi.org/10.1038/ nrn2787
-
[2]
Daniel Y . Little and Friedrich T. Sommer. Learning and exploration in action-perception loops.Frontiers in Neural Circuits, V olume 7 - 2013, 2013. ISSN 1662-5110. doi: 10.3389/fncir.2013.00037. URLhttps:// www.frontiersin.org/journals/neural-circuits/articles/10.3389/fncir.2013.00037
-
[3]
Theoretical perspectives on active sensing.Current Opinion in Behavioral Sciences, 11:100–108, 2016
Scott Cheng-Hsin Yang, Daniel M Wolpert, and Máté Lengyel. Theoretical perspectives on active sensing.Current Opinion in Behavioral Sciences, 11:100–108, 2016. ISSN 2352-1546. doi: https:// doi.org/10.1016/j.cobeha.2016.06.009. URL https://www.sciencedirect.com/science/article/ pii/S2352154616301255. Computational modeling
-
[4]
Adam Linson, Andy Clark, Subramanian Ramamoorthy, and Karl Friston. The active inference approach to ecological perception: General information dynamics for natural and artificial embodied cognition. Frontiers in Robotics and AI, V olume 5 - 2018, 2018. ISSN 2296-9144. doi: 10.3389/frobt.2018.00021. URL https://www.frontiersin.org/journals/robotics-and-ai...
-
[5]
Cognitive maps in rats and men.Psychological Review, 55(4):189–208, 1948
Edward C Tolman. Cognitive maps in rats and men.Psychological Review, 55(4):189–208, 1948. doi: 10.1037/h0061626. URLhttps://psycnet.apa.org/record/1949-00103-001
-
[6]
Clarendon Press, Oxford, UK,
John O’Keefe and Lynn Nadel.The Hippocampus as a Cognitive Map. Clarendon Press, Oxford, UK,
-
[7]
Dileep George, Rajeev V . Rikhye, Nishad Gothoskar, J. Swaroop Guntupalli, Antoine Dedieu, and Miguel Lázaro-Gredilla. Clone-structured graph representations enable flexible learning and vicarious evaluation of cognitive maps.Nature Communications, 12(1):2392, 2021. doi: 10.1038/s41467-021-22559-5. URL https://www.nature.com/articles/s41467-021-22559-5
-
[8]
Swaroop Guntupalli, Guangyao Zhou, Carter Wendelken, Miguel Lázaro- Gredilla, and Dileep George
Rajkumar Vasudeva Raju, J. Swaroop Guntupalli, Guangyao Zhou, Carter Wendelken, Miguel Lázaro- Gredilla, and Dileep George. Space is a latent sequence: A theory of the hippocampus.Science Advances, 10(31):eadm8470, 2024. doi: 10.1126/sciadv.adm8470. URL https://www.science.org/doi/abs/ 10.1126/sciadv.adm8470
-
[9]
Daniel Levenstein, Aleksei Efremov, Roy Henha Eyono, Adrien Peyrache, and Blake Richards. Sequential predictive learning is a unifying theory for hippocampal representation and replay.bioRxiv, 2024. doi: 10.1101/2024.04.28.591528. URL https://www.biorxiv.org/content/early/2024/04/29/2024. 04.28.591528
-
[10]
Matthew Rosenberg, Tony Zhang, Pietro Perona, and Markus Meister. Mice in a labyrinth show rapid learning, sudden insight, and efficient exploration.eLife, 10:e66175, jul 2021. ISSN 2050-084X. doi: 10.7554/eLife.66175. URLhttps://doi.org/10.7554/eLife.66175
-
[11]
Rajesh P. N. Rao and Dana H. Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature Neuroscience, 2(1):79–87, January 1999. doi: 10.1038/4580
-
[12]
Karl Friston. A theory of cortical responses.Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1456):815–836, April 2005. doi: 10.1098/rstb.2005.1622. 10
-
[13]
Active sensing with predictive coding and uncertainty minimization.Patterns, 5(6):100983, June 2024
Abdelrahman Sharafeldin, Nabil Imam, and Hannah Choi. Active sensing with predictive coding and uncertainty minimization.Patterns, 5(6):100983, June 2024. ISSN 2666-3899. doi: 10.1016/j.patter.2024. 100983. URLhttps://doi.org/10.1016/j.patter.2024.100983
-
[14]
Infomax control of eye movements.IEEE Transactions on Autonomous Mental Development, 2(2):91–107, 2010
Nicholas J Butko and Javier R Movellan. Infomax control of eye movements.IEEE Transactions on Autonomous Mental Development, 2(2):91–107, 2010
2010
-
[15]
Rajesh P. N. Rao, Dimitrios C. Gklezakos, and Vishwas Sathish. Active predictive coding: A unifying neural model for active perception, compositional learning, and hierarchical planning.Neural Computation, 36(1):1–32, 12 2023. ISSN 0899-7667. doi: 10.1162/neco_a_01627. URL https://doi.org/10.1162/ neco_a_01627
-
[16]
Karl Friston, Rick Adams, Laurent Perrinet, and Michael Breakspear. Perceptions as hypotheses: Saccades as experiments.Frontiers in Psychology, V olume 3 - 2012, 2012. ISSN 1664-1078. doi: 10.3389/fpsyg. 2012.00151. URL https://www.frontiersin.org/journals/psychology/articles/10.3389/ fpsyg.2012.00151
-
[17]
Alexander Tschantz, Beren Millidge, Anil K. Seth, and Christopher L. Buckley. Reinforcement learning through active inference. 2020. URLhttps://arxiv.org/abs/2002.12636
-
[18]
Zoe C. Ashwood, Nicholas A. Roy, Iris R. Stone, Anne E. Urai, Anne K. Churchland, Alexandre Pouget, Jonathan W. Pillow, and The International Brain Laboratory. Mice alternate between discrete strategies during perceptual decision-making.Nature Neuroscience, 25(2):201–212, February 2022. ISSN 1546-1726. doi: 10.1038/s41593-021-01007-z. URLhttps://doi.org/1...
-
[19]
Multi-intention inverse q-learning for interpretable behavior representation.Transactions on Machine Learning Research, 2024
Hao Zhu, Brice De La Crompe, Gabriel Kalweit, Artur Schneider, Maria Kalweit, Ilka Diester, and Joschka Boedecker. Multi-intention inverse q-learning for interpretable behavior representation.Transactions on Machine Learning Research, 2024. URLhttps://openreview.net/forum?id=hrKHkmLUFk
2024
-
[20]
Markowitz, and Anqi Wu
Jingyang Ke, Feiyang Wu, Jiyi Wang, Jeffrey E. Markowitz, and Anqi Wu. Inverse reinforcement learning with switching rewards and history dependency for characterizing animal behaviors. InProceedings of the 42nd International Conference on Machine Learning, 2025. URL https://openreview.net/forum? id=yUxVZBYaQA. ICML 2025 poster
2025
-
[21]
Improving generalization for temporal difference learning: The successor representation
Peter Dayan. Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5(4):613–624, 1993. doi: 10.1162/neco.1993.5.4.613
-
[22]
Kimberly L. Stachenfeld, Matthew M. Botvinick, and Samuel J. Gershman. The hippocampus as a predictive map.Nature Neuroscience, 20(11):1643–1653, November 2017. ISSN 1546-1726. doi: 10.1038/nn.4650. URLhttps://doi.org/10.1038/nn.4650
-
[23]
Stefano Recanatesi, Matthew Farrell, Guillaume Lajoie, Sophie Deneve, Mattia Rigotti, and Eric Shea-Brown. Predictive learning as a network mechanism for extracting low-dimensional latent space representations.Nature Communications, 12(1):1417, March 2021. ISSN 2041-1723. doi: 10.1038/s41467-021-21696-1. URLhttps://doi.org/10.1038/s41467-021-21696-1
-
[24]
Andrea Banino, Caswell Barry, Benigno Uria, Charles Blundell, Timothy Lillicrap, Piotr Mirowski, Alexander Pritzel, Martin J. Chadwick, Thomas Degris, Joseph Modayil, Greg Wayne, Hubert Soyer, Fabio Viola, Brian Zhang, Ross Goroshin, Neil Rabinowitz, Razvan Pascanu, Charlie Beattie, Stig Petersen, Amir Sadik, Stephen Gaffney, Helen King, Koray Kavukcuoglu...
-
[25]
James C.R. Whittington, Timothy H. Muller, Shirley Mark, Guifen Chen, Caswell Barry, Neil Burgess, and Timothy E.J. Behrens. The tolman-eichenbaum machine: Unifying space and relational memory through generalization in the hippocampal formation.Cell, 183(5):1249–1263.e23, 2020. ISSN 0092-8674. doi: https://doi.org/10.1016/j.cell.2020.10.024. URL https://w...
-
[26]
Di Tullio, Spencer Rooke, and Vijay Balasubramanian
Zhaoze Wang, Ronald W. Di Tullio, Spencer Rooke, and Vijay Balasubramanian. Time makes space: Emergence of place fields in networks encoding temporally continuous sensory experiences.bioRxiv, page 2024.08.11.607484, July 2025. doi: 10.1101/2024.08.11.607484. Preprint
-
[27]
Latent representations in hippocampal network model co-evolve with behavioral exploration of task structure.Nature Communications, 15(1):687, January 2024
Ian Cone and Claudia Clopath. Latent representations in hippocampal network model co-evolve with behavioral exploration of task structure.Nature Communications, 15(1):687, January 2024. ISSN 2041-
2024
-
[28]
URLhttps://doi.org/10.1038/s41467-024-44871-6
doi: 10.1038/s41467-024-44871-6. URLhttps://doi.org/10.1038/s41467-024-44871-6. 11
-
[29]
Predictive sequence learning in the hippocampal formation.Neuron, 112(15):2645–2658.e4, 2024
Yusi Chen, Huanqiu Zhang, Mia Cameron, and Terrence Sejnowski. Predictive sequence learning in the hippocampal formation.Neuron, 112(15):2645–2658.e4, 2024. ISSN 0896-6273. doi: https:// doi.org/10.1016/j.neuron.2024.05.024. URL https://www.sciencedirect.com/science/article/ pii/S0896627324003714
-
[30]
David Dupret, Joseph O’Neill, Barty Pleydell-Bouverie, and Jozsef Csicsvari. The reorganization and reactivation of hippocampal maps predict spatial memory performance.Nature Neuroscience, 13(8): 995–1002, August 2010. doi: 10.1038/nn.2599
-
[31]
Huanqiu Zhang, P. Dylan Rich, Albert K. Lee, and Tatyana O. Sharpee. Hippocampal spatial representations exhibit a hyperbolic geometry that expands with experience.Nature Neuroscience, 26(1):131–139, January 2023. ISSN 1546-1726. doi: 10.1038/s41593-022-01212-4. URL https://doi.org/10.1038/ s41593-022-01212-4
-
[32]
Wei Guo, Jie J. Zhang, Jonathan P. Newman, and Matthew A. Wilson. Latent learning drives sleep- dependent plasticity in distinct ca1 subpopulations.Cell Reports, 43(12):115028, December 2024. doi: 10.1016/j.celrep.2024.115028
-
[33]
Nieh, Manuel Schottdorf, Nicolas W
Edward H. Nieh, Manuel Schottdorf, Nicolas W. Freeman, Ryan J. Low, Sam Lewallen, Sue Ann Koay, Lucas Pinto, Jeffrey L. Gauthier, Carlos D. Brody, and David W. Tank. Geometry of abstract learned knowledge in the hippocampus.Nature, 595(7865):80–84, July 2021. ISSN 1476-4687. doi: 10.1038/ s41586-021-03652-7. URLhttps://doi.org/10.1038/s41586-021-03652-7
-
[34]
Shinya Nakai, Takuma Kitanishi, and Kenji Mizuseki. Distinct manifold encoding of navigational information in the subiculum and hippocampus.Science Advances, 10(5):eadi4471, 2024. doi: 10.1126/ sciadv.adi4471. URLhttps://www.science.org/doi/abs/10.1126/sciadv.adi4471. A Methods A.1 Maze environment The maze is represented as a full binary tree with N= 127...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.