A neurally plausible model learns successor representations in partially observable environments

Eszter Vertes; Maneesh Sahani

arxiv: 1906.09480 · v1 · pith:RII6RVZ7new · submitted 2019-06-22 · 📊 stat.ML · cs.LG· cs.NE· q-bio.NC

A neurally plausible model learns successor representations in partially observable environments

Eszter Vertes , Maneesh Sahani This is my paper

Pith reviewed 2026-05-25 17:53 UTC · model grok-4.3

classification 📊 stat.ML cs.LGcs.NEq-bio.NC

keywords successor representationspartially observable environmentsdistributional codesreinforcement learningneural plausibilityvalue functionuncertainty representationnoisy observations

0 comments

The pith

A model extends the distributed distributional code to successor features, enabling neurally plausible reinforcement learning from noisy partial observations where direct policy learning fails.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a model that learns successor representations in partially observable noisy environments by extending a distributed code for uncertainty. This provides a middle ground between model-based and model-free reinforcement learning, supporting fast value computation and adaptation to reward changes. A sympathetic reader would care because many real-world tasks, such as navigation or predator avoidance, involve inferring hidden states from sensory noise rather than observing them directly. The approach shows that such representations can yield effective policies even when standard methods cannot learn them. It grounds the mechanism in a framework intended to match neural computation.

Core claim

We introduce a neurally plausible model using distributional successor features, which builds on the distributed distributional code for the representation and computation of uncertainty, and which allows for efficient value function computation in partially observed environments via the successor representation. We show that distributional successor features can support reinforcement learning in noisy environments in which direct learning of successful policies is infeasible.

What carries the argument

Distributional successor features: an extension of the distributed distributional code that represents uncertainty over future states to compute values under partial observability.

If this is right

Enables efficient value function computation without requiring full state observability.
Supports rapid adaptation when the reward function or goal locations change.
Yields successful policies in noisy settings where direct learning of policies is infeasible.
Produces representations whose features match patterns seen in neural responses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same code could be tested for its ability to handle multi-step planning under sensory noise in larger state spaces.
If neural recordings show population codes matching the distributional successor features during partial-observation tasks, that would align with the model's predictions.
The framework suggests a route to combine successor representations with other forms of uncertainty propagation in sequential decision problems.

Load-bearing premise

The distributed distributional code for uncertainty can be extended to successor features while preserving neural plausibility and enabling efficient computation in partially observed settings.

What would settle it

A simulation of a noisy partially observable task in which the model produces no better policies than direct policy learning methods that the paper claims are infeasible.

Figures

Figures reproduced from arXiv: 1906.09480 by Eszter Vertes, Maneesh Sahani.

**Figure 2.** Figure 2: Value functions computed using successor features under a random walk policy [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Value functions computed by SFs under the learned policy. Top row shows reward and [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 1.** Figure 1: Learning and inference in the DDC state-space model [PITH_FULL_IMAGE:figures/full_fig_p012_1.png] view at source ↗

read the original abstract

Animals need to devise strategies to maximize returns while interacting with their environment based on incoming noisy sensory observations. Task-relevant states, such as the agent's location within an environment or the presence of a predator, are often not directly observable but must be inferred using available sensory information. Successor representations (SR) have been proposed as a middle-ground between model-based and model-free reinforcement learning strategies, allowing for fast value computation and rapid adaptation to changes in the reward function or goal locations. Indeed, recent studies suggest that features of neural responses are consistent with the SR framework. However, it is not clear how such representations might be learned and computed in partially observed, noisy environments. Here, we introduce a neurally plausible model using distributional successor features, which builds on the distributed distributional code for the representation and computation of uncertainty, and which allows for efficient value function computation in partially observed environments via the successor representation. We show that distributional successor features can support reinforcement learning in noisy environments in which direct learning of successful policies is infeasible.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends their distributed distributional code to successor features for SR learning in POMDPs, with the main claim hinging on simulations whose details are not visible in the abstract.

read the letter

The core idea is a model that uses distributional successor features to learn successor representations from noisy observations in partially observable settings. This lets the agent do efficient value computation and adapt to reward changes without needing a full model or direct policy learning that apparently fails in their tests. It builds straight on their prior distributional code work for uncertainty, so the novelty is in the combination for POMDPs rather than a wholly new mechanism. The framing around why SRs matter for fast adaptation and neural plausibility is clear and connects to existing ideas in RL and neuroscience. The soft spot is that the abstract gives no equations, no simulation setup, no baselines, and no numbers on how much better this does than alternatives or what counts as infeasible direct learning. Without those, it's impossible to tell if the extension actually preserves the claimed advantages or if the results depend on particular hyperparameter choices. This is aimed at people working on successor representations, distributional codes, or hippocampal models of RL. A reader already following that literature would see the proposal as a logical next step, but the value depends on whether the full methods hold up. It should go to peer review so the simulations and any formal properties can be checked properly.

Referee Report

0 major / 3 minor

Summary. The paper proposes a neurally plausible model that extends the distributed distributional code to successor features, enabling learning of successor representations (SR) in partially observable environments. It claims this approach supports efficient value function computation and reinforcement learning in noisy POMDPs where direct policy learning is infeasible, building on prior work on distributional codes for uncertainty representation.

Significance. If the simulations and derivations hold, the work provides a concrete mechanism linking neural uncertainty representations to SR-based RL, offering potential explanations for biological value computation in uncertain settings and a practical algorithm for POMDPs. The modeling proposal is grounded in existing frameworks and addresses a clear gap in applying SR to partial observability.

minor comments (3)

The abstract and introduction would benefit from a brief explicit statement of the key equations defining the distributional successor features (e.g., how the code for successor distributions is updated) to allow readers to assess the extension from the base distributed distributional code without immediately consulting the methods.
Figure captions should include more detail on simulation parameters, such as noise levels in observations and number of trials, to make the results in the POMDP experiments reproducible from the figures alone.
Notation for the successor features and value computation should be unified across sections; currently the transition from the standard SR to the distributional version is not always signposted with equation references.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of the manuscript, the accurate summary of our contribution, and the recommendation for minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

Minor self-citation to prior distributional code work; central model extension remains independent

full rationale

The paper introduces a new model for distributional successor features in POMDPs by extending the distributed distributional code. It explicitly references building on that prior framework, but the abstract presents the extension itself (neural plausibility, efficient value computation via SR, support for RL where direct policies fail) as the novel contribution without any shown equations, fitted parameters, or self-defined terms that reduce the claimed result to its inputs by construction. No load-bearing uniqueness theorems, ansatzes, or renaming of known results are evident from the provided text. This is a standard modeling proposal whose support would be evaluated externally; the self-citation is not circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review limits visibility into parameters and assumptions; the model relies on extension of a prior distributional code framework whose details are not re-derived here.

free parameters (1)

model hyperparameters for feature learning and distribution parameters
Likely present for training the successor features but unspecified in abstract

axioms (1)

domain assumption The distributed distributional code provides a neurally plausible representation of uncertainty that can be extended to successor features.
The model is described as building directly on this prior framework.

pith-pipeline@v0.9.0 · 5715 in / 1096 out tokens · 35144 ms · 2026-05-25T17:53:42.892987+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

[1]

Hunt, Tom Schaul, Hado P

Andr \'e Barreto, Will Dabney, R \'e mi Munos, Jonathan J. Hunt, Tom Schaul, Hado P. van Hasselt, and David Silver. Successor features for transfer in reinforcement learning. In Advances in neural information processing systems, pages 4055--4065, 2017

work page 2017
[2]

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

Andr \'e Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Z \'i dek, and R \'e mi Munos. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement . arXiv:1901.10964 [cs], January 2019. URL http://arxiv.org/abs/1901.10964. arXiv: 1901.10964

work page internal anchor Pith review Pith/arXiv arXiv 1901
[3]

Daw, Yael Niv, and Peter Dayan

Nathaniel D. Daw, Yael Niv, and Peter Dayan. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8 0 (12): 0 1704, December 2005. ISSN 1546-1726. doi:10.1038/nn1560. URL https://www.nature.com/articles/nn1560

work page doi:10.1038/nn1560 2005
[4]

Daw, Samuel J

Nathaniel D. Daw, Samuel J. Gershman, Ben Seymour, Peter Dayan, and Raymond J. Dolan. Model- Based Influences on Humans ' Choices and Striatal Prediction Errors . Neuron, 69 0 (6): 0 1204--1215, March 2011. ISSN 0896-6273. doi:10.1016/j.neuron.2011.02.027. URL http://www.sciencedirect.com/science/article/pii/S0896627311001255

work page doi:10.1016/j.neuron.2011.02.027 2011
[5]

Improving Generalization for Temporal Difference Learning : The Successor Representation

Peter Dayan. Improving Generalization for Temporal Difference Learning : The Successor Representation . Neural Computation, 5 0 (4): 0 613--624, July 1993. ISSN 0899-7667. doi:10.1162/neco.1993.5.4.613. URL https://doi.org/10.1162/neco.1993.5.4.613

work page doi:10.1162/neco.1993.5.4.613 1993
[6]

Peter Dayan and Nathaniel D. Daw. Decision theory, reinforcement learning, and the brain. Cognitive, Affective, & Behavioral Neuroscience, 8 0 (4): 0 429--453, December 2008. ISSN 1531-135X. doi:10.3758/CABN.8.4.429. URL https://doi.org/10.3758/CABN.8.4.429

work page doi:10.3758/cabn.8.4.429 2008
[7]

Gershman

Samuel J. Gershman. The Successor Representation : Its Computational Logic and Neural Substrates . J. Neurosci., 38 0 (33): 0 7193--7200, August 2018. ISSN 0270-6474, 1529-2401. doi:10.1523/JNEUROSCI.0151-18.2018. URL http://www.jneurosci.org/content/38/33/7193

work page doi:10.1523/jneurosci.0151-18.2018 2018
[8]

O'Doherty

Jan Gl \"a scher, Nathaniel Daw, Peter Dayan, and John P. O'Doherty. States versus Rewards : Dissociable Neural Prediction Error Signals Underlying Model - Based and Model - Free Reinforcement Learning . Neuron, 66 0 (4): 0 585--595, May 2010. ISSN 0896-6273. doi:10.1016/j.neuron.2010.04.016. URL http://www.sciencedirect.com/science/article/pii/S0896627310002874

work page doi:10.1016/j.neuron.2010.04.016 2010
[9]

Borgwardt, Malte J

Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch \"o lkopf, and Alexander Smola. A Kernel Two - Sample Test . Journal of Machine Learning Research, 13: 0 723--773, March 2012. URL http://jmlr.csail.mit.edu/papers/v13/gretton12a.html

work page 2012
[10]

wake-sleep

G E Hinton, P Dayan, B J Frey, and R M Neal. The "wake-sleep" algorithm for unsupervised neural networks. Science, 268 0 (5214): 0 1158--1161, May 1995. ISSN 0036-8075

work page 1995
[11]

Deep Successor Reinforcement Learning

Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, and Samuel J. Gershman. Deep Successor Reinforcement Learning . arXiv:1606.02396 [cs, stat], June 2016. URL http://arxiv.org/abs/1606.02396. arXiv: 1606.02396

work page internal anchor Pith review Pith/arXiv arXiv 2016
[12]

Mattar and Nathaniel D

Marcelo G. Mattar and Nathaniel D. Daw. Prioritized memory access explains planning and hippocampal replay. Nature Neuroscience, 21 0 (11): 0 1609, November 2018. ISSN 1546-1726. doi:10.1038/s41593-018-0232-z. URL https://www.nature.com/articles/s41593-018-0232-z

work page doi:10.1038/s41593-018-0232-z 2018
[13]

Dorsal hippocampus contributes to model-based planning

Kevin J Miller, Matthew M Botvinick, and Carlos D Brody. Dorsal hippocampus contributes to model-based planning. Nature Neuroscience, 20 0 (9): 0 1269--1276, September 2017. ISSN 1097-6256, 1546-1726. doi:10.1038/nn.4613. URL http://www.nature.com/articles/nn.4613

work page doi:10.1038/nn.4613 2017
[14]

Momennejad, E

I. Momennejad, E. M. Russek, J. H. Cheong, M. M. Botvinick, N. D. Daw, and S. J. Gershman. The successor representation in human reinforcement learning. Nature Human Behaviour, 1 0 (9): 0 680, September 2017. ISSN 2397-3374. doi:10.1038/s41562-017-0180-8. URL https://www.nature.com/articles/s41562-017-0180-8

work page doi:10.1038/s41562-017-0180-8 2017
[15]

Pfeiffer and David J

Brad E. Pfeiffer and David J. Foster. Hippocampal place-cell sequences depict future paths to remembered goals. Nature, 497 0 (7447): 0 74--79, May 2013. ISSN 1476-4687. doi:10.1038/nature12112. URL https://www.nature.com/articles/nature12112

work page doi:10.1038/nature12112 2013
[16]

Russek, Ida Momennejad, Matthew M

Evan M. Russek, Ida Momennejad, Matthew M. Botvinick, Samuel J. Gershman, and Nathaniel D. Daw. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLOS Computational Biology, 13 0 (9): 0 e1005768, September 2017. ISSN 1553-7358. doi:10.1371/journal.pcbi.1005768. URL https://journals.plos.org/ploscompbiol/artic...

work page doi:10.1371/journal.pcbi.1005768 2017
[17]

Doubly Distributional Population Codes : Simultaneous Representation of Uncertainty and Multiplicity

Maneesh Sahani and Peter Dayan. Doubly Distributional Population Codes : Simultaneous Representation of Uncertainty and Multiplicity . Neural Computation, 15 0 (10): 0 2255--2279, October 2003. ISSN 0899-7667. doi:10.1162/089976603322362356. URL http://dx.doi.org/10.1162/089976603322362356

work page doi:10.1162/089976603322362356 2003
[18]

Stachenfeld, Matthew M

Kimberly L. Stachenfeld, Matthew M. Botvinick, and Samuel J. Gershman. The hippocampus as a predictive map. Nature Neuroscience, 20 0 (11): 0 1643--1653, November 2017. ISSN 1546-1726. doi:10.1038/nn.4650. URL https://www.nature.com/articles/nn.4650

work page doi:10.1038/nn.4650 2017
[19]

Babayan, Naoshige Uchida, and Samuel J

Clara Kwon Starkweather, Benedicte M. Babayan, Naoshige Uchida, and Samuel J. Gershman. Dopamine reward prediction errors reflect hidden-state inference across time. Nat. Neurosci., 20 0 (4): 0 581--589, April 2017. ISSN 1546-1726. doi:10.1038/nn.4520

work page doi:10.1038/nn.4520 2017
[20]

Hippocampal Reactivation of Random Trajectories Resembling Brownian Diffusion

Federico Stella, Peter Baracskay, Joseph O Neill, and Jozsef Csicsvari. Hippocampal Reactivation of Random Trajectories Resembling Brownian Diffusion . Neuron, February 2019. ISSN 0896-6273. doi:10.1016/j.neuron.2019.01.052. URL http://www.sciencedirect.com/science/article/pii/S0896627319300790

work page doi:10.1016/j.neuron.2019.01.052 2019
[21]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto. Introduction to reinforcement learning, volume 135. MIT press Cambridge, 1998

work page 1998
[22]

Flexible and accurate inference and learning for deep generative models

Eszter V \'e rtes and Maneesh Sahani. Flexible and accurate inference and learning for deep generative models. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31 , pages 4166--4175. Curran Associates, Inc., 2018. URL http://papers.nips.cc/paper/7671-flexible-a...

work page 2018
[23]

Wainwright and Michael I

Martin J. Wainwright and Michael I. Jordan. Graphical Models , Exponential Families , and Variational Inference . Found. Trends Mach. Learn., 1 0 (1-2): 0 1--305, January 2008. ISSN 1935-8237. doi:10.1561/2200000001. URL http://dx.doi.org/10.1561/2200000001

work page doi:10.1561/2200000001 2008
[24]

Zemel, Peter Dayan, and Alexandre Pouget

Richard S. Zemel, Peter Dayan, and Alexandre Pouget. Probabilistic interpretation of population codes. Neural computation, 10 0 (2): 0 403--430, 1998

work page 1998

[1] [1]

Hunt, Tom Schaul, Hado P

Andr \'e Barreto, Will Dabney, R \'e mi Munos, Jonathan J. Hunt, Tom Schaul, Hado P. van Hasselt, and David Silver. Successor features for transfer in reinforcement learning. In Advances in neural information processing systems, pages 4055--4065, 2017

work page 2017

[2] [2]

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

Andr \'e Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Z \'i dek, and R \'e mi Munos. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement . arXiv:1901.10964 [cs], January 2019. URL http://arxiv.org/abs/1901.10964. arXiv: 1901.10964

work page internal anchor Pith review Pith/arXiv arXiv 1901

[3] [3]

Daw, Yael Niv, and Peter Dayan

Nathaniel D. Daw, Yael Niv, and Peter Dayan. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8 0 (12): 0 1704, December 2005. ISSN 1546-1726. doi:10.1038/nn1560. URL https://www.nature.com/articles/nn1560

work page doi:10.1038/nn1560 2005

[4] [4]

Daw, Samuel J

Nathaniel D. Daw, Samuel J. Gershman, Ben Seymour, Peter Dayan, and Raymond J. Dolan. Model- Based Influences on Humans ' Choices and Striatal Prediction Errors . Neuron, 69 0 (6): 0 1204--1215, March 2011. ISSN 0896-6273. doi:10.1016/j.neuron.2011.02.027. URL http://www.sciencedirect.com/science/article/pii/S0896627311001255

work page doi:10.1016/j.neuron.2011.02.027 2011

[5] [5]

Improving Generalization for Temporal Difference Learning : The Successor Representation

Peter Dayan. Improving Generalization for Temporal Difference Learning : The Successor Representation . Neural Computation, 5 0 (4): 0 613--624, July 1993. ISSN 0899-7667. doi:10.1162/neco.1993.5.4.613. URL https://doi.org/10.1162/neco.1993.5.4.613

work page doi:10.1162/neco.1993.5.4.613 1993

[6] [6]

Peter Dayan and Nathaniel D. Daw. Decision theory, reinforcement learning, and the brain. Cognitive, Affective, & Behavioral Neuroscience, 8 0 (4): 0 429--453, December 2008. ISSN 1531-135X. doi:10.3758/CABN.8.4.429. URL https://doi.org/10.3758/CABN.8.4.429

work page doi:10.3758/cabn.8.4.429 2008

[7] [7]

Gershman

Samuel J. Gershman. The Successor Representation : Its Computational Logic and Neural Substrates . J. Neurosci., 38 0 (33): 0 7193--7200, August 2018. ISSN 0270-6474, 1529-2401. doi:10.1523/JNEUROSCI.0151-18.2018. URL http://www.jneurosci.org/content/38/33/7193

work page doi:10.1523/jneurosci.0151-18.2018 2018

[8] [8]

O'Doherty

Jan Gl \"a scher, Nathaniel Daw, Peter Dayan, and John P. O'Doherty. States versus Rewards : Dissociable Neural Prediction Error Signals Underlying Model - Based and Model - Free Reinforcement Learning . Neuron, 66 0 (4): 0 585--595, May 2010. ISSN 0896-6273. doi:10.1016/j.neuron.2010.04.016. URL http://www.sciencedirect.com/science/article/pii/S0896627310002874

work page doi:10.1016/j.neuron.2010.04.016 2010

[9] [9]

Borgwardt, Malte J

Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch \"o lkopf, and Alexander Smola. A Kernel Two - Sample Test . Journal of Machine Learning Research, 13: 0 723--773, March 2012. URL http://jmlr.csail.mit.edu/papers/v13/gretton12a.html

work page 2012

[10] [10]

wake-sleep

G E Hinton, P Dayan, B J Frey, and R M Neal. The "wake-sleep" algorithm for unsupervised neural networks. Science, 268 0 (5214): 0 1158--1161, May 1995. ISSN 0036-8075

work page 1995

[11] [11]

Deep Successor Reinforcement Learning

Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, and Samuel J. Gershman. Deep Successor Reinforcement Learning . arXiv:1606.02396 [cs, stat], June 2016. URL http://arxiv.org/abs/1606.02396. arXiv: 1606.02396

work page internal anchor Pith review Pith/arXiv arXiv 2016

[12] [12]

Mattar and Nathaniel D

Marcelo G. Mattar and Nathaniel D. Daw. Prioritized memory access explains planning and hippocampal replay. Nature Neuroscience, 21 0 (11): 0 1609, November 2018. ISSN 1546-1726. doi:10.1038/s41593-018-0232-z. URL https://www.nature.com/articles/s41593-018-0232-z

work page doi:10.1038/s41593-018-0232-z 2018

[13] [13]

Dorsal hippocampus contributes to model-based planning

Kevin J Miller, Matthew M Botvinick, and Carlos D Brody. Dorsal hippocampus contributes to model-based planning. Nature Neuroscience, 20 0 (9): 0 1269--1276, September 2017. ISSN 1097-6256, 1546-1726. doi:10.1038/nn.4613. URL http://www.nature.com/articles/nn.4613

work page doi:10.1038/nn.4613 2017

[14] [14]

Momennejad, E

I. Momennejad, E. M. Russek, J. H. Cheong, M. M. Botvinick, N. D. Daw, and S. J. Gershman. The successor representation in human reinforcement learning. Nature Human Behaviour, 1 0 (9): 0 680, September 2017. ISSN 2397-3374. doi:10.1038/s41562-017-0180-8. URL https://www.nature.com/articles/s41562-017-0180-8

work page doi:10.1038/s41562-017-0180-8 2017

[15] [15]

Pfeiffer and David J

Brad E. Pfeiffer and David J. Foster. Hippocampal place-cell sequences depict future paths to remembered goals. Nature, 497 0 (7447): 0 74--79, May 2013. ISSN 1476-4687. doi:10.1038/nature12112. URL https://www.nature.com/articles/nature12112

work page doi:10.1038/nature12112 2013

[16] [16]

Russek, Ida Momennejad, Matthew M

Evan M. Russek, Ida Momennejad, Matthew M. Botvinick, Samuel J. Gershman, and Nathaniel D. Daw. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLOS Computational Biology, 13 0 (9): 0 e1005768, September 2017. ISSN 1553-7358. doi:10.1371/journal.pcbi.1005768. URL https://journals.plos.org/ploscompbiol/artic...

work page doi:10.1371/journal.pcbi.1005768 2017

[17] [17]

Doubly Distributional Population Codes : Simultaneous Representation of Uncertainty and Multiplicity

Maneesh Sahani and Peter Dayan. Doubly Distributional Population Codes : Simultaneous Representation of Uncertainty and Multiplicity . Neural Computation, 15 0 (10): 0 2255--2279, October 2003. ISSN 0899-7667. doi:10.1162/089976603322362356. URL http://dx.doi.org/10.1162/089976603322362356

work page doi:10.1162/089976603322362356 2003

[18] [18]

Stachenfeld, Matthew M

Kimberly L. Stachenfeld, Matthew M. Botvinick, and Samuel J. Gershman. The hippocampus as a predictive map. Nature Neuroscience, 20 0 (11): 0 1643--1653, November 2017. ISSN 1546-1726. doi:10.1038/nn.4650. URL https://www.nature.com/articles/nn.4650

work page doi:10.1038/nn.4650 2017

[19] [19]

Babayan, Naoshige Uchida, and Samuel J

Clara Kwon Starkweather, Benedicte M. Babayan, Naoshige Uchida, and Samuel J. Gershman. Dopamine reward prediction errors reflect hidden-state inference across time. Nat. Neurosci., 20 0 (4): 0 581--589, April 2017. ISSN 1546-1726. doi:10.1038/nn.4520

work page doi:10.1038/nn.4520 2017

[20] [20]

Hippocampal Reactivation of Random Trajectories Resembling Brownian Diffusion

Federico Stella, Peter Baracskay, Joseph O Neill, and Jozsef Csicsvari. Hippocampal Reactivation of Random Trajectories Resembling Brownian Diffusion . Neuron, February 2019. ISSN 0896-6273. doi:10.1016/j.neuron.2019.01.052. URL http://www.sciencedirect.com/science/article/pii/S0896627319300790

work page doi:10.1016/j.neuron.2019.01.052 2019

[21] [21]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto. Introduction to reinforcement learning, volume 135. MIT press Cambridge, 1998

work page 1998

[22] [22]

Flexible and accurate inference and learning for deep generative models

Eszter V \'e rtes and Maneesh Sahani. Flexible and accurate inference and learning for deep generative models. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31 , pages 4166--4175. Curran Associates, Inc., 2018. URL http://papers.nips.cc/paper/7671-flexible-a...

work page 2018

[23] [23]

Wainwright and Michael I

Martin J. Wainwright and Michael I. Jordan. Graphical Models , Exponential Families , and Variational Inference . Found. Trends Mach. Learn., 1 0 (1-2): 0 1--305, January 2008. ISSN 1935-8237. doi:10.1561/2200000001. URL http://dx.doi.org/10.1561/2200000001

work page doi:10.1561/2200000001 2008

[24] [24]

Zemel, Peter Dayan, and Alexandre Pouget

Richard S. Zemel, Peter Dayan, and Alexandre Pouget. Probabilistic interpretation of population codes. Neural computation, 10 0 (2): 0 403--430, 1998

work page 1998