pith. machine review for the scientific record. sign in

arxiv: 2603.27134 · v5 · submitted 2026-03-28 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Factorization Regret mediates compositional generalization in latent space

Authors on Pith no claims yet

Pith reviewed 2026-05-14 23:11 UTC · model grok-4.3

classification 💻 cs.LG
keywords compositional generalizationfactorization regretrepresentation classification chainslatent variable interactionsPOMDPvariational inferenceCognitive Gridworld
0
0 comments X

The pith

Representation Classification Chains learn parametric interactions between latent variables to enable compositional generalization in POMDPs where feedback covers only one goal variable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper frames compositional generalization as a variational inference problem over latent variables whose interactions must be recovered from data. It introduces the Cognitive Gridworld, a stationary POMDP in which multiple latent variables jointly generate observations but reward is provided only for a single goal variable, and defines Factorization Regret as the information-theoretic cost imposed by those interactions. Experiments first show that RNNs given the interactions explicitly still suffer performance gaps explained by Factorization Regret, including a predicted confidence-accuracy decoupling. The authors then introduce Representation Classification Chains that separate value inference from interaction parameter estimation, demonstrating improved generalization to unseen variable combinations and offline learning in new action spaces.

Core claim

Factorization Regret measures how much task performance depends on recovering the parametric interactions among latent variables; once these interactions are learned by an embedding model, Representation Classification Chains disentangle inference of variable values from estimation of their interaction parameters, allowing the model to compose known variables in novel ways and to learn offline in previously unseen action spaces.

What carries the argument

Representation Classification Chains (RCCs), an architecture that separates latent-variable inference from estimation of their parametric interactions inside a variational inference loop.

If this is right

  • RNNs supplied with explicit interactions still exhibit accuracy gaps directly proportional to measured Factorization Regret.
  • A theoretically predicted failure mode appears in which model confidence decouples from actual accuracy when interactions are not fully utilized.
  • RCCs that learn interactions while inferring values enable compositional generalization to novel combinations of the relevant variables.
  • RCCs support offline learning in novel action spaces after the interactions have been recovered.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same separation of inference and interaction learning could be tested in other partially observable settings where only a subset of latent factors receive direct reward.
  • If RCCs scale, they suggest a route to building goal-directed agents that treat variable interactions as reusable modules rather than re-learning them for every new task.
  • The framework offers a concrete metric (Factorization Regret) that could be tracked during training of other latent-variable models to diagnose generalization bottlenecks.

Load-bearing premise

The parametric interactions among latent variables can be disentangled from value inference in a way that stays stable when the model must discover those interactions from data alone.

What would settle it

Train RCCs on the Cognitive Gridworld and test whether they achieve lower Factorization Regret and higher accuracy on held-out combinations of latent variables than standard RNNs or embedding models that do not separate inference from interaction learning.

Figures

Figures reproduced from arXiv: 2603.27134 by John Schwarcz.

Figure 1
Figure 1. Figure 1: Environment schematic for C = 2. Observations ot are generated stochastically. vari￾able interactions Z parameterize the likelihood PZ(o | r) and variable realizations (r1, r2) fix the probability of sampling each observable o i . Thus, while the real world involves a vast number of latent variables, the agent’s goal effectively reduces the world to a context of C relevant variables—specifically, the goal … view at source ↗
Figure 2
Figure 2. Figure 2: The cost of Naive Bayes grows with time and interactions. (a) Example Joint (matrices) and marginalized (vectors) likelihoods. (b) Top: Accuracy of Joint (left) and Naive (right) Bayes across varying context sizes. Bottom: Relative accuracy (left) and Semantic Interaction Information (right). Circles mark four equidistant reference time-points throughout inference. given the realization of a latent variabl… view at source ↗
Figure 3
Figure 3. Figure 3: Recurrent Neural Networks align with theoretical predictions. (a) Architecture (left) and gradient flow (right) of the Classifier. Only the goal belief-state receives a gradient. (b-d) Same as Figure 2b (for C = 1, 2) with Fully Trained and Echo State Networks. Markers indicate four equidistant reference time-points throughout inference. shifts rightward, reflecting a growing probability of a hit. Unexpect… view at source ↗
Figure 4
Figure 4. Figure 4: Failure to capture SII can induce hallucinations. (a) Sequential updating of example posteriors under Joint and Naive inference. (b) Distribution of hits and misses at each step, pooled over episodes. Misinterpreting evidence yields episodes with performance below chance. 3.3 EXPERIMENT 2: LEARNING INTERACTIONS REQUIRES VARIATIONAL INFERENCE Thus far, we have established how Interaction Information can imp… view at source ↗
Figure 5
Figure 5. Figure 5: Compositional embeddings are learned indirectly via goals. Schematic demonstration of compositional generalization in latent space. Training episodes (top) contain at most one testing variable, which is never the goal (green). Testing episodes (bottom) consist entirely of testing vari￾ables. Success requires testing variable embeddings to be learned through their implicit relationships to training goals. T… view at source ↗
Figure 6
Figure 6. Figure 6: A variational architecture learns compositional embeddings from reward. (a) Rele￾vant variables interact via learnable embeddings to form interactions. (b) Forward pass and gradient flow of the Classifier and Generator. The Classifier learns from rewards while the Generator uses self-supervised-learning (SSL). (c) Testing episode accuracy of the Classifier throughout training. intrinsic reward is given by … view at source ↗
Figure 7
Figure 7. Figure 7: Conditional generative modeling enables optimization in compositional spaces. (a) Schematic illustrating the mapping of preferred observations (Ω) to their respective likelihoods and the cumulative landscape (accumulated over i in subsection 3.4). (b) An example traversal, from the lowest to the highest point on the landscape, changes observations to best match the agent’s preference. (c) Controller learni… view at source ↗
Figure 8
Figure 8. Figure 8: Example offline learning trajectories w/ Generator. The evolution of the deterministic policy, argmaxrπ(r), is plotted throughout offline training from initialization (red circles) to the end of training (green stars). Trajectories are overlaid on the preference landscapes to demonstrate navigation through an internal Cognitive Gridworld. 4 DISCUSSION In this work, we attempted to formalize the ability to … view at source ↗
Figure 9
Figure 9. Figure 9: A flexible process for embedding Gridworld structure into latent space. The embed￾dings of relevant variables are compressed into interactions which are then expanded to a discrete probability distribution over possible realizations of the world. The full process consists of first (i) compressing embedding vectors to their scalar interactions. Then (ii) expanding pairwise interac￾tions to pairs of vectors,… view at source ↗
Figure 10
Figure 10. Figure 10: Additional examples of Bayesian inference for C = 2. Representative examples of belief-state updating under Joint and Naive inference. A.3 LEARNING DYNAMICS Future theoretical work is still needed to specify the relationship between learning, dynamics and computation. For instance, we observed that while average performance improves throughout train￾ing for both Fully Trained and Echo State networks (Figu… view at source ↗
Figure 11
Figure 11. Figure 11: Early correlation between accuracy and SII predicts eventual performance (C = 2). (a) Throughout learning, the testing accuracy at the final step of inference saturates to the perfor￾mance of either Joint or Naive Bayes. (b) Final step testing accuracy during early training. (c) Correlation between Semantic Interaction Information (SII) and accuracy at the final step of infer￾ence. A negative correlation … view at source ↗
Figure 12
Figure 12. Figure 12: Belief representations are initially factorized. Top 2 Principal Components of the marginal beliefs after a single observation. Beliefs are colored by the realization sum (rc + rc ′ ), dif￾ference (rc − rc ′ ), and belief entropy (− P c P r Btcr ln Btcr). R2 indicates the variance explained of the respective variables by the top 2 components [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Failing to capture interaction information causes entanglement over time. Same as [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Dis-entanglement correlates with learned dynamics of dimensionality. Cross￾correlations between Bayesian dis-entanglement and network (a) absolute distance, (b) L2 norm, (c) Hoyer’s sparsity and (d) Participation Ratio of network dynamics. Only data from 20 ≤ t ≤ T was analyzed to isolate the steady-state response profile. relationships, Representation Classification Chains can be viewed as the potential … view at source ↗
Figure 15
Figure 15. Figure 15: Additional learning trajectories of the Controller trained Offline w/ Generator. Representative examples of the Controller exploring an internally generated Cognitive Gridworld. A.6 ENVIRONMENT HYPERPARAMETERS • T (Trajectory / inference steps): 30 • dE (Embedding dimensionality): 30 • |S| (Total latent variables / states): 500 • R (Possible realizations): 10 • do (Observation dimensions): 5 • λ (Likeliho… view at source ↗
Figure 16
Figure 16. Figure 16: Network results extend to environments with 3 interacting variables. (a) Same as Figure 3b-d, for 1, 2 and 3 relevant variables. (b) Divergence of network marginal beliefs from Bayesian marginal beliefs. Over the course of inference, the Fully Trained network’s beliefs (solid lines) diverge from Naive Bayes (on the x-axis) while staying aligned with Joint Bayes (on the y￾axis). Conversely, the Echo State … view at source ↗
read the original abstract

Are there still barriers to generalization once all of the relevant variables are known? We address this question via a framework that casts compositional generalization as a variational inference problem over latent variables with parametric interactions. To explore this framework, we develop the Cognitive Gridworld, a stationary Partially Observable Markov Decision Process (POMDP) in which observations are generated jointly by multiple latent variables, yet feedback is provided only for a single goal variable. This setting allows us to describe Factorization Regret: an information-theoretic quantity that measures the contribution of latent variable interactions to task performance. Using this metric, we first analyze Recurrent Neural Networks (RNNs) that are explicitly provided with the interactions and find that Factorization Regret explains the accuracy gap between Echo State and Fully Trained networks. Additionally, our analysis uncovers a theoretically predicted failure mode, where confidence becomes decoupled from accuracy. These results suggest that utilizing the interactions between relevant variables is a non-trivial capability. We then address a harder regime where the interactions themselves must be learned by an embedding model. Learning how variables interact while learning how to infer their values is a variational inference problem. We approach this dilemma via Representation Classification Chains (RCCs), a novel architecture which disentangles variable inference and parameter estimation. We demonstrate that, by learning how variables interact, RCCs facilitate compositional generalization to novel combinations of relevant variables and offline learning in novel action spaces. Together, these results establish a theoretically grounded setting for researching, developing and evaluating goal-directed generalist agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper frames compositional generalization as variational inference over latent variables with parametric interactions in a new POMDP called the Cognitive Gridworld, where observations depend on multiple latents but feedback is given only on a goal variable. It defines Factorization Regret as an information-theoretic quantity measuring the performance contribution of latent interactions. The work first analyzes RNNs supplied with explicit interactions, showing that Factorization Regret accounts for accuracy differences between Echo State and fully trained networks and identifying a confidence-accuracy decoupling failure mode. It then introduces Representation Classification Chains (RCCs) that learn interactions while inferring values, claiming these enable compositional generalization to novel variable combinations and offline learning in new action spaces.

Significance. If the RCC disentanglement mechanism and the mediating role of Factorization Regret are rigorously validated, the framework would offer a principled information-theoretic lens on compositional generalization in latent-space RL, together with a new stationary POMDP benchmark. The explicit linkage between interaction learning and generalization performance, plus the identification of a theoretically predicted failure mode, would be useful for designing generalist agents; however, the current absence of equations and quantitative results limits immediate impact.

major comments (2)
  1. [Abstract] Abstract: the claim that RCCs 'disentangle variable inference and parameter estimation' to solve the variational inference problem is load-bearing for the central result, yet no equations, loss terms, or architectural constraints are supplied showing how interaction parameters are isolated from value inference (e.g., whether they appear only in a dedicated factorization term or remain coupled through shared embeddings). Without this isolation, reported gains could arise from joint non-factorized fitting rather than the claimed mechanism.
  2. [Abstract] Abstract: Factorization Regret is introduced as an information-theoretic quantity that 'explains the accuracy gap' between Echo State and Fully Trained networks, but no definition, derivation, or numerical results (error bars, data-exclusion criteria) are provided; this prevents verification that the metric is independent of parameterization choices and actually mediates the observed generalization.
minor comments (1)
  1. [Abstract] The abstract would be strengthened by including at least one key equation for Factorization Regret and a brief statement of the RCC loss or architecture constraint.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity on the requested details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that RCCs 'disentangle variable inference and parameter estimation' to solve the variational inference problem is load-bearing for the central result, yet no equations, loss terms, or architectural constraints are supplied showing how interaction parameters are isolated from value inference (e.g., whether they appear only in a dedicated factorization term or remain coupled through shared embeddings). Without this isolation, reported gains could arise from joint non-factorized fitting rather than the claimed mechanism.

    Authors: We agree that the abstract would benefit from a more explicit pointer to the isolation mechanism. Section 4 of the manuscript defines RCCs with separate inference and parameterization modules: variable inference uses a dedicated encoder whose outputs feed only into a value head, while interaction parameters are learned via a classification chain with an explicit factorization loss (Equation 7) that operates on a frozen embedding and does not back-propagate into the inference path. This architectural constraint prevents the coupling the referee correctly flags. We will revise the abstract to reference this separation and the dedicated loss term. revision: yes

  2. Referee: [Abstract] Abstract: Factorization Regret is introduced as an information-theoretic quantity that 'explains the accuracy gap' between Echo State and Fully Trained networks, but no definition, derivation, or numerical results (error bars, data-exclusion criteria) are provided; this prevents verification that the metric is independent of parameterization choices and actually mediates the observed generalization.

    Authors: We accept that the abstract omits these supporting elements. The definition appears in Section 3.1 as the expected reduction in reward entropy attributable to latent interactions (I(R; interactions) minus a baseline entropy term), with the derivation following from the chain rule on the joint posterior. Numerical results, including error bars across 10 seeds and exclusion of runs that failed to reach 80% training accuracy, are shown in Figure 3 and Table 2. We will add a concise definition and citation to these results in the revised abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the derivation chain

full rationale

The paper introduces Factorization Regret as a new information-theoretic quantity and RCCs as a novel architecture for disentangling inference and parameter estimation in a variational setting. The abstract and provided text define the metric, apply it to RNNs with explicit interactions, and demonstrate RCC performance on learned interactions without any equations or steps that reduce predictions or claims to fitted inputs by construction. No self-citations appear as load-bearing premises, and the central claims rest on the introduced framework plus empirical analysis rather than tautological renaming or self-referential definitions. The derivation remains self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that observations are generated by multiple interacting latent variables whose interactions can be variationally inferred; no explicit free parameters or invented entities are named in the abstract.

free parameters (1)
  • parametric interaction terms
    Interactions between latent variables are treated as learnable parameters whose form is not derived from first principles.
axioms (1)
  • domain assumption Observations are generated jointly by multiple latent variables with feedback only on a single goal variable
    Core modeling choice for the Cognitive Gridworld POMDP.

pith-pipeline@v0.9.0 · 5556 in / 1191 out tokens · 33040 ms · 2026-05-14T23:11:28.344346+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

122 extracted references · 122 canonical work pages · 7 internal anchors

  1. [1]

    Burgess, Nicholas Watters, Alexander Lerchner, and Irina Higgins

    Alessandro Achille, Tom Eccles, Lo ¨ıc Matthey, Christopher P. Burgess, Nicholas Watters, Alexander Lerchner, and Irina Higgins. Life-long disentangled representation learning with cross-domain latent homologies. InNeural Information Processing Systems, 2018. URL https://api.semanticscholar.org/CorpusID:52049801

  2. [2]

    Is conditional generative modeling all you need for decision-making? arXiv preprint arXiv:2211.15657 , 2022

    Anurag Ajay, Yilun Du, Abhi Gupta, Joshua B. Tenenbaum, T. Jaakkola, and Pulkit Agrawal. Is conditional generative modeling all you need for decision-making?ArXiv, abs/2211.15657, 2022. URLhttps://api.semanticscholar.org/CorpusID: 254044710

  3. [3]

    An exact analytical relation among recall, precision, and classification accuracy in information retrieval.Boston College, Boston, Technical Report BCCS-02, 1: 1–22, 2002

    Sergio A Alvarez. An exact analytical relation among recall, precision, and classification accuracy in information retrieval.Boston College, Boston, Technical Report BCCS-02, 1: 1–22, 2002

  4. [4]

    V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

    Mahmoud Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Mojtaba Komeili, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zho- lus, Sergio Arnaud, Abha Gejji, Ada Martin, Francois Robert Hogan, Daniel Dugas, Pi- otr Bojanowski, Vasil Khalidov, Patrick Labatut, Francisco Massa, Marc Szafraniec, Kapil Krishnakumar, Yong L...

  5. [5]

    Optimal control of markov processes with incomplete state information

    Karl Johan ˚Astr¨om. Optimal control of markov processes with incomplete state information. Journal of Mathematical Analysis and Applications, 10:174–205, 1965. URLhttps:// api.semanticscholar.org/CorpusID:121222106

  6. [6]

    The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off.The Journal of Neuroscience, 33:3844 – 3856, 2013

    Omri Barak, Mattia Rigotti, and Stefano Fusi. The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off.The Journal of Neuroscience, 33:3844 – 3856, 2013. URLhttps://api.semanticscholar.org/CorpusID:1766932

  7. [7]

    Raunak Basu, Robert Gebauer, Tim Herfurth, Simon Kolb, Zahra Golipour, Tatjana Tchumatchenko, and Hiroshi T. Ito. The orbitofrontal cortex maps future navigational goals.Nature, 599:449 – 452, 2021. URLhttps://api.semanticscholar.org/ CorpusID:240072183

  8. [8]

    Muller, James C

    Timothy Edward John Behrens, Timothy H. Muller, James C. R. Whittington, Shirley Mark, Alon B. Baram, Kimberly L. Stachenfeld, and Zeb Kurth-Nelson. What is a cognitive map? organizing knowledge for flexible behavior.Neuron, 100:490–509, 2018. URLhttps: //api.semanticscholar.org/CorpusID:53105626

  9. [9]

    Dynamic programming.Science, 153:34 – 37, 1957

    Richard Bellman. Dynamic programming.Science, 153:34 – 37, 1957. URLhttps: //api.semanticscholar.org/CorpusID:271544899

  10. [10]

    Predictive learning enables compositional repre- sentations.bioRxiv, pp

    Gauthier Boeshertz and Claudia Clopath. Predictive learning enables compositional repre- sentations.bioRxiv, pp. 2025–09, 2025

  11. [11]

    Bowler, Dua Azhar, Cambria M Jensen, Hyun-Woo Lee, and James G

    John C. Bowler, Dua Azhar, Cambria M Jensen, Hyun-Woo Lee, and James G. Heys. Struc- tured experience shapes strategy learning and neural dynamics in the medial entorhinal cortex.bioRxiv, 2025. URLhttps://api.semanticscholar.org/CorpusID: 278664552

  12. [12]

    Brunton, Matthew M

    Bingni W. Brunton, Matthew M. Botvinick, and Carlos D. Brody. Rats and humans can optimally accumulate evidence for decision-making.Science, 340:95 – 98, 2013. URL https://api.semanticscholar.org/CorpusID:13098239. 12

  13. [13]

    Spatial coding and attractor dynamics of grid cells in the entorhinal cortex.Current Opinion in Neurobiology, 25:169–175, 2014

    Yoram Burak. Spatial coding and attractor dynamics of grid cells in the entorhinal cortex.Current Opinion in Neurobiology, 25:169–175, 2014. URLhttps://api. semanticscholar.org/CorpusID:16681043

  14. [14]

    Bussell, Ryan P

    Jennifer J. Bussell, Ryan P. Badman, Christian David M ´arton, Ethan S. Bromberg-Martin, Larry Abbott, Kanaka Rajan, and Richard Axel. Representations of the intrinsic value of information in mouse orbitofrontal cortex.bioRxiv, 2024. URLhttps://api. semanticscholar.org/CorpusID:264171514

  15. [15]

    Charles M. Butter. Perseveration in extinction and in discrimination reversal tasks following selective frontal ablations in macaca mulatta.Physiology & Behavior, 4:163–171, 1969. URL https://api.semanticscholar.org/CorpusID:17920166

  16. [16]

    Stephanie C. Y . Chan, Yael Niv, and Kenneth A. Norman. A probability distribution over latent causes, in the orbitofrontal cortex.The Journal of Neuroscience, 36:7817 – 7828,

  17. [17]

    URLhttps://api.semanticscholar.org/CorpusID:9673546

  18. [18]

    On the Measure of Intelligence

    Franc ¸ois Chollet. On the measure of intelligence.arXiv preprint arXiv:1911.01547, 2019

  19. [19]

    Arc prize 2024: Technical report

    Francois Chollet, Mike Knoop, Gregory Kamradt, and Bryan Landers. Arc prize 2024: Tech- nical report.ArXiv, abs/2412.04604, 2024. URLhttps://api.semanticscholar. org/CorpusID:274581906

  20. [20]

    Yogita Chudasama and Trevor William Robbins. Dissociable contributions of the or- bitofrontal and infralimbic cortex to pavlovian autoshaping and discrimination reversal learn- ing: Further evidence for the functional heterogeneity of the rodent frontal cortex.The Jour- nal of Neuroscience, 23:8771 – 8780, 2003. URLhttps://api.semanticscholar. org/CorpusI...

  21. [21]

    Churchland and Krishna V

    Mark M. Churchland and Krishna V . Shenoy. Preparatory activity and the expansive null- space.Nature reviews. Neuroscience, 2024. URLhttps://api.semanticscholar. org/CorpusID:268250917

  22. [22]

    Information processing capacity of dynamical systems.Scientific Reports, 2, 2012

    Joni Dambre, David Verstraeten, Benjamin Schrauwen, and Serge Massar. Information processing capacity of dynamical systems.Scientific Reports, 2, 2012. URLhttps: //api.semanticscholar.org/CorpusID:7342429

  23. [23]

    Victor de Lafuente, Mehrdad Jazayeri, and Michael N. Shadlen. Representation of accumu- lating evidence for a decision in two parietal areas.The Journal of Neuroscience, 35:4306 – 4318, 2015. URLhttps://api.semanticscholar.org/CorpusID:14214715

  24. [24]

    Rebecca Dias, Trevor William Robbins, and Angela C. Roberts. Dissociation in prefrontal cortex of affective and attentional shifts.Nature, 380:69–72, 1996. URLhttps://api. semanticscholar.org/CorpusID:4301013

  25. [25]

    Audrey Duarte, Richard N. A. Henson, Robert T. Knight, Tina Emery, and Kim S. Gra- ham. Orbito-frontal cortex is necessary for temporal context memory.Journal of Cognitive Neuroscience, 22:1819–1831, 2010. URLhttps://api.semanticscholar.org/ CorpusID:14909943

  26. [26]

    Dubreuil, Adrian Valente, Manuel Beir ´an, Francesca Mastrogiuseppe, and Srdjan Ostojic

    Alexis M. Dubreuil, Adrian Valente, Manuel Beir ´an, Francesca Mastrogiuseppe, and Srdjan Ostojic. The role of population structure in computations through neural dynamics.Nature Neuroscience, 25:783 – 794, 2022. URLhttps://api.semanticscholar.org/ CorpusID:256838997

  27. [27]

    Porter, Catherine E Munro, and Howard Eichenbaum

    Anja Farovik, Ryan Place, Sam McKenzie, Blake S. Porter, Catherine E Munro, and Howard Eichenbaum. Orbitofrontal cortex encodes memories within value-based schemas and rep- resents contexts that guide memory retrieval.The Journal of Neuroscience, 35:8333 – 8344,

  28. [28]

    URLhttps://api.semanticscholar.org/CorpusID:17512263. 13

  29. [29]

    Functional coupling between the prefrontal cortex and dopamine neurons in the ventral tegmental area.Journal of Neuroscience, 27(20):5414–5421, 2007

    Ming Gao, Chang-Liang Liu, Shen Yang, Guo-Zhang Jin, Benjamin S Bunney, and Wei-Xing Shi. Functional coupling between the prefrontal cortex and dopamine neurons in the ventral tegmental area.Journal of Neuroscience, 27(20):5414–5421, 2007

  30. [30]

    Garvert, Tankred Saanum, Eric Schulz, Nicolas W

    Mona M. Garvert, Tankred Saanum, Eric Schulz, Nicolas W. Schuck, and Chris- tian F. Doeller. Hippocampal spatio-predictive cognitive maps adaptively guide reward generalization.Nature Neuroscience, 26:615 – 626, 2023. URLhttps://api. semanticscholar.org/CorpusID:257924320

  31. [31]

    Gershman and Yael Niv

    Samuel J. Gershman and Yael Niv. Learning latent structure: carving nature at its joints.Current Opinion in Neurobiology, 20:251–256, 2010. URLhttps://api. semanticscholar.org/CorpusID:10255984

  32. [32]

    Interaction information for causal inference: The case of directed triangle.2017 IEEE International Symposium on Information Theory (ISIT), pp

    AmirEmad Ghassami and Negar Kiyavash. Interaction information for causal inference: The case of directed triangle.2017 IEEE International Symposium on Information Theory (ISIT), pp. 1326–1330, 2017. URLhttps://api.semanticscholar.org/CorpusID: 8283977

  33. [33]

    Gold and Michael N

    Joshua I. Gold and Michael N. Shadlen. The neural basis of decision making.Annual review of neuroscience, 30:535–74, 2007. URLhttps://api.semanticscholar. org/CorpusID:6842034

  34. [34]

    Pa ˇsukonis, Jimmy Ba, and Timothy P

    Danijar Hafner, J. Pa ˇsukonis, Jimmy Ba, and Timothy P. Lillicrap. Mastering diverse con- trol tasks through world models.Nature, 640:647 – 653, 2025. URLhttps://api. semanticscholar.org/CorpusID:277508993

  35. [35]

    A framework for intelligence and cortical function based on grid cells in the neocortex.Frontiers in Neu- ral Circuits, 12, 2018

    Jeff Hawkins, Marcus Lewis, Mirko Klukas, Scott Purdy, and Subutai Ahmad. A framework for intelligence and cortical function based on grid cells in the neocortex.Frontiers in Neu- ral Circuits, 12, 2018. URLhttps://api.semanticscholar.org/CorpusID: 57761278

  36. [36]

    Springer Science & Business Media, 2001

    Steven C Hayes, Dermot Barnes-Holmes, and Bryan Roche.Relational frame theory: A post- Skinnerian account of human language and cognition. Springer Science & Business Media, 2001

  37. [37]

    Hennig, Sandra A

    Jay A. Hennig, Sandra A. Romero Pinto, Takahiro Yamaguchi, Scott W. Linderman, Naoshige Uchida, and Samuel J. Gershman. Emergence of belief-like representations through reinforcement learning.PLOS Computational Biology, 19, 2023. URLhttps: //api.semanticscholar.org/CorpusID:258051351

  38. [38]

    Hollerman and Wolfram Schultz

    Jeffrey R. Hollerman and Wolfram Schultz. Dopamine neurons report an error in the temporal prediction of reward during learning.Nature Neuroscience, 1:304–309, 1998. URLhttps: //api.semanticscholar.org/CorpusID:7785929

  39. [39]

    Hornak, John P

    J. Hornak, John P. O’Doherty, Jessica Bramham, Edmund T. Rolls, Robin G. Morris, Pe- ter R. Bullock, and C. E. Polkey. Reward-related reversal learning after surgical exci- sions in orbito-frontal or dorsolateral prefrontal cortex in humans.Journal of Cogni- tive Neuroscience, 16:463–478, 2004. URLhttps://api.semanticscholar.org/ CorpusID:132678

  40. [40]

    Hospedales, Antreas Antoniou, Paul Micaelli, and Amos J

    Timothy M. Hospedales, Antreas Antoniou, Paul Micaelli, and Amos J. Storkey. Meta- learning in neural networks: A survey.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 44:5149–5169, 2020. URLhttps://api.semanticscholar. org/CorpusID:215744839

  41. [41]

    Active Learning with Partial Feedback

    Peiyun Hu, Zachary Chase Lipton, Anima Anandkumar, and Deva Ramanan. Active learning with partial feedback.ArXiv, abs/1802.07427, 2018. URLhttps://api. semanticscholar.org/CorpusID:3534906. 14

  42. [42]

    Reservoir computing beyond memory- nonlinearity trade-off.Scientific Reports, 7, 2017

    Masanobu Inubushi and Kazuyuki Yoshimura. Reservoir computing beyond memory- nonlinearity trade-off.Scientific Reports, 7, 2017. URLhttps://api. semanticscholar.org/CorpusID:10886282

  43. [43]

    Suda, and Elisabeth A

    Alicia Izquierdo, Robin K. Suda, and Elisabeth A. Murray. Bilateral orbital prefrontal cortex lesions in rhesus monkeys disrupt choices guided by both reward value and re- ward contingency.The Journal of Neuroscience, 24:7540 – 7548, 2004. URLhttps: //api.semanticscholar.org/CorpusID:17542448

  44. [44]

    Yong Sang Jo and Sheri J. Y . Mizumori. Prefrontal regulation of neuronal activity in the ventral tegmental area.Cerebral cortex, 26 10:4057–4068, 2016. URLhttps://api. semanticscholar.org/CorpusID:4875389

  45. [45]

    Littman, and Anthony R

    Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in partially observable stochastic domains.Artif. Intell., 101:99–134, 1998. URLhttps: //api.semanticscholar.org/CorpusID:5613003

  46. [46]

    Park, and John-Dylan Haynes

    Thorsten Kahnt, Jakob Heinzle, Soyoung Q. Park, and John-Dylan Haynes. The neural code of reward anticipation in human orbitofrontal cortex.Proceedings of the National Academy of Sciences, 107:6010 – 6015, 2010. URLhttps://api.semanticscholar.org/ CorpusID:22879670

  47. [47]

    Petzschner, Daniel M

    Yul HR Kang, Frederike H. Petzschner, Daniel M. Wolpert, and Michael N. Shadlen. Piercing of consciousness as a threshold-crossing operation.Current Biology, 27:2285 – 2295.e6,

  48. [48]

    URLhttps://api.semanticscholar.org/CorpusID:27618011

  49. [49]

    How goals affect information seeking

    Gili Karni, Yael Niv, and Nathaniel Daw. How goals affect information seeking. InProceed- ings of the Annual Meeting of the Cognitive Science Society, volume 47, 2025

  50. [50]

    Cueva, Daphna Shohamy, Greg Jensen, Xue-Xin Wei, Vincent P

    Kenneth Kay, Natalie Biderman, Ramin Khajeh, Manuel Beiran, Christopher J. Cueva, Daphna Shohamy, Greg Jensen, Xue-Xin Wei, Vincent P. Ferrera, and L.F. Abbott. Emer- gent neural dynamics and geometry for generalization in a transitive inference task.PLOS Computational Biology, 20, 2023. URLhttps://api.semanticscholar.org/ CorpusID:260381252

  51. [51]

    Roozbeh Kiani and Michael N. Shadlen. Representation of confidence associated with a decision by neurons in the parietal cortex.Science, 324:759 – 764, 2009. URLhttps: //api.semanticscholar.org/CorpusID:11581812

  52. [52]

    Knudsen and Joni D

    Eric B. Knudsen and Joni D. Wallis. Closed-loop theta stimulation in the or- bitofrontal cortex prevents reward-based learning.Neuron, 2020. URLhttps://api. semanticscholar.org/CorpusID:212644121

  53. [53]

    Artemy Kolchinsky and David H. Wolpert. Semantic information, autonomous agency and non-equilibrium statistical physics.Interface Focus, 8, 2018. URLhttps://api. semanticscholar.org/CorpusID:53566383

  54. [54]

    Lake and Marco Baroni

    Brenden M. Lake and Marco Baroni. Human-like systematic generalization through a meta-learning neural network.Nature, 623:115 – 121, 2023. URLhttps://api. semanticscholar.org/CorpusID:264489248

  55. [55]

    Mc- Clelland

    Andrew Kyle Lampinen, Martin Engelcke, Yuxuan Li, Arslan Chaudhry, and James L. Mc- Clelland. Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences. 2025. URLhttps://api.semanticscholar.org/ CorpusID:281410976

  56. [56]

    Goal-directed navi- gation in humans and deep reinforcement learning agents relies on an adaptive mix of vector-based and transition-based strategies.PLOS Biology, 23, 2025

    Denis C L Lan, Laurence T Hunt, and Christopher Summerfield. Goal-directed navi- gation in humans and deep reinforcement learning agents relies on an adaptive mix of vector-based and transition-based strategies.PLOS Biology, 23, 2025. URLhttps: //api.semanticscholar.org/CorpusID:280389881. 15

  57. [57]

    Hippocampal and orbitofrontal neurons contribute to com- plementary aspects of associative structure.Nature Communications, 15, 2024

    Huixin Lin and Jingfeng Zhou. Hippocampal and orbitofrontal neurons contribute to com- plementary aspects of associative structure.Nature Communications, 15, 2024. URL https://api.semanticscholar.org/CorpusID:270638438

  58. [58]

    Daniel J. Lodge. The medial prefrontal and orbitofrontal cortices differentially regulate dopamine system function.Neuropsychopharmacology, 36:1227–1236, 2011. URLhttps: //api.semanticscholar.org/CorpusID:28747941

  59. [59]

    Shenoy, and William T

    Valerio Mante, David Sussillo, Krishna V . Shenoy, and William T. Newsome. Context- dependent computation by recurrent dynamics in prefrontal cortex.Nature, 503:78 – 84,

  60. [60]

    URLhttps://api.semanticscholar.org/CorpusID:4450696

  61. [61]

    Context- dependent computation by recurrent dynamics in prefrontal cortex.nature, 503(7474):78–84, 2013

    Valerio Mante, David Sussillo, Krishna V Shenoy, and William T Newsome. Context- dependent computation by recurrent dynamics in prefrontal cortex.nature, 503(7474):78–84, 2013

  62. [62]

    Kerry McAlonan and Verity J. Brown. Orbital prefrontal cortex mediates reversal learning and not attentional set shifting in the rat.Behavioural Brain Research, 146:97–103, 2003. URLhttps://api.semanticscholar.org/CorpusID:11359123

  63. [63]

    Cognitive model discovery via disentangled rnns.Advances in Neural Information Processing Systems, 36: 61377–61394, 2023

    Kevin Miller, Maria Eckstein, Matt Botvinick, and Zeb Kurth-Nelson. Cognitive model discovery via disentangled rnns.Advances in Neural Information Processing Systems, 36: 61377–61394, 2023

  64. [64]

    Walter Mischel and Ebbe B. Ebbesen. Attention in delay of gratification.Jour- nal of Personality and Social Psychology, 16:329–337, 1970. URLhttps://api. semanticscholar.org/CorpusID:53464175

  65. [65]

    Bouffard, Laura A

    Eda Mizrak, Nichole R. Bouffard, Laura A. Libby, Erie D. Boorman, and Charan Ranganath. The hippocampus and orbitofrontal cortex jointly represent task structure during memory- guided decision making.Cell reports, 37:110065 – 110065, 2021. URLhttps://api. semanticscholar.org/CorpusID:244792239

  66. [66]

    George E. Monahan. State of the art—a survey of partially observable markov decision processes: Theory, models, and algorithms.Management Science, 28:1–16, 1982. URL https://api.semanticscholar.org/CorpusID:123582406

  67. [67]

    Muhle-Karbe, Hannah Sheahan, Giovanni Pezzulo, Hugo J

    Paul S. Muhle-Karbe, Hannah Sheahan, Giovanni Pezzulo, Hugo J. Spiers, Samson Chien, Nicolas W. Schuck, and Christopher Summerfield. Goal-seeking compresses neural codes for space in the human hippocampus and orbitofrontal cortex.Neuron, 111:3885–3899.e6,

  68. [68]

    URLhttps://api.semanticscholar.org/CorpusID:255850293

  69. [69]

    Bouffard, Laura A

    Eda Mızrak, Nichole R. Bouffard, Laura A. Libby, Erie D. Boorman, and Charan Ranganath. The hippocampus and orbitofrontal cortex jointly represent task structure during memory- guided decision making.Cell reports, 37:110065 – 110065, 2021. URLhttps://api. semanticscholar.org/CorpusID:244792239

  70. [70]

    Namboodiri, James M

    Vijay Mohan K. Namboodiri, James M. Otis, Kay van Heeswijk, Elisa S V oets, Rizk A. Alghorazi, Jose Rodr ´ıguez-Romaguera, Stefan Mihalas, and Garret D. Stuber. Single-cell activity tracking reveals that orbitofrontal neurons acquire and maintain a long-term memory to guide behavioral adaptation.Nature neuroscience, 22:1110 – 1121, 2019. URLhttps: //api.s...

  71. [71]

    Gershman, Yuan Chang Leong, Angela Radulescu, and Robert C

    Yael Niv, Reka Daniel, Andra Geana, Samuel J. Gershman, Yuan Chang Leong, Angela Radulescu, and Robert C. Wilson. Reinforcement learning in multidimensional environments relies on attention mechanisms.The Journal of Neuroscience, 35:8145 – 8157, 2015. URL https://api.semanticscholar.org/CorpusID:18446484. 16

  72. [72]

    Meta-learning of Sequential Strategies

    Pedro A. Ortega, Jane X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Raz- van Pascanu, Nicolas Manfred Otto Heess, Joel Veness, Alexander Pritzel, Pablo Sprech- mann, Siddhant M. Jayakumar, Tom McGrath, Kevin J. Miller, Mohammad Gheshlaghi Azar, Ian Osband, Neil C. Rabinowitz, Andr ´as Gy ¨orgy, Silvia Chiappa, Simon Osindero, Yee Whye Teh, H. V ....

  73. [73]

    Range-adapting representation of economic value in the or- bitofrontal cortex.The Journal of Neuroscience, 29:14004 – 14014, 2009

    Camillo Padoa-Schioppa. Range-adapting representation of economic value in the or- bitofrontal cortex.The Journal of Neuroscience, 29:14004 – 14014, 2009. URLhttps: //api.semanticscholar.org/CorpusID:7643973

  74. [74]

    The representation of economic value in the orbitofrontal cortex is invariant for changes of menu.Nature Neuroscience, 11:95–102, 2008

    Camillo Padoa-Schioppa and John A Assad. The representation of economic value in the orbitofrontal cortex is invariant for changes of menu.Nature Neuroscience, 11:95–102, 2008. URLhttps://api.semanticscholar.org/CorpusID:901185

  75. [75]

    Gabriel Pelletier and Lesley K. Fellows. A critical role for human ventromedial frontal lobe in value comparison of complex objects based on attribute configuration.The Journal of Neuroscience, 39:4124 – 4132, 2019. URLhttps://api.semanticscholar.org/ CorpusID:76659569

  76. [76]

    Evidence accumulation relates to perceptual consciousness and moni- toring.Nature Communications, 12, 2021

    Michael Pereira, Pierre M ´egevand, Mi Xue Tan, Wenwen Chang, Shuo Wang, Ali Rezai, Margitta Seeck, Marco Vincenzo Corniola, Shahan Momjian, Fosco Bernasconi, Olaf Blanke, and Nathan Faivre. Evidence accumulation relates to perceptual consciousness and moni- toring.Nature Communications, 12, 2021. URLhttps://api.semanticscholar. org/CorpusID:235268827

  77. [77]

    SimpleBench: The Text Benchmark in which Unspecialized Human Performance Exceeds that of Current Frontier Models.https://simple-bench.com/, October 2024

    Philip and Hemang. SimpleBench: The Text Benchmark in which Unspecialized Human Performance Exceeds that of Current Frontier Models.https://simple-bench.com/, October 2024. Technical Report

  78. [78]

    Preuss and Steven P

    Todd M. Preuss and Steven P. Wise. Evolution of prefrontal cortex.Neuropsychopharma- cology, 47:3–19, 2021. URLhttps://api.semanticscholar.org/CorpusID: 236940889

  79. [79]

    Rosas, Andrea I

    Alexandra Proca, Fernando E. Rosas, Andrea I. Luppi, Daniel Bor, Matthew Crosby, and Pedro A. M. Mediano. Synergistic information supports modality integration and flexible learning in neural networks solving multiple tasks.PLOS Computational Biology, 20, 2024. URLhttps://api.semanticscholar.org/CorpusID:252734834

  80. [80]

    Forming cognitive maps for abstract spaces: the roles of the human hip- pocampus and orbitofrontal cortex.Communications Biology, 7, 2024

    Yidan Qiu, Huakang Li, Jiajun Liao, Kemeng Chen, Xiaoyan Wu, Bingyi Liu, and Rui- wang Huang. Forming cognitive maps for abstract spaces: the roles of the human hip- pocampus and orbitofrontal cortex.Communications Biology, 7, 2024. URLhttps: //api.semanticscholar.org/CorpusID:269499660

Showing first 80 references.