pith. machine review for the scientific record. sign in

arxiv: 2604.02348 · v1 · submitted 2026-02-17 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Contextual Intelligence The Next Leap for Reinforcement Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:51 UTC · model grok-4.3

classification 💻 cs.LG
keywords contextual reinforcement learningallogenic contextsautogenic contextszero-shot generalizationmulti-time-scale modelingcontext taxonomyRL safetyreal-world transfer
0
0 comments X

The pith

Reinforcement learning agents need context treated as a first-class modeling primitive that separates environment-imposed factors from agent-driven ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that current contextual RL treats context as a single static input, which limits how well policies transfer to new situations. It proposes splitting contexts into allogenic types set by the environment and autogenic types generated by the agent itself, then outlines three directions: learning across mixed context types, modeling variables that change at different speeds, and adding abstract descriptors such as roles or uncertainties. If these steps succeed, agents would reason explicitly about their own identity, the constraints the world places on them, and how both shift over time. This matters because standard RL policies often fail outside their training distribution, restricting practical use in robotics or control tasks.

Core claim

We propose a taxonomy that divides contexts into allogenic factors imposed by the environment and autogenic factors produced by the agent, then identify three necessary research directions: learning with heterogeneous contexts so agents can model mutual influences, multi-time-scale modeling to handle slow-changing allogenic variables separately from fast-changing autogenic ones, and integration of abstract high-level contexts such as roles and regulatory regimes. Treating context as a first-class primitive in this way would let agents reason about who they are, what the world permits, and how both evolve.

What carries the argument

The taxonomy that splits allogenic (environment-imposed) from autogenic (agent-driven) contexts, which supplies the structure for the three research directions.

If this is right

  • Agents can explicitly track how their actions alter the world and how the world alters them through the taxonomy levels.
  • Separate learning mechanisms become appropriate for slowly evolving allogenic variables versus rapidly changing autogenic variables within an episode.
  • Inclusion of abstract contexts such as roles and uncertainties produces behavior that respects non-physical constraints.
  • Zero-shot generalization improves because agents no longer treat context as a monolithic input.
  • Real-world deployment becomes safer and more efficient once agents maintain consistent reasoning across changing conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same separation could be tested in hierarchical or meta-learning setups to see whether adaptation speed increases when context types are distinguished.
  • Agents built this way might maintain stable behavior when regulatory regimes or resource limits shift, a property not directly tested in current benchmarks.
  • The approach suggests a concrete experiment: compare generalization curves on environments where allogenic factors are held fixed while autogenic factors vary within episodes.

Load-bearing premise

That explicitly separating allogenic from autogenic contexts and following the three research directions will produce agents with substantially better zero-shot generalization and real-world safety than existing contextual RL methods.

What would settle it

Training agents on the proposed taxonomy and directions and measuring no measurable gain in zero-shot transfer success or safety metrics relative to standard contextual RL baselines on the same tasks.

Figures

Figures reproduced from arXiv: 2604.02348 by Andr\'e Biedenkapp.

Figure 1
Figure 1. Figure 1: Schematic of a cMDP with common state & action [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
read the original abstract

Reinforcement learning (RL) has produced spectacular results in games, robotics, and continuous control. Yet, despite these successes, learned policies often fail to generalize beyond their training distribution, limiting real-world impact. Recent work on contextual RL (cRL) shows that exposing agents to environment characteristics -- contexts -- can improve zero-shot transfer. So far, the community has treated context as a monolithic, static observable, an approach that constrains the generalization capabilities of RL agents. To achieve contextual intelligence we first propose a novel taxonomy of contexts that separates allogenic (environment-imposed) from autogenic (agent-driven) factors. We identify three fundamental research directions that must be addressed to promote truly contextual intelligence: (1) Learning with heterogeneous contexts to explicitly exploit the taxonomy levels so agents can reason about their influence on the world and vice versa; (2) Multi-time-scale modeling to recognize that allogenic variables evolve slowly or remain static, whereas autogenic variables may change within an episode, potentially requiring different learning mechanisms; (3) Integration of abstract, high-level contexts to incorporate roles, resource & regulatory regimes, uncertainties, and other non-physical descriptors that crucially influence behavior. We envision context as a first-class modeling primitive, empowering agents to reason about who they are, what the world permits, and how both evolve over time. By doing so, we aim to catalyze a new generation of context-aware agents that can be deployed safely and efficiently in the real world.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript is a position paper arguing that current contextual reinforcement learning (cRL) treats context as a monolithic, static observable, limiting zero-shot generalization. It proposes a novel taxonomy distinguishing allogenic (environment-imposed) from autogenic (agent-driven) contexts and outlines three research directions: (1) learning with heterogeneous contexts to exploit taxonomy levels, (2) multi-time-scale modeling to handle differing evolution rates, and (3) integration of abstract contexts such as roles and regulatory regimes. The vision is to elevate context to a first-class primitive enabling agents to reason about self, world constraints, and temporal evolution for safer, more transferable real-world deployment.

Significance. If the taxonomy and directions yield concrete algorithms with measurable gains in generalization and safety, the work could meaningfully shift RL research toward more structured context handling. The absence of any empirical results, formal definitions, derivations, or illustrative examples means the significance remains aspirational and depends entirely on subsequent validation.

major comments (2)
  1. [Abstract and taxonomy introduction] Abstract and § on taxonomy: The claim that explicitly separating allogenic from autogenic contexts will produce substantially improved zero-shot generalization and safety is load-bearing yet unsupported; no concrete example, formal distinction, or comparison to existing cRL methods (e.g., context as state augmentation) is provided to show why the split is non-trivial or beneficial.
  2. [Three research directions] Research directions section: The multi-time-scale modeling direction asserts that allogenic variables evolve slowly while autogenic may change within an episode, but offers no mechanism, algorithm sketch, or learning rule to operationalize this difference, leaving the feasibility of the proposal unaddressed.
minor comments (2)
  1. [Introduction] The manuscript would benefit from a short related-work subsection contrasting the proposed taxonomy with prior contextual RL surveys or frameworks to clarify novelty.
  2. [Taxonomy] Terminology: 'Allogenic' and 'autogenic' are used without citation to their origins in other fields; adding one or two references would improve accessibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our position paper. The comments highlight important areas where the vision can be clarified and strengthened. We address each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract and taxonomy introduction] Abstract and § on taxonomy: The claim that explicitly separating allogenic from autogenic contexts will produce substantially improved zero-shot generalization and safety is load-bearing yet unsupported; no concrete example, formal distinction, or comparison to existing cRL methods (e.g., context as state augmentation) is provided to show why the split is non-trivial or beneficial.

    Authors: We acknowledge that the manuscript, as a position paper, presents the taxonomy as a conceptual foundation rather than an empirically validated claim. The distinction aims to capture that allogenic contexts (e.g., fixed environmental parameters like gravity or regulatory constraints) are typically invariant within an episode, while autogenic contexts (e.g., agent-internal goals or learned representations) can evolve dynamically. This separation is intended to enable agents to reason differently about external constraints versus self-generated factors, potentially improving transfer. We will revise the taxonomy section to include a concrete illustrative example (such as a robotic navigation task with fixed terrain properties versus variable task objectives) and a brief comparison to standard context-as-state-augmentation baselines to clarify why the split is non-trivial. Full empirical demonstration of generalization gains is beyond the scope of this vision paper and is flagged as future work. revision: partial

  2. Referee: [Three research directions] Research directions section: The multi-time-scale modeling direction asserts that allogenic variables evolve slowly while autogenic may change within an episode, but offers no mechanism, algorithm sketch, or learning rule to operationalize this difference, leaving the feasibility of the proposal unaddressed.

    Authors: We agree that feasibility would benefit from an operational sketch. In the revised manuscript we will expand the multi-time-scale direction with a high-level conceptual mechanism, for example by suggesting the use of separate temporal abstraction layers or timescale-specific encoders (drawing on ideas from hierarchical RL and multi-timescale recurrent models) that allow the agent to maintain slow-updating representations for allogenic factors and faster adaptation for autogenic ones. This will be presented as an illustrative direction rather than a complete algorithm, consistent with the position-paper format. revision: yes

Circularity Check

0 steps flagged

No significant circularity: position paper with no derivations

full rationale

The manuscript is a position paper proposing a context taxonomy (allogenic vs. autogenic) and three high-level research directions. It contains no equations, fitted parameters, formal derivations, proofs, or algorithms. The central claims are aspirational visions for future contextual RL agents rather than assertions whose correctness reduces to self-citation chains or definitions by construction. No load-bearing step equates a prediction to its own inputs or imports uniqueness from prior author work as an unverified premise. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central vision rests on the domain assumption that current contextual RL is limited by monolithic context treatment and that the proposed split plus three directions will overcome generalization failures; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Current contextual RL treats context as a monolithic, static observable
    Stated directly in the abstract as the constraint on generalization capabilities.
  • ad hoc to paper Separating allogenic from autogenic contexts and addressing heterogeneous, multi-time-scale, and abstract contexts will improve agent reasoning and transfer
    Core premise of the proposed taxonomy and three research directions.

pith-pipeline@v0.9.0 · 5554 in / 1281 out tokens · 36298 ms · 2026-05-15T21:51:56.906740+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 3 internal anchors

  1. [1]

    Abdolshah, H

    M. Abdolshah, H. Le, T. K. George, S. Gupta, S. Rana, and S. Venkatesh. 2021. A New Representation of Successor Features for Transfer across Dissimilar Environments. InProceedings of the 38th International Conference on Machine Learning (ICML’21) (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 1–9

  2. [2]

    S. V. Albrecht, F. Christianos, and L. Schäfer. 2024.Multi-agent reinforcement learning: Foundations and modern approaches. MIT Press

  3. [3]

    Andrychowicz, B

    M. Andrychowicz, B. Baker, M. Chociej, R. Józefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba. 2020. Learning dexterous in-hand manipulation. International Journal of Robotics Research39, 1 (2020)

  4. [4]

    J. Beck, R. Vuorio, E. Z. Liu, Z. Xiong, L. M. Zintgraf, C. Finn, and S. Whiteson

  5. [5]

    Trends Mach

    A Tutorial on Meta-Reinforcement Learning.Found. Trends Mach. Learn. 18, 2-3 (2025), 224–384. https://doi.org/10.1561/2200000080

  6. [6]

    M. G. Bellemare, S. Candido, P. Samuel Castro, J. Gong, M. C. Machado, S. Moitra, S. S. Ponda, and Z. Wang. 2020. Autonomous navigation of stratospheric balloons using reinforcement learning.Nature588, 7836 (2020), 77–82

  7. [7]

    R. Bellman. 1957. A Markovian decision process.Journal of Mathematics and Mechanics(1957), 679–684

  8. [8]

    Benad, F

    J. Benad, F. Röder, M. V. Butz, and M. Eppe. 2025. Shared dynamic model aligned hypernetworks for contextual reinforcement learning. InEighteenth European Workshop on Reinforcement Learning. https://openreview.net/forum? id=6gdvQqkFKT

  9. [9]

    Benjamins, T

    C. Benjamins, T. Eimer, F. Schubert, A. Mohan, S. Döhler, A. Biedenkapp, B. Rosenhan, F. Hutter, and M. Lindauer. 2023. Contextualize Me – The Case for Context in Reinforcement Learning.Transactions on Machine Learning Research (2023)

  10. [10]

    Beukman, D

    M. Beukman, D. Jarvis, R. Klein, S. James, and B. Rosman. 2023. Dynamics Gen- eralisation in Reinforcement Learning via Adaptive Context-Aware Policies. In Proceedings of the 36th International Conference on Advances in Neural Information Processing Systems (NeurIPS’23), A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.). Curran Associates

  11. [11]

    Biedenkapp, R

    A. Biedenkapp, R. Rajan, F. Hutter, and M. Lindauer. 2021. TempoRL: Learning When to Act. InProceedings of the 38th International Conference on Machine Learning (ICML’21) (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 914–924

  12. [12]

    Biedenkapp, D

    A. Biedenkapp, D. Speck, S. Sievers, F. Hutter, M. Lindauer, and J. Seipp

  13. [13]

    InWork- shop on Bridging the Gap Between AI Planning and Reinforcement Learning (PRL@ICAPS’22), M

    Learning Domain-Independent Policies for Open List Selection. InWork- shop on Bridging the Gap Between AI Planning and Reinforcement Learning (PRL@ICAPS’22), M. Katz, H. Palacios, and V. Gómez (Eds.)

  14. [14]

    Bordne, M

    P. Bordne, M. A. Hasan, E. Bergman, N. Awad, and A. Biedenkapp. 2024. CANDID DAC: Leveraging Coupled Action Dimensions with Importance Differences in DAC. InProc. of AutoML’24, Workshop Track. https://openreview.net/forum?id= ZCCZYfstkG

  15. [15]

    P. F. Christiano, J. Leike, T. B. Brown, M. Martic, S. Legg, and D. Amodei. 2017. Deep Reinforcement Learning from Human Preferences. InProceedings of the 30th International Conference on Advances in Neural Information Processing Systems (NeurIPS’17), I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett (Eds...

  16. [16]

    Cobbe, C

    K. Cobbe, C. Hesse, J. Hilton, and J. Schulman. 2020. Leveraging Procedural Generation to Benchmark Reinforcement Learning. InProceedings of the 37th In- ternational Conference on Machine Learning (ICML’20), H. Daume III and A. Singh (Eds.), Vol. 98. Proceedings of Machine Learning Research

  17. [17]

    Degrave, F

    J. Degrave, F. Felici, J. Buchli, M. Neunert, B. Tracey, F. Carpanese, T. Ewalds, R. Hafner, A. Abdolmaleki, D. de las Casas, C. Donner, L. Fritz, C. Galperti, A. Huber, J. Keeling, M. Tsimpoukelli, J. Kay, A. Merle, Jean-M. Moret, S. Noury, F. Pesamosca, D. Pfau, O. Sauter, C. Sommariva, S. Coda, B. Duval, A. Fasoli, P. Kohli, K. Kavukcuoglu, D. Hassabis...

  18. [18]

    C. Ding, L. Zhou, Y. Li, and X. Rong. 2020. Locomotion Control of Quadruped Robots With Online Center of Mass Adaptation and Payload Identification.IEEE Access8 (2020), 224578–224587. https://doi.org/10.1109/ACCESS.2020.3044933

  19. [19]

    Eimer, C

    T. Eimer, C. Benjamins, and M. Lindauer. 2021. Hyperparameters in Contextual RL are Highly Situational. InWorkshop on Ecological Theory of Reinforcement Learning (EcoRL@NeurIPS’21)

  20. [20]

    Eimer, A

    T. Eimer, A. Biedenkapp, F. Hutter, and M. Lindauer. 2021. Self-Paced Context Evaluation for Contextual Reinforcement Learning. InProceedings of the 38th International Conference on Machine Learning (ICML’21) (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 2948–2958

  21. [21]

    Engwegen, D

    L. Engwegen, D. Brinks, and W. Boehmer. 2025. Modular Recurrence in Contex- tual MDPs for Universal Morphology Control. InEighteenth European Workshop on Reinforcement Learning. https://openreview.net/forum?id=0fn0ii1njp

  22. [22]

    Evans, A

    B. Evans, A. Thankaraj, and L. Pinto. 2022. Context is Everything: Implicit Identification for Dynamics Adaptation. In2022 International Conference on Robotics and Automation, ICRA. IEEE, 2642–2648

  23. [23]

    Eysenbach, R

    B. Eysenbach, R. R. Salakhutdinov, and S. Levine. 2019. Search on the Replay Buffer: Bridging Planning and Reinforcement Learning. InProceedings of the 32nd International Conference on Advances in Neural Information Processing Systems (NeurIPS’19), H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alche Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran As...

  24. [24]

    Ghosh, J

    D. Ghosh, J. Rahme, A. Kumar, A. Zhang, R. P. Adams, and S. Levine. 2021. Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability. InProceedings of the 34th International Conference on Advances in Neural Information Processing Systems (NeurIPS’21), M. Ranzato, A. Beygelzimer, K. Nguyen, P. Liang, J. Vaughan, and Y. Dauph...

  25. [25]

    Sebastian Griesbach and Carlo D’Eramo. 2025. Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization.Reinforcement Learning Journal6 (2025), 1140–1157

  26. [26]

    Grooten, P

    B. Grooten, P. MacAlpine, K. Subramanian, P. R. Wurman, and P. Stone. 2026. Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy. InProceedings of the Fourtieth AAAI Conference on Artificial Intelligence. AAAI Press

  27. [27]

    Gumbsch, N

    C. Gumbsch, N. Sajid, G. Martius, and M. V. Butz. 2024. Learning Hierarchi- cal World Models with Adaptive Temporal Abstractions from Discrete Latent Dynamics. InThe Twelfth International Conference on Learning Representations (ICLR’24). ICLR. https://openreview.net/forum?id=TjCDNssXKU

  28. [28]

    D. Ha, A. M. Dai, and Q. V. Le. 2017. HyperNetworks. InThe Fifth International Conference on Learning Representations (ICLR’17). ICLR, OpenReview.net

  29. [29]

    Hafner, J

    D. Hafner, J. Pasukonis, J. Ba, and T. P. Lillicrap. 2025. Mastering diverse control tasks through world models.Nat.640, 8059 (2025), 647–653

  30. [30]

    Contextual Markov Decision Processes

    A. Hallak, D. Di Castro, and S. Mannor. 2015. Contextual Markov Decision Processes.arXiv:1502.02259 [stat.ML](2015)

  31. [31]

    2019.Ethics guidelines for trustworthy AI

    High-Level Expert Group on AI. 2019.Ethics guidelines for trustworthy AI. Report. European Commission. https://ec.europa.eu/digital-single-market/en/news/ ethics-guidelines-trustworthy-ai

  32. [32]

    Iannotta, Y

    M. Iannotta, Y. Yang, J. A. Stork, E. Schaffernicht, and T. Stoyanov. 2025. Can Context Bridge the Reality Gap? Sim-to-Real Transfer of Context-Aware Policies. arXiv:2511.04249 [cs.RO](2025). https://doi.org/10.48550/arXiv.2511.04249

  33. [33]

    Kaufmann, L

    E. Kaufmann, L. Bauersfeld, A. Loquercio, M. Müller, V. Koltun, and D. Scara- muzza. 2023. Champion-level drone racing using deep reinforcement learning. Nat.620, 7976 (2023), 982–987. https://doi.org/10.1038/S41586-023-06419-4

  34. [34]

    R. Kirk, A. Zhang, E. Grefenstette, and T. Rocktäschel. 2023. A Survey of Zero-shot Generalisation in Deep Reinforcement Learning.Journal of Artificial Intelligence Research (JAIR)76 (2023), 201–264

  35. [35]

    Klink, C

    P. Klink, C. D’Eramo, J. Peters, and J. Pajarinen. 2020. Self-Paced Deep Reinforce- ment Learning. InProceedings of the 33rd International Conference on Advances in Neural Information Processing Systems (NeurIPS’20), H. Larochelle, M. Ranzato, R. Hadsell, M.-F. Balcan, and H. Lin (Eds.). Curran Associates, 9216–9227

  36. [36]

    Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

    S. Levine, A. Kumar, G. Tucker, and J. Fu. 2020. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems.arXiv:2005.01643 [cs.LG] abs/2005.01643 (2020)

  37. [37]

    V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. 2015. Human-level control through deep reinforcement learning.Nature518, 7540 (26 02 2015), 529–533

  38. [38]

    A. Modi, N. Jiang, S. Singh, and A. Tewari. 2018. Markov Decision Processes With Continuous Side Information. InAlgorithmic Learning Theory (ALT’18), Vol. 83. 597–618

  39. [39]

    Mohan, A

    A. Mohan, A. Zhang, and M. Lindauer. 2024. Structure in Deep Reinforcement Learning: A Survey and Open Problems.Journal of Artificial Intelligence Research 79 (2024)

  40. [40]

    Camaret Ndir, A

    T. Camaret Ndir, A. Biedenkapp, and N. Awad. 2024. Inferring Behavior- Specific Context Improves Zero-Shot Generalization in Reinforcement Learn- ing. InSeventeenth European Workshop on Reinforcement Learning. https: //openreview.net/forum?id=51XSWH0mgN

  41. [41]

    Panaganti, Z

    K. Panaganti, Z. Xu, D. Kalathil, and M. Ghavamzadeh. 2022. Robust Rein- forcement Learning using Offline Data. InProceedings of the 35th International Conference on Advances in Neural Information Processing Systems (NeurIPS’22), S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). Curran Associates

  42. [42]

    Bin Peng, M

    X. Bin Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. 2018. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. InProc. of ICRA’18. IEEE, 1–8

  43. [43]

    Perez, F

    C. Perez, F. P. Such, and T. Karaletsos. 2020. Generalized hidden parameter mdps: Transferable model-based rl in a handful of trials. InProceedings of the AAAI Conference on Artificial Intelligence, F. Rossi, V. Conitzer, and F. Sha (Eds.). Association for the Advancement of Artificial Intelligence, AAAI Press, 5403– 5411

  44. [44]

    Prasanna, K

    S. Prasanna, K. Farid, R. Rajan, and A. Biedenkapp. 2024. Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization.Re- inforcement Learning Journal1 (2024)

  45. [45]

    Rajan, J

    R. Rajan, J. Diaz, S. Guttikonda, F. Ferreira, A. Biedenkapp, J. Ole von H., and F. Hutter. 2023. MDP Playground: An Analysis and Debug Testbed for Rein- forcement Learning.Journal of Artificial Intelligence Research (JAIR)77 (2023), 821–890

  46. [46]

    S. Reed, K. Zolna, E. Parisotto, S. Gómez C., A. Novikov, G. Barth-maron, M. Giménez, Y. Sulsky, J. Kay, J. T. Springenberg, T. Eccles, J. Bruce, A. Razavi, A. Edwards, N. Heess, Y. Chen, R. Hadsell, O. Vinyals, M. Bordbar, and N. de Freitas. 2022. A Generalist Agent.Transactions on Machine Learning Research (2022). https://openreview.net/forum?id=1ikK0kH...

  47. [47]

    S. Russell. 2022. Human-Compatible Artificial Intelligence. InHuman-Like Machine Intelligence, S. H. Muggleton and N. Chater (Eds.). Oxford University Press, 3–23

  48. [48]

    Saxena, J

    V. Saxena, J. Ba, and D. Hafner. 2021. Clockwork Variational Autoencoders. In Proceedings of the 34th International Conference on Advances in Neural Information Processing Systems (NeurIPS’21), M. A. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan (Eds.). Curran Associates, 29246–29257

  49. [49]

    Sodhani, A

    S. Sodhani, A. Zhang, and J. Pineau. 2021. Multi-Task Reinforcement Learning with Context-based Representations. InProceedings of the 38th International Conference on Machine Learning (ICML’21) (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 9767–9779

  50. [50]

    R. S. Sutton and A. G. Barto. 2018.Reinforcement learning: An introduction(2 ed.). MIT Press

  51. [51]

    Tobin, R

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. 2017. Domain randomization for transferring deep neural networks from simulation to the real world. InProc. of (IROS’17). IEEE, 23–30

  52. [52]

    Vinyals, I

    O. Vinyals, I. Babuschkin, W. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. Agapiou, M. Jaderberg, A. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. Paine, C. Gulcehre, Z. Wand, T. Pfaff, Y. Wu, R. Ring, ...

  53. [53]

    Nature7782 (2019), 350–354

    Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature7782 (2019), 350–354

  54. [54]

    J. Wang, M. King, N. Porcel, Z. Kurth-Nelson, T. Zhu, C. Deck, P. Choy, M. Cassin, M. Reynolds, H. F. Song, G. Buttimore, D. P. Reichert, N. C. Rabinowitz, L. Matthey, D. Hassabis, A. Lerchner, and M. M. Botvinick. 2021. Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents. InProceedings of the 34th International Conference on ...

  55. [55]

    P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subramanian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchs, L. Gilpin, P. Khandelwal, V. Raj Kompella, H. Lin, P. MacAlpine, D. Oller, T. Seno, C. Sherstan, M. D. Thomure, H. Aghabozorgi, L. Barrett, R. Douglas, D. Whitehead, P. Dürr, P. Stone, M. Spranger, and H. Kitano. 2022. Outracin...

  56. [56]

    Yarats, R

    D. Yarats, R. Fergus, A. Lazaric, and L. Pinto. 2021. Reinforcement Learning with Prototypical Representations. InProceedings of the 38th International Con- ference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 11920–11931

  57. [57]

    W. Yu, J. Tan, C. K. Liu, and G. Turk. 2017. Preparing for the unknown: Learning a universal policy with online system identification.arXiv preprint arXiv:1702.02453 (2017)

  58. [58]

    W. Yu, J. Tan, C. K. Liu, and G. Turk. 2017. Preparing for the Unknown: Learning a Universal Policy with Online System Identification. InRobotics: Science and Systems XIII

  59. [59]

    Zhang, S

    A. Zhang, S. Sodhani, K. Khetarpal, and J. Pineau. 2021. Learning Robust State Abstractions for Hidden-parameter Block MDPs. InThe Ninth International Conference on Learning Representations (ICLR’21). ICLR

  60. [60]

    Zhang, H

    H. Zhang, H. Chen, C. Xiao, B. Li, M. Liu, D. S. Boning, and C.-J. Hsieh. 2020. Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations. InProceedings of the 33rd International Conference on Advances in Neural Information Processing Systems (NeurIPS’20), H. Larochelle, M. Ranzato, R. Hadsell, M.-F. Balcan, and H.-T. Lin ...

  61. [61]

    W. Zhou, L. Pinto, and A. Gupta. 2019. Environment Probing Interaction Policies. InThe Seventh International Conference on Learning Representations (ICLR’19). ICLR, OpenReview.net. APPENDIX On the Relativity of Context.We ultimately believe that context is relative. The frame of reference determines if a context is allogenic, autogenic, or somewhere in be...