arxiv: 2604.02348 · v1 · submitted 2026-02-17 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Contextual Intelligence The Next Leap for Reinforcement Learning

Andr\'e Biedenkapp

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:51 UTC · model grok-4.3

classification 💻 cs.LG

keywords contextual reinforcement learningallogenic contextsautogenic contextszero-shot generalizationmulti-time-scale modelingcontext taxonomyRL safetyreal-world transfer

0 comments

The pith

Reinforcement learning agents need context treated as a first-class modeling primitive that separates environment-imposed factors from agent-driven ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that current contextual RL treats context as a single static input, which limits how well policies transfer to new situations. It proposes splitting contexts into allogenic types set by the environment and autogenic types generated by the agent itself, then outlines three directions: learning across mixed context types, modeling variables that change at different speeds, and adding abstract descriptors such as roles or uncertainties. If these steps succeed, agents would reason explicitly about their own identity, the constraints the world places on them, and how both shift over time. This matters because standard RL policies often fail outside their training distribution, restricting practical use in robotics or control tasks.

Core claim

We propose a taxonomy that divides contexts into allogenic factors imposed by the environment and autogenic factors produced by the agent, then identify three necessary research directions: learning with heterogeneous contexts so agents can model mutual influences, multi-time-scale modeling to handle slow-changing allogenic variables separately from fast-changing autogenic ones, and integration of abstract high-level contexts such as roles and regulatory regimes. Treating context as a first-class primitive in this way would let agents reason about who they are, what the world permits, and how both evolve.

What carries the argument

The taxonomy that splits allogenic (environment-imposed) from autogenic (agent-driven) contexts, which supplies the structure for the three research directions.

If this is right

Agents can explicitly track how their actions alter the world and how the world alters them through the taxonomy levels.
Separate learning mechanisms become appropriate for slowly evolving allogenic variables versus rapidly changing autogenic variables within an episode.
Inclusion of abstract contexts such as roles and uncertainties produces behavior that respects non-physical constraints.
Zero-shot generalization improves because agents no longer treat context as a monolithic input.
Real-world deployment becomes safer and more efficient once agents maintain consistent reasoning across changing conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separation could be tested in hierarchical or meta-learning setups to see whether adaptation speed increases when context types are distinguished.
Agents built this way might maintain stable behavior when regulatory regimes or resource limits shift, a property not directly tested in current benchmarks.
The approach suggests a concrete experiment: compare generalization curves on environments where allogenic factors are held fixed while autogenic factors vary within episodes.

Load-bearing premise

That explicitly separating allogenic from autogenic contexts and following the three research directions will produce agents with substantially better zero-shot generalization and real-world safety than existing contextual RL methods.

What would settle it

Training agents on the proposed taxonomy and directions and measuring no measurable gain in zero-shot transfer success or safety metrics relative to standard contextual RL baselines on the same tasks.

Figures

Figures reproduced from arXiv: 2604.02348 by Andr\'e Biedenkapp.

read the original abstract

Reinforcement learning (RL) has produced spectacular results in games, robotics, and continuous control. Yet, despite these successes, learned policies often fail to generalize beyond their training distribution, limiting real-world impact. Recent work on contextual RL (cRL) shows that exposing agents to environment characteristics -- contexts -- can improve zero-shot transfer. So far, the community has treated context as a monolithic, static observable, an approach that constrains the generalization capabilities of RL agents. To achieve contextual intelligence we first propose a novel taxonomy of contexts that separates allogenic (environment-imposed) from autogenic (agent-driven) factors. We identify three fundamental research directions that must be addressed to promote truly contextual intelligence: (1) Learning with heterogeneous contexts to explicitly exploit the taxonomy levels so agents can reason about their influence on the world and vice versa; (2) Multi-time-scale modeling to recognize that allogenic variables evolve slowly or remain static, whereas autogenic variables may change within an episode, potentially requiring different learning mechanisms; (3) Integration of abstract, high-level contexts to incorporate roles, resource & regulatory regimes, uncertainties, and other non-physical descriptors that crucially influence behavior. We envision context as a first-class modeling primitive, empowering agents to reason about who they are, what the world permits, and how both evolve over time. By doing so, we aim to catalyze a new generation of context-aware agents that can be deployed safely and efficiently in the real world.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript is a position paper arguing that current contextual reinforcement learning (cRL) treats context as a monolithic, static observable, limiting zero-shot generalization. It proposes a novel taxonomy distinguishing allogenic (environment-imposed) from autogenic (agent-driven) contexts and outlines three research directions: (1) learning with heterogeneous contexts to exploit taxonomy levels, (2) multi-time-scale modeling to handle differing evolution rates, and (3) integration of abstract contexts such as roles and regulatory regimes. The vision is to elevate context to a first-class primitive enabling agents to reason about self, world constraints, and temporal evolution for safer, more transferable real-world deployment.

Significance. If the taxonomy and directions yield concrete algorithms with measurable gains in generalization and safety, the work could meaningfully shift RL research toward more structured context handling. The absence of any empirical results, formal definitions, derivations, or illustrative examples means the significance remains aspirational and depends entirely on subsequent validation.

major comments (2)

[Abstract and taxonomy introduction] Abstract and § on taxonomy: The claim that explicitly separating allogenic from autogenic contexts will produce substantially improved zero-shot generalization and safety is load-bearing yet unsupported; no concrete example, formal distinction, or comparison to existing cRL methods (e.g., context as state augmentation) is provided to show why the split is non-trivial or beneficial.
[Three research directions] Research directions section: The multi-time-scale modeling direction asserts that allogenic variables evolve slowly while autogenic may change within an episode, but offers no mechanism, algorithm sketch, or learning rule to operationalize this difference, leaving the feasibility of the proposal unaddressed.

minor comments (2)

[Introduction] The manuscript would benefit from a short related-work subsection contrasting the proposed taxonomy with prior contextual RL surveys or frameworks to clarify novelty.
[Taxonomy] Terminology: 'Allogenic' and 'autogenic' are used without citation to their origins in other fields; adding one or two references would improve accessibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our position paper. The comments highlight important areas where the vision can be clarified and strengthened. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract and taxonomy introduction] Abstract and § on taxonomy: The claim that explicitly separating allogenic from autogenic contexts will produce substantially improved zero-shot generalization and safety is load-bearing yet unsupported; no concrete example, formal distinction, or comparison to existing cRL methods (e.g., context as state augmentation) is provided to show why the split is non-trivial or beneficial.

Authors: We acknowledge that the manuscript, as a position paper, presents the taxonomy as a conceptual foundation rather than an empirically validated claim. The distinction aims to capture that allogenic contexts (e.g., fixed environmental parameters like gravity or regulatory constraints) are typically invariant within an episode, while autogenic contexts (e.g., agent-internal goals or learned representations) can evolve dynamically. This separation is intended to enable agents to reason differently about external constraints versus self-generated factors, potentially improving transfer. We will revise the taxonomy section to include a concrete illustrative example (such as a robotic navigation task with fixed terrain properties versus variable task objectives) and a brief comparison to standard context-as-state-augmentation baselines to clarify why the split is non-trivial. Full empirical demonstration of generalization gains is beyond the scope of this vision paper and is flagged as future work. revision: partial
Referee: [Three research directions] Research directions section: The multi-time-scale modeling direction asserts that allogenic variables evolve slowly while autogenic may change within an episode, but offers no mechanism, algorithm sketch, or learning rule to operationalize this difference, leaving the feasibility of the proposal unaddressed.

Authors: We agree that feasibility would benefit from an operational sketch. In the revised manuscript we will expand the multi-time-scale direction with a high-level conceptual mechanism, for example by suggesting the use of separate temporal abstraction layers or timescale-specific encoders (drawing on ideas from hierarchical RL and multi-timescale recurrent models) that allow the agent to maintain slow-updating representations for allogenic factors and faster adaptation for autogenic ones. This will be presented as an illustrative direction rather than a complete algorithm, consistent with the position-paper format. revision: yes

Circularity Check

0 steps flagged

No significant circularity: position paper with no derivations

full rationale

The manuscript is a position paper proposing a context taxonomy (allogenic vs. autogenic) and three high-level research directions. It contains no equations, fitted parameters, formal derivations, proofs, or algorithms. The central claims are aspirational visions for future contextual RL agents rather than assertions whose correctness reduces to self-citation chains or definitions by construction. No load-bearing step equates a prediction to its own inputs or imports uniqueness from prior author work as an unverified premise. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central vision rests on the domain assumption that current contextual RL is limited by monolithic context treatment and that the proposed split plus three directions will overcome generalization failures; no free parameters or invented entities are introduced.

axioms (2)

domain assumption Current contextual RL treats context as a monolithic, static observable
Stated directly in the abstract as the constraint on generalization capabilities.
ad hoc to paper Separating allogenic from autogenic contexts and addressing heterogeneous, multi-time-scale, and abstract contexts will improve agent reasoning and transfer
Core premise of the proposed taxonomy and three research directions.

pith-pipeline@v0.9.0 · 5554 in / 1281 out tokens · 36298 ms · 2026-05-15T21:51:56.906740+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a novel taxonomy of contexts that separates allogenic (environment-imposed) from autogenic (agent-driven) factors... three fundamental research directions: (1) Learning with heterogeneous contexts... (2) Multi-time-scale modeling... (3) Integration of abstract, high-level contexts
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We envision context as a first-class modeling primitive, empowering agents to reason about who they are, what the world permits, and how both evolve over time.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 3 internal anchors

[1]

Abdolshah, H

M. Abdolshah, H. Le, T. K. George, S. Gupta, S. Rana, and S. Venkatesh. 2021. A New Representation of Successor Features for Transfer across Dissimilar Environments. InProceedings of the 38th International Conference on Machine Learning (ICML’21) (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 1–9

work page 2021
[2]

S. V. Albrecht, F. Christianos, and L. Schäfer. 2024.Multi-agent reinforcement learning: Foundations and modern approaches. MIT Press

work page 2024
[3]

Andrychowicz, B

M. Andrychowicz, B. Baker, M. Chociej, R. Józefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba. 2020. Learning dexterous in-hand manipulation. International Journal of Robotics Research39, 1 (2020)

work page 2020
[4]

J. Beck, R. Vuorio, E. Z. Liu, Z. Xiong, L. M. Zintgraf, C. Finn, and S. Whiteson

work page
[5]

Trends Mach

A Tutorial on Meta-Reinforcement Learning.Found. Trends Mach. Learn. 18, 2-3 (2025), 224–384. https://doi.org/10.1561/2200000080

work page doi:10.1561/2200000080 2025
[6]

M. G. Bellemare, S. Candido, P. Samuel Castro, J. Gong, M. C. Machado, S. Moitra, S. S. Ponda, and Z. Wang. 2020. Autonomous navigation of stratospheric balloons using reinforcement learning.Nature588, 7836 (2020), 77–82

work page 2020
[7]

R. Bellman. 1957. A Markovian decision process.Journal of Mathematics and Mechanics(1957), 679–684

work page 1957
[8]

Benad, F

J. Benad, F. Röder, M. V. Butz, and M. Eppe. 2025. Shared dynamic model aligned hypernetworks for contextual reinforcement learning. InEighteenth European Workshop on Reinforcement Learning. https://openreview.net/forum? id=6gdvQqkFKT

work page 2025
[9]

Benjamins, T

C. Benjamins, T. Eimer, F. Schubert, A. Mohan, S. Döhler, A. Biedenkapp, B. Rosenhan, F. Hutter, and M. Lindauer. 2023. Contextualize Me – The Case for Context in Reinforcement Learning.Transactions on Machine Learning Research (2023)

work page 2023
[10]

Beukman, D

M. Beukman, D. Jarvis, R. Klein, S. James, and B. Rosman. 2023. Dynamics Gen- eralisation in Reinforcement Learning via Adaptive Context-Aware Policies. In Proceedings of the 36th International Conference on Advances in Neural Information Processing Systems (NeurIPS’23), A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.). Curran Associates

work page 2023
[11]

Biedenkapp, R

A. Biedenkapp, R. Rajan, F. Hutter, and M. Lindauer. 2021. TempoRL: Learning When to Act. InProceedings of the 38th International Conference on Machine Learning (ICML’21) (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 914–924

work page 2021
[12]

Biedenkapp, D

A. Biedenkapp, D. Speck, S. Sievers, F. Hutter, M. Lindauer, and J. Seipp

work page
[13]

InWork- shop on Bridging the Gap Between AI Planning and Reinforcement Learning (PRL@ICAPS’22), M

Learning Domain-Independent Policies for Open List Selection. InWork- shop on Bridging the Gap Between AI Planning and Reinforcement Learning (PRL@ICAPS’22), M. Katz, H. Palacios, and V. Gómez (Eds.)

work page
[14]

Bordne, M

P. Bordne, M. A. Hasan, E. Bergman, N. Awad, and A. Biedenkapp. 2024. CANDID DAC: Leveraging Coupled Action Dimensions with Importance Differences in DAC. InProc. of AutoML’24, Workshop Track. https://openreview.net/forum?id= ZCCZYfstkG

work page 2024
[15]

P. F. Christiano, J. Leike, T. B. Brown, M. Martic, S. Legg, and D. Amodei. 2017. Deep Reinforcement Learning from Human Preferences. InProceedings of the 30th International Conference on Advances in Neural Information Processing Systems (NeurIPS’17), I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett (Eds...

work page 2017
[16]

Cobbe, C

K. Cobbe, C. Hesse, J. Hilton, and J. Schulman. 2020. Leveraging Procedural Generation to Benchmark Reinforcement Learning. InProceedings of the 37th In- ternational Conference on Machine Learning (ICML’20), H. Daume III and A. Singh (Eds.), Vol. 98. Proceedings of Machine Learning Research

work page 2020
[17]

Degrave, F

J. Degrave, F. Felici, J. Buchli, M. Neunert, B. Tracey, F. Carpanese, T. Ewalds, R. Hafner, A. Abdolmaleki, D. de las Casas, C. Donner, L. Fritz, C. Galperti, A. Huber, J. Keeling, M. Tsimpoukelli, J. Kay, A. Merle, Jean-M. Moret, S. Noury, F. Pesamosca, D. Pfau, O. Sauter, C. Sommariva, S. Coda, B. Duval, A. Fasoli, P. Kohli, K. Kavukcuoglu, D. Hassabis...

work page 2022
[18]

C. Ding, L. Zhou, Y. Li, and X. Rong. 2020. Locomotion Control of Quadruped Robots With Online Center of Mass Adaptation and Payload Identification.IEEE Access8 (2020), 224578–224587. https://doi.org/10.1109/ACCESS.2020.3044933

work page doi:10.1109/access.2020.3044933 2020
[19]

Eimer, C

T. Eimer, C. Benjamins, and M. Lindauer. 2021. Hyperparameters in Contextual RL are Highly Situational. InWorkshop on Ecological Theory of Reinforcement Learning (EcoRL@NeurIPS’21)

work page 2021
[20]

Eimer, A

T. Eimer, A. Biedenkapp, F. Hutter, and M. Lindauer. 2021. Self-Paced Context Evaluation for Contextual Reinforcement Learning. InProceedings of the 38th International Conference on Machine Learning (ICML’21) (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 2948–2958

work page 2021
[21]

Engwegen, D

L. Engwegen, D. Brinks, and W. Boehmer. 2025. Modular Recurrence in Contex- tual MDPs for Universal Morphology Control. InEighteenth European Workshop on Reinforcement Learning. https://openreview.net/forum?id=0fn0ii1njp

work page 2025
[22]

Evans, A

B. Evans, A. Thankaraj, and L. Pinto. 2022. Context is Everything: Implicit Identification for Dynamics Adaptation. In2022 International Conference on Robotics and Automation, ICRA. IEEE, 2642–2648

work page 2022
[23]

Eysenbach, R

B. Eysenbach, R. R. Salakhutdinov, and S. Levine. 2019. Search on the Replay Buffer: Bridging Planning and Reinforcement Learning. InProceedings of the 32nd International Conference on Advances in Neural Information Processing Systems (NeurIPS’19), H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alche Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran As...

work page 2019
[24]

Ghosh, J

D. Ghosh, J. Rahme, A. Kumar, A. Zhang, R. P. Adams, and S. Levine. 2021. Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability. InProceedings of the 34th International Conference on Advances in Neural Information Processing Systems (NeurIPS’21), M. Ranzato, A. Beygelzimer, K. Nguyen, P. Liang, J. Vaughan, and Y. Dauph...

work page 2021
[25]

Sebastian Griesbach and Carlo D’Eramo. 2025. Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization.Reinforcement Learning Journal6 (2025), 1140–1157

work page 2025
[26]

Grooten, P

B. Grooten, P. MacAlpine, K. Subramanian, P. R. Wurman, and P. Stone. 2026. Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy. InProceedings of the Fourtieth AAAI Conference on Artificial Intelligence. AAAI Press

work page 2026
[27]

Gumbsch, N

C. Gumbsch, N. Sajid, G. Martius, and M. V. Butz. 2024. Learning Hierarchi- cal World Models with Adaptive Temporal Abstractions from Discrete Latent Dynamics. InThe Twelfth International Conference on Learning Representations (ICLR’24). ICLR. https://openreview.net/forum?id=TjCDNssXKU

work page 2024
[28]

D. Ha, A. M. Dai, and Q. V. Le. 2017. HyperNetworks. InThe Fifth International Conference on Learning Representations (ICLR’17). ICLR, OpenReview.net

work page 2017
[29]

Hafner, J

D. Hafner, J. Pasukonis, J. Ba, and T. P. Lillicrap. 2025. Mastering diverse control tasks through world models.Nat.640, 8059 (2025), 647–653

work page 2025
[30]

Contextual Markov Decision Processes

A. Hallak, D. Di Castro, and S. Mannor. 2015. Contextual Markov Decision Processes.arXiv:1502.02259 [stat.ML](2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[31]

2019.Ethics guidelines for trustworthy AI

High-Level Expert Group on AI. 2019.Ethics guidelines for trustworthy AI. Report. European Commission. https://ec.europa.eu/digital-single-market/en/news/ ethics-guidelines-trustworthy-ai

work page 2019
[32]

Iannotta, Y

M. Iannotta, Y. Yang, J. A. Stork, E. Schaffernicht, and T. Stoyanov. 2025. Can Context Bridge the Reality Gap? Sim-to-Real Transfer of Context-Aware Policies. arXiv:2511.04249 [cs.RO](2025). https://doi.org/10.48550/arXiv.2511.04249

work page doi:10.48550/arxiv.2511.04249 2025
[33]

Kaufmann, L

E. Kaufmann, L. Bauersfeld, A. Loquercio, M. Müller, V. Koltun, and D. Scara- muzza. 2023. Champion-level drone racing using deep reinforcement learning. Nat.620, 7976 (2023), 982–987. https://doi.org/10.1038/S41586-023-06419-4

work page doi:10.1038/s41586-023-06419-4 2023
[34]

R. Kirk, A. Zhang, E. Grefenstette, and T. Rocktäschel. 2023. A Survey of Zero-shot Generalisation in Deep Reinforcement Learning.Journal of Artificial Intelligence Research (JAIR)76 (2023), 201–264

work page 2023
[35]

Klink, C

P. Klink, C. D’Eramo, J. Peters, and J. Pajarinen. 2020. Self-Paced Deep Reinforce- ment Learning. InProceedings of the 33rd International Conference on Advances in Neural Information Processing Systems (NeurIPS’20), H. Larochelle, M. Ranzato, R. Hadsell, M.-F. Balcan, and H. Lin (Eds.). Curran Associates, 9216–9227

work page 2020
[36]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

S. Levine, A. Kumar, G. Tucker, and J. Fu. 2020. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems.arXiv:2005.01643 [cs.LG] abs/2005.01643 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020
[37]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. 2015. Human-level control through deep reinforcement learning.Nature518, 7540 (26 02 2015), 529–533

work page 2015
[38]

A. Modi, N. Jiang, S. Singh, and A. Tewari. 2018. Markov Decision Processes With Continuous Side Information. InAlgorithmic Learning Theory (ALT’18), Vol. 83. 597–618

work page 2018
[39]

Mohan, A

A. Mohan, A. Zhang, and M. Lindauer. 2024. Structure in Deep Reinforcement Learning: A Survey and Open Problems.Journal of Artificial Intelligence Research 79 (2024)

work page 2024
[40]

Camaret Ndir, A

T. Camaret Ndir, A. Biedenkapp, and N. Awad. 2024. Inferring Behavior- Specific Context Improves Zero-Shot Generalization in Reinforcement Learn- ing. InSeventeenth European Workshop on Reinforcement Learning. https: //openreview.net/forum?id=51XSWH0mgN

work page 2024
[41]

Panaganti, Z

K. Panaganti, Z. Xu, D. Kalathil, and M. Ghavamzadeh. 2022. Robust Rein- forcement Learning using Offline Data. InProceedings of the 35th International Conference on Advances in Neural Information Processing Systems (NeurIPS’22), S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). Curran Associates

work page 2022
[42]

Bin Peng, M

X. Bin Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. 2018. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. InProc. of ICRA’18. IEEE, 1–8

work page 2018
[43]

Perez, F

C. Perez, F. P. Such, and T. Karaletsos. 2020. Generalized hidden parameter mdps: Transferable model-based rl in a handful of trials. InProceedings of the AAAI Conference on Artificial Intelligence, F. Rossi, V. Conitzer, and F. Sha (Eds.). Association for the Advancement of Artificial Intelligence, AAAI Press, 5403– 5411

work page 2020
[44]

Prasanna, K

S. Prasanna, K. Farid, R. Rajan, and A. Biedenkapp. 2024. Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization.Re- inforcement Learning Journal1 (2024)

work page 2024
[45]

Rajan, J

R. Rajan, J. Diaz, S. Guttikonda, F. Ferreira, A. Biedenkapp, J. Ole von H., and F. Hutter. 2023. MDP Playground: An Analysis and Debug Testbed for Rein- forcement Learning.Journal of Artificial Intelligence Research (JAIR)77 (2023), 821–890

work page 2023
[46]

S. Reed, K. Zolna, E. Parisotto, S. Gómez C., A. Novikov, G. Barth-maron, M. Giménez, Y. Sulsky, J. Kay, J. T. Springenberg, T. Eccles, J. Bruce, A. Razavi, A. Edwards, N. Heess, Y. Chen, R. Hadsell, O. Vinyals, M. Bordbar, and N. de Freitas. 2022. A Generalist Agent.Transactions on Machine Learning Research (2022). https://openreview.net/forum?id=1ikK0kH...

work page 2022
[47]

S. Russell. 2022. Human-Compatible Artificial Intelligence. InHuman-Like Machine Intelligence, S. H. Muggleton and N. Chater (Eds.). Oxford University Press, 3–23

work page 2022
[48]

Saxena, J

V. Saxena, J. Ba, and D. Hafner. 2021. Clockwork Variational Autoencoders. In Proceedings of the 34th International Conference on Advances in Neural Information Processing Systems (NeurIPS’21), M. A. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan (Eds.). Curran Associates, 29246–29257

work page 2021
[49]

Sodhani, A

S. Sodhani, A. Zhang, and J. Pineau. 2021. Multi-Task Reinforcement Learning with Context-based Representations. InProceedings of the 38th International Conference on Machine Learning (ICML’21) (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 9767–9779

work page 2021
[50]

R. S. Sutton and A. G. Barto. 2018.Reinforcement learning: An introduction(2 ed.). MIT Press

work page 2018
[51]

Tobin, R

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. 2017. Domain randomization for transferring deep neural networks from simulation to the real world. InProc. of (IROS’17). IEEE, 23–30

work page 2017
[52]

Vinyals, I

O. Vinyals, I. Babuschkin, W. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. Agapiou, M. Jaderberg, A. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. Paine, C. Gulcehre, Z. Wand, T. Pfaff, Y. Wu, R. Ring, ...

work page
[53]

Nature7782 (2019), 350–354

Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature7782 (2019), 350–354

work page 2019
[54]

J. Wang, M. King, N. Porcel, Z. Kurth-Nelson, T. Zhu, C. Deck, P. Choy, M. Cassin, M. Reynolds, H. F. Song, G. Buttimore, D. P. Reichert, N. C. Rabinowitz, L. Matthey, D. Hassabis, A. Lerchner, and M. M. Botvinick. 2021. Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents. InProceedings of the 34th International Conference on ...

work page 2021
[55]

P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subramanian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchs, L. Gilpin, P. Khandelwal, V. Raj Kompella, H. Lin, P. MacAlpine, D. Oller, T. Seno, C. Sherstan, M. D. Thomure, H. Aghabozorgi, L. Barrett, R. Douglas, D. Whitehead, P. Dürr, P. Stone, M. Spranger, and H. Kitano. 2022. Outracin...

work page doi:10.1038/s41586-021-04357-7 2022
[56]

Yarats, R

D. Yarats, R. Fergus, A. Lazaric, and L. Pinto. 2021. Reinforcement Learning with Prototypical Representations. InProceedings of the 38th International Con- ference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 11920–11931

work page 2021
[57]

W. Yu, J. Tan, C. K. Liu, and G. Turk. 2017. Preparing for the unknown: Learning a universal policy with online system identification.arXiv preprint arXiv:1702.02453 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[58]

W. Yu, J. Tan, C. K. Liu, and G. Turk. 2017. Preparing for the Unknown: Learning a Universal Policy with Online System Identification. InRobotics: Science and Systems XIII

work page 2017
[59]

Zhang, S

A. Zhang, S. Sodhani, K. Khetarpal, and J. Pineau. 2021. Learning Robust State Abstractions for Hidden-parameter Block MDPs. InThe Ninth International Conference on Learning Representations (ICLR’21). ICLR

work page 2021
[60]

Zhang, H

H. Zhang, H. Chen, C. Xiao, B. Li, M. Liu, D. S. Boning, and C.-J. Hsieh. 2020. Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations. InProceedings of the 33rd International Conference on Advances in Neural Information Processing Systems (NeurIPS’20), H. Larochelle, M. Ranzato, R. Hadsell, M.-F. Balcan, and H.-T. Lin ...

work page 2020
[61]

W. Zhou, L. Pinto, and A. Gupta. 2019. Environment Probing Interaction Policies. InThe Seventh International Conference on Learning Representations (ICLR’19). ICLR, OpenReview.net. APPENDIX On the Relativity of Context.We ultimately believe that context is relative. The frame of reference determines if a context is allogenic, autogenic, or somewhere in be...

work page 2019