Recognition: 2 theorem links
· Lean TheoremContextual Intelligence The Next Leap for Reinforcement Learning
Pith reviewed 2026-05-15 21:51 UTC · model grok-4.3
The pith
Reinforcement learning agents need context treated as a first-class modeling primitive that separates environment-imposed factors from agent-driven ones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a taxonomy that divides contexts into allogenic factors imposed by the environment and autogenic factors produced by the agent, then identify three necessary research directions: learning with heterogeneous contexts so agents can model mutual influences, multi-time-scale modeling to handle slow-changing allogenic variables separately from fast-changing autogenic ones, and integration of abstract high-level contexts such as roles and regulatory regimes. Treating context as a first-class primitive in this way would let agents reason about who they are, what the world permits, and how both evolve.
What carries the argument
The taxonomy that splits allogenic (environment-imposed) from autogenic (agent-driven) contexts, which supplies the structure for the three research directions.
If this is right
- Agents can explicitly track how their actions alter the world and how the world alters them through the taxonomy levels.
- Separate learning mechanisms become appropriate for slowly evolving allogenic variables versus rapidly changing autogenic variables within an episode.
- Inclusion of abstract contexts such as roles and uncertainties produces behavior that respects non-physical constraints.
- Zero-shot generalization improves because agents no longer treat context as a monolithic input.
- Real-world deployment becomes safer and more efficient once agents maintain consistent reasoning across changing conditions.
Where Pith is reading between the lines
- The same separation could be tested in hierarchical or meta-learning setups to see whether adaptation speed increases when context types are distinguished.
- Agents built this way might maintain stable behavior when regulatory regimes or resource limits shift, a property not directly tested in current benchmarks.
- The approach suggests a concrete experiment: compare generalization curves on environments where allogenic factors are held fixed while autogenic factors vary within episodes.
Load-bearing premise
That explicitly separating allogenic from autogenic contexts and following the three research directions will produce agents with substantially better zero-shot generalization and real-world safety than existing contextual RL methods.
What would settle it
Training agents on the proposed taxonomy and directions and measuring no measurable gain in zero-shot transfer success or safety metrics relative to standard contextual RL baselines on the same tasks.
Figures
read the original abstract
Reinforcement learning (RL) has produced spectacular results in games, robotics, and continuous control. Yet, despite these successes, learned policies often fail to generalize beyond their training distribution, limiting real-world impact. Recent work on contextual RL (cRL) shows that exposing agents to environment characteristics -- contexts -- can improve zero-shot transfer. So far, the community has treated context as a monolithic, static observable, an approach that constrains the generalization capabilities of RL agents. To achieve contextual intelligence we first propose a novel taxonomy of contexts that separates allogenic (environment-imposed) from autogenic (agent-driven) factors. We identify three fundamental research directions that must be addressed to promote truly contextual intelligence: (1) Learning with heterogeneous contexts to explicitly exploit the taxonomy levels so agents can reason about their influence on the world and vice versa; (2) Multi-time-scale modeling to recognize that allogenic variables evolve slowly or remain static, whereas autogenic variables may change within an episode, potentially requiring different learning mechanisms; (3) Integration of abstract, high-level contexts to incorporate roles, resource & regulatory regimes, uncertainties, and other non-physical descriptors that crucially influence behavior. We envision context as a first-class modeling primitive, empowering agents to reason about who they are, what the world permits, and how both evolve over time. By doing so, we aim to catalyze a new generation of context-aware agents that can be deployed safely and efficiently in the real world.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a position paper arguing that current contextual reinforcement learning (cRL) treats context as a monolithic, static observable, limiting zero-shot generalization. It proposes a novel taxonomy distinguishing allogenic (environment-imposed) from autogenic (agent-driven) contexts and outlines three research directions: (1) learning with heterogeneous contexts to exploit taxonomy levels, (2) multi-time-scale modeling to handle differing evolution rates, and (3) integration of abstract contexts such as roles and regulatory regimes. The vision is to elevate context to a first-class primitive enabling agents to reason about self, world constraints, and temporal evolution for safer, more transferable real-world deployment.
Significance. If the taxonomy and directions yield concrete algorithms with measurable gains in generalization and safety, the work could meaningfully shift RL research toward more structured context handling. The absence of any empirical results, formal definitions, derivations, or illustrative examples means the significance remains aspirational and depends entirely on subsequent validation.
major comments (2)
- [Abstract and taxonomy introduction] Abstract and § on taxonomy: The claim that explicitly separating allogenic from autogenic contexts will produce substantially improved zero-shot generalization and safety is load-bearing yet unsupported; no concrete example, formal distinction, or comparison to existing cRL methods (e.g., context as state augmentation) is provided to show why the split is non-trivial or beneficial.
- [Three research directions] Research directions section: The multi-time-scale modeling direction asserts that allogenic variables evolve slowly while autogenic may change within an episode, but offers no mechanism, algorithm sketch, or learning rule to operationalize this difference, leaving the feasibility of the proposal unaddressed.
minor comments (2)
- [Introduction] The manuscript would benefit from a short related-work subsection contrasting the proposed taxonomy with prior contextual RL surveys or frameworks to clarify novelty.
- [Taxonomy] Terminology: 'Allogenic' and 'autogenic' are used without citation to their origins in other fields; adding one or two references would improve accessibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our position paper. The comments highlight important areas where the vision can be clarified and strengthened. We address each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract and taxonomy introduction] Abstract and § on taxonomy: The claim that explicitly separating allogenic from autogenic contexts will produce substantially improved zero-shot generalization and safety is load-bearing yet unsupported; no concrete example, formal distinction, or comparison to existing cRL methods (e.g., context as state augmentation) is provided to show why the split is non-trivial or beneficial.
Authors: We acknowledge that the manuscript, as a position paper, presents the taxonomy as a conceptual foundation rather than an empirically validated claim. The distinction aims to capture that allogenic contexts (e.g., fixed environmental parameters like gravity or regulatory constraints) are typically invariant within an episode, while autogenic contexts (e.g., agent-internal goals or learned representations) can evolve dynamically. This separation is intended to enable agents to reason differently about external constraints versus self-generated factors, potentially improving transfer. We will revise the taxonomy section to include a concrete illustrative example (such as a robotic navigation task with fixed terrain properties versus variable task objectives) and a brief comparison to standard context-as-state-augmentation baselines to clarify why the split is non-trivial. Full empirical demonstration of generalization gains is beyond the scope of this vision paper and is flagged as future work. revision: partial
-
Referee: [Three research directions] Research directions section: The multi-time-scale modeling direction asserts that allogenic variables evolve slowly while autogenic may change within an episode, but offers no mechanism, algorithm sketch, or learning rule to operationalize this difference, leaving the feasibility of the proposal unaddressed.
Authors: We agree that feasibility would benefit from an operational sketch. In the revised manuscript we will expand the multi-time-scale direction with a high-level conceptual mechanism, for example by suggesting the use of separate temporal abstraction layers or timescale-specific encoders (drawing on ideas from hierarchical RL and multi-timescale recurrent models) that allow the agent to maintain slow-updating representations for allogenic factors and faster adaptation for autogenic ones. This will be presented as an illustrative direction rather than a complete algorithm, consistent with the position-paper format. revision: yes
Circularity Check
No significant circularity: position paper with no derivations
full rationale
The manuscript is a position paper proposing a context taxonomy (allogenic vs. autogenic) and three high-level research directions. It contains no equations, fitted parameters, formal derivations, proofs, or algorithms. The central claims are aspirational visions for future contextual RL agents rather than assertions whose correctness reduces to self-citation chains or definitions by construction. No load-bearing step equates a prediction to its own inputs or imports uniqueness from prior author work as an unverified premise. The derivation chain is therefore self-contained and non-circular.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Current contextual RL treats context as a monolithic, static observable
- ad hoc to paper Separating allogenic from autogenic contexts and addressing heterogeneous, multi-time-scale, and abstract contexts will improve agent reasoning and transfer
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a novel taxonomy of contexts that separates allogenic (environment-imposed) from autogenic (agent-driven) factors... three fundamental research directions: (1) Learning with heterogeneous contexts... (2) Multi-time-scale modeling... (3) Integration of abstract, high-level contexts
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We envision context as a first-class modeling primitive, empowering agents to reason about who they are, what the world permits, and how both evolve over time.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
M. Abdolshah, H. Le, T. K. George, S. Gupta, S. Rana, and S. Venkatesh. 2021. A New Representation of Successor Features for Transfer across Dissimilar Environments. InProceedings of the 38th International Conference on Machine Learning (ICML’21) (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 1–9
work page 2021
-
[2]
S. V. Albrecht, F. Christianos, and L. Schäfer. 2024.Multi-agent reinforcement learning: Foundations and modern approaches. MIT Press
work page 2024
-
[3]
M. Andrychowicz, B. Baker, M. Chociej, R. Józefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba. 2020. Learning dexterous in-hand manipulation. International Journal of Robotics Research39, 1 (2020)
work page 2020
-
[4]
J. Beck, R. Vuorio, E. Z. Liu, Z. Xiong, L. M. Zintgraf, C. Finn, and S. Whiteson
-
[5]
A Tutorial on Meta-Reinforcement Learning.Found. Trends Mach. Learn. 18, 2-3 (2025), 224–384. https://doi.org/10.1561/2200000080
-
[6]
M. G. Bellemare, S. Candido, P. Samuel Castro, J. Gong, M. C. Machado, S. Moitra, S. S. Ponda, and Z. Wang. 2020. Autonomous navigation of stratospheric balloons using reinforcement learning.Nature588, 7836 (2020), 77–82
work page 2020
-
[7]
R. Bellman. 1957. A Markovian decision process.Journal of Mathematics and Mechanics(1957), 679–684
work page 1957
- [8]
-
[9]
C. Benjamins, T. Eimer, F. Schubert, A. Mohan, S. Döhler, A. Biedenkapp, B. Rosenhan, F. Hutter, and M. Lindauer. 2023. Contextualize Me – The Case for Context in Reinforcement Learning.Transactions on Machine Learning Research (2023)
work page 2023
-
[10]
M. Beukman, D. Jarvis, R. Klein, S. James, and B. Rosman. 2023. Dynamics Gen- eralisation in Reinforcement Learning via Adaptive Context-Aware Policies. In Proceedings of the 36th International Conference on Advances in Neural Information Processing Systems (NeurIPS’23), A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.). Curran Associates
work page 2023
-
[11]
A. Biedenkapp, R. Rajan, F. Hutter, and M. Lindauer. 2021. TempoRL: Learning When to Act. InProceedings of the 38th International Conference on Machine Learning (ICML’21) (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 914–924
work page 2021
- [12]
-
[13]
InWork- shop on Bridging the Gap Between AI Planning and Reinforcement Learning (PRL@ICAPS’22), M
Learning Domain-Independent Policies for Open List Selection. InWork- shop on Bridging the Gap Between AI Planning and Reinforcement Learning (PRL@ICAPS’22), M. Katz, H. Palacios, and V. Gómez (Eds.)
- [14]
-
[15]
P. F. Christiano, J. Leike, T. B. Brown, M. Martic, S. Legg, and D. Amodei. 2017. Deep Reinforcement Learning from Human Preferences. InProceedings of the 30th International Conference on Advances in Neural Information Processing Systems (NeurIPS’17), I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett (Eds...
work page 2017
-
[16]
K. Cobbe, C. Hesse, J. Hilton, and J. Schulman. 2020. Leveraging Procedural Generation to Benchmark Reinforcement Learning. InProceedings of the 37th In- ternational Conference on Machine Learning (ICML’20), H. Daume III and A. Singh (Eds.), Vol. 98. Proceedings of Machine Learning Research
work page 2020
-
[17]
J. Degrave, F. Felici, J. Buchli, M. Neunert, B. Tracey, F. Carpanese, T. Ewalds, R. Hafner, A. Abdolmaleki, D. de las Casas, C. Donner, L. Fritz, C. Galperti, A. Huber, J. Keeling, M. Tsimpoukelli, J. Kay, A. Merle, Jean-M. Moret, S. Noury, F. Pesamosca, D. Pfau, O. Sauter, C. Sommariva, S. Coda, B. Duval, A. Fasoli, P. Kohli, K. Kavukcuoglu, D. Hassabis...
work page 2022
-
[18]
C. Ding, L. Zhou, Y. Li, and X. Rong. 2020. Locomotion Control of Quadruped Robots With Online Center of Mass Adaptation and Payload Identification.IEEE Access8 (2020), 224578–224587. https://doi.org/10.1109/ACCESS.2020.3044933
- [19]
-
[20]
T. Eimer, A. Biedenkapp, F. Hutter, and M. Lindauer. 2021. Self-Paced Context Evaluation for Contextual Reinforcement Learning. InProceedings of the 38th International Conference on Machine Learning (ICML’21) (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 2948–2958
work page 2021
-
[21]
L. Engwegen, D. Brinks, and W. Boehmer. 2025. Modular Recurrence in Contex- tual MDPs for Universal Morphology Control. InEighteenth European Workshop on Reinforcement Learning. https://openreview.net/forum?id=0fn0ii1njp
work page 2025
- [22]
-
[23]
B. Eysenbach, R. R. Salakhutdinov, and S. Levine. 2019. Search on the Replay Buffer: Bridging Planning and Reinforcement Learning. InProceedings of the 32nd International Conference on Advances in Neural Information Processing Systems (NeurIPS’19), H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alche Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran As...
work page 2019
-
[24]
D. Ghosh, J. Rahme, A. Kumar, A. Zhang, R. P. Adams, and S. Levine. 2021. Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability. InProceedings of the 34th International Conference on Advances in Neural Information Processing Systems (NeurIPS’21), M. Ranzato, A. Beygelzimer, K. Nguyen, P. Liang, J. Vaughan, and Y. Dauph...
work page 2021
-
[25]
Sebastian Griesbach and Carlo D’Eramo. 2025. Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization.Reinforcement Learning Journal6 (2025), 1140–1157
work page 2025
-
[26]
B. Grooten, P. MacAlpine, K. Subramanian, P. R. Wurman, and P. Stone. 2026. Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy. InProceedings of the Fourtieth AAAI Conference on Artificial Intelligence. AAAI Press
work page 2026
-
[27]
C. Gumbsch, N. Sajid, G. Martius, and M. V. Butz. 2024. Learning Hierarchi- cal World Models with Adaptive Temporal Abstractions from Discrete Latent Dynamics. InThe Twelfth International Conference on Learning Representations (ICLR’24). ICLR. https://openreview.net/forum?id=TjCDNssXKU
work page 2024
-
[28]
D. Ha, A. M. Dai, and Q. V. Le. 2017. HyperNetworks. InThe Fifth International Conference on Learning Representations (ICLR’17). ICLR, OpenReview.net
work page 2017
- [29]
-
[30]
Contextual Markov Decision Processes
A. Hallak, D. Di Castro, and S. Mannor. 2015. Contextual Markov Decision Processes.arXiv:1502.02259 [stat.ML](2015)
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[31]
2019.Ethics guidelines for trustworthy AI
High-Level Expert Group on AI. 2019.Ethics guidelines for trustworthy AI. Report. European Commission. https://ec.europa.eu/digital-single-market/en/news/ ethics-guidelines-trustworthy-ai
work page 2019
-
[32]
M. Iannotta, Y. Yang, J. A. Stork, E. Schaffernicht, and T. Stoyanov. 2025. Can Context Bridge the Reality Gap? Sim-to-Real Transfer of Context-Aware Policies. arXiv:2511.04249 [cs.RO](2025). https://doi.org/10.48550/arXiv.2511.04249
-
[33]
E. Kaufmann, L. Bauersfeld, A. Loquercio, M. Müller, V. Koltun, and D. Scara- muzza. 2023. Champion-level drone racing using deep reinforcement learning. Nat.620, 7976 (2023), 982–987. https://doi.org/10.1038/S41586-023-06419-4
-
[34]
R. Kirk, A. Zhang, E. Grefenstette, and T. Rocktäschel. 2023. A Survey of Zero-shot Generalisation in Deep Reinforcement Learning.Journal of Artificial Intelligence Research (JAIR)76 (2023), 201–264
work page 2023
-
[35]
P. Klink, C. D’Eramo, J. Peters, and J. Pajarinen. 2020. Self-Paced Deep Reinforce- ment Learning. InProceedings of the 33rd International Conference on Advances in Neural Information Processing Systems (NeurIPS’20), H. Larochelle, M. Ranzato, R. Hadsell, M.-F. Balcan, and H. Lin (Eds.). Curran Associates, 9216–9227
work page 2020
-
[36]
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
S. Levine, A. Kumar, G. Tucker, and J. Fu. 2020. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems.arXiv:2005.01643 [cs.LG] abs/2005.01643 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[37]
V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. 2015. Human-level control through deep reinforcement learning.Nature518, 7540 (26 02 2015), 529–533
work page 2015
-
[38]
A. Modi, N. Jiang, S. Singh, and A. Tewari. 2018. Markov Decision Processes With Continuous Side Information. InAlgorithmic Learning Theory (ALT’18), Vol. 83. 597–618
work page 2018
- [39]
-
[40]
T. Camaret Ndir, A. Biedenkapp, and N. Awad. 2024. Inferring Behavior- Specific Context Improves Zero-Shot Generalization in Reinforcement Learn- ing. InSeventeenth European Workshop on Reinforcement Learning. https: //openreview.net/forum?id=51XSWH0mgN
work page 2024
-
[41]
K. Panaganti, Z. Xu, D. Kalathil, and M. Ghavamzadeh. 2022. Robust Rein- forcement Learning using Offline Data. InProceedings of the 35th International Conference on Advances in Neural Information Processing Systems (NeurIPS’22), S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). Curran Associates
work page 2022
-
[42]
X. Bin Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. 2018. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. InProc. of ICRA’18. IEEE, 1–8
work page 2018
-
[43]
C. Perez, F. P. Such, and T. Karaletsos. 2020. Generalized hidden parameter mdps: Transferable model-based rl in a handful of trials. InProceedings of the AAAI Conference on Artificial Intelligence, F. Rossi, V. Conitzer, and F. Sha (Eds.). Association for the Advancement of Artificial Intelligence, AAAI Press, 5403– 5411
work page 2020
-
[44]
S. Prasanna, K. Farid, R. Rajan, and A. Biedenkapp. 2024. Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization.Re- inforcement Learning Journal1 (2024)
work page 2024
- [45]
-
[46]
S. Reed, K. Zolna, E. Parisotto, S. Gómez C., A. Novikov, G. Barth-maron, M. Giménez, Y. Sulsky, J. Kay, J. T. Springenberg, T. Eccles, J. Bruce, A. Razavi, A. Edwards, N. Heess, Y. Chen, R. Hadsell, O. Vinyals, M. Bordbar, and N. de Freitas. 2022. A Generalist Agent.Transactions on Machine Learning Research (2022). https://openreview.net/forum?id=1ikK0kH...
work page 2022
-
[47]
S. Russell. 2022. Human-Compatible Artificial Intelligence. InHuman-Like Machine Intelligence, S. H. Muggleton and N. Chater (Eds.). Oxford University Press, 3–23
work page 2022
-
[48]
V. Saxena, J. Ba, and D. Hafner. 2021. Clockwork Variational Autoencoders. In Proceedings of the 34th International Conference on Advances in Neural Information Processing Systems (NeurIPS’21), M. A. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan (Eds.). Curran Associates, 29246–29257
work page 2021
-
[49]
S. Sodhani, A. Zhang, and J. Pineau. 2021. Multi-Task Reinforcement Learning with Context-based Representations. InProceedings of the 38th International Conference on Machine Learning (ICML’21) (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 9767–9779
work page 2021
-
[50]
R. S. Sutton and A. G. Barto. 2018.Reinforcement learning: An introduction(2 ed.). MIT Press
work page 2018
- [51]
-
[52]
O. Vinyals, I. Babuschkin, W. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. Agapiou, M. Jaderberg, A. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. Paine, C. Gulcehre, Z. Wand, T. Pfaff, Y. Wu, R. Ring, ...
-
[53]
Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature7782 (2019), 350–354
work page 2019
-
[54]
J. Wang, M. King, N. Porcel, Z. Kurth-Nelson, T. Zhu, C. Deck, P. Choy, M. Cassin, M. Reynolds, H. F. Song, G. Buttimore, D. P. Reichert, N. C. Rabinowitz, L. Matthey, D. Hassabis, A. Lerchner, and M. M. Botvinick. 2021. Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents. InProceedings of the 34th International Conference on ...
work page 2021
-
[55]
P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subramanian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchs, L. Gilpin, P. Khandelwal, V. Raj Kompella, H. Lin, P. MacAlpine, D. Oller, T. Seno, C. Sherstan, M. D. Thomure, H. Aghabozorgi, L. Barrett, R. Douglas, D. Whitehead, P. Dürr, P. Stone, M. Spranger, and H. Kitano. 2022. Outracin...
-
[56]
D. Yarats, R. Fergus, A. Lazaric, and L. Pinto. 2021. Reinforcement Learning with Prototypical Representations. InProceedings of the 38th International Con- ference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), M. Meila and T. Zhang (Eds.). PMLR, 11920–11931
work page 2021
-
[57]
W. Yu, J. Tan, C. K. Liu, and G. Turk. 2017. Preparing for the unknown: Learning a universal policy with online system identification.arXiv preprint arXiv:1702.02453 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[58]
W. Yu, J. Tan, C. K. Liu, and G. Turk. 2017. Preparing for the Unknown: Learning a Universal Policy with Online System Identification. InRobotics: Science and Systems XIII
work page 2017
- [59]
-
[60]
H. Zhang, H. Chen, C. Xiao, B. Li, M. Liu, D. S. Boning, and C.-J. Hsieh. 2020. Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations. InProceedings of the 33rd International Conference on Advances in Neural Information Processing Systems (NeurIPS’20), H. Larochelle, M. Ranzato, R. Hadsell, M.-F. Balcan, and H.-T. Lin ...
work page 2020
-
[61]
W. Zhou, L. Pinto, and A. Gupta. 2019. Environment Probing Interaction Policies. InThe Seventh International Conference on Learning Representations (ICLR’19). ICLR, OpenReview.net. APPENDIX On the Relativity of Context.We ultimately believe that context is relative. The frame of reference determines if a context is allogenic, autogenic, or somewhere in be...
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.