pith. machine review for the scientific record. sign in

arxiv: 2605.12224 · v1 · submitted 2026-05-12 · 💻 cs.LG

Recognition: no theorem link

Intrinsic Vicarious Conditioning for Deep Reinforcement Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-13 05:57 UTC · model grok-4.3

classification 💻 cs.LG
keywords reinforcement learningintrinsic rewardsvicarious conditioningdeep RLmemory-based methodscontinual learninglow-shot learningnon-descriptive terminals
0
0 comments X

The pith

Vicarious conditioning supplies intrinsic rewards in deep reinforcement learning without requiring demonstrators' policies or reward functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces vicarious conditioning, drawn from psychological literature, as an intrinsic reward source for reinforcement learning agents. It implements the four steps of attention, retention, reproduction, and reinforcement through memory-based methods that operate without any access to a demonstrator agent's policy or reward function. This setup supports low-shot learning and addresses environments where terminal conditions provide no explicit reward signal. Evaluations in the MiniWorld Sidewalk and Box2D CarRacing environments show the approach produces longer episodes by steering agents away from non-descriptive endings toward more stable states. The work positions the method as more suitable for single-life or continual learning scenarios than direct conditioning approaches.

Core claim

By implementing the four psychological steps of vicarious conditioning through memory-based mechanisms, reinforcement learning agents receive intrinsic rewards that discourage non-descriptive terminal conditions and guide behavior toward desirable states, all without access to the demonstrating agent's policy or reward function.

What carries the argument

Memory-based implementation of vicarious conditioning's four steps (attention, retention, reproduction, reinforcement) that generates intrinsic rewards from observed demonstrations.

Load-bearing premise

Memory-based versions of attention, retention, reproduction, and reinforcement can produce useful intrinsic rewards even with no access to the demonstrator agent's policy or reward function.

What would settle it

A controlled comparison in the Sidewalk environment in which agents using the vicarious conditioning module fail to achieve reliably longer episodes than standard reinforcement learning baselines that lack any demonstrator input.

Figures

Figures reproduced from arXiv: 2605.12224 by Alex Ororbia, Ferat Sahin, Jamison Heard, Rodney A Sanchez.

Figure 1
Figure 1. Figure 1: Sidewalk intrinsic-only training curves across thresholds ( [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: CarRacing positive VC training curves across [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Depicts the social learning frameworks where attention and retention use low-shot learning [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Demonstrates the Siamese Network that is then used as a gate for the SIAMESE LSTM [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Collectively demonstrates the POMDP wrapper and region that considers collision. Car [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Describes an example of avoindance behavior for vicarious conditioning where the agent [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Describes an example of enticement behavior for vicarious conditioning where the agent [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Demonstrates the achieved average and standard deviation of the accuracies for SMANN [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Training curves for the intrinsic-only condition in the Sidewalk environment across three [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Training curves for the intrinsic and extrinsic condition in the Sidewalk environment [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Training curves for the positive Vicarious conditioning with extrinsic rewards in the Car [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Training curves for the negative Vicarious conditioning with extrinsic rewards in the [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Training curves for the Composite Vicarious conditioning with extrinsic rewards in the [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗
read the original abstract

Advancements in reinforcement learning have produced a variety of complex and useful intrinsic driving forces; crucially, these drivers operate under a direct conditioning paradigm. This form of conditioning limits our agents' capacity by restricting how they learn from the environment as well as from others. Off-policy or learn-by-example methods can learn from demonstrators' representations, but they require access to the demonstrating agent's policies or their reward functions. Our work overcomes this direct sampling limitation by introducing vicarious conditioning as an intrinsic reward mechanism. We draw from psychological and biological literature to provide a foundation for vicarious conditioning and use memory-based methods to implement its four steps: attention, retention, reproduction, and reinforcement. Crucially, our vicarious conditioning paradigms support low-shot learning and do not require the demonstrator agent's policy nor its reward functions. We evaluate our approach in the MiniWorld Sidewalk environment, one of the few public environments that features a non-descriptive terminal condition (no reward provided upon agent death), and extend it to Box2D's CarRacing environment. Our results across both environments demonstrate that vicarious conditioning enables longer episode lengths by discouraging the agent from non-descriptive terminal conditions and guiding the agent toward desirable states. Overall, this work emulates a cognitively-plausible learning paradigm better suited to problems such as single-life learning or continual learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes vicarious conditioning as a new intrinsic reward mechanism for deep RL. It draws on psychological literature to define four steps (attention, retention, reproduction, reinforcement) realized via memory-based methods; the resulting intrinsic reward is claimed to let agents learn from demonstrators without access to the demonstrator's policy or reward function. The approach is evaluated in MiniWorld Sidewalk (chosen for its non-descriptive terminal condition) and Box2D CarRacing, with the central empirical claim that it produces longer episodes by discouraging non-descriptive terminals and guiding agents toward desirable states. The work is positioned as more cognitively plausible for single-life and continual learning settings.

Significance. If the memory-based implementation can be shown to generate useful intrinsic rewards from raw observations alone, the result would offer a novel route to vicarious learning in RL that avoids the usual requirement for demonstrator internals. This could be relevant for sparse-reward or non-descriptive-terminal environments and for paradigms that aim to emulate human-like observational learning. No machine-checked proofs, reproducible code releases, or parameter-free derivations are described, so the significance rests entirely on whether the empirical gains are reproducible and mechanistically grounded.

major comments (2)
  1. [Abstract / Methods] Abstract and Methods: the manuscript states that memory-based methods realize the four psychological steps and produce an intrinsic reward, yet supplies neither the memory architecture, state encoding, reward formula, nor update rule. This is load-bearing for the central claim that the mechanism infers desirable states from raw observations without demonstrator policy or reward access; without the concrete realization it is impossible to determine whether reported episode-length gains are artifacts of an implicit channel or genuine vicarious conditioning.
  2. [Results] Results: the abstract asserts that vicarious conditioning enables longer episode lengths in both MiniWorld Sidewalk and CarRacing, but the provided text contains no quantitative metrics, error bars, ablation studies, or comparison against baselines that isolate the contribution of the four-step memory implementation. This undermines the empirical support for the claim that the method discourages non-descriptive terminals.
minor comments (2)
  1. [Introduction] The psychological literature citations that ground the four steps are referenced but not listed with specific sources or page numbers in the abstract; adding them would improve traceability.
  2. [Methods] Notation for the intrinsic reward and memory components is not introduced, making it difficult to follow how attention, retention, reproduction, and reinforcement map onto RL primitives.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and commit to revisions that will strengthen the manuscript's clarity and empirical grounding.

read point-by-point responses
  1. Referee: [Abstract / Methods] Abstract and Methods: the manuscript states that memory-based methods realize the four psychological steps and produce an intrinsic reward, yet supplies neither the memory architecture, state encoding, reward formula, nor update rule. This is load-bearing for the central claim that the mechanism infers desirable states from raw observations without demonstrator policy or reward access; without the concrete realization it is impossible to determine whether reported episode-length gains are artifacts of an implicit channel or genuine vicarious conditioning.

    Authors: We agree that the current manuscript presents the four steps at a conceptual level without the concrete implementation details. The Methods section does not specify the memory architecture, state encoding from raw observations, the exact intrinsic reward formula, or the update rules. This omission limits the ability to evaluate whether the reported gains arise from genuine vicarious conditioning. In the revised manuscript we will add a dedicated subsection detailing the memory architecture (including its structure and capacity), the state encoding process, the mathematical formulation of the intrinsic reward derived from attention, retention, reproduction, and reinforcement, and the corresponding update rules. These additions will make the mechanism fully specified and allow readers to assess its validity independent of any implicit channels. revision: yes

  2. Referee: [Results] Results: the abstract asserts that vicarious conditioning enables longer episode lengths in both MiniWorld Sidewalk and CarRacing, but the provided text contains no quantitative metrics, error bars, ablation studies, or comparison against baselines that isolate the contribution of the four-step memory implementation. This undermines the empirical support for the claim that the method discourages non-descriptive terminals.

    Authors: We acknowledge that the current manuscript text provides only a qualitative description of longer episode lengths without supporting quantitative data. No specific metrics, error bars, ablation studies, or baseline comparisons are included to isolate the contribution of the four-step implementation. In the revised version we will expand the Results section to report mean episode lengths with standard errors across multiple random seeds for both environments, include ablation studies that remove or modify individual steps of the vicarious conditioning process, and add comparisons against relevant baselines (standard PPO, other intrinsic reward methods, and random exploration). These quantitative results and analyses will directly support the claim that the mechanism discourages non-descriptive terminals. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on external psychological literature and empirical evaluation

full rationale

The paper grounds vicarious conditioning in external psychological and biological literature, then implements the four steps (attention, retention, reproduction, reinforcement) via memory-based methods without providing equations that reduce the intrinsic reward to a fitted parameter or self-referential definition. The central result—longer episodes via avoidance of non-descriptive terminals—is presented as an empirical outcome across MiniWorld and CarRacing environments rather than a derivation that collapses to its own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results is shown to substitute for independent derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the premise that psychological literature supplies a sound basis for the four conditioning steps and that memory-based methods can realize them in RL without demonstrator internals.

axioms (1)
  • domain assumption Psychological and biological literature on vicarious conditioning provides a valid and transferable foundation for designing intrinsic rewards in artificial agents.
    Invoked to justify the four-step implementation.
invented entities (1)
  • Vicarious conditioning intrinsic reward no independent evidence
    purpose: To enable learning from demonstrators without their policies or reward functions via memory-based attention, retention, reproduction, and reinforcement.
    New mechanism introduced to overcome direct-sampling limitations of prior intrinsic methods.

pith-pipeline@v0.9.0 · 5536 in / 1321 out tokens · 34627 ms · 2026-05-13T05:57:29.463570+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 1 internal anchor

  1. [1]

    Observational learning of fear in real time procedure

    Szczepanik, Micha and Ka \'z mierowska, Anna M and Micha owski, Jaros aw M and Wypych, Marek and Olsson, Andreas and Knapska, Ewelina. Observational learning of fear in real time procedure. Scientific Reports

  2. [2]

    2005 , issn =

    Fear Potentiation and Fear Inhibition in a Human Fear-Potentiated Startle Paradigm , journal =. 2005 , issn =. doi:10.1016/j.biopsych.2005.02.025 , author =

  3. [3]

    2006 , issn =

    Extinction in Human Fear Conditioning , journal =. 2006 , issn =. doi:10.1016/j.biopsych.2005.10.006 , author =

  4. [4]

    2007 , issn =

    Recall of Fear Extinction in Humans Activates the Ventromedial Prefrontal Cortex and Hippocampus in Concert , journal =. 2007 , issn =. doi:10.1016/j.biopsych.2006.10.011 , author =

  5. [5]

    2002 , issn =

    Phobias and preparedness: the selective, automatic, and encapsulated nature of fear , journal =. 2002 , issn =. doi:10.1016/S0006-3223(02)01669-4 , author =

  6. [6]

    Behavioral and Neural Mechanisms of Overgeneralization in Anxiety

    Laufer, Offir and Israeli, David and Paz, Rony. Behavioral and Neural Mechanisms of Overgeneralization in Anxiety. Current Biology

  7. [7]

    Neurobiology of Infant Fear and Anxiety: Impacts of Delayed Amygdala Development and Attachment Figure Quality

    Sullivan, Regina M and Opendak, Maya. Neurobiology of Infant Fear and Anxiety: Impacts of Delayed Amygdala Development and Attachment Figure Quality. Biol Psychiatry

  8. [8]

    The neurobiology of anxiety disorders: brain imaging, genetics, and psychoneuroendocrinology

    Martin, Elizabeth I and Ressler, Kerry J and Binder, Elisabeth and Nemeroff, Charles B. The neurobiology of anxiety disorders: brain imaging, genetics, and psychoneuroendocrinology. Psychiatr Clin North Am

  9. [9]

    2020 , issn =

    The fear-defense system, emotions, and oxidative stress , journal =. 2020 , issn =. doi:10.1016/j.redox.2020.101588 , author =

  10. [10]

    Philosophical Transactions of the Royal Society B: Biological Sciences , volume =

    Suvrathan, Aparna and Bennur, Sharath and Ghosh, Supriya and Tomar, Anupratap and Anilkumar, Shobha and Chattarji, Sumantra , title =. Philosophical Transactions of the Royal Society B: Biological Sciences , volume =. 2014 , doi =

  11. [11]

    2015 , issn =

    Abnormal Fear Memory as a Model for Posttraumatic Stress Disorder , journal =. 2015 , issn =. doi:10.1016/j.biopsych.2015.06.017 , author =

  12. [12]

    2006 , issn =

    Extending animal models of fear conditioning to humans , journal =. 2006 , issn =. doi:10.1016/j.biopsycho.2006.01.006 , author =

  13. [13]

    , number =

    Askew, Chris and Field, Andy P. , number =. 2008 , journal =

  14. [14]

    and Hermans, Dirk and Vervliet, Bram , title =

    Craske, Michelle G. and Hermans, Dirk and Vervliet, Bram , title =. Philosophical Transactions of the Royal Society B: Biological Sciences , volume =. 2018 , doi =

  15. [15]

    Gorman and Steven P

    Jack M. Gorman and Steven P. Roose , title =. Journal of the American Psychoanalytic Association , volume =. 2011 , doi =

  16. [16]

    and Milad, Mohammed R

    Marin, Marie France and Bilodeau-Houle, Alexe and Morand-Beaulieu, Simon and Brouillard, Alexandra and Herringa, Ryan J. and Milad, Mohammed R. , number =. 2020 , journal =. doi:10.1038/S41598-020-74170-1 , issn =

  17. [17]

    2013 , issn =

    Role of amygdala in drug memory , journal =. 2013 , issn =. doi:10.1016/j.nlm.2013.06.017 , author =

  18. [18]

    European conference on computer vision , pages=

    A siamese long short-term memory architecture for human re-identification , author=. European conference on computer vision , pages=. 2016 , organization=

  19. [19]

    Neural computation , volume=

    Long short-term memory , author=. Neural computation , volume=. 1997 , publisher=

  20. [20]

    Neurobiology of addiction: a neurocircuitry analysis

    Koob, George F and Volkow, Nora D. Neurobiology of addiction: a neurocircuitry analysis. Lancet Psychiatry

  21. [21]

    The amygdala and the pursuit of future rewards

    Johnson, S Tobias and Grabenhorst, Fabian. The amygdala and the pursuit of future rewards. Front Neurosci

  22. [22]

    2025 , issn =

    Learning to fear novel stimuli by observing others in the social affordance framework , journal =. 2025 , issn =. doi:10.1016/j.neubiorev.2025.106006 , author =

  23. [23]

    Psychobiological mechanisms underlying the social buffering of the hypothalamic-pituitary-adrenocortical axis: a review of animal models and human studies across development

    Hostinar, Camelia E and Sullivan, Regina M and Gunnar, Megan R. Psychobiological mechanisms underlying the social buffering of the hypothalamic-pituitary-adrenocortical axis: a review of animal models and human studies across development. Psychol Bull

  24. [24]

    and Barto, Andrew G

    Sutton, Richard S. and Barto, Andrew G. , publisher =. Reinforcement Learning:. 1998 , address =

  25. [25]

    R. S. Sutton and D. McAllester and S. Singh and Y. Mansour. Policy Gradient Methods for Reinforcement Learning with Function Approximation. Advances in Neural Information Processing Systems 12. 2000

  26. [26]

    R. J. Williams. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning. 1992

  27. [27]

    Neural Turing Machines

    Neural Turing Machines , author =. arXiv preprint arXiv:1410.5401 , year =

  28. [28]

    Advances in Neural Information Processing Systems , volume =

    You only live once: Single-life reinforcement learning , author =. Advances in Neural Information Processing Systems , volume =

  29. [29]

    International Conference on Learning Representations , year =

    Hopfield Networks is All You Need , author =. International Conference on Learning Representations , year =

  30. [30]

    Neural networks and physical systems with emergent collective computational abilities

    Hopfield, J J. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci U S A

  31. [31]

    Nature Communications , volume =

    Experimentally validated memristive memory augmented neural network with efficient hashing and similarity search , author =. Nature Communications , volume =. 2022 , doi =

  32. [32]

    XMem : Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

    Cheng, Ho Kei and Schwing, Alexander G. XMem : Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model. Computer Vision -- ECCV 2022. 2022

  33. [33]

    Proceedings of the 39th International Conference on Machine Learning , pages =

    Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation , author =. Proceedings of the 39th International Conference on Machine Learning , pages =. 2022 , volume =

  34. [34]

    2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , title =

    Correia, Andr. 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , title =. 2023 , pages =

  35. [35]

    Proceedings of the 38th International Conference on Machine Learning , pages =

    Reinforcement Learning with Prototypical Representations , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , volume =

  36. [36]

    Proceedings of the 39th International Conference on Machine Learning , pages =

    Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning , author =. Proceedings of the 39th International Conference on Machine Learning , pages =. 2022 , volume =

  37. [37]

    Proceedings of the 39th International Conference on Machine Learning , pages =

    Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations , author =. Proceedings of the 39th International Conference on Machine Learning , pages =. 2022 , volume =

  38. [38]

    Generative Adversarial Imitation Learning , volume =

    Ho, Jonathan and Ermon, Stefano , booktitle =. Generative Adversarial Imitation Learning , volume =

  39. [39]

    , booktitle =

    Wu, Bohan and Xu, Feng and He, Zhanpeng and Gupta, Abhi and Allen, Peter K. , booktitle =. 2020 , pages =

  40. [40]

    Proceedings of The 6th Conference on Robot Learning , pages =

    Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward , author =. Proceedings of The 6th Conference on Robot Learning , pages =. 2023 , volume =

  41. [41]

    2022 , volume =

    Kang, Katie and Gradu, Paula and Choi, Jason J and Janner, Michael and Tomlin, Claire and Levine, Sergey , booktitle =. 2022 , volume =

  42. [42]

    Proceedings of the AAAI Conference on Artificial Intelligence , author =

    Deep Q-learning From Demonstrations , volume =. Proceedings of the AAAI Conference on Artificial Intelligence , author =. 2018 , month =. doi:10.1609/aaai.v32i1.11757 , number =

  43. [43]

    Highway Exiting Planner for Automated Vehicles Using Reinforcement Learning , year =

    Cao, Zhong and Yang, Diange and Xu, Shaobing and Peng, Huei and Li, Boqi and Feng, Shuo and Zhao, Ding , journal =. Highway Exiting Planner for Automated Vehicles Using Reinforcement Learning , year =

  44. [44]

    , booktitle =

    Bouton, Maxime and Nakhaei, Alireza and Isele, David and Fujimura, Kikuo and Kochenderfer, Mykel J. , booktitle =. Reinforcement Learning with Iterative Reasoning for Merging in Dense Traffic , year =

  45. [45]

    Interaction-aware Decision Making with Adaptive Strategies under Merging Scenarios , year =

    Hu, Yeping and Nakhaei, Alireza and Tomizuka, Masayoshi and Fujimura, Kikuo , booktitle =. Interaction-aware Decision Making with Adaptive Strategies under Merging Scenarios , year =

  46. [46]

    IEEE Transactions on Robotics , title =

    Eteke, Cem and Keb. IEEE Transactions on Robotics , title =. 2021 , volume =

  47. [47]

    Neural Computation , volume =

    Efficient training of artificial neural networks for autonomous navigation , author =. Neural Computation , volume =. 1991 , publisher =

  48. [48]

    Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics , pages =

    A reduction of imitation learning and structured prediction to no-regret online learning , author =. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics , pages =. 2011 , organization =

  49. [49]

    Proceedings of the Twenty-First International Conference on Machine Learning , pages =

    Apprenticeship learning via inverse reinforcement learning , author =. Proceedings of the Twenty-First International Conference on Machine Learning , pages =. 2004 , organization =

  50. [50]

    Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence , pages =

    Maximum entropy inverse reinforcement learning , author =. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence , pages =

  51. [51]

    International Conference on Learning Representations , year =

    Learning robust rewards with adversarial inverse reinforcement learning , author =. International Conference on Learning Representations , year =

  52. [52]

    Proceedings of the 28th International Joint Conference on Artificial Intelligence , pages =

    Recent Advances in Imitation Learning from Observation , author =. Proceedings of the 28th International Joint Conference on Artificial Intelligence , pages =. 2019 , doi =

  53. [53]

    Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence , pages =

    Behavioral cloning from observation , author =. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence , pages =

  54. [54]

    2018 IEEE International Conference on Robotics and Automation , pages =

    Imitation from observation: Learning to imitate behaviors from raw video via context translation , author =. 2018 IEEE International Conference on Robotics and Automation , pages =. 2018 , organization =

  55. [55]

    International Conference on Learning Representations , year =

    Third-person imitation learning , author =. International Conference on Learning Representations , year =

  56. [56]

    Proceedings of the 33rd International Conference on Machine Learning , pages =

    Meta-Learning with Memory-Augmented Neural Networks , author =. Proceedings of the 33rd International Conference on Machine Learning , pages =. 2016 , volume =

  57. [57]

    2024 19th Annual System of Systems Engineering Conference (SoSE) , pages =

    Fear based intrinsic reward as a barrier function for continuous reinforcement learning , author =. 2024 19th Annual System of Systems Engineering Conference (SoSE) , pages =. 2024 , organization =

  58. [58]

    Krichmar , doi =

    Jinwei Xing and Takashi Nagata and Kexin Chen and Xinyun Zou and Emre Neftci and Jeffrey L. Krichmar , doi =. Domain Adaptation In Reinforcement Learning Via Latent Unified State Representation , volume =. Proceedings of the 35th AAAI Conference on Artificial Intelligence , pages =

  59. [59]

    arXiv preprint arXiv:2305.04412 , year =

    Efficient reinforcement learning for autonomous driving with parameterized skills and priors , author =. arXiv preprint arXiv:2305.04412 , year =