pith. sign in

arxiv: 2506.21872 · v2 · submitted 2025-06-27 · 💻 cs.LG · cs.AI

A Survey of Continual Reinforcement Learning

Pith reviewed 2026-05-19 08:24 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords continual reinforcement learningsurveytaxonomyknowledge storageknowledge transferlifelong learningsequential decision makingagent adaptation
0
0 comments X

The pith

Continual reinforcement learning methods fall into four categories based on how they store and transfer knowledge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The survey reviews how reinforcement learning agents can keep learning new tasks while preserving what they have already learned. It organizes the literature around metrics, tasks, benchmarks, and scenario settings. The main step forward is a taxonomy that places methods into four groups according to their approach to storing knowledge internally or moving it across tasks. Readers interested in agents that operate in changing environments gain a clearer map of which techniques reuse past experience most effectively.

Core claim

The paper proposes a new taxonomy of CRL methods, categorizing them into four types from the perspective of knowledge storage and/or transfer. This framework organizes existing approaches by whether agents retain information in shared structures, isolated components, or through explicit transfer mechanisms between tasks.

What carries the argument

The four-type taxonomy that groups methods by their knowledge storage and transfer strategies, which provides a lens for comparing how agents retain and reuse prior learning.

If this is right

  • Methods become easier to compare when grouped by their specific handling of retained knowledge.
  • Under-explored categories within the taxonomy point to concrete opportunities for new algorithm designs.
  • Benchmark suites can be expanded to evaluate performance across all four types rather than a narrow subset.
  • Researchers can identify which storage or transfer mechanisms best support long sequences of tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same storage-and-transfer lens could be applied to continual learning outside reinforcement learning to test its broader usefulness.
  • Experiments that measure knowledge retention rates per category would provide a direct test of whether the taxonomy predicts practical differences.
  • As new methods appear that combine storage and transfer in unexpected ways, the four categories may need subdivision or merging.

Load-bearing premise

That the existing body of continual reinforcement learning work can be partitioned comprehensively and usefully into four categories defined by knowledge storage and transfer.

What would settle it

A substantial set of recent CRL papers that resist assignment to any of the four categories without forcing would indicate the taxonomy does not cover the field.

Figures

Figures reproduced from arXiv: 2506.21872 by Bo An, Chaofan Pan, Jiye Liang, Tianrui Li, Wei Wei, Xin Yang, Yanhua Li.

Figure 1
Figure 1. Figure 1: The setting of CRL. Different classes arrive sequentially, and the [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A comparison of four RL paradigms. A. Definition The term “Continual Reinforcement Learning” can be bro￾ken down into two main components: “continual” and “rein￾forcement learning”. While “reinforcement learning” remains the core subject of study, the term “continual” emphasizes the extension of traditional RL to a dynamic, multi-task framework, where agents continuously learn, adapt, and retain knowledge … view at source ↗
Figure 3
Figure 3. Figure 3: The triangular balance of plasticity, stability, and scalability in CRL. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Timeline illustrating the key developments, by order and interval, in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of the general structure of a CRL method, organized by the knowledge that is stored and/or transferred. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The framework of policy reuse in CRL methods. Stored policies [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The framework of policy decomposition in CRL methods. Factor decomposition, multi-head network, hierarchical decomposition, and modular [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The framework of policy merging in CRL methods. Distillation, [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The framework of experience-focused methods. Some methods use a [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: The framework of dynamic-focused methods. Direct modeling [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: The framework of reward-focused methods. [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
read the original abstract

Reinforcement Learning (RL) is an important machine learning paradigm for solving sequential decision-making problems. Recent years have witnessed remarkable progress in this field due to the rapid development of deep neural networks. However, the success of RL currently relies on extensive training data and computational resources. In addition, RL's limited ability to generalize across tasks restricts its applicability in dynamic and real-world environments. With the arisen of Continual Learning (CL), Continual Reinforcement Learning (CRL) has emerged as a promising research direction to address these limitations by enabling agents to learn continuously, adapt to new tasks, and retain previously acquired knowledge. In this survey, we provide a comprehensive examination of CRL, focusing on its core concepts, challenges, and methodologies. Firstly, we conduct a detailed review of existing works, organizing and analyzing their metrics, tasks, benchmarks, and scenario settings. Secondly, we propose a new taxonomy of CRL methods, categorizing them into four types from the perspective of knowledge storage and/or transfer. Finally, our analysis highlights the unique challenges of CRL and provides practical insights into future directions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. This manuscript is a survey on Continual Reinforcement Learning (CRL). It reviews existing works by organizing and analyzing their metrics, tasks, benchmarks, and scenario settings. The central contribution is a proposed new taxonomy that categorizes CRL methods into four types from the perspective of knowledge storage and/or transfer. The paper concludes by highlighting unique challenges of CRL and providing practical insights into future directions.

Significance. If the taxonomy successfully partitions the surveyed literature in a comprehensive and insightful manner without forcing, the survey would offer a useful organizational framework for a growing subfield. The systematic review of metrics, tasks, benchmarks, and scenarios adds practical value for standardizing evaluation and comparison across CRL studies.

minor comments (3)
  1. Abstract: 'With the arisen of Continual Learning' should be revised to 'With the rise of Continual Learning' or 'With the emergence of Continual Learning' for grammatical accuracy.
  2. Taxonomy section: Provide explicit criteria or decision rules used to assign methods to each of the four categories, with at least one concrete example per category drawn from the reviewed literature to demonstrate the partition's utility and lack of overlap.
  3. Benchmarks and scenarios review: Include a summary table listing the key benchmarks, their task characteristics, and which taxonomy category the associated methods primarily fall into, to improve readability and allow quick cross-referencing.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of our survey on Continual Reinforcement Learning and for recommending minor revision. We appreciate the recognition of the value in our review of metrics, tasks, benchmarks, and scenario settings, as well as the proposed taxonomy based on knowledge storage and transfer.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

This is a survey paper whose central contribution is a literature review and a proposed organizational taxonomy of existing CRL methods into four categories based on knowledge storage and/or transfer. No derivations, equations, predictions, fitted parameters, or mathematical claims are present. The taxonomy is explicitly framed as an organizational framework derived from reviewing the body of prior work rather than from any self-referential definitions, self-citations that bear the load of the argument, or reductions of results to inputs by construction. The paper is therefore self-contained as a descriptive survey with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey the central claims rest on the representativeness of the selected literature and the utility of the proposed four-type taxonomy; no free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5727 in / 1112 out tokens · 35270 ms · 2026-05-19T08:24:15.176799+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments

    cs.RO 2026-04 unverdicted novelty 6.0

    Robots autonomously convert LLM-guided experiences into a reusable local method library, reducing average execution time from 7.7772s to 6.7779s and LLM calls per task from 1.0 to 0.2 in repeated-task experiments.

Reference graph

Works this paper leans on

209 extracted references · 209 canonical work pages · cited by 1 Pith paper · 11 internal anchors

  1. [1]

    R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 2018

  2. [2]

    Human-level control through deep reinforcement learning,

    V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al. , “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015

  3. [3]

    A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,

    D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Si- monyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,” Science, vol. 362, no. 6419, pp. 1140–1144, 2018

  4. [4]

    Improved prediction of protein-protein interactions using AlphaFold2,

    P. Bryant, G. Pozzati, and A. Elofsson, “Improved prediction of protein-protein interactions using AlphaFold2,” Nature Communica- tions, vol. 13, no. 1, p. 1265, 2022

  5. [5]

    Learning high-accuracy error decoding for quantum processors,

    J. Bausch, A. W. Senior, F. J. H. Heras, T. Edlich, A. Davies, M. New- man, C. Jones, K. Satzinger, M. Y . Niu, S. Blackwell, G. Holland, D. Kafri, J. Atalaya, C. Gidney, D. Hassabis, S. Boixo, H. Neven, and P. Kohli, “Learning high-accuracy error decoding for quantum processors,” Nature, vol. 635, no. 8040, pp. 834–840, 2024

  6. [6]

    Training language models to follow instructions with human feedback,

    L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. F. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” in NeurIPS, vol. 35, 2022, pp. 27 730–27 744

  7. [7]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    DeepSeek-AI, D. Guo, and D. Y . et al., “DeepSeek-R1: Incentiviz- ing reasoning capability in LLMs via reinforcement learning,” ArXiv preprint, vol. abs/2501.12948, 2025

  8. [8]

    Deepther- mal: Combustion optimization for thermal power generating units using offline reinforcement learning,

    X. Zhan, H. Xu, Y . Zhang, X. Zhu, H. Yin, and Y . Zheng, “Deepther- mal: Combustion optimization for thermal power generating units using offline reinforcement learning,” inAAAI, vol. 36, no. 4, 2022, pp. 4680– 4688

  9. [9]

    Magnetic control of tokamak plasmas through deep reinforcement learning,

    J. Degrave, F. Felici, J. Buchli, M. Neunert, B. Tracey, F. Carpanese, T. Ewalds, R. Hafner, A. Abdolmaleki, D. de las Casas, C. Donner, L. Fritz, C. Galperti, A. Huber, J. Keeling, M. Tsimpoukelli, J. Kay, A. Merle, J.-M. Moret, S. Noury, F. Pesamosca, D. Pfau, O. Sauter, C. Sommariva, S. Coda, B. Duval, A. Fasoli, P. Kohli, K. Kavukcuoglu, D. Hassabis, ...

  10. [10]

    Dense reinforcement learning for safety validation of autonomous vehicles,

    S. Feng, H. Sun, X. Yan, H. Zhu, Z. Zou, S. Shen, and H. X. Liu, “Dense reinforcement learning for safety validation of autonomous vehicles,” Nature, vol. 615, no. 7953, pp. 620–627, 2023

  11. [11]

    Grandmaster level in StarCraft II using multi-agent reinforcement learning,

    O. Vinyals, I. Babuschkin, and W. M. e. a. Czarnecki, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019

  12. [12]

    Towards sample efficient reinforcement learning,

    Y . Yu, “Towards sample efficient reinforcement learning,” in IJCAI, 2018, pp. 5739–5743

  13. [13]

    Ding and H

    Z. Ding and H. Dong, Challenges of Reinforcement Learning. Springer Singapore, 2020

  14. [14]

    Challenges of real-world reinforcement learning: definitions, benchmarks and analysis,

    G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru, S. Gowal, and T. Hester, “Challenges of real-world reinforcement learning: definitions, benchmarks and analysis,”Machine Learning, vol. 110, no. 9, pp. 2419–2468, 2021

  15. [15]

    Biological underpinnings for lifelong learning machines,

    D. Kudithipudi, M. Aguilar-Simon, and J. e. a. Babb, “Biological underpinnings for lifelong learning machines,” Nature Machine Intel- ligence, vol. 4, no. 3, pp. 196–210, 2022

  16. [16]

    Continual lifelong learning with neural networks: A review,

    G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual lifelong learning with neural networks: A review,” Neural Networks, vol. 113, pp. 54–71, 2019

  17. [17]

    A continual learning survey: Defying forgetting in classification tasks,

    M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE Transactions on Pattern Anal- ysis and Machine Intelligence , vol. 44, no. 7, pp. 3366–3385, 2022

  18. [18]

    A comprehensive survey of continual learning: Theory, method and application,

    L. Wang, X. Zhang, H. Su, and J. Zhu, “A comprehensive survey of continual learning: Theory, method and application,” IEEE Transac- tions on Pattern Analysis and Machine Intelligence , vol. 46, no. 8, pp. 5362–5383, 2024

  19. [19]

    Federated continual learning via knowledge fusion: A survey,

    X. Yang, H. Yu, X. Gao, H. Wang, J. Zhang, and T. Li, “Federated continual learning via knowledge fusion: A survey,”IEEE Transactions on Knowledge and Data Engineering , vol. 36, no. 8, pp. 3832–3850, 2024

  20. [20]

    CHILD: A first step towards continual learning,

    M. B. Ring, “CHILD: A first step towards continual learning,” Machine Learning, vol. 28, no. 1, pp. 77–104, 1997

  21. [21]

    Towards continual reinforcement learning: A review and perspectives,

    K. Khetarpal, M. Riemer, I. Rish, and D. Precup, “Towards continual reinforcement learning: A review and perspectives,” Journal of Artifi- cial Intelligence Research , vol. 75, pp. 1401–1476, 2022

  22. [22]

    Fast TRAC: A parameter-free optimizer for lifelong reinforcement learning,

    A. Muppidi, Z. Zhang, and H. Yang, “Fast TRAC: A parameter-free optimizer for lifelong reinforcement learning,” in NeurIPS, 2024

  23. [23]

    A comprehensive survey of forgetting in deep learning beyond continual learning,

    Z. Wang, E. Yang, L. Shen, and H. Huang, “A comprehensive survey of forgetting in deep learning beyond continual learning,” IEEE Trans- actions on Pattern Analysis and Machine Intelligence , vol. 47, no. 3, pp. 1464–1483, 2025. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. XX, XX 2025 23

  24. [24]

    Markov decision processes,

    M. L. Puterman, “Markov decision processes,” in Stochastic Models , 1990, vol. 2, pp. 331–434

  25. [25]

    R. J. Boucherie and N. M. Van Dijk, Markov Decision Processes in Practice. Springer, 2017

  26. [26]

    Prioritized experience replay,

    T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” in ICLR, 2016, pp. 1–21

  27. [27]

    Dueling network architectures for deep reinforcement learning,

    Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and N. de Freitas, “Dueling network architectures for deep reinforcement learning,” in ICML, vol. 48, 2016, pp. 1995–2003

  28. [28]

    Deep recurrent Q-Learning for partially observable mdps,

    M. J. Hausknecht and P. Stone, “Deep recurrent Q-Learning for partially observable mdps,” in AAAI Fall Symposia, 2015, pp. 29–37

  29. [29]

    Asynchronous methods for deep reinforcement learning,

    V . Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in ICML, vol. 48, 2016, pp. 1928–1937

  30. [30]

    Continuous control with deep reinforce- ment learning,

    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning,” in ICLR, 2016, pp. 1–14

  31. [31]

    Addressing function ap- proximation error in actor-critic methods,

    S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function ap- proximation error in actor-critic methods,” in ICML, vol. 80, 2018, pp. 1582–1591

  32. [32]

    Trust region policy optimization,

    J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, “Trust region policy optimization,” in ICML, vol. 37, 2015, pp. 1889–1897

  33. [33]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” ArXiv preprint , vol. abs/1707.06347, 2017

  34. [34]

    Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

    T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” in ICML, vol. 80, 2018, pp. 1856–1865

  35. [35]

    A wholistic view of continual learning with deep neural networks: Forgotten lessons and the bridge to active and open world learning,

    M. Mundt, Y . Hong, I. Pliushch, and V . Ramesh, “A wholistic view of continual learning with deep neural networks: Forgotten lessons and the bridge to active and open world learning,” Neural Networks, vol. 160, pp. 306–336, 2023

  36. [36]

    Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges,

    T. Lesort, V . Lomonaco, A. Stoian, D. Maltoni, D. Filliat, and N. Díaz- Rodríguez, “Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges,” Information Fusion, vol. 58, pp. 52–68, 2020

  37. [37]

    Continual variational autoencoder via continual generative knowledge distillation,

    F. Ye and A. G. Bors, “Continual variational autoencoder via continual generative knowledge distillation,” in AAAI, vol. 37, no. 9, 2023, pp. 10 918–10 926

  38. [38]

    iCaRL: Incremental classifier and representation learning,

    S. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “iCaRL: Incremental classifier and representation learning,” in CVPR, 2017, pp. 5533–5542

  39. [39]

    Continual learning with deep generative replay,

    H. Shin, J. K. Lee, J. Kim, and J. Kim, “Continual learning with deep generative replay,” in NeurIPS, vol. 30, 2017, pp. 2990–2999

  40. [40]

    Class-incremental learning via deep model consolida- tion,

    J. Zhang, J. Zhang, S. Ghosh, D. Li, S. Tasci, L. Heck, H. Zhang, and C.-C. J. Kuo, “Class-incremental learning via deep model consolida- tion,” in WACV, 2020

  41. [41]

    Differential privacy preservation in robust continual learning,

    A. Hassanpour, M. Moradikia, B. Yang, A. Abdelhadi, C. Busch, and J. Fierrez, “Differential privacy preservation in robust continual learning,” IEEE Access, vol. 10, pp. 24 273–24 287, 2022

  42. [42]

    Overcoming catastrophic forgetting in neural networks,

    J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural networks,”Proceedings of the National Academy of Sciences , vol. 114, no. 13, pp. 3521–3526, 2017

  43. [43]

    Continual learning via inter-task synaptic mapping,

    F. Mao, W. Weng, M. Pratama, and E. Y . K. Yee, “Continual learning via inter-task synaptic mapping,” Knowledge-Based Systems, vol. 222, p. 106947, 2021

  44. [44]

    Learning without forgetting,

    Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 40, no. 12, pp. 2935–2947, 2018

  45. [45]

    Class-incremental learning by knowl- edge distillation with adaptive feature consolidation,

    M. Kang, J. Park, and B. Han, “Class-incremental learning by knowl- edge distillation with adaptive feature consolidation,” in CVPR, 2022, pp. 16 050–16 059

  46. [46]

    PackNet: Adding multiple tasks to a single network by iterative pruning,

    A. Mallya and S. Lazebnik, “PackNet: Adding multiple tasks to a single network by iterative pruning,” in CVPR, 2018, pp. 7765–7773

  47. [47]

    Piggyback: Adapting a single network to multiple tasks by learning to mask weights,

    A. Mallya, D. Davis, and S. Lazebnik, “Piggyback: Adapting a single network to multiple tasks by learning to mask weights,” in ECCV, 2018

  48. [48]

    Lifelong generative modelling using dynamic expansion graph model,

    F. Ye and A. G. Bors, “Lifelong generative modelling using dynamic expansion graph model,” in AAAI, vol. 36, no. 8, 2022, pp. 8857–8865

  49. [49]

    Few-shot incremental learning with continually evolved classifiers,

    C. Zhang, N. Song, G. Lin, Y . Zheng, P. Pan, and Y . Xu, “Few-shot incremental learning with continually evolved classifiers,” in CVPR, 2021, pp. 12 455–12 464

  50. [50]

    Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines

    Y .-C. Hsu, Y .-C. Liu, and Z. Kira, “Re-evaluating continual learning scenarios: A categorization and case for strong baselines,” ArXiv preprint, vol. abs/1810.12488, 2018

  51. [51]

    Three types of incremental learning,

    G. M. van de Ven, T. Tuytelaars, and A. S. Tolias, “Three types of incremental learning,” Nature Machine Intelligence, vol. 4, no. 12, pp. 1185–1197, 2022

  52. [52]

    A definition of continual reinforcement learning,

    D. Abel, A. Barreto, B. V . Roy, D. Precup, H. P. van Hasselt, and S. Singh, “A definition of continual reinforcement learning,” in NeurIPS, vol. 36, 2023, pp. 50 377–50 407

  53. [53]

    Loss of plasticity in continual deep reinforcement learning,

    Z. Abbas, R. Zhao, J. Modayil, A. White, and M. C. Machado, “Loss of plasticity in continual deep reinforcement learning,” in CoLLAs, vol. 232, 2023, pp. 620–636

  54. [54]

    A survey of multi-task deep reinforcement learning,

    N. Vithayathil Varghese and Q. H. Mahmoud, “A survey of multi-task deep reinforcement learning,” Electronics, vol. 9, no. 9, 2020

  55. [55]

    Transfer learning in deep reinforcement learning: A survey,

    Z. Zhu, K. Lin, A. K. Jain, and J. Zhou, “Transfer learning in deep reinforcement learning: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 45, no. 11, pp. 13 344–13 362, 2023

  56. [56]

    Continual reinforcement learning with complex synapses,

    C. Kaplanis, M. Shanahan, and C. Clopath, “Continual reinforcement learning with complex synapses,” in ICML, vol. 80, 2018, pp. 2502– 2511

  57. [57]

    Loss of plasticity in deep continual learning,

    S. Dohare, J. F. Hernandez-Garcia, Q. Lan, P. Rahman, A. R. Mah- mood, and R. S. Sutton, “Loss of plasticity in deep continual learning,” Nature, vol. 632, no. 8026, pp. 768–774, 2024

  58. [58]

    Plasticity Loss in Deep Reinforcement Learning: A Survey

    T. Klein, L. Miklautz, K. Sidak, C. Plant, and S. Tschiatschek, “Plas- ticity loss in deep reinforcement learning: A survey,” ArXiv preprint, vol. abs/2411.04832, 2024

  59. [59]

    A study of plasticity loss in on-policy deep reinforcement learning,

    A. Juliani and J. T. Ash, “A study of plasticity loss in on-policy deep reinforcement learning,” in NeurIPS, vol. 37, 2024, pp. 113 884– 113 910

  60. [60]

    Contin- ual world: A robotic benchmark for continual reinforcementlearning,

    M. Wolczyk, M. Zajac, R. Pascanu, L. Kucinski, and P. Milos, “Contin- ual world: A robotic benchmark for continual reinforcementlearning,” in NeurIPS, vol. 34, 2021, pp. 28 496–28 510

  61. [61]

    Disentangling transfer in continual reinforcement learning,

    ——, “Disentangling transfer in continual reinforcement learning,” in NeurIPS, vol. 35, 2022, pp. 6304–6317

  62. [62]

    Self-composing policies for scalable continual reinforcement learning,

    M. Malagon, J. Ceberio, and J. A. Lozano, “Self-composing policies for scalable continual reinforcement learning,” inICML, vol. 235, 2024, pp. 34 432–34 460

  63. [63]

    Continuous coordination as a realistic scenario for lifelong learning,

    H. Nekoei, A. Badrinaaraayanan, A. C. Courville, and S. Chandar, “Continuous coordination as a realistic scenario for lifelong learning,” in ICML, vol. 139, 2021, pp. 8016–8024

  64. [64]

    L2Explorer: A lifelong reinforcement learning assessment environment,

    E. C. Johnson, E. Q. Nguyen, B. Schreurs, C. S. Ewulum, C. Ashcraft, N. M. Fendley, M. M. Baker, A. New, and G. K. Vallabha, “L2Explorer: A lifelong reinforcement learning assessment environment,” ArXiv preprint, vol. abs/2203.07454, 2022

  65. [65]

    Building a subspace of policies for scalable continual learning,

    J. Gaya, T. Doan, L. Caccia, L. Soulier, L. Denoyer, and R. Raileanu, “Building a subspace of policies for scalable continual learning,” in ICLR, 2023, pp. 1–28

  66. [66]

    Model-based lifelong reinforcement learning with bayesian exploration,

    H. Fu, S. Yu, M. Littman, and G. Konidaris, “Model-based lifelong reinforcement learning with bayesian exploration,” in NeurIPS, vol. 35, 2022, pp. 32 369–32 382

  67. [67]

    Continual reinforcement learning in 3D non-stationary environments,

    V . Lomonaco, K. Desai, E. Culurciello, and D. Maltoni, “Continual reinforcement learning in 3D non-stationary environments,” in CVPR, 2020

  68. [68]

    CORA: Benchmarks, baselines, and metrics as a platform for continual rein- forcement learning agents,

    S. Powers, E. Xing, E. Kolve, R. Mottaghi, and A. Gupta, “CORA: Benchmarks, baselines, and metrics as a platform for continual rein- forcement learning agents,” in CoLLAs, vol. 199, 2022, pp. 705–743

  69. [70]

    COOM: A game benchmark for continual reinforcement learning,

    T. Tomilin, M. Fang, Y . Zhang, and M. Pechenizkiy, “COOM: A game benchmark for continual reinforcement learning,” in NeurIPS, vol. 36, 2023, pp. 67 794–67 832

  70. [71]

    Policy and value transfer in lifelong reinforcement learning,

    D. Abel, Y . Jinnai, S. Y . Guo, G. D. Konidaris, and M. L. Littman, “Policy and value transfer in lifelong reinforcement learning,” inICML, vol. 80, 2018, pp. 20–29

  71. [72]

    Minigrid & miniworld: Modular & customizable reinforcement learning envi- ronments for goal-oriented tasks,

    M. Chevalier-Boisvert, B. Dai, M. Towers, R. Perez-Vicente, L. Willems, S. Lahlou, S. Pal, P. S. Castro, and J. Terry, “Minigrid & miniworld: Modular & customizable reinforcement learning envi- ronments for goal-oriented tasks,” in NeurIPS, 2023

  72. [73]

    DeepMind Lab

    C. Beattie, J. Z. Leibo, D. Teplyashin, T. Ward, M. Wainwright, H. Küt- tler, A. Lefrancq, S. Green, V . Valdés, A. Sadiket al., “Deepmind lab,” ArXiv preprint, vol. abs/1612.03801, 2016

  73. [74]

    Progress & compress: A scalable framework for continual learning,

    J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska-Barwinska, Y . W. Teh, R. Pascanu, and R. Hadsell, “Progress & compress: A scalable framework for continual learning,” in ICML, vol. 80, 2018, pp. 4535– 4544

  74. [75]

    Gymnasium: A Standard Interface for Reinforcement Learning Environments

    M. Towers, A. Kwiatkowski, J. Terry, J. U. Balis, G. D. Cola, T. Deleu, M. Goulão, A. Kallinteris, M. Krimmel, A. KG, R. Perez-Vicente, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. XX, XX 2025 24 A. Pierré, S. Schulhoff, J. J. Tai, H. Tan, and O. G. Younis, “Gymna- sium: A standard interface for reinforcement learning envir...

  75. [76]

    Same state, different task: Continual reinforcement learning without interference,

    S. Kessler, J. Parker-Holder, P. J. Ball, S. Zohren, and S. J. Roberts, “Same state, different task: Continual reinforcement learning without interference,” in AAAI, vol. 36, no. 7, 2022, pp. 7143–7151

  76. [77]

    MuJoCo: A physics engine for model-based control,

    E. Todorov, T. Erez, and Y . Tassa, “MuJoCo: A physics engine for model-based control,” in IROS, 2012, pp. 5026–5033

  77. [78]

    Policy consolidation for continual reinforcement learning,

    C. Kaplanis, M. Shanahan, and C. Clopath, “Policy consolidation for continual reinforcement learning,” in ICML, vol. 97, 2019, pp. 3242– 3251

  78. [79]

    IMPALA: scalable distributed deep-RL with impor- tance weighted actor-learner architectures,

    L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V . Mnih, T. Ward, Y . Doron, V . Firoiu, T. Harley, I. Dunning, S. Legg, and K. Kavukcuoglu, “IMPALA: scalable distributed deep-RL with impor- tance weighted actor-learner architectures,” in ICML, vol. 80, 2018, pp. 1406–1415

  79. [80]

    Prediction and control in continual rein- forcement learning,

    N. Anand and D. Precup, “Prediction and control in continual rein- forcement learning,” in NeurIPS, vol. 36, 2023, pp. 63 779–63 817

  80. [81]

    The arcade learning environment: An evaluation platform for general agents,

    M. G. Bellemare, Y . Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” Journal of Artificial Intelligence Research, vol. 47, no. 1, pp. 253–279, 2013

Showing first 80 references.