A Survey of Continual Reinforcement Learning

Bo An; Chaofan Pan; Jiye Liang; Tianrui Li; Wei Wei; Xin Yang; Yanhua Li

arxiv: 2506.21872 · v2 · submitted 2025-06-27 · 💻 cs.LG · cs.AI

A Survey of Continual Reinforcement Learning

Chaofan Pan , Xin Yang , Yanhua Li , Wei Wei , Tianrui Li , Bo An , Jiye Liang This is my paper

Pith reviewed 2026-05-19 08:24 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords continual reinforcement learningsurveytaxonomyknowledge storageknowledge transferlifelong learningsequential decision makingagent adaptation

0 comments

The pith

Continual reinforcement learning methods fall into four categories based on how they store and transfer knowledge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The survey reviews how reinforcement learning agents can keep learning new tasks while preserving what they have already learned. It organizes the literature around metrics, tasks, benchmarks, and scenario settings. The main step forward is a taxonomy that places methods into four groups according to their approach to storing knowledge internally or moving it across tasks. Readers interested in agents that operate in changing environments gain a clearer map of which techniques reuse past experience most effectively.

Core claim

The paper proposes a new taxonomy of CRL methods, categorizing them into four types from the perspective of knowledge storage and/or transfer. This framework organizes existing approaches by whether agents retain information in shared structures, isolated components, or through explicit transfer mechanisms between tasks.

What carries the argument

The four-type taxonomy that groups methods by their knowledge storage and transfer strategies, which provides a lens for comparing how agents retain and reuse prior learning.

If this is right

Methods become easier to compare when grouped by their specific handling of retained knowledge.
Under-explored categories within the taxonomy point to concrete opportunities for new algorithm designs.
Benchmark suites can be expanded to evaluate performance across all four types rather than a narrow subset.
Researchers can identify which storage or transfer mechanisms best support long sequences of tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same storage-and-transfer lens could be applied to continual learning outside reinforcement learning to test its broader usefulness.
Experiments that measure knowledge retention rates per category would provide a direct test of whether the taxonomy predicts practical differences.
As new methods appear that combine storage and transfer in unexpected ways, the four categories may need subdivision or merging.

Load-bearing premise

That the existing body of continual reinforcement learning work can be partitioned comprehensively and usefully into four categories defined by knowledge storage and transfer.

What would settle it

A substantial set of recent CRL papers that resist assignment to any of the four categories without forcing would indicate the taxonomy does not cover the field.

Figures

Figures reproduced from arXiv: 2506.21872 by Bo An, Chaofan Pan, Jiye Liang, Tianrui Li, Wei Wei, Xin Yang, Yanhua Li.

**Figure 2.** Figure 2: A comparison of four RL paradigms. A. Definition The term “Continual Reinforcement Learning” can be broken down into two main components: “continual” and “reinforcement learning”. While “reinforcement learning” remains the core subject of study, the term “continual” emphasizes the extension of traditional RL to a dynamic, multi-task framework, where agents continuously learn, adapt, and retain knowledge … view at source ↗

**Figure 3.** Figure 3: The triangular balance of plasticity, stability, and scalability in CRL. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Timeline illustrating the key developments, by order and interval, in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Illustration of the general structure of a CRL method, organized by the knowledge that is stored and/or transferred. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: The framework of policy reuse in CRL methods. Stored policies [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: The framework of policy decomposition in CRL methods. Factor decomposition, multi-head network, hierarchical decomposition, and modular [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: The framework of policy merging in CRL methods. Distillation, [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: The framework of experience-focused methods. Some methods use a [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: The framework of dynamic-focused methods. Direct modeling [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: The framework of reward-focused methods. [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

read the original abstract

Reinforcement Learning (RL) is an important machine learning paradigm for solving sequential decision-making problems. Recent years have witnessed remarkable progress in this field due to the rapid development of deep neural networks. However, the success of RL currently relies on extensive training data and computational resources. In addition, RL's limited ability to generalize across tasks restricts its applicability in dynamic and real-world environments. With the arisen of Continual Learning (CL), Continual Reinforcement Learning (CRL) has emerged as a promising research direction to address these limitations by enabling agents to learn continuously, adapt to new tasks, and retain previously acquired knowledge. In this survey, we provide a comprehensive examination of CRL, focusing on its core concepts, challenges, and methodologies. Firstly, we conduct a detailed review of existing works, organizing and analyzing their metrics, tasks, benchmarks, and scenario settings. Secondly, we propose a new taxonomy of CRL methods, categorizing them into four types from the perspective of knowledge storage and/or transfer. Finally, our analysis highlights the unique challenges of CRL and provides practical insights into future directions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey's main value is a new four-category taxonomy for CRL methods organized around knowledge storage and transfer.

read the letter

This survey's main value is a new four-category taxonomy for CRL methods organized around knowledge storage and transfer. That framing is the part worth paying attention to. The paper reviews existing CRL work in detail. It pulls together information on metrics, tasks, benchmarks, and different scenario settings. It also discusses the core challenges like catastrophic forgetting and the need for adaptation in dynamic environments. The analysis at the end gives some practical thoughts on where the field should go next. This kind of organized review is useful because the CRL literature has been expanding quickly. Having a consistent way to group methods helps people see patterns and avoid duplicating effort. On the downside, the taxonomy's effectiveness depends on whether most methods fit cleanly into those four buckets. If many approaches combine elements from more than one category, the division might not be as sharp as hoped. The paper treats the taxonomy as a helpful lens rather than the only possible one, which keeps expectations reasonable. Readers who are new to CRL or looking for a way to navigate the papers will get the most out of this. It is not a technical advance with new proofs or experiments, but it supports incremental progress by making the existing results easier to access and compare. I would send this to peer review. The structure is clear, the claims are modest, and the review appears thorough enough to warrant referee feedback.

Referee Report

0 major / 3 minor

Summary. This manuscript is a survey on Continual Reinforcement Learning (CRL). It reviews existing works by organizing and analyzing their metrics, tasks, benchmarks, and scenario settings. The central contribution is a proposed new taxonomy that categorizes CRL methods into four types from the perspective of knowledge storage and/or transfer. The paper concludes by highlighting unique challenges of CRL and providing practical insights into future directions.

Significance. If the taxonomy successfully partitions the surveyed literature in a comprehensive and insightful manner without forcing, the survey would offer a useful organizational framework for a growing subfield. The systematic review of metrics, tasks, benchmarks, and scenarios adds practical value for standardizing evaluation and comparison across CRL studies.

minor comments (3)

Abstract: 'With the arisen of Continual Learning' should be revised to 'With the rise of Continual Learning' or 'With the emergence of Continual Learning' for grammatical accuracy.
Taxonomy section: Provide explicit criteria or decision rules used to assign methods to each of the four categories, with at least one concrete example per category drawn from the reviewed literature to demonstrate the partition's utility and lack of overlap.
Benchmarks and scenarios review: Include a summary table listing the key benchmarks, their task characteristics, and which taxonomy category the associated methods primarily fall into, to improve readability and allow quick cross-referencing.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of our survey on Continual Reinforcement Learning and for recommending minor revision. We appreciate the recognition of the value in our review of metrics, tasks, benchmarks, and scenario settings, as well as the proposed taxonomy based on knowledge storage and transfer.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

This is a survey paper whose central contribution is a literature review and a proposed organizational taxonomy of existing CRL methods into four categories based on knowledge storage and/or transfer. No derivations, equations, predictions, fitted parameters, or mathematical claims are present. The taxonomy is explicitly framed as an organizational framework derived from reviewing the body of prior work rather than from any self-referential definitions, self-citations that bear the load of the argument, or reductions of results to inputs by construction. The paper is therefore self-contained as a descriptive survey with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey the central claims rest on the representativeness of the selected literature and the utility of the proposed four-type taxonomy; no free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5727 in / 1112 out tokens · 35270 ms · 2026-05-19T08:24:15.176799+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose a new taxonomy of CRL methods, categorizing them into four types from the perspective of knowledge storage and/or transfer... policy-focused, experience-focused, dynamic-focused, and reward-focused methods
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

triangular balance among plasticity, stability, and scalability... stability-plasticity dilemma

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments
cs.RO 2026-04 unverdicted novelty 6.0

Robots autonomously convert LLM-guided experiences into a reusable local method library, reducing average execution time from 7.7772s to 6.7779s and LLM calls per task from 1.0 to 0.2 in repeated-task experiments.

Reference graph

Works this paper leans on

209 extracted references · 209 canonical work pages · cited by 1 Pith paper · 11 internal anchors

[1]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 2018

work page 2018
[2]

Human-level control through deep reinforcement learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al. , “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015

work page 2015
[3]

A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,

D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Si- monyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,” Science, vol. 362, no. 6419, pp. 1140–1144, 2018

work page 2018
[4]

Improved prediction of protein-protein interactions using AlphaFold2,

P. Bryant, G. Pozzati, and A. Elofsson, “Improved prediction of protein-protein interactions using AlphaFold2,” Nature Communica- tions, vol. 13, no. 1, p. 1265, 2022

work page 2022
[5]

Learning high-accuracy error decoding for quantum processors,

J. Bausch, A. W. Senior, F. J. H. Heras, T. Edlich, A. Davies, M. New- man, C. Jones, K. Satzinger, M. Y . Niu, S. Blackwell, G. Holland, D. Kafri, J. Atalaya, C. Gidney, D. Hassabis, S. Boixo, H. Neven, and P. Kohli, “Learning high-accuracy error decoding for quantum processors,” Nature, vol. 635, no. 8040, pp. 834–840, 2024

work page 2024
[6]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. F. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” in NeurIPS, vol. 35, 2022, pp. 27 730–27 744

work page 2022
[7]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI, D. Guo, and D. Y . et al., “DeepSeek-R1: Incentiviz- ing reasoning capability in LLMs via reinforcement learning,” ArXiv preprint, vol. abs/2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Deepther- mal: Combustion optimization for thermal power generating units using offline reinforcement learning,

X. Zhan, H. Xu, Y . Zhang, X. Zhu, H. Yin, and Y . Zheng, “Deepther- mal: Combustion optimization for thermal power generating units using offline reinforcement learning,” inAAAI, vol. 36, no. 4, 2022, pp. 4680– 4688

work page 2022
[9]

Magnetic control of tokamak plasmas through deep reinforcement learning,

J. Degrave, F. Felici, J. Buchli, M. Neunert, B. Tracey, F. Carpanese, T. Ewalds, R. Hafner, A. Abdolmaleki, D. de las Casas, C. Donner, L. Fritz, C. Galperti, A. Huber, J. Keeling, M. Tsimpoukelli, J. Kay, A. Merle, J.-M. Moret, S. Noury, F. Pesamosca, D. Pfau, O. Sauter, C. Sommariva, S. Coda, B. Duval, A. Fasoli, P. Kohli, K. Kavukcuoglu, D. Hassabis, ...

work page 2022
[10]

Dense reinforcement learning for safety validation of autonomous vehicles,

S. Feng, H. Sun, X. Yan, H. Zhu, Z. Zou, S. Shen, and H. X. Liu, “Dense reinforcement learning for safety validation of autonomous vehicles,” Nature, vol. 615, no. 7953, pp. 620–627, 2023

work page 2023
[11]

Grandmaster level in StarCraft II using multi-agent reinforcement learning,

O. Vinyals, I. Babuschkin, and W. M. e. a. Czarnecki, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019

work page 2019
[12]

Towards sample efficient reinforcement learning,

Y . Yu, “Towards sample efficient reinforcement learning,” in IJCAI, 2018, pp. 5739–5743

work page 2018
[13]

Ding and H

Z. Ding and H. Dong, Challenges of Reinforcement Learning. Springer Singapore, 2020

work page 2020
[14]

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis,

G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru, S. Gowal, and T. Hester, “Challenges of real-world reinforcement learning: definitions, benchmarks and analysis,”Machine Learning, vol. 110, no. 9, pp. 2419–2468, 2021

work page 2021
[15]

Biological underpinnings for lifelong learning machines,

D. Kudithipudi, M. Aguilar-Simon, and J. e. a. Babb, “Biological underpinnings for lifelong learning machines,” Nature Machine Intel- ligence, vol. 4, no. 3, pp. 196–210, 2022

work page 2022
[16]

Continual lifelong learning with neural networks: A review,

G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual lifelong learning with neural networks: A review,” Neural Networks, vol. 113, pp. 54–71, 2019

work page 2019
[17]

A continual learning survey: Defying forgetting in classification tasks,

M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE Transactions on Pattern Anal- ysis and Machine Intelligence , vol. 44, no. 7, pp. 3366–3385, 2022

work page 2022
[18]

A comprehensive survey of continual learning: Theory, method and application,

L. Wang, X. Zhang, H. Su, and J. Zhu, “A comprehensive survey of continual learning: Theory, method and application,” IEEE Transac- tions on Pattern Analysis and Machine Intelligence , vol. 46, no. 8, pp. 5362–5383, 2024

work page 2024
[19]

Federated continual learning via knowledge fusion: A survey,

X. Yang, H. Yu, X. Gao, H. Wang, J. Zhang, and T. Li, “Federated continual learning via knowledge fusion: A survey,”IEEE Transactions on Knowledge and Data Engineering , vol. 36, no. 8, pp. 3832–3850, 2024

work page 2024
[20]

CHILD: A first step towards continual learning,

M. B. Ring, “CHILD: A first step towards continual learning,” Machine Learning, vol. 28, no. 1, pp. 77–104, 1997

work page 1997
[21]

Towards continual reinforcement learning: A review and perspectives,

K. Khetarpal, M. Riemer, I. Rish, and D. Precup, “Towards continual reinforcement learning: A review and perspectives,” Journal of Artifi- cial Intelligence Research , vol. 75, pp. 1401–1476, 2022

work page 2022
[22]

Fast TRAC: A parameter-free optimizer for lifelong reinforcement learning,

A. Muppidi, Z. Zhang, and H. Yang, “Fast TRAC: A parameter-free optimizer for lifelong reinforcement learning,” in NeurIPS, 2024

work page 2024
[23]

A comprehensive survey of forgetting in deep learning beyond continual learning,

Z. Wang, E. Yang, L. Shen, and H. Huang, “A comprehensive survey of forgetting in deep learning beyond continual learning,” IEEE Trans- actions on Pattern Analysis and Machine Intelligence , vol. 47, no. 3, pp. 1464–1483, 2025. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. XX, XX 2025 23

work page 2025
[24]

Markov decision processes,

M. L. Puterman, “Markov decision processes,” in Stochastic Models , 1990, vol. 2, pp. 331–434

work page 1990
[25]

R. J. Boucherie and N. M. Van Dijk, Markov Decision Processes in Practice. Springer, 2017

work page 2017
[26]

Prioritized experience replay,

T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” in ICLR, 2016, pp. 1–21

work page 2016
[27]

Dueling network architectures for deep reinforcement learning,

Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and N. de Freitas, “Dueling network architectures for deep reinforcement learning,” in ICML, vol. 48, 2016, pp. 1995–2003

work page 2016
[28]

Deep recurrent Q-Learning for partially observable mdps,

M. J. Hausknecht and P. Stone, “Deep recurrent Q-Learning for partially observable mdps,” in AAAI Fall Symposia, 2015, pp. 29–37

work page 2015
[29]

Asynchronous methods for deep reinforcement learning,

V . Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in ICML, vol. 48, 2016, pp. 1928–1937

work page 2016
[30]

Continuous control with deep reinforce- ment learning,

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning,” in ICLR, 2016, pp. 1–14

work page 2016
[31]

Addressing function ap- proximation error in actor-critic methods,

S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function ap- proximation error in actor-critic methods,” in ICML, vol. 80, 2018, pp. 1582–1591

work page 2018
[32]

Trust region policy optimization,

J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, “Trust region policy optimization,” in ICML, vol. 37, 2015, pp. 1889–1897

work page 2015
[33]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” ArXiv preprint , vol. abs/1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[34]

Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” in ICML, vol. 80, 2018, pp. 1856–1865

work page 2018
[35]

A wholistic view of continual learning with deep neural networks: Forgotten lessons and the bridge to active and open world learning,

M. Mundt, Y . Hong, I. Pliushch, and V . Ramesh, “A wholistic view of continual learning with deep neural networks: Forgotten lessons and the bridge to active and open world learning,” Neural Networks, vol. 160, pp. 306–336, 2023

work page 2023
[36]

Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges,

T. Lesort, V . Lomonaco, A. Stoian, D. Maltoni, D. Filliat, and N. Díaz- Rodríguez, “Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges,” Information Fusion, vol. 58, pp. 52–68, 2020

work page 2020
[37]

Continual variational autoencoder via continual generative knowledge distillation,

F. Ye and A. G. Bors, “Continual variational autoencoder via continual generative knowledge distillation,” in AAAI, vol. 37, no. 9, 2023, pp. 10 918–10 926

work page 2023
[38]

iCaRL: Incremental classifier and representation learning,

S. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “iCaRL: Incremental classifier and representation learning,” in CVPR, 2017, pp. 5533–5542

work page 2017
[39]

Continual learning with deep generative replay,

H. Shin, J. K. Lee, J. Kim, and J. Kim, “Continual learning with deep generative replay,” in NeurIPS, vol. 30, 2017, pp. 2990–2999

work page 2017
[40]

Class-incremental learning via deep model consolida- tion,

J. Zhang, J. Zhang, S. Ghosh, D. Li, S. Tasci, L. Heck, H. Zhang, and C.-C. J. Kuo, “Class-incremental learning via deep model consolida- tion,” in WACV, 2020

work page 2020
[41]

Differential privacy preservation in robust continual learning,

A. Hassanpour, M. Moradikia, B. Yang, A. Abdelhadi, C. Busch, and J. Fierrez, “Differential privacy preservation in robust continual learning,” IEEE Access, vol. 10, pp. 24 273–24 287, 2022

work page 2022
[42]

Overcoming catastrophic forgetting in neural networks,

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural networks,”Proceedings of the National Academy of Sciences , vol. 114, no. 13, pp. 3521–3526, 2017

work page 2017
[43]

Continual learning via inter-task synaptic mapping,

F. Mao, W. Weng, M. Pratama, and E. Y . K. Yee, “Continual learning via inter-task synaptic mapping,” Knowledge-Based Systems, vol. 222, p. 106947, 2021

work page 2021
[44]

Learning without forgetting,

Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 40, no. 12, pp. 2935–2947, 2018

work page 2018
[45]

Class-incremental learning by knowl- edge distillation with adaptive feature consolidation,

M. Kang, J. Park, and B. Han, “Class-incremental learning by knowl- edge distillation with adaptive feature consolidation,” in CVPR, 2022, pp. 16 050–16 059

work page 2022
[46]

PackNet: Adding multiple tasks to a single network by iterative pruning,

A. Mallya and S. Lazebnik, “PackNet: Adding multiple tasks to a single network by iterative pruning,” in CVPR, 2018, pp. 7765–7773

work page 2018
[47]

Piggyback: Adapting a single network to multiple tasks by learning to mask weights,

A. Mallya, D. Davis, and S. Lazebnik, “Piggyback: Adapting a single network to multiple tasks by learning to mask weights,” in ECCV, 2018

work page 2018
[48]

Lifelong generative modelling using dynamic expansion graph model,

F. Ye and A. G. Bors, “Lifelong generative modelling using dynamic expansion graph model,” in AAAI, vol. 36, no. 8, 2022, pp. 8857–8865

work page 2022
[49]

Few-shot incremental learning with continually evolved classifiers,

C. Zhang, N. Song, G. Lin, Y . Zheng, P. Pan, and Y . Xu, “Few-shot incremental learning with continually evolved classifiers,” in CVPR, 2021, pp. 12 455–12 464

work page 2021
[50]

Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines

Y .-C. Hsu, Y .-C. Liu, and Z. Kira, “Re-evaluating continual learning scenarios: A categorization and case for strong baselines,” ArXiv preprint, vol. abs/1810.12488, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[51]

Three types of incremental learning,

G. M. van de Ven, T. Tuytelaars, and A. S. Tolias, “Three types of incremental learning,” Nature Machine Intelligence, vol. 4, no. 12, pp. 1185–1197, 2022

work page 2022
[52]

A definition of continual reinforcement learning,

D. Abel, A. Barreto, B. V . Roy, D. Precup, H. P. van Hasselt, and S. Singh, “A definition of continual reinforcement learning,” in NeurIPS, vol. 36, 2023, pp. 50 377–50 407

work page 2023
[53]

Loss of plasticity in continual deep reinforcement learning,

Z. Abbas, R. Zhao, J. Modayil, A. White, and M. C. Machado, “Loss of plasticity in continual deep reinforcement learning,” in CoLLAs, vol. 232, 2023, pp. 620–636

work page 2023
[54]

A survey of multi-task deep reinforcement learning,

N. Vithayathil Varghese and Q. H. Mahmoud, “A survey of multi-task deep reinforcement learning,” Electronics, vol. 9, no. 9, 2020

work page 2020
[55]

Transfer learning in deep reinforcement learning: A survey,

Z. Zhu, K. Lin, A. K. Jain, and J. Zhou, “Transfer learning in deep reinforcement learning: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 45, no. 11, pp. 13 344–13 362, 2023

work page 2023
[56]

Continual reinforcement learning with complex synapses,

C. Kaplanis, M. Shanahan, and C. Clopath, “Continual reinforcement learning with complex synapses,” in ICML, vol. 80, 2018, pp. 2502– 2511

work page 2018
[57]

Loss of plasticity in deep continual learning,

S. Dohare, J. F. Hernandez-Garcia, Q. Lan, P. Rahman, A. R. Mah- mood, and R. S. Sutton, “Loss of plasticity in deep continual learning,” Nature, vol. 632, no. 8026, pp. 768–774, 2024

work page 2024
[58]

Plasticity Loss in Deep Reinforcement Learning: A Survey

T. Klein, L. Miklautz, K. Sidak, C. Plant, and S. Tschiatschek, “Plas- ticity loss in deep reinforcement learning: A survey,” ArXiv preprint, vol. abs/2411.04832, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[59]

A study of plasticity loss in on-policy deep reinforcement learning,

A. Juliani and J. T. Ash, “A study of plasticity loss in on-policy deep reinforcement learning,” in NeurIPS, vol. 37, 2024, pp. 113 884– 113 910

work page 2024
[60]

Contin- ual world: A robotic benchmark for continual reinforcementlearning,

M. Wolczyk, M. Zajac, R. Pascanu, L. Kucinski, and P. Milos, “Contin- ual world: A robotic benchmark for continual reinforcementlearning,” in NeurIPS, vol. 34, 2021, pp. 28 496–28 510

work page 2021
[61]

Disentangling transfer in continual reinforcement learning,

——, “Disentangling transfer in continual reinforcement learning,” in NeurIPS, vol. 35, 2022, pp. 6304–6317

work page 2022
[62]

Self-composing policies for scalable continual reinforcement learning,

M. Malagon, J. Ceberio, and J. A. Lozano, “Self-composing policies for scalable continual reinforcement learning,” inICML, vol. 235, 2024, pp. 34 432–34 460

work page 2024
[63]

Continuous coordination as a realistic scenario for lifelong learning,

H. Nekoei, A. Badrinaaraayanan, A. C. Courville, and S. Chandar, “Continuous coordination as a realistic scenario for lifelong learning,” in ICML, vol. 139, 2021, pp. 8016–8024

work page 2021
[64]

L2Explorer: A lifelong reinforcement learning assessment environment,

E. C. Johnson, E. Q. Nguyen, B. Schreurs, C. S. Ewulum, C. Ashcraft, N. M. Fendley, M. M. Baker, A. New, and G. K. Vallabha, “L2Explorer: A lifelong reinforcement learning assessment environment,” ArXiv preprint, vol. abs/2203.07454, 2022

work page arXiv 2022
[65]

Building a subspace of policies for scalable continual learning,

J. Gaya, T. Doan, L. Caccia, L. Soulier, L. Denoyer, and R. Raileanu, “Building a subspace of policies for scalable continual learning,” in ICLR, 2023, pp. 1–28

work page 2023
[66]

Model-based lifelong reinforcement learning with bayesian exploration,

H. Fu, S. Yu, M. Littman, and G. Konidaris, “Model-based lifelong reinforcement learning with bayesian exploration,” in NeurIPS, vol. 35, 2022, pp. 32 369–32 382

work page 2022
[67]

Continual reinforcement learning in 3D non-stationary environments,

V . Lomonaco, K. Desai, E. Culurciello, and D. Maltoni, “Continual reinforcement learning in 3D non-stationary environments,” in CVPR, 2020

work page 2020
[68]

CORA: Benchmarks, baselines, and metrics as a platform for continual rein- forcement learning agents,

S. Powers, E. Xing, E. Kolve, R. Mottaghi, and A. Gupta, “CORA: Benchmarks, baselines, and metrics as a platform for continual rein- forcement learning agents,” in CoLLAs, vol. 199, 2022, pp. 705–743

work page 2022
[70]

COOM: A game benchmark for continual reinforcement learning,

T. Tomilin, M. Fang, Y . Zhang, and M. Pechenizkiy, “COOM: A game benchmark for continual reinforcement learning,” in NeurIPS, vol. 36, 2023, pp. 67 794–67 832

work page 2023
[71]

Policy and value transfer in lifelong reinforcement learning,

D. Abel, Y . Jinnai, S. Y . Guo, G. D. Konidaris, and M. L. Littman, “Policy and value transfer in lifelong reinforcement learning,” inICML, vol. 80, 2018, pp. 20–29

work page 2018
[72]

Minigrid & miniworld: Modular & customizable reinforcement learning envi- ronments for goal-oriented tasks,

M. Chevalier-Boisvert, B. Dai, M. Towers, R. Perez-Vicente, L. Willems, S. Lahlou, S. Pal, P. S. Castro, and J. Terry, “Minigrid & miniworld: Modular & customizable reinforcement learning envi- ronments for goal-oriented tasks,” in NeurIPS, 2023

work page 2023
[73]

DeepMind Lab

C. Beattie, J. Z. Leibo, D. Teplyashin, T. Ward, M. Wainwright, H. Küt- tler, A. Lefrancq, S. Green, V . Valdés, A. Sadiket al., “Deepmind lab,” ArXiv preprint, vol. abs/1612.03801, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[74]

Progress & compress: A scalable framework for continual learning,

J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska-Barwinska, Y . W. Teh, R. Pascanu, and R. Hadsell, “Progress & compress: A scalable framework for continual learning,” in ICML, vol. 80, 2018, pp. 4535– 4544

work page 2018
[75]

Gymnasium: A Standard Interface for Reinforcement Learning Environments

M. Towers, A. Kwiatkowski, J. Terry, J. U. Balis, G. D. Cola, T. Deleu, M. Goulão, A. Kallinteris, M. Krimmel, A. KG, R. Perez-Vicente, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. XX, XX 2025 24 A. Pierré, S. Schulhoff, J. J. Tai, H. Tan, and O. G. Younis, “Gymna- sium: A standard interface for reinforcement learning envir...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[76]

Same state, different task: Continual reinforcement learning without interference,

S. Kessler, J. Parker-Holder, P. J. Ball, S. Zohren, and S. J. Roberts, “Same state, different task: Continual reinforcement learning without interference,” in AAAI, vol. 36, no. 7, 2022, pp. 7143–7151

work page 2022
[77]

MuJoCo: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “MuJoCo: A physics engine for model-based control,” in IROS, 2012, pp. 5026–5033

work page 2012
[78]

Policy consolidation for continual reinforcement learning,

C. Kaplanis, M. Shanahan, and C. Clopath, “Policy consolidation for continual reinforcement learning,” in ICML, vol. 97, 2019, pp. 3242– 3251

work page 2019
[79]

IMPALA: scalable distributed deep-RL with impor- tance weighted actor-learner architectures,

L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V . Mnih, T. Ward, Y . Doron, V . Firoiu, T. Harley, I. Dunning, S. Legg, and K. Kavukcuoglu, “IMPALA: scalable distributed deep-RL with impor- tance weighted actor-learner architectures,” in ICML, vol. 80, 2018, pp. 1406–1415

work page 2018
[80]

Prediction and control in continual rein- forcement learning,

N. Anand and D. Precup, “Prediction and control in continual rein- forcement learning,” in NeurIPS, vol. 36, 2023, pp. 63 779–63 817

work page 2023
[81]

The arcade learning environment: An evaluation platform for general agents,

M. G. Bellemare, Y . Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” Journal of Artificial Intelligence Research, vol. 47, no. 1, pp. 253–279, 2013

work page 2013

Showing first 80 references.

[1] [1]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 2018

work page 2018

[2] [2]

Human-level control through deep reinforcement learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al. , “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015

work page 2015

[3] [3]

A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,

D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Si- monyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,” Science, vol. 362, no. 6419, pp. 1140–1144, 2018

work page 2018

[4] [4]

Improved prediction of protein-protein interactions using AlphaFold2,

P. Bryant, G. Pozzati, and A. Elofsson, “Improved prediction of protein-protein interactions using AlphaFold2,” Nature Communica- tions, vol. 13, no. 1, p. 1265, 2022

work page 2022

[5] [5]

Learning high-accuracy error decoding for quantum processors,

J. Bausch, A. W. Senior, F. J. H. Heras, T. Edlich, A. Davies, M. New- man, C. Jones, K. Satzinger, M. Y . Niu, S. Blackwell, G. Holland, D. Kafri, J. Atalaya, C. Gidney, D. Hassabis, S. Boixo, H. Neven, and P. Kohli, “Learning high-accuracy error decoding for quantum processors,” Nature, vol. 635, no. 8040, pp. 834–840, 2024

work page 2024

[6] [6]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. F. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” in NeurIPS, vol. 35, 2022, pp. 27 730–27 744

work page 2022

[7] [7]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI, D. Guo, and D. Y . et al., “DeepSeek-R1: Incentiviz- ing reasoning capability in LLMs via reinforcement learning,” ArXiv preprint, vol. abs/2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

Deepther- mal: Combustion optimization for thermal power generating units using offline reinforcement learning,

X. Zhan, H. Xu, Y . Zhang, X. Zhu, H. Yin, and Y . Zheng, “Deepther- mal: Combustion optimization for thermal power generating units using offline reinforcement learning,” inAAAI, vol. 36, no. 4, 2022, pp. 4680– 4688

work page 2022

[9] [9]

Magnetic control of tokamak plasmas through deep reinforcement learning,

J. Degrave, F. Felici, J. Buchli, M. Neunert, B. Tracey, F. Carpanese, T. Ewalds, R. Hafner, A. Abdolmaleki, D. de las Casas, C. Donner, L. Fritz, C. Galperti, A. Huber, J. Keeling, M. Tsimpoukelli, J. Kay, A. Merle, J.-M. Moret, S. Noury, F. Pesamosca, D. Pfau, O. Sauter, C. Sommariva, S. Coda, B. Duval, A. Fasoli, P. Kohli, K. Kavukcuoglu, D. Hassabis, ...

work page 2022

[10] [10]

Dense reinforcement learning for safety validation of autonomous vehicles,

S. Feng, H. Sun, X. Yan, H. Zhu, Z. Zou, S. Shen, and H. X. Liu, “Dense reinforcement learning for safety validation of autonomous vehicles,” Nature, vol. 615, no. 7953, pp. 620–627, 2023

work page 2023

[11] [11]

Grandmaster level in StarCraft II using multi-agent reinforcement learning,

O. Vinyals, I. Babuschkin, and W. M. e. a. Czarnecki, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019

work page 2019

[12] [12]

Towards sample efficient reinforcement learning,

Y . Yu, “Towards sample efficient reinforcement learning,” in IJCAI, 2018, pp. 5739–5743

work page 2018

[13] [13]

Ding and H

Z. Ding and H. Dong, Challenges of Reinforcement Learning. Springer Singapore, 2020

work page 2020

[14] [14]

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis,

G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru, S. Gowal, and T. Hester, “Challenges of real-world reinforcement learning: definitions, benchmarks and analysis,”Machine Learning, vol. 110, no. 9, pp. 2419–2468, 2021

work page 2021

[15] [15]

Biological underpinnings for lifelong learning machines,

D. Kudithipudi, M. Aguilar-Simon, and J. e. a. Babb, “Biological underpinnings for lifelong learning machines,” Nature Machine Intel- ligence, vol. 4, no. 3, pp. 196–210, 2022

work page 2022

[16] [16]

Continual lifelong learning with neural networks: A review,

G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual lifelong learning with neural networks: A review,” Neural Networks, vol. 113, pp. 54–71, 2019

work page 2019

[17] [17]

A continual learning survey: Defying forgetting in classification tasks,

M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE Transactions on Pattern Anal- ysis and Machine Intelligence , vol. 44, no. 7, pp. 3366–3385, 2022

work page 2022

[18] [18]

A comprehensive survey of continual learning: Theory, method and application,

L. Wang, X. Zhang, H. Su, and J. Zhu, “A comprehensive survey of continual learning: Theory, method and application,” IEEE Transac- tions on Pattern Analysis and Machine Intelligence , vol. 46, no. 8, pp. 5362–5383, 2024

work page 2024

[19] [19]

Federated continual learning via knowledge fusion: A survey,

X. Yang, H. Yu, X. Gao, H. Wang, J. Zhang, and T. Li, “Federated continual learning via knowledge fusion: A survey,”IEEE Transactions on Knowledge and Data Engineering , vol. 36, no. 8, pp. 3832–3850, 2024

work page 2024

[20] [20]

CHILD: A first step towards continual learning,

M. B. Ring, “CHILD: A first step towards continual learning,” Machine Learning, vol. 28, no. 1, pp. 77–104, 1997

work page 1997

[21] [21]

Towards continual reinforcement learning: A review and perspectives,

K. Khetarpal, M. Riemer, I. Rish, and D. Precup, “Towards continual reinforcement learning: A review and perspectives,” Journal of Artifi- cial Intelligence Research , vol. 75, pp. 1401–1476, 2022

work page 2022

[22] [22]

Fast TRAC: A parameter-free optimizer for lifelong reinforcement learning,

A. Muppidi, Z. Zhang, and H. Yang, “Fast TRAC: A parameter-free optimizer for lifelong reinforcement learning,” in NeurIPS, 2024

work page 2024

[23] [23]

A comprehensive survey of forgetting in deep learning beyond continual learning,

Z. Wang, E. Yang, L. Shen, and H. Huang, “A comprehensive survey of forgetting in deep learning beyond continual learning,” IEEE Trans- actions on Pattern Analysis and Machine Intelligence , vol. 47, no. 3, pp. 1464–1483, 2025. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. XX, XX 2025 23

work page 2025

[24] [24]

Markov decision processes,

M. L. Puterman, “Markov decision processes,” in Stochastic Models , 1990, vol. 2, pp. 331–434

work page 1990

[25] [25]

R. J. Boucherie and N. M. Van Dijk, Markov Decision Processes in Practice. Springer, 2017

work page 2017

[26] [26]

Prioritized experience replay,

T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” in ICLR, 2016, pp. 1–21

work page 2016

[27] [27]

Dueling network architectures for deep reinforcement learning,

Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and N. de Freitas, “Dueling network architectures for deep reinforcement learning,” in ICML, vol. 48, 2016, pp. 1995–2003

work page 2016

[28] [28]

Deep recurrent Q-Learning for partially observable mdps,

M. J. Hausknecht and P. Stone, “Deep recurrent Q-Learning for partially observable mdps,” in AAAI Fall Symposia, 2015, pp. 29–37

work page 2015

[29] [29]

Asynchronous methods for deep reinforcement learning,

V . Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in ICML, vol. 48, 2016, pp. 1928–1937

work page 2016

[30] [30]

Continuous control with deep reinforce- ment learning,

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning,” in ICLR, 2016, pp. 1–14

work page 2016

[31] [31]

Addressing function ap- proximation error in actor-critic methods,

S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function ap- proximation error in actor-critic methods,” in ICML, vol. 80, 2018, pp. 1582–1591

work page 2018

[32] [32]

Trust region policy optimization,

J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, “Trust region policy optimization,” in ICML, vol. 37, 2015, pp. 1889–1897

work page 2015

[33] [33]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” ArXiv preprint , vol. abs/1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[34] [34]

Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” in ICML, vol. 80, 2018, pp. 1856–1865

work page 2018

[35] [35]

A wholistic view of continual learning with deep neural networks: Forgotten lessons and the bridge to active and open world learning,

M. Mundt, Y . Hong, I. Pliushch, and V . Ramesh, “A wholistic view of continual learning with deep neural networks: Forgotten lessons and the bridge to active and open world learning,” Neural Networks, vol. 160, pp. 306–336, 2023

work page 2023

[36] [36]

Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges,

T. Lesort, V . Lomonaco, A. Stoian, D. Maltoni, D. Filliat, and N. Díaz- Rodríguez, “Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges,” Information Fusion, vol. 58, pp. 52–68, 2020

work page 2020

[37] [37]

Continual variational autoencoder via continual generative knowledge distillation,

F. Ye and A. G. Bors, “Continual variational autoencoder via continual generative knowledge distillation,” in AAAI, vol. 37, no. 9, 2023, pp. 10 918–10 926

work page 2023

[38] [38]

iCaRL: Incremental classifier and representation learning,

S. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “iCaRL: Incremental classifier and representation learning,” in CVPR, 2017, pp. 5533–5542

work page 2017

[39] [39]

Continual learning with deep generative replay,

H. Shin, J. K. Lee, J. Kim, and J. Kim, “Continual learning with deep generative replay,” in NeurIPS, vol. 30, 2017, pp. 2990–2999

work page 2017

[40] [40]

Class-incremental learning via deep model consolida- tion,

J. Zhang, J. Zhang, S. Ghosh, D. Li, S. Tasci, L. Heck, H. Zhang, and C.-C. J. Kuo, “Class-incremental learning via deep model consolida- tion,” in WACV, 2020

work page 2020

[41] [41]

Differential privacy preservation in robust continual learning,

A. Hassanpour, M. Moradikia, B. Yang, A. Abdelhadi, C. Busch, and J. Fierrez, “Differential privacy preservation in robust continual learning,” IEEE Access, vol. 10, pp. 24 273–24 287, 2022

work page 2022

[42] [42]

Overcoming catastrophic forgetting in neural networks,

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural networks,”Proceedings of the National Academy of Sciences , vol. 114, no. 13, pp. 3521–3526, 2017

work page 2017

[43] [43]

Continual learning via inter-task synaptic mapping,

F. Mao, W. Weng, M. Pratama, and E. Y . K. Yee, “Continual learning via inter-task synaptic mapping,” Knowledge-Based Systems, vol. 222, p. 106947, 2021

work page 2021

[44] [44]

Learning without forgetting,

Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 40, no. 12, pp. 2935–2947, 2018

work page 2018

[45] [45]

Class-incremental learning by knowl- edge distillation with adaptive feature consolidation,

M. Kang, J. Park, and B. Han, “Class-incremental learning by knowl- edge distillation with adaptive feature consolidation,” in CVPR, 2022, pp. 16 050–16 059

work page 2022

[46] [46]

PackNet: Adding multiple tasks to a single network by iterative pruning,

A. Mallya and S. Lazebnik, “PackNet: Adding multiple tasks to a single network by iterative pruning,” in CVPR, 2018, pp. 7765–7773

work page 2018

[47] [47]

Piggyback: Adapting a single network to multiple tasks by learning to mask weights,

A. Mallya, D. Davis, and S. Lazebnik, “Piggyback: Adapting a single network to multiple tasks by learning to mask weights,” in ECCV, 2018

work page 2018

[48] [48]

Lifelong generative modelling using dynamic expansion graph model,

F. Ye and A. G. Bors, “Lifelong generative modelling using dynamic expansion graph model,” in AAAI, vol. 36, no. 8, 2022, pp. 8857–8865

work page 2022

[49] [49]

Few-shot incremental learning with continually evolved classifiers,

C. Zhang, N. Song, G. Lin, Y . Zheng, P. Pan, and Y . Xu, “Few-shot incremental learning with continually evolved classifiers,” in CVPR, 2021, pp. 12 455–12 464

work page 2021

[50] [50]

Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines

Y .-C. Hsu, Y .-C. Liu, and Z. Kira, “Re-evaluating continual learning scenarios: A categorization and case for strong baselines,” ArXiv preprint, vol. abs/1810.12488, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[51] [51]

Three types of incremental learning,

G. M. van de Ven, T. Tuytelaars, and A. S. Tolias, “Three types of incremental learning,” Nature Machine Intelligence, vol. 4, no. 12, pp. 1185–1197, 2022

work page 2022

[52] [52]

A definition of continual reinforcement learning,

D. Abel, A. Barreto, B. V . Roy, D. Precup, H. P. van Hasselt, and S. Singh, “A definition of continual reinforcement learning,” in NeurIPS, vol. 36, 2023, pp. 50 377–50 407

work page 2023

[53] [53]

Loss of plasticity in continual deep reinforcement learning,

Z. Abbas, R. Zhao, J. Modayil, A. White, and M. C. Machado, “Loss of plasticity in continual deep reinforcement learning,” in CoLLAs, vol. 232, 2023, pp. 620–636

work page 2023

[54] [54]

A survey of multi-task deep reinforcement learning,

N. Vithayathil Varghese and Q. H. Mahmoud, “A survey of multi-task deep reinforcement learning,” Electronics, vol. 9, no. 9, 2020

work page 2020

[55] [55]

Transfer learning in deep reinforcement learning: A survey,

Z. Zhu, K. Lin, A. K. Jain, and J. Zhou, “Transfer learning in deep reinforcement learning: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 45, no. 11, pp. 13 344–13 362, 2023

work page 2023

[56] [56]

Continual reinforcement learning with complex synapses,

C. Kaplanis, M. Shanahan, and C. Clopath, “Continual reinforcement learning with complex synapses,” in ICML, vol. 80, 2018, pp. 2502– 2511

work page 2018

[57] [57]

Loss of plasticity in deep continual learning,

S. Dohare, J. F. Hernandez-Garcia, Q. Lan, P. Rahman, A. R. Mah- mood, and R. S. Sutton, “Loss of plasticity in deep continual learning,” Nature, vol. 632, no. 8026, pp. 768–774, 2024

work page 2024

[58] [58]

Plasticity Loss in Deep Reinforcement Learning: A Survey

T. Klein, L. Miklautz, K. Sidak, C. Plant, and S. Tschiatschek, “Plas- ticity loss in deep reinforcement learning: A survey,” ArXiv preprint, vol. abs/2411.04832, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[59] [59]

A study of plasticity loss in on-policy deep reinforcement learning,

A. Juliani and J. T. Ash, “A study of plasticity loss in on-policy deep reinforcement learning,” in NeurIPS, vol. 37, 2024, pp. 113 884– 113 910

work page 2024

[60] [60]

Contin- ual world: A robotic benchmark for continual reinforcementlearning,

M. Wolczyk, M. Zajac, R. Pascanu, L. Kucinski, and P. Milos, “Contin- ual world: A robotic benchmark for continual reinforcementlearning,” in NeurIPS, vol. 34, 2021, pp. 28 496–28 510

work page 2021

[61] [61]

Disentangling transfer in continual reinforcement learning,

——, “Disentangling transfer in continual reinforcement learning,” in NeurIPS, vol. 35, 2022, pp. 6304–6317

work page 2022

[62] [62]

Self-composing policies for scalable continual reinforcement learning,

M. Malagon, J. Ceberio, and J. A. Lozano, “Self-composing policies for scalable continual reinforcement learning,” inICML, vol. 235, 2024, pp. 34 432–34 460

work page 2024

[63] [63]

Continuous coordination as a realistic scenario for lifelong learning,

H. Nekoei, A. Badrinaaraayanan, A. C. Courville, and S. Chandar, “Continuous coordination as a realistic scenario for lifelong learning,” in ICML, vol. 139, 2021, pp. 8016–8024

work page 2021

[64] [64]

L2Explorer: A lifelong reinforcement learning assessment environment,

E. C. Johnson, E. Q. Nguyen, B. Schreurs, C. S. Ewulum, C. Ashcraft, N. M. Fendley, M. M. Baker, A. New, and G. K. Vallabha, “L2Explorer: A lifelong reinforcement learning assessment environment,” ArXiv preprint, vol. abs/2203.07454, 2022

work page arXiv 2022

[65] [65]

Building a subspace of policies for scalable continual learning,

J. Gaya, T. Doan, L. Caccia, L. Soulier, L. Denoyer, and R. Raileanu, “Building a subspace of policies for scalable continual learning,” in ICLR, 2023, pp. 1–28

work page 2023

[66] [66]

Model-based lifelong reinforcement learning with bayesian exploration,

H. Fu, S. Yu, M. Littman, and G. Konidaris, “Model-based lifelong reinforcement learning with bayesian exploration,” in NeurIPS, vol. 35, 2022, pp. 32 369–32 382

work page 2022

[67] [67]

Continual reinforcement learning in 3D non-stationary environments,

V . Lomonaco, K. Desai, E. Culurciello, and D. Maltoni, “Continual reinforcement learning in 3D non-stationary environments,” in CVPR, 2020

work page 2020

[68] [68]

CORA: Benchmarks, baselines, and metrics as a platform for continual rein- forcement learning agents,

S. Powers, E. Xing, E. Kolve, R. Mottaghi, and A. Gupta, “CORA: Benchmarks, baselines, and metrics as a platform for continual rein- forcement learning agents,” in CoLLAs, vol. 199, 2022, pp. 705–743

work page 2022

[69] [70]

COOM: A game benchmark for continual reinforcement learning,

T. Tomilin, M. Fang, Y . Zhang, and M. Pechenizkiy, “COOM: A game benchmark for continual reinforcement learning,” in NeurIPS, vol. 36, 2023, pp. 67 794–67 832

work page 2023

[70] [71]

Policy and value transfer in lifelong reinforcement learning,

D. Abel, Y . Jinnai, S. Y . Guo, G. D. Konidaris, and M. L. Littman, “Policy and value transfer in lifelong reinforcement learning,” inICML, vol. 80, 2018, pp. 20–29

work page 2018

[71] [72]

Minigrid & miniworld: Modular & customizable reinforcement learning envi- ronments for goal-oriented tasks,

M. Chevalier-Boisvert, B. Dai, M. Towers, R. Perez-Vicente, L. Willems, S. Lahlou, S. Pal, P. S. Castro, and J. Terry, “Minigrid & miniworld: Modular & customizable reinforcement learning envi- ronments for goal-oriented tasks,” in NeurIPS, 2023

work page 2023

[72] [73]

DeepMind Lab

C. Beattie, J. Z. Leibo, D. Teplyashin, T. Ward, M. Wainwright, H. Küt- tler, A. Lefrancq, S. Green, V . Valdés, A. Sadiket al., “Deepmind lab,” ArXiv preprint, vol. abs/1612.03801, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[73] [74]

Progress & compress: A scalable framework for continual learning,

J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska-Barwinska, Y . W. Teh, R. Pascanu, and R. Hadsell, “Progress & compress: A scalable framework for continual learning,” in ICML, vol. 80, 2018, pp. 4535– 4544

work page 2018

[74] [75]

Gymnasium: A Standard Interface for Reinforcement Learning Environments

M. Towers, A. Kwiatkowski, J. Terry, J. U. Balis, G. D. Cola, T. Deleu, M. Goulão, A. Kallinteris, M. Krimmel, A. KG, R. Perez-Vicente, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. XX, XX 2025 24 A. Pierré, S. Schulhoff, J. J. Tai, H. Tan, and O. G. Younis, “Gymna- sium: A standard interface for reinforcement learning envir...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[75] [76]

Same state, different task: Continual reinforcement learning without interference,

S. Kessler, J. Parker-Holder, P. J. Ball, S. Zohren, and S. J. Roberts, “Same state, different task: Continual reinforcement learning without interference,” in AAAI, vol. 36, no. 7, 2022, pp. 7143–7151

work page 2022

[76] [77]

MuJoCo: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “MuJoCo: A physics engine for model-based control,” in IROS, 2012, pp. 5026–5033

work page 2012

[77] [78]

Policy consolidation for continual reinforcement learning,

C. Kaplanis, M. Shanahan, and C. Clopath, “Policy consolidation for continual reinforcement learning,” in ICML, vol. 97, 2019, pp. 3242– 3251

work page 2019

[78] [79]

IMPALA: scalable distributed deep-RL with impor- tance weighted actor-learner architectures,

L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V . Mnih, T. Ward, Y . Doron, V . Firoiu, T. Harley, I. Dunning, S. Legg, and K. Kavukcuoglu, “IMPALA: scalable distributed deep-RL with impor- tance weighted actor-learner architectures,” in ICML, vol. 80, 2018, pp. 1406–1415

work page 2018

[79] [80]

Prediction and control in continual rein- forcement learning,

N. Anand and D. Precup, “Prediction and control in continual rein- forcement learning,” in NeurIPS, vol. 36, 2023, pp. 63 779–63 817

work page 2023

[80] [81]

The arcade learning environment: An evaluation platform for general agents,

M. G. Bellemare, Y . Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” Journal of Artificial Intelligence Research, vol. 47, no. 1, pp. 253–279, 2013

work page 2013