A Survey of Continual Reinforcement Learning
Pith reviewed 2026-05-19 08:24 UTC · model grok-4.3
The pith
Continual reinforcement learning methods fall into four categories based on how they store and transfer knowledge.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper proposes a new taxonomy of CRL methods, categorizing them into four types from the perspective of knowledge storage and/or transfer. This framework organizes existing approaches by whether agents retain information in shared structures, isolated components, or through explicit transfer mechanisms between tasks.
What carries the argument
The four-type taxonomy that groups methods by their knowledge storage and transfer strategies, which provides a lens for comparing how agents retain and reuse prior learning.
If this is right
- Methods become easier to compare when grouped by their specific handling of retained knowledge.
- Under-explored categories within the taxonomy point to concrete opportunities for new algorithm designs.
- Benchmark suites can be expanded to evaluate performance across all four types rather than a narrow subset.
- Researchers can identify which storage or transfer mechanisms best support long sequences of tasks.
Where Pith is reading between the lines
- The same storage-and-transfer lens could be applied to continual learning outside reinforcement learning to test its broader usefulness.
- Experiments that measure knowledge retention rates per category would provide a direct test of whether the taxonomy predicts practical differences.
- As new methods appear that combine storage and transfer in unexpected ways, the four categories may need subdivision or merging.
Load-bearing premise
That the existing body of continual reinforcement learning work can be partitioned comprehensively and usefully into four categories defined by knowledge storage and transfer.
What would settle it
A substantial set of recent CRL papers that resist assignment to any of the four categories without forcing would indicate the taxonomy does not cover the field.
Figures
read the original abstract
Reinforcement Learning (RL) is an important machine learning paradigm for solving sequential decision-making problems. Recent years have witnessed remarkable progress in this field due to the rapid development of deep neural networks. However, the success of RL currently relies on extensive training data and computational resources. In addition, RL's limited ability to generalize across tasks restricts its applicability in dynamic and real-world environments. With the arisen of Continual Learning (CL), Continual Reinforcement Learning (CRL) has emerged as a promising research direction to address these limitations by enabling agents to learn continuously, adapt to new tasks, and retain previously acquired knowledge. In this survey, we provide a comprehensive examination of CRL, focusing on its core concepts, challenges, and methodologies. Firstly, we conduct a detailed review of existing works, organizing and analyzing their metrics, tasks, benchmarks, and scenario settings. Secondly, we propose a new taxonomy of CRL methods, categorizing them into four types from the perspective of knowledge storage and/or transfer. Finally, our analysis highlights the unique challenges of CRL and provides practical insights into future directions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This manuscript is a survey on Continual Reinforcement Learning (CRL). It reviews existing works by organizing and analyzing their metrics, tasks, benchmarks, and scenario settings. The central contribution is a proposed new taxonomy that categorizes CRL methods into four types from the perspective of knowledge storage and/or transfer. The paper concludes by highlighting unique challenges of CRL and providing practical insights into future directions.
Significance. If the taxonomy successfully partitions the surveyed literature in a comprehensive and insightful manner without forcing, the survey would offer a useful organizational framework for a growing subfield. The systematic review of metrics, tasks, benchmarks, and scenarios adds practical value for standardizing evaluation and comparison across CRL studies.
minor comments (3)
- Abstract: 'With the arisen of Continual Learning' should be revised to 'With the rise of Continual Learning' or 'With the emergence of Continual Learning' for grammatical accuracy.
- Taxonomy section: Provide explicit criteria or decision rules used to assign methods to each of the four categories, with at least one concrete example per category drawn from the reviewed literature to demonstrate the partition's utility and lack of overlap.
- Benchmarks and scenarios review: Include a summary table listing the key benchmarks, their task characteristics, and which taxonomy category the associated methods primarily fall into, to improve readability and allow quick cross-referencing.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of our survey on Continual Reinforcement Learning and for recommending minor revision. We appreciate the recognition of the value in our review of metrics, tasks, benchmarks, and scenario settings, as well as the proposed taxonomy based on knowledge storage and transfer.
Circularity Check
No significant circularity identified
full rationale
This is a survey paper whose central contribution is a literature review and a proposed organizational taxonomy of existing CRL methods into four categories based on knowledge storage and/or transfer. No derivations, equations, predictions, fitted parameters, or mathematical claims are present. The taxonomy is explicitly framed as an organizational framework derived from reviewing the body of prior work rather than from any self-referential definitions, self-citations that bear the load of the argument, or reductions of results to inputs by construction. The paper is therefore self-contained as a descriptive survey with no circular steps.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we propose a new taxonomy of CRL methods, categorizing them into four types from the perspective of knowledge storage and/or transfer... policy-focused, experience-focused, dynamic-focused, and reward-focused methods
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
triangular balance among plasticity, stability, and scalability... stability-plasticity dilemma
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments
Robots autonomously convert LLM-guided experiences into a reusable local method library, reducing average execution time from 7.7772s to 6.7779s and LLM calls per task from 1.0 to 0.2 in repeated-task experiments.
Reference graph
Works this paper leans on
-
[1]
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT Press, 2018
work page 2018
-
[2]
Human-level control through deep reinforcement learning,
V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al. , “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015
work page 2015
-
[3]
A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Si- monyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,” Science, vol. 362, no. 6419, pp. 1140–1144, 2018
work page 2018
-
[4]
Improved prediction of protein-protein interactions using AlphaFold2,
P. Bryant, G. Pozzati, and A. Elofsson, “Improved prediction of protein-protein interactions using AlphaFold2,” Nature Communica- tions, vol. 13, no. 1, p. 1265, 2022
work page 2022
-
[5]
Learning high-accuracy error decoding for quantum processors,
J. Bausch, A. W. Senior, F. J. H. Heras, T. Edlich, A. Davies, M. New- man, C. Jones, K. Satzinger, M. Y . Niu, S. Blackwell, G. Holland, D. Kafri, J. Atalaya, C. Gidney, D. Hassabis, S. Boixo, H. Neven, and P. Kohli, “Learning high-accuracy error decoding for quantum processors,” Nature, vol. 635, no. 8040, pp. 834–840, 2024
work page 2024
-
[6]
Training language models to follow instructions with human feedback,
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. F. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” in NeurIPS, vol. 35, 2022, pp. 27 730–27 744
work page 2022
-
[7]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI, D. Guo, and D. Y . et al., “DeepSeek-R1: Incentiviz- ing reasoning capability in LLMs via reinforcement learning,” ArXiv preprint, vol. abs/2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
X. Zhan, H. Xu, Y . Zhang, X. Zhu, H. Yin, and Y . Zheng, “Deepther- mal: Combustion optimization for thermal power generating units using offline reinforcement learning,” inAAAI, vol. 36, no. 4, 2022, pp. 4680– 4688
work page 2022
-
[9]
Magnetic control of tokamak plasmas through deep reinforcement learning,
J. Degrave, F. Felici, J. Buchli, M. Neunert, B. Tracey, F. Carpanese, T. Ewalds, R. Hafner, A. Abdolmaleki, D. de las Casas, C. Donner, L. Fritz, C. Galperti, A. Huber, J. Keeling, M. Tsimpoukelli, J. Kay, A. Merle, J.-M. Moret, S. Noury, F. Pesamosca, D. Pfau, O. Sauter, C. Sommariva, S. Coda, B. Duval, A. Fasoli, P. Kohli, K. Kavukcuoglu, D. Hassabis, ...
work page 2022
-
[10]
Dense reinforcement learning for safety validation of autonomous vehicles,
S. Feng, H. Sun, X. Yan, H. Zhu, Z. Zou, S. Shen, and H. X. Liu, “Dense reinforcement learning for safety validation of autonomous vehicles,” Nature, vol. 615, no. 7953, pp. 620–627, 2023
work page 2023
-
[11]
Grandmaster level in StarCraft II using multi-agent reinforcement learning,
O. Vinyals, I. Babuschkin, and W. M. e. a. Czarnecki, “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019
work page 2019
-
[12]
Towards sample efficient reinforcement learning,
Y . Yu, “Towards sample efficient reinforcement learning,” in IJCAI, 2018, pp. 5739–5743
work page 2018
-
[13]
Z. Ding and H. Dong, Challenges of Reinforcement Learning. Springer Singapore, 2020
work page 2020
-
[14]
Challenges of real-world reinforcement learning: definitions, benchmarks and analysis,
G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru, S. Gowal, and T. Hester, “Challenges of real-world reinforcement learning: definitions, benchmarks and analysis,”Machine Learning, vol. 110, no. 9, pp. 2419–2468, 2021
work page 2021
-
[15]
Biological underpinnings for lifelong learning machines,
D. Kudithipudi, M. Aguilar-Simon, and J. e. a. Babb, “Biological underpinnings for lifelong learning machines,” Nature Machine Intel- ligence, vol. 4, no. 3, pp. 196–210, 2022
work page 2022
-
[16]
Continual lifelong learning with neural networks: A review,
G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual lifelong learning with neural networks: A review,” Neural Networks, vol. 113, pp. 54–71, 2019
work page 2019
-
[17]
A continual learning survey: Defying forgetting in classification tasks,
M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE Transactions on Pattern Anal- ysis and Machine Intelligence , vol. 44, no. 7, pp. 3366–3385, 2022
work page 2022
-
[18]
A comprehensive survey of continual learning: Theory, method and application,
L. Wang, X. Zhang, H. Su, and J. Zhu, “A comprehensive survey of continual learning: Theory, method and application,” IEEE Transac- tions on Pattern Analysis and Machine Intelligence , vol. 46, no. 8, pp. 5362–5383, 2024
work page 2024
-
[19]
Federated continual learning via knowledge fusion: A survey,
X. Yang, H. Yu, X. Gao, H. Wang, J. Zhang, and T. Li, “Federated continual learning via knowledge fusion: A survey,”IEEE Transactions on Knowledge and Data Engineering , vol. 36, no. 8, pp. 3832–3850, 2024
work page 2024
-
[20]
CHILD: A first step towards continual learning,
M. B. Ring, “CHILD: A first step towards continual learning,” Machine Learning, vol. 28, no. 1, pp. 77–104, 1997
work page 1997
-
[21]
Towards continual reinforcement learning: A review and perspectives,
K. Khetarpal, M. Riemer, I. Rish, and D. Precup, “Towards continual reinforcement learning: A review and perspectives,” Journal of Artifi- cial Intelligence Research , vol. 75, pp. 1401–1476, 2022
work page 2022
-
[22]
Fast TRAC: A parameter-free optimizer for lifelong reinforcement learning,
A. Muppidi, Z. Zhang, and H. Yang, “Fast TRAC: A parameter-free optimizer for lifelong reinforcement learning,” in NeurIPS, 2024
work page 2024
-
[23]
A comprehensive survey of forgetting in deep learning beyond continual learning,
Z. Wang, E. Yang, L. Shen, and H. Huang, “A comprehensive survey of forgetting in deep learning beyond continual learning,” IEEE Trans- actions on Pattern Analysis and Machine Intelligence , vol. 47, no. 3, pp. 1464–1483, 2025. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. XX, XX 2025 23
work page 2025
-
[24]
M. L. Puterman, “Markov decision processes,” in Stochastic Models , 1990, vol. 2, pp. 331–434
work page 1990
-
[25]
R. J. Boucherie and N. M. Van Dijk, Markov Decision Processes in Practice. Springer, 2017
work page 2017
-
[26]
Prioritized experience replay,
T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” in ICLR, 2016, pp. 1–21
work page 2016
-
[27]
Dueling network architectures for deep reinforcement learning,
Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and N. de Freitas, “Dueling network architectures for deep reinforcement learning,” in ICML, vol. 48, 2016, pp. 1995–2003
work page 2016
-
[28]
Deep recurrent Q-Learning for partially observable mdps,
M. J. Hausknecht and P. Stone, “Deep recurrent Q-Learning for partially observable mdps,” in AAAI Fall Symposia, 2015, pp. 29–37
work page 2015
-
[29]
Asynchronous methods for deep reinforcement learning,
V . Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in ICML, vol. 48, 2016, pp. 1928–1937
work page 2016
-
[30]
Continuous control with deep reinforce- ment learning,
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning,” in ICLR, 2016, pp. 1–14
work page 2016
-
[31]
Addressing function ap- proximation error in actor-critic methods,
S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function ap- proximation error in actor-critic methods,” in ICML, vol. 80, 2018, pp. 1582–1591
work page 2018
-
[32]
Trust region policy optimization,
J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, “Trust region policy optimization,” in ICML, vol. 37, 2015, pp. 1889–1897
work page 2015
-
[33]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” ArXiv preprint , vol. abs/1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[34]
Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” in ICML, vol. 80, 2018, pp. 1856–1865
work page 2018
-
[35]
M. Mundt, Y . Hong, I. Pliushch, and V . Ramesh, “A wholistic view of continual learning with deep neural networks: Forgotten lessons and the bridge to active and open world learning,” Neural Networks, vol. 160, pp. 306–336, 2023
work page 2023
-
[36]
T. Lesort, V . Lomonaco, A. Stoian, D. Maltoni, D. Filliat, and N. Díaz- Rodríguez, “Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges,” Information Fusion, vol. 58, pp. 52–68, 2020
work page 2020
-
[37]
Continual variational autoencoder via continual generative knowledge distillation,
F. Ye and A. G. Bors, “Continual variational autoencoder via continual generative knowledge distillation,” in AAAI, vol. 37, no. 9, 2023, pp. 10 918–10 926
work page 2023
-
[38]
iCaRL: Incremental classifier and representation learning,
S. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “iCaRL: Incremental classifier and representation learning,” in CVPR, 2017, pp. 5533–5542
work page 2017
-
[39]
Continual learning with deep generative replay,
H. Shin, J. K. Lee, J. Kim, and J. Kim, “Continual learning with deep generative replay,” in NeurIPS, vol. 30, 2017, pp. 2990–2999
work page 2017
-
[40]
Class-incremental learning via deep model consolida- tion,
J. Zhang, J. Zhang, S. Ghosh, D. Li, S. Tasci, L. Heck, H. Zhang, and C.-C. J. Kuo, “Class-incremental learning via deep model consolida- tion,” in WACV, 2020
work page 2020
-
[41]
Differential privacy preservation in robust continual learning,
A. Hassanpour, M. Moradikia, B. Yang, A. Abdelhadi, C. Busch, and J. Fierrez, “Differential privacy preservation in robust continual learning,” IEEE Access, vol. 10, pp. 24 273–24 287, 2022
work page 2022
-
[42]
Overcoming catastrophic forgetting in neural networks,
J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural networks,”Proceedings of the National Academy of Sciences , vol. 114, no. 13, pp. 3521–3526, 2017
work page 2017
-
[43]
Continual learning via inter-task synaptic mapping,
F. Mao, W. Weng, M. Pratama, and E. Y . K. Yee, “Continual learning via inter-task synaptic mapping,” Knowledge-Based Systems, vol. 222, p. 106947, 2021
work page 2021
-
[44]
Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 40, no. 12, pp. 2935–2947, 2018
work page 2018
-
[45]
Class-incremental learning by knowl- edge distillation with adaptive feature consolidation,
M. Kang, J. Park, and B. Han, “Class-incremental learning by knowl- edge distillation with adaptive feature consolidation,” in CVPR, 2022, pp. 16 050–16 059
work page 2022
-
[46]
PackNet: Adding multiple tasks to a single network by iterative pruning,
A. Mallya and S. Lazebnik, “PackNet: Adding multiple tasks to a single network by iterative pruning,” in CVPR, 2018, pp. 7765–7773
work page 2018
-
[47]
Piggyback: Adapting a single network to multiple tasks by learning to mask weights,
A. Mallya, D. Davis, and S. Lazebnik, “Piggyback: Adapting a single network to multiple tasks by learning to mask weights,” in ECCV, 2018
work page 2018
-
[48]
Lifelong generative modelling using dynamic expansion graph model,
F. Ye and A. G. Bors, “Lifelong generative modelling using dynamic expansion graph model,” in AAAI, vol. 36, no. 8, 2022, pp. 8857–8865
work page 2022
-
[49]
Few-shot incremental learning with continually evolved classifiers,
C. Zhang, N. Song, G. Lin, Y . Zheng, P. Pan, and Y . Xu, “Few-shot incremental learning with continually evolved classifiers,” in CVPR, 2021, pp. 12 455–12 464
work page 2021
-
[50]
Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines
Y .-C. Hsu, Y .-C. Liu, and Z. Kira, “Re-evaluating continual learning scenarios: A categorization and case for strong baselines,” ArXiv preprint, vol. abs/1810.12488, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[51]
Three types of incremental learning,
G. M. van de Ven, T. Tuytelaars, and A. S. Tolias, “Three types of incremental learning,” Nature Machine Intelligence, vol. 4, no. 12, pp. 1185–1197, 2022
work page 2022
-
[52]
A definition of continual reinforcement learning,
D. Abel, A. Barreto, B. V . Roy, D. Precup, H. P. van Hasselt, and S. Singh, “A definition of continual reinforcement learning,” in NeurIPS, vol. 36, 2023, pp. 50 377–50 407
work page 2023
-
[53]
Loss of plasticity in continual deep reinforcement learning,
Z. Abbas, R. Zhao, J. Modayil, A. White, and M. C. Machado, “Loss of plasticity in continual deep reinforcement learning,” in CoLLAs, vol. 232, 2023, pp. 620–636
work page 2023
-
[54]
A survey of multi-task deep reinforcement learning,
N. Vithayathil Varghese and Q. H. Mahmoud, “A survey of multi-task deep reinforcement learning,” Electronics, vol. 9, no. 9, 2020
work page 2020
-
[55]
Transfer learning in deep reinforcement learning: A survey,
Z. Zhu, K. Lin, A. K. Jain, and J. Zhou, “Transfer learning in deep reinforcement learning: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 45, no. 11, pp. 13 344–13 362, 2023
work page 2023
-
[56]
Continual reinforcement learning with complex synapses,
C. Kaplanis, M. Shanahan, and C. Clopath, “Continual reinforcement learning with complex synapses,” in ICML, vol. 80, 2018, pp. 2502– 2511
work page 2018
-
[57]
Loss of plasticity in deep continual learning,
S. Dohare, J. F. Hernandez-Garcia, Q. Lan, P. Rahman, A. R. Mah- mood, and R. S. Sutton, “Loss of plasticity in deep continual learning,” Nature, vol. 632, no. 8026, pp. 768–774, 2024
work page 2024
-
[58]
Plasticity Loss in Deep Reinforcement Learning: A Survey
T. Klein, L. Miklautz, K. Sidak, C. Plant, and S. Tschiatschek, “Plas- ticity loss in deep reinforcement learning: A survey,” ArXiv preprint, vol. abs/2411.04832, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[59]
A study of plasticity loss in on-policy deep reinforcement learning,
A. Juliani and J. T. Ash, “A study of plasticity loss in on-policy deep reinforcement learning,” in NeurIPS, vol. 37, 2024, pp. 113 884– 113 910
work page 2024
-
[60]
Contin- ual world: A robotic benchmark for continual reinforcementlearning,
M. Wolczyk, M. Zajac, R. Pascanu, L. Kucinski, and P. Milos, “Contin- ual world: A robotic benchmark for continual reinforcementlearning,” in NeurIPS, vol. 34, 2021, pp. 28 496–28 510
work page 2021
-
[61]
Disentangling transfer in continual reinforcement learning,
——, “Disentangling transfer in continual reinforcement learning,” in NeurIPS, vol. 35, 2022, pp. 6304–6317
work page 2022
-
[62]
Self-composing policies for scalable continual reinforcement learning,
M. Malagon, J. Ceberio, and J. A. Lozano, “Self-composing policies for scalable continual reinforcement learning,” inICML, vol. 235, 2024, pp. 34 432–34 460
work page 2024
-
[63]
Continuous coordination as a realistic scenario for lifelong learning,
H. Nekoei, A. Badrinaaraayanan, A. C. Courville, and S. Chandar, “Continuous coordination as a realistic scenario for lifelong learning,” in ICML, vol. 139, 2021, pp. 8016–8024
work page 2021
-
[64]
L2Explorer: A lifelong reinforcement learning assessment environment,
E. C. Johnson, E. Q. Nguyen, B. Schreurs, C. S. Ewulum, C. Ashcraft, N. M. Fendley, M. M. Baker, A. New, and G. K. Vallabha, “L2Explorer: A lifelong reinforcement learning assessment environment,” ArXiv preprint, vol. abs/2203.07454, 2022
-
[65]
Building a subspace of policies for scalable continual learning,
J. Gaya, T. Doan, L. Caccia, L. Soulier, L. Denoyer, and R. Raileanu, “Building a subspace of policies for scalable continual learning,” in ICLR, 2023, pp. 1–28
work page 2023
-
[66]
Model-based lifelong reinforcement learning with bayesian exploration,
H. Fu, S. Yu, M. Littman, and G. Konidaris, “Model-based lifelong reinforcement learning with bayesian exploration,” in NeurIPS, vol. 35, 2022, pp. 32 369–32 382
work page 2022
-
[67]
Continual reinforcement learning in 3D non-stationary environments,
V . Lomonaco, K. Desai, E. Culurciello, and D. Maltoni, “Continual reinforcement learning in 3D non-stationary environments,” in CVPR, 2020
work page 2020
-
[68]
S. Powers, E. Xing, E. Kolve, R. Mottaghi, and A. Gupta, “CORA: Benchmarks, baselines, and metrics as a platform for continual rein- forcement learning agents,” in CoLLAs, vol. 199, 2022, pp. 705–743
work page 2022
-
[70]
COOM: A game benchmark for continual reinforcement learning,
T. Tomilin, M. Fang, Y . Zhang, and M. Pechenizkiy, “COOM: A game benchmark for continual reinforcement learning,” in NeurIPS, vol. 36, 2023, pp. 67 794–67 832
work page 2023
-
[71]
Policy and value transfer in lifelong reinforcement learning,
D. Abel, Y . Jinnai, S. Y . Guo, G. D. Konidaris, and M. L. Littman, “Policy and value transfer in lifelong reinforcement learning,” inICML, vol. 80, 2018, pp. 20–29
work page 2018
-
[72]
M. Chevalier-Boisvert, B. Dai, M. Towers, R. Perez-Vicente, L. Willems, S. Lahlou, S. Pal, P. S. Castro, and J. Terry, “Minigrid & miniworld: Modular & customizable reinforcement learning envi- ronments for goal-oriented tasks,” in NeurIPS, 2023
work page 2023
-
[73]
C. Beattie, J. Z. Leibo, D. Teplyashin, T. Ward, M. Wainwright, H. Küt- tler, A. Lefrancq, S. Green, V . Valdés, A. Sadiket al., “Deepmind lab,” ArXiv preprint, vol. abs/1612.03801, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[74]
Progress & compress: A scalable framework for continual learning,
J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska-Barwinska, Y . W. Teh, R. Pascanu, and R. Hadsell, “Progress & compress: A scalable framework for continual learning,” in ICML, vol. 80, 2018, pp. 4535– 4544
work page 2018
-
[75]
Gymnasium: A Standard Interface for Reinforcement Learning Environments
M. Towers, A. Kwiatkowski, J. Terry, J. U. Balis, G. D. Cola, T. Deleu, M. Goulão, A. Kallinteris, M. Krimmel, A. KG, R. Perez-Vicente, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. XX, XX 2025 24 A. Pierré, S. Schulhoff, J. J. Tai, H. Tan, and O. G. Younis, “Gymna- sium: A standard interface for reinforcement learning envir...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[76]
Same state, different task: Continual reinforcement learning without interference,
S. Kessler, J. Parker-Holder, P. J. Ball, S. Zohren, and S. J. Roberts, “Same state, different task: Continual reinforcement learning without interference,” in AAAI, vol. 36, no. 7, 2022, pp. 7143–7151
work page 2022
-
[77]
MuJoCo: A physics engine for model-based control,
E. Todorov, T. Erez, and Y . Tassa, “MuJoCo: A physics engine for model-based control,” in IROS, 2012, pp. 5026–5033
work page 2012
-
[78]
Policy consolidation for continual reinforcement learning,
C. Kaplanis, M. Shanahan, and C. Clopath, “Policy consolidation for continual reinforcement learning,” in ICML, vol. 97, 2019, pp. 3242– 3251
work page 2019
-
[79]
IMPALA: scalable distributed deep-RL with impor- tance weighted actor-learner architectures,
L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V . Mnih, T. Ward, Y . Doron, V . Firoiu, T. Harley, I. Dunning, S. Legg, and K. Kavukcuoglu, “IMPALA: scalable distributed deep-RL with impor- tance weighted actor-learner architectures,” in ICML, vol. 80, 2018, pp. 1406–1415
work page 2018
-
[80]
Prediction and control in continual rein- forcement learning,
N. Anand and D. Precup, “Prediction and control in continual rein- forcement learning,” in NeurIPS, vol. 36, 2023, pp. 63 779–63 817
work page 2023
-
[81]
The arcade learning environment: An evaluation platform for general agents,
M. G. Bellemare, Y . Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” Journal of Artificial Intelligence Research, vol. 47, no. 1, pp. 253–279, 2013
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.