pith. sign in

arxiv: 2603.11395 · v2 · pith:B4LNMOKDnew · submitted 2026-03-12 · 💻 cs.LG · cs.AI

ARROW: Augmented Replay for RObust World models

Pith reviewed 2026-05-21 11:15 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords continual reinforcement learningworld modelsreplay bufferscatastrophic forgettingmodel-based RLDreamerV3AtariProcgen
0
0 comments X

The pith

ARROW uses a short-term and long-term replay buffer in DreamerV3 to reduce forgetting on unrelated tasks while matching forward transfer on related ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ARROW to address continual reinforcement learning, where agents must learn new skills without losing old ones. It extends the DreamerV3 world model with a memory-efficient replay buffer that splits into a short-term buffer holding recent experiences and a long-term buffer that uses intelligent sampling to keep task diversity. This bio-inspired design replays experiences to the predictive world model rather than directly to the policy, aiming to avoid the large memory needs and forgetting seen in standard model-free replay methods. Tests on Atari games without shared structure and Procgen variants with shared structure show the approach yields substantially less forgetting than same-size baselines on non-overlapping tasks.

Core claim

ARROW maintains two complementary buffers: a short-term buffer for recent experiences and a long-term buffer that preserves task diversity through intelligent sampling, demonstrating substantially less forgetting on tasks without shared structure compared to model-free and model-based baselines with replay buffers of the same size, while maintaining comparable forward transfer.

What carries the argument

The distribution-matching long-term buffer that intelligently samples past experiences to maintain task diversity during world model training.

If this is right

  • Model-based methods can achieve better retention in continual RL without needing larger replay memory than model-free baselines.
  • Replaying experiences to the world model supports retention on tasks with no shared structure.
  • Forward transfer remains intact on tasks that do share structure, such as Procgen CoinRun variants.
  • Bio-inspired replay mechanisms offer a scalable path for continual learning in reinforcement learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dual-buffer design might allow world models to act as compact, reusable memory stores for sequences of many more tasks.
  • Similar sampling strategies could be tested in non-RL continual learning domains where distribution shift also drives forgetting.
  • If the long-term buffer scales well, it could lower the memory footprint for deployed agents that encounter open-ended task streams.

Load-bearing premise

The intelligent sampling from the long-term buffer preserves task diversity and prevents the distribution shift that causes forgetting, without introducing instabilities into world model training.

What would settle it

Running the same Atari continual learning experiments but replacing the intelligent sampling with uniform random selection from the long-term buffer and measuring whether the reduction in forgetting disappears.

Figures

Figures reproduced from arXiv: 2603.11395 by Abdallah Al Siyabi, Abdulaziz Alyahya, Gideon Kowadlo, Levin Kuhlmann, Luke Yang, Markus R. Ernst.

Figure 1
Figure 1. Figure 1: World Model Learning. (A) Images drawn from the replay buffer are encoded to and reconstructed [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Experiment setup. (A) Augmented buffer used in ARROW. (B) Continual learning tasks with [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Atari median normalized performance (Eq. 1). Shaded area depicts 0.25 and 0.75 quartiles of 5 seeds. Bold line segments indicate training of task. (A) Default order of tasks (one-cycle). (B) Reversed order of tasks (one-cycle). (C) Default order of tasks (two-cycle). The dotted, vertical line marks the end of cycle 1 and the beginning of cycle 2. 5 Results 5.1 Tasks without shared structure: Atari Median n… view at source ↗
Figure 4
Figure 4. Figure 4: Atari metrics shown as median with (0.25 - 0.75) quartile confidence intervals, across 5 seeds, and calculated using normalized scores (Eq. 1). (A) Default task order (one-cycle). (B) Reversed task order (one-cycle). (C) Default task order (two-cycle). ARROW maintains the highest WC-ACC (0.618), confirming that its stability advantage is robust to task ordering. Two-cycle training. The two-cycle setting re… view at source ↗
Figure 5
Figure 5. Figure 5: CoinRun median normalized performance (Eq. equation 1). Shaded area depicts 0.25 and 0.75 quartiles of 5 seeds. Bold line segments indicate training of task. (A) Default order of tasks (one-cycle). (B) Reversed order of tasks (one-cycle). (C) Default order of tasks (two-cycle). The dotted vertical line marks the end of cycle 1 and the beginning of cycle 2. plasticity balance: ARROW attains the highest WC-A… view at source ↗
Figure 6
Figure 6. Figure 6: CoinRun metrics shown as median with (0.25 - 0.75) quartile confidence intervals, across 5 seeds, and calculated using normalized scores, Eq. 1. (A) Default order of tasks (one-cycle). (B) Reversed order of tasks (one-cycle). (C) Default order of tasks (two-cycle). 5.2.1 Continual learning sample efficiency The last columns of [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
read the original abstract

Continual reinforcement learning challenges agents to acquire new skills while retaining previously learned ones with the goal of improving performance in both past and future tasks. Most existing approaches rely on model-free methods with replay buffers to mitigate catastrophic forgetting; however, these solutions often face significant scalability challenges due to large memory demands. Drawing inspiration from neuroscience, where the brain replays experiences to a predictive World Model rather than directly to the policy, we present ARROW (Augmented Replay for RObust World models), a model-based continual RL algorithm that extends DreamerV3 with a memory-efficient, distribution-matching replay buffer. Unlike standard fixed-size FIFO buffers, ARROW maintains two complementary buffers: a short-term buffer for recent experiences and a long-term buffer that preserves task diversity through intelligent sampling. We evaluate ARROW on two challenging continual RL settings: Tasks without shared structure (Atari), and tasks with shared structure, where knowledge transfer is possible (Procgen CoinRun variants). Compared to model-free and model-based baselines with replay buffers of the same-size, ARROW demonstrates substantially less forgetting on tasks without shared structure, while maintaining comparable forward transfer. Our findings highlight the potential of model-based RL and bio-inspired approaches for continual reinforcement learning, warranting further research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes ARROW, a model-based continual RL method extending DreamerV3 with dual replay buffers: a short-term buffer for recent experiences and a long-term buffer that uses distribution-matching sampling to preserve task diversity. It evaluates the approach on Atari (tasks without shared structure) and Procgen CoinRun variants (tasks with shared structure), claiming substantially less forgetting than model-free and model-based baselines that use replay buffers of the same size, while maintaining comparable forward transfer.

Significance. If the empirical results hold, the work would be significant for continual RL by showing that a bio-inspired, model-based replay mechanism can reduce catastrophic forgetting more effectively than standard FIFO buffers while remaining memory-efficient. The dual-buffer design and explicit focus on distribution matching provide a concrete, testable alternative to existing replay strategies.

major comments (2)
  1. Abstract and Experiments section: The central claim of 'substantially less forgetting' on non-shared tasks is not supported by any quantitative metrics, error bars, run counts, or ablation results in the provided text. Without these, the magnitude and reliability of the improvement cannot be assessed and the comparison to same-size-buffer baselines remains unverifiable.
  2. Method section (distribution-matching sampling): The description of how the long-term buffer performs 'intelligent sampling' to preserve task diversity is missing algorithmic details, pseudocode, or a precise objective. This step is load-bearing for the weakest assumption and for the claimed advantage over standard replay; its absence prevents verification that no new instabilities are introduced in world-model training.
minor comments (2)
  1. The abstract refers to 'replay buffers of the same-size' for baselines; clarify whether total memory footprint or per-buffer capacity is matched, as ARROW maintains two buffers.
  2. Add a table or plot in the results section that directly reports forgetting metrics (e.g., performance drop on previous tasks) with standard deviations for ARROW versus each baseline.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, indicating the revisions we will incorporate in the next version to strengthen the presentation and verifiability of our results.

read point-by-point responses
  1. Referee: Abstract and Experiments section: The central claim of 'substantially less forgetting' on non-shared tasks is not supported by any quantitative metrics, error bars, run counts, or ablation results in the provided text. Without these, the magnitude and reliability of the improvement cannot be assessed and the comparison to same-size-buffer baselines remains unverifiable.

    Authors: We agree that the abstract as currently written does not contain explicit quantitative metrics, error bars, or run counts, which limits immediate assessment of the effect size. The full experiments section reports comparative performance on Atari and Procgen, but to address this directly we will revise the abstract to include key quantitative results (e.g., average forgetting reduction percentages relative to same-size FIFO baselines) and will add explicit statements on the number of independent runs (5 seeds per condition) together with standard error bars. We will also expand the experiments section with a dedicated ablation table isolating the contribution of the long-term distribution-matching buffer. These additions will make the magnitude and reliability of the improvement verifiable without altering the underlying claims. revision: yes

  2. Referee: Method section (distribution-matching sampling): The description of how the long-term buffer performs 'intelligent sampling' to preserve task diversity is missing algorithmic details, pseudocode, or a precise objective. This step is load-bearing for the weakest assumption and for the claimed advantage over standard replay; its absence prevents verification that no new instabilities are introduced in world-model training.

    Authors: We acknowledge that the current method description is insufficiently precise on the distribution-matching sampling mechanism. In the revised manuscript we will add a dedicated subsection with the exact objective (minimizing a divergence measure between the empirical task distribution in the long-term buffer and a uniform target distribution over observed tasks), the sampling procedure, and full pseudocode. This will clarify how the buffer differs from FIFO replay and allow direct inspection of potential effects on world-model training stability. Our internal experiments showed no introduced instabilities, but the added formalization will enable readers to verify this. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical extension of DreamerV3 using dual replay buffers (short-term and long-term with intelligent sampling) and reports performance via direct experimental comparison to model-free and model-based baselines on Atari and Procgen tasks. No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claims rest on observed reductions in forgetting and maintained transfer, which are externally falsifiable through replication rather than reducing to the method's own definitions or prior author work by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not introduce or rely on any explicit free parameters, axioms, or invented entities beyond the standard assumptions of DreamerV3 and replay-buffer methods.

pith-pipeline@v0.9.0 · 5768 in / 1200 out tokens · 32110 ms · 2026-05-21T11:15:45.473975+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 4 internal anchors

  1. [1]

    Layer Normalization

    URL https://arxiv. org/abs/1607.06450. M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The Arcade Learning Environment: An Evaluation Platform for General Agents.Journal of Artificial Intelligence Research, 47:253–279, June

  2. [2]

    doi: 10.1613/jair.3912

    ISSN 1076-9757. doi: 10.1613/jair.3912. Arslan Chaudhry, Albert Gordo, Puneet Dokania, Philip Torr, and David Lopez-Paz. Using hindsight to anchor past knowledge in continual learning.Proceedings of the AAAI Conference on Artificial Intelligence, 35(8):6993–7001, May

  3. [3]

    URL https://ojs.aaai.org/index.php/AAAI/ article/view/16861

    doi: 10.1609/aaai.v35i8.16861. URL https://ojs.aaai.org/index.php/AAAI/ article/view/16861. Zhiyuan Chen and Bing Liu. Continual learning and catastrophic forgetting. InLifelong Machine Learning, pp. 55–75. Springer International Publishing, Cham,

  4. [4]

    doi: 10.1007/978-3-031-01581-6_4

    ISBN 978-3-031-01581-6. doi: 10.1007/978-3-031-01581-6_4. Karl Cobbe, Chris Hesse, Jacob Hilton, and John Schulman. Leveraging procedural generation to benchmark reinforcement learning. In Hal Daumé III and Aarti Singh (eds.),Proceedings of the 37th International Conference on Machine Learning, volume119ofProceedings of Machine Learning Research, pp.2048–...

  5. [5]

    PathNet: Evolution Channels Gradient Descent in Super Neural Networks

    URL https://arxiv.org/abs/1701.08734. Robert M. French. Catastrophic forgetting in connectionist networks.Trends in Cognitive Sciences, 3(4): 128–135,

  6. [6]

    doi: 10.1016/S1364-6613(99)01294-2

    ISSN 1364-6613. doi: 10.1016/S1364-6613(99)01294-2. URL https://www.sciencedirect. com/science/article/pii/S1364661399012942. David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.),Advances in Neural Information Processing Systems, volume

  7. [7]

    cc/paper_files/paper/2018/hash/2de5d16682c3c35007e4e92982f1a2ba-Abstract.html

    URL https://papers.neurips. cc/paper_files/paper/2018/hash/2de5d16682c3c35007e4e92982f1a2ba-Abstract.html. DanijarHafner, TimothyLillicrap, IanFischer, RubenVillegas, DavidHa, HonglakLee, andJamesDavidson. Learning latent dynamics for planning from pixels. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.),Proceedings of the 36th International Conferen...

  8. [8]

    doi: 10.1038/s41586-025-08744-2

    doi: 10.1038/s41586-025-08744-2. Nicklas Hansen, Hao Su, and Xiaolong Wang. TD-MPC2: Scalable, robust world models for continuous control. InThe Twelfth International Conference on Learning Representations,

  9. [9]

    doi: 10.1016/j.neuron.2017

    ISSN 0896-6273. doi: 10.1016/j.neuron.2017. 06.011. URL https://www.sciencedirect.com/science/article/pii/S0896627317305093. Dan Hendrycks and Kevin Gimpel. Bridging nonlinearities and stochastic regularizers with gaussian error linear units,

  10. [10]

    Gaussian Error Linear Units (GELUs)

    URL http://arxiv.org/abs/1606.08415. Yizhou Huang, Kevin Xie, Homanga Bharadhwaj, and Florian Shkurti. Continual model-based reinforce- ment learning with hypernetworks. In2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 799–805,

  11. [11]

    Lee, Matthew Tan, Yuke Zhu, and Jeannette Bohg

    doi: 10.1109/ICRA48506.2021.9560793. David Isele and Akansel Cosgun. Selective experience replay for lifelong learning.Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr

  12. [12]

    URL https://ojs

    doi: 10.1609/aaai.v32i1.11595. URL https://ojs. aaai.org/index.php/AAAI/article/view/11595. Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in partially observable stochastic domains.Artificial Intelligence, 101(1):99–134,

  13. [13]

    doi: 10.1016/ S0004-3702(98)00023-X

    ISSN 0004-3702. doi: 10.1016/ S0004-3702(98)00023-X. URL https://www.sciencedirect.com/science/article/pii/S000437029800023X. 15 Samuel Kessler, Piotr Milos, Jack Parker-Holder, and Stephen J. Roberts. The surprising effectiveness of latent world models for continual reinforcement learning. InDeep Reinforcement Learning Workshop NeurIPS 2022,

  14. [14]

    doi: 10.1613/jair.1.13673

    ISSN 1076-9757. doi: 10.1613/jair.1.13673. James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catastrophic forgetting in neural networks. Proceedings of the...

  15. [15]

    doi: 10.1073/pnas.1611835114

    doi: 10.1073/pnas.1611835114. URL https://www.pnas.org/doi/abs/10.1073/pnas.1611835114. Matthias De Lange, Gido van de Ven, and Tinne Tuytelaars. Continual evaluation for lifelong learning: Identifying the stability gap,

  16. [16]

    URL https://arxiv.org/abs/2205.13452. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Back- propagation applied to handwritten zip code recognition.Neural Computation, 1(4):541–551,

  17. [17]

    Backpropagation applied to handwritten zip code recognition,

    doi: 10.1162/neco.1989.1.4.541. Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies.Journal of Machine Learning Research, 17(39):1–40,

  18. [18]

    Marlos C

    URL https:// proceedings.neurips.cc/paper_files/paper/2017/file/f87522788a2be2d171666752f97ddebb-Paper.pdf. Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, and Michael Bowling. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents.Journal of Artificial Intelligence Resea...

  19. [19]

    doi: 10.1613/jair.5699

    ISSN 1076-9757. doi: 10.1613/jair.5699. Mackenzie Weygandt Mathis. The neocortical column as a universal template for perception and world-model learning.Nature Reviews Neuroscience, 24(1):3–3,

  20. [20]

    Michael McCloskey and Neal J

    doi: 10.1038/s41583-022-00658-6. Michael McCloskey and Neal J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. In Gordon H. Bower (ed.),Psychology of learning and motivation, volume 24 ofPsychol- ogy of Learning and Motivation, pp. 109–165. Academic Press,

  21. [21]

    URL https://www.sciencedirect.com/science/article/ pii/S0079742108605368

    doi: 10.1016/S0079-7421(08)60536-8. URL https://www.sciencedirect.com/science/article/pii/S0079742108605368. Martial Mermillod, Aurélia Bugaiska, and Patrick BONIN. The stability-plasticity dilemma: investigating the continuum from catastrophic forgetting to age-limited learning effects.Frontiers in Psychology, Volume 4 - 2013,

  22. [22]

    doi: 10.3389/fpsyg.2013.00504

    ISSN 1664-1078. doi: 10.3389/fpsyg.2013.00504. URL https://www.frontiersin.org/ journals/psychology/articles/10.3389/fpsyg.2013.00504. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis An...

  23. [23]

    Deep online learning via meta-learning: Continual adaptation for model-based RL

    16 Anusha Nagabandi, Chelsea Finn, and Sergey Levine. Deep online learning via meta-learning: Continual adaptation for model-based RL. In7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9,

  24. [24]

    Dota 2 with Large Scale Deep Reinforcement Learning

    URL https://arxiv.org/abs/1912.06680. German I. Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review.Neural Networks, 113:54–71,

  25. [25]

    doi: 10.1016/j.neunet.2019. 01.012. Martin L. Puterman. Chapter 8 markov decision processes. InStochastic Models, volume 2 ofHandbooks in Operations Research and Management Science, pp. 331–434. Elsevier,

  26. [26]

    URL https://www.sciencedirect.com/science/article/pii/S0927050705801720

    doi: 10.1016/S0927-0507(05) 80172-0. URL https://www.sciencedirect.com/science/article/pii/S0927050705801720. Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Harm van Seijen, and Sarath Chandar. Replaybufferwithlocalforgettingforadaptingtolocalenvironmentchangesindeepmodel-basedreinforce- ment learning. In Sarath Chandar, Razvan Pascanu, Han...

  27. [27]

    Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu, and Gerald Tesauro

    URL https://proceedings.mlr.press/v232/rahimi-kalahroudi23a.html. Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu, and Gerald Tesauro. Learning to learn without forgetting by maximizing transfer and minimizing interference. In7th Inter- national Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9,

  28. [28]

    Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell

    URL https://proceedings.neurips.cc/paper_files/paper/2019/file/ fa7cdfad1a5aaf8370ebeda47a1ff1c3-Paper.pdf. Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. Progress & compress: A scalable framework for continual learning. In Jennifer Dy and Andreas Krause (eds.),Proceeding...

  29. [29]

    Mujoco: A physics en- gine for model-based control, in: 2012 IEEE/RSJ International Con- ference on Intelligent Robots and Systems, IEEE

    doi: 10.1109/IROS.2012.6386109. 17 Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander S. Vezhnevets, Rémi Leblo...

  30. [30]

    Liar” ends the game, then both players reveal their dice. If the last bid is not satisfied, then the player who called “Liar

    doi: 10.1038/s41586-019-1724-z. Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3):229–256,

  31. [31]

    Williams

    doi: 10.1007/BF00992696. Yaosheng Xu, Dailin Hu, Litian Liang, Stephen Marcus McAleer, Pieter Abbeel, and Roy Fox. Target entropy annealing for discrete soft actor-critic. InDeep RL Workshop NeurIPS 2021,

  32. [32]

    URL https://arxiv.org/abs/2401.16650. 18 A Tabular data & additional results Variant Procgen flag Description Coinrun — regularly rendered game +NBuse_backgrounds = Falseremoves decorative backgrounds +RTrestrict_themes = Truerestricts the set of level themes +GAuse_generated_assets = Trueenables procedurally generated assets +MAuse_monochrome_assets = Tr...