pith. sign in

arxiv: 2503.08751 · v3 · submitted 2025-03-11 · 💻 cs.CV · cs.LG

Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning

Pith reviewed 2026-05-23 00:18 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords disentangled representationsworld modelsreinforcement learningvideo predictionlatent distillationsemantic knowledge transfervisual variationsmodel-based RL
0
0 comments X

The pith

Disentangled world models transfer semantic knowledge from distracting videos to reinforcement learning via latent distillation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that visual RL agents facing environment variations can achieve higher sample efficiency by extracting and transferring semantic knowledge from unlabeled distracting videos rather than learning representations from scratch. It pretrains an action-free video prediction model offline with disentanglement regularization, then distills the resulting capability into the world model. During online adaptation the model receives an additional disentanglement constraint while actions and rewards from environment interactions increase data diversity and strengthen the representations.

Core claim

We introduce Disentangled World Models that pretrain an action-free video prediction model offline with disentanglement regularization to extract semantic knowledge from distracting videos. The disentanglement capability of the pretrained model is transferred to the world model through latent distillation. For online finetuning we add a disentanglement constraint to the world model; the incorporation of actions and rewards from environment interactions enriches data diversity and strengthens the learned representations.

What carries the argument

Offline-to-online latent distillation that transfers disentanglement capability from a pretrained action-free video prediction model to the RL world model.

If this is right

  • RL agents reach target performance with fewer environment interactions on visual benchmarks that contain variations.
  • Semantic factors are separated in the world model without requiring task-specific labels during the offline pretraining stage.
  • Online adaptation gains from richer data that includes both visual observations and the actions plus rewards collected during interaction.
  • The world model inherits interpretable disentanglement properties that persist through the transfer and finetuning stages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pretraining-plus-distillation pattern could be applied in any domain where large amounts of unlabeled video exist but direct RL interaction data remain expensive.
  • If the distracting videos contain semantics that conflict with the downstream task, the latent distillation step may degrade rather than improve the world model.
  • Extending the approach to include multiple distinct video sources during pretraining might further increase the robustness of the transferred representations.

Load-bearing premise

Semantic variations present in the chosen distracting videos are relevant to the target RL tasks and can be disentangled in a form that transfers usefully without negative transfer or loss of task-critical information.

What would settle it

Run the full pipeline on a benchmark where the distracting videos contain only variations unrelated to the RL task and measure whether sample efficiency and final performance remain at or above the level of a standard world model without the offline pretraining step.

Figures

Figures reproduced from arXiv: 2503.08751 by Baao Xie, Liaomo Zheng, Qi Wang, Shiyu Wang, Wenjun Zeng, Xiaokang Yang, Xin Jin, Yunbo Wang, Zhipeng Zhang.

Figure 1
Figure 1. Figure 1: Overview of our proposed framework. The key idea [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of Disentangled World Models. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example image observations of our modified DMC and MuJoCo [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of DisWM against visual RL baselines, including [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of traversals of β-VAE during the pretraining phase. Object Background color Robot arm color [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Fine-grained disentanglement results on MuJoCo [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: These figures illustrate the ablation studies and sensitivity analyses of DisWM on DMC [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Performance of DisWM on DMC Cartpole Swingup with different video datasets. 5. Related Work Visual MBRL. Visual RL learns control policies from raw pixels, which has achieved remarkable performance in various tasks [3, 4, 31, 37], while prior RL studies fo￾cus on learning policies from low-dimensional states. Ex￾tant approaches can be divided into two main directions: model-free RL [18, 19, 24, 32, 42, 44,… view at source ↗
read the original abstract

Training visual reinforcement learning (RL) in practical scenarios presents a significant challenge, $\textit{i.e.,}$ RL agents suffer from low sample efficiency in environments with variations. While various approaches have attempted to alleviate this issue by disentangled representation learning, these methods usually start learning from scratch without prior knowledge of the world. This paper, in contrast, tries to learn and understand underlying semantic variations from distracting videos via offline-to-online latent distillation and flexible disentanglement constraints. To enable effective cross-domain semantic knowledge transfer, we introduce an interpretable model-based RL framework, dubbed Disentangled World Models (DisWM). Specifically, we pretrain the action-free video prediction model offline with disentanglement regularization to extract semantic knowledge from distracting videos. The disentanglement capability of the pretrained model is then transferred to the world model through latent distillation. For finetuning in the online environment, we exploit the knowledge from the pretrained model and introduce a disentanglement constraint to the world model. During the adaptation phase, the incorporation of actions and rewards from online environment interactions enriches the diversity of the data, which in turn strengthens the disentangled representation learning. Experimental results validate the superiority of our approach on various benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces Disentangled World Models (DisWM), a model-based RL framework that pretrains an action-free video prediction model offline on distracting videos using disentanglement regularization to extract semantic knowledge. This pretrained capability is transferred to the world model via latent distillation; during online finetuning a disentanglement constraint is added, with actions and rewards from environment interactions claimed to enrich data diversity and strengthen disentangled representations. The central claim is that this offline-to-online pipeline yields superior performance on various visual RL benchmarks in environments with variations.

Significance. If the empirical claims hold, the work could provide a practical route for transferring semantic factors from offline video data into RL world models, potentially improving sample efficiency and robustness without learning representations entirely from scratch. The approach relies on standard techniques in disentangled representation learning and latent distillation rather than novel theoretical derivations or parameter-free results.

major comments (2)
  1. [Abstract] Abstract: The manuscript asserts that 'Experimental results validate the superiority of our approach on various benchmarks' yet supplies no quantitative results, ablation studies, baseline comparisons, or implementation details. This absence prevents verification that the proposed pipeline (offline pretraining + latent distillation + online disentanglement constraint) is responsible for any measured gains rather than online data alone.
  2. [Abstract] Abstract: The central claim requires that semantic variations in the chosen distracting videos align with task-critical factors in the target RL environments and survive latent distillation without negative transfer. The text states that 'the incorporation of actions and rewards … strengthens the disentangled representation learning' but provides no mechanism, analysis, or ablation demonstrating that video-derived factors are the relevant ones or that distillation (versus online interactions) drives the improvement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We agree that the abstract requires strengthening to better substantiate the claims with quantitative highlights and clarifications. We will revise the abstract in the next version to address both points while preserving its concise nature, and we point to the detailed experiments, ablations, and analyses already present in the full manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The manuscript asserts that 'Experimental results validate the superiority of our approach on various benchmarks' yet supplies no quantitative results, ablation studies, baseline comparisons, or implementation details. This absence prevents verification that the proposed pipeline (offline pretraining + latent distillation + online disentanglement constraint) is responsible for any measured gains rather than online data alone.

    Authors: We acknowledge that the current abstract does not embed specific numbers or direct references to ablations. In the revision we will add concise quantitative highlights (e.g., relative sample-efficiency gains on the reported visual RL benchmarks) and explicit mentions of the ablation studies and baseline comparisons that appear in Sections 4 and 5. These additions will make clear that the measured improvements arise from the full offline-to-online pipeline rather than online data alone. revision: yes

  2. Referee: [Abstract] Abstract: The central claim requires that semantic variations in the chosen distracting videos align with task-critical factors in the target RL environments and survive latent distillation without negative transfer. The text states that 'the incorporation of actions and rewards … strengthens the disentangled representation learning' but provides no mechanism, analysis, or ablation demonstrating that video-derived factors are the relevant ones or that distillation (versus online interactions) drives the improvement.

    Authors: The manuscript already contains the requested mechanism (latent distillation of the pretrained disentangled factors followed by an online disentanglement constraint) and supporting ablations (Section 5.3) that isolate the contribution of actions/rewards and show that distillation improves over purely online training without negative transfer. To satisfy the abstract-level concern we will insert a short clause referencing these results and noting that the chosen video variations were selected to match task-critical factors. This keeps the abstract concise while directing readers to the evidence. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical pipeline with independent experimental validation

full rationale

The paper presents a standard offline-to-online transfer pipeline: pretrain an action-free video model with disentanglement regularization on distracting videos, distill latents to a world model, then finetune online with added actions/rewards and a disentanglement constraint. No derivation, equation, or claim reduces a reported prediction or result to a fitted quantity by construction, nor invokes a self-citation chain or uniqueness theorem to force the architecture. The central claims rest on benchmark comparisons rather than self-referential definitions, making the work self-contained against external evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Abstract provides insufficient detail to enumerate concrete free parameters or background axioms; the central claim rests on the domain assumption that disentanglement regularization extracts transferable semantics and that latent distillation preserves them.

axioms (2)
  • domain assumption Disentanglement regularization applied to action-free video prediction extracts semantic factors relevant to downstream RL tasks
    Invoked to justify offline pretraining step
  • domain assumption Latent distillation transfers disentanglement capability without substantial information loss
    Central to the knowledge-transfer mechanism
invented entities (1)
  • Disentangled World Models (DisWM) no independent evidence
    purpose: End-to-end framework combining offline video pretraining, latent distillation, and online disentanglement constraint
    New named model introduced to realize the described pipeline

pith-pipeline@v0.9.0 · 5772 in / 1353 out tokens · 85232 ms · 2026-05-23T00:18:53.078035+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Mask World Model: Predicting What Matters for Robust Robot Policy Learning

    cs.RO 2026-04 unverdicted novelty 7.0

    Mask World Model predicts semantic mask dynamics with video diffusion and integrates it with a diffusion policy head, outperforming RGB world models on LIBERO and RLBench while showing better real-world generalization...

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Diffusion for world modeling: Visual details matter in atari

    Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kan- ervisto, Amos Storkey, Tim Pearce, and Franc ¸ois Fleuret. Diffusion for world modeling: Visual details matter in atari. In NeurIPS, 2024. 8

  2. [2]

    Repre- sentation learning: A review and new perspectives

    Yoshua Bengio, Aaron Courville, and Pascal Vincent. Repre- sentation learning: A review and new perspectives. TPAMI, 35(8):1798–1828, 2013. 2

  3. [3]

    Environment agnostic representation for visual rein- forcement learning

    Hyesong Choi, Hunsang Lee, Seongwon Jeong, and Dongbo Min. Environment agnostic representation for visual rein- forcement learning. In ICCV, pages 263–273, 2023. 8

  4. [4]

    Local-guided global: Paired similarity representation for visual reinforcement learning

    Hyesong Choi, Hunsang Lee, Wonil Song, Sangryul Jeon, Kwanghoon Sohn, and Dongbo Min. Local-guided global: Paired similarity representation for visual reinforcement learning. In CVPR, pages 15072–15082, 2023. 8

  5. [5]

    Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

    Djork-Arn ´e Clevert, Thomas Unterthiner, and Sepp Hochre- iter. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015. 1

  6. [6]

    Conditional mutual in- formation for disentangled representations in reinforcement learning

    Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah Hanna, and Stefano Albrecht. Conditional mutual in- formation for disentangled representations in reinforcement learning. In NeurIPS, 2023. 1, 2

  7. [7]

    Temporal disen- tanglement of representations for improved generalisation in reinforcement learning

    Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah P Hanna, and Stefano V Albrecht. Temporal disen- tanglement of representations for improved generalisation in reinforcement learning. In ICLR, 2023. 1, 2, 5, 6

  8. [8]

    Learning latent dynamics for planning from pixels

    Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Ville- gas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In ICML, 2019. 8

  9. [9]

    Dream to control: Learning behaviors by la- tent imagination

    Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Moham- mad Norouzi. Dream to control: Learning behaviors by la- tent imagination. In ICLR, 2020

  10. [10]

    Mastering atari with discrete world models

    Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models. In ICLR, 2021. 4, 6, 8, 1

  11. [11]

    Deep hierarchical planning from pixels

    Danijar Hafner, Kuang-Huei Lee, Ian Fischer, and Pieter Abbeel. Deep hierarchical planning from pixels. arXiv preprint arXiv:2206.04114, 2022. 1

  12. [12]

    Mastering diverse domains through world models

    Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. Nature, 2025. 8, 1

  13. [13]

    Td-mpc2: Scalable, robust world models for continuous control

    Nicklas Hansen, Hao Su, and Xiaolong Wang. Td-mpc2: Scalable, robust world models for continuous control. In ICLR, 2024. 1

  14. [14]

    beta-vae: Learning basic visual con- cepts with a constrained variational framework

    Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-vae: Learning basic visual con- cepts with a constrained variational framework. In ICLR,

  15. [15]

    Darla: Improv- ing zero-shot transfer in reinforcement learning

    Irina Higgins, Arka Pal, Andrei Rusu, Loic Matthey, Christopher Burgess, Alexander Pritzel, Matthew Botvinick, Charles Blundell, and Alexander Lerchner. Darla: Improv- ing zero-shot transfer in reinforcement learning. In ICML,

  16. [16]

    Leveraging separated world model for exploration in visually distracted environments

    Kaichen Huang, Shenghua Wan, Minghao Shao, Hai-Hang Sun, Le Gan, Shuai Feng, and De-Chuan Zhan. Leveraging separated world model for exploration in visually distracted environments. NeurIPS, 2024. 8

  17. [17]

    Reinforcement learning with augmented data

    Misha Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Aravind Srinivas. Reinforcement learning with augmented data. In NeurIPS, 2020. 5

  18. [18]

    Curl: Contrastive unsupervised representations for reinforcement learning

    Michael Laskin, Aravind Srinivas, and Pieter Abbeel. Curl: Contrastive unsupervised representations for reinforcement learning. In ICML, pages 5639–5650, 2020. 1, 6, 8

  19. [19]

    Generalizing consistency policy to visual rl with prior- itized proximal experience regularization

    Haoran Li, Zhennan Jiang, YUHUI CHEN, and Dongbin Zhao. Generalizing consistency policy to visual rl with prior- itized proximal experience regularization. In NeurIPS, 2024. 8

  20. [20]

    Open-world reinforcement learning over long short-term imagination

    Jiajian Li, Qi Wang, Yunbo Wang, Xin Jin, Yang Li, Wen- jun Zeng, and Xiaokang Yang. Open-world reinforcement learning over long short-term imagination. In ICLR, 2025. 1, 8

  21. [21]

    Learning to model the world with language

    Jessy Lin, Yuqing Du, Olivia Watkins, Danijar Hafner, Pieter Abbeel, Dan Klein, and Anca Dragan. Learning to model the world with language. In ICML, 2024. 8

  22. [22]

    Struc- tured state space models for in-context reinforcement learn- ing

    Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, and Feryal Behbahani. Struc- tured state space models for in-context reinforcement learn- ing. NeurIPS, 2023. 8

  23. [23]

    Har- monydream: Task harmonization inside world models

    Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, and Mingsheng Long. Har- monydream: Task harmonization inside world models. In ICML, 2024. 8

  24. [24]

    Vip: Towards universal visual reward and representation via value-implicit pre-training

    Yecheng Jason Ma, Shagun Sodhani, Dinesh Jayaraman, Os- bert Bastani, Vikash Kumar, and Amy Zhang. Vip: Towards universal visual reward and representation via value-implicit pre-training. In ICLR, 2023. 8

  25. [25]

    Visualizing data using t-sne

    Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. JMLR, 9(Nov):2579–2605, 2008. 2

  26. [26]

    Choreographer: Learning and adapting skills in imagination

    Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt, Alexandre Lacoste, and Sai Rajeswar. Choreographer: Learning and adapting skills in imagination. In ICLR, 2023. 8

  27. [27]

    Genrl: Multimodal- foundation world models for generalization in embodied agents

    Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt, Aaron C Courville, and Sai Rajeswar Mudumba. Genrl: Multimodal- foundation world models for generalization in embodied agents. NeurIPS, 2024. 8

  28. [28]

    Iso-dream: Isolating and leveraging noncontrollable visual dynamics in world models

    Minting Pan, Xiangming Zhu, Yunbo Wang, and Xiaokang Yang. Iso-dream: Isolating and leveraging noncontrollable visual dynamics in world models. In NeurIPS, 2022. 1, 8

  29. [29]

    Reinforcement learning with action-free pre- training from videos

    Younggyo Seo, Kimin Lee, Stephen L James, and Pieter Abbeel. Reinforcement learning with action-free pre- training from videos. In ICML, 2022. 4, 5, 6, 8, 1

  30. [30]

    Multi-view masked world models for visual robotic manipulation

    Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jin- woo Shin, and Pieter Abbeel. Multi-view masked world models for visual robotic manipulation. In ICML, 2023. 8

  31. [31]

    A simple framework for generalization in visual rl under dynamic scene perturbations

    Wonil Song, Hyesong Choi, Kwanghoon Sohn, and Dongbo Min. A simple framework for generalization in visual rl under dynamic scene perturbations. NeurIPS, 37:121790– 121826, 2024. 8

  32. [32]

    Decoupling representation learning from reinforce- ment learning

    Adam Stooke, Kimin Lee, Pieter Abbeel, and Michael Laskin. Decoupling representation learning from reinforce- ment learning. In ICML, pages 9870–9879, 2021. 8

  33. [33]

    Transfer rl across observation feature spaces via model-based regularization

    Yanchao Sun, Ruijie Zheng, Xiyao Wang, Andrew Cohen, and Furong Huang. Transfer rl across observation feature spaces via model-based regularization. In ICLR, 2022. 8

  34. [34]

    DeepMind Control Suite

    Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Ab- dolmaleki, Josh Merel, Andrew Lefrancq, et al. Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018. 4

  35. [35]

    Mujoco: A physics engine for model-based control

    Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In IROS, 2012. 4

  36. [36]

    Making offline rl online: Collab- orative world models for offline visual reinforcement learn- ing

    Qi Wang, Junming Yang, Yunbo Wang, Xin Jin, Wenjun Zeng, and Xiaokang Yang. Making offline rl online: Collab- orative world models for offline visual reinforcement learn- ing. In NeurIPS, 2024. 8

  37. [37]

    Unsupervised visual attention and invariance for reinforcement learning

    Xudong Wang, Long Lian, and Stella X Yu. Unsupervised visual attention and invariance for reinforcement learning. In CVPR, pages 6677–6687, 2021. 4, 8, 1

  38. [38]

    Dis- entangled representation learning

    Xin Wang, Hong Chen, Zihao Wu, Wenwu Zhu, et al. Dis- entangled representation learning. TPAMI, 2024. 2

  39. [39]

    Pre-training contextualized world models with in-the-wild videos for reinforcement learning

    Jialong Wu, Haoyu Ma, Chaoyi Deng, and Mingsheng Long. Pre-training contextualized world models with in-the-wild videos for reinforcement learning. NeurIPS, 2023. 8, 1

  40. [40]

    Navinerf: Nerf-based 3d representation disentanglement by latent semantic naviga- tion

    Baao Xie, Bohan Li, Zequn Zhang, Junting Dong, Xin Jin, Jingyu Yang, and Wenjun Zeng. Navinerf: Nerf-based 3d representation disentanglement by latent semantic naviga- tion. In ICCV, 2023. 2

  41. [41]

    Graph-based unsupervised disentan- gled representation learning via multimodal large language models

    Baao Xie, Qiuyu Chen, Yunnan Wang, Zequn Zhang, Xin Jin, and Wenjun Zeng. Graph-based unsupervised disentan- gled representation learning via multimodal large language models. In NeurIPS, 2024. 2

  42. [42]

    Mastering visual continuous control: Improved data- augmented reinforcement learning

    Denis Yarats, Rob Fergus, Alessandro Lazaric, and Lerrel Pinto. Mastering visual continuous control: Improved data- augmented reinforcement learning. In ICLR, 2022. 8

  43. [43]

    Meta- world: A benchmark and evaluation for multi-task and meta reinforcement learning

    Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta- world: A benchmark and evaluation for multi-task and meta reinforcement learning. In CoRL, 2019. 4

  44. [44]

    Learning invariant representations for reinforcement learning without reconstruction

    Amy Zhang, Rowan McAllister, Roberto Calandra, Yarin Gal, and Sergey Levine. Learning invariant representations for reinforcement learning without reconstruction. In ICLR,

  45. [45]

    Prelar: World model pre-training with learnable action rep- resentation

    Lixuan Zhang, Meina Kan, Shiguang Shan, and Xilin Chen. Prelar: World model pre-training with learnable action rep- resentation. In ECCV, 2024. 1, 8

  46. [46]

    Storm: Efficient stochastic transformer based world models for reinforcement learning

    Weipu Zhang, Gang Wang, Jian Sun, Yetian Yuan, and Gao Huang. Storm: Efficient stochastic transformer based world models for reinforcement learning. In NeurIPS, 2023. 8

  47. [47]

    Taco: Temporal latent action-driven contrastive loss for visual re- inforcement learning

    Ruijie Zheng, Xiyao Wang, Yanchao Sun, Shuang Ma, Jieyu Zhao, Huazhe Xu, Hal Daum´e III, and Furong Huang. Taco: Temporal latent action-driven contrastive loss for visual re- inforcement learning. In NeurIPS, 2023. 8

  48. [48]

    Transfer learning in deep reinforcement learning: A survey

    Zhuangdi Zhu, Kaixiang Lin, Anil K Jain, and Jiayu Zhou. Transfer learning in deep reinforcement learning: A survey. TPAMI, 45(11):13344–13362, 2023. 8 Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning Supplementary Material A. Compared Baselines We compare DisWM with strong visual RL age...