pith. machine review for the scientific record. sign in

arxiv: 2605.05951 · v1 · submitted 2026-05-07 · 💻 cs.AI

Recognition: unknown

HaM-World: Soft-Hamiltonian World Models with Selective Memory for Planning

Authors on Pith no claims yet

Pith reviewed 2026-05-08 10:51 UTC · model grok-4.3

classification 💻 cs.AI
keywords world modelsHamiltonian dynamicsmodel-based planningDeepMind Control Suiteselective state-space modelsMambareinforcement learning
0
0 comments X

The pith

A world model that splits latent states into a soft-Hamiltonian (position-momentum) subspace and a separate context memory with selective state-space updates stabilizes long-horizon planning and improves robustness to dynamics shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard world-model latents lack both history-conditioned memory for approximate Markov completeness and geometric separation of configuration, momentum, and task semantics, causing imagined rollouts to diverge over longer horizons or under distribution shifts. By decomposing the latent into a canonical (q, p) subspace that evolves according to an energy-derived Hamiltonian vector field plus learnable residuals, and a context subspace c that receives Mamba selective memory, the planner obtains a single structured state usable for dynamics prediction, reward estimation, rollouts, and cross-entropy method action search. A sympathetic reader would care because these two structures directly address the sources of instability that currently limit model-based reinforcement learning on real control tasks. Empirical results on four DeepMind Control Suite domains show the highest average AUC, long-horizon rollout error reduced to 45 percent of a strong baseline, and superior returns under twelve out-of-distribution perturbations.

Core claim

HaM-World decomposes the latent state into a canonical (q, p) subspace that evolves through an energy-derived Hamiltonian vector field plus learnable residual and control dynamics, together with a context subspace c informed by Mamba selective state-space memory. This single latent representation is shared by dynamics prediction, reward and value estimation, imagined rollouts, and CEM planning. On four DeepMind Control Suite tasks the model records the highest average AUC of 117.9, reduces long-horizon rollout error to 45 percent of a strong baseline, and wins 11 of 12 cells in multi-horizon MSE comparisons. Under twelve out-of-distribution perturbations that include dynamics shifts, action,

What carries the argument

The soft-Hamiltonian vector field acting on the (q, p) subspace, augmented by learnable residuals and fed by Mamba selective memory into the context subspace c.

If this is right

  • Bounded action-free Hamiltonian-energy drift across rollouts
  • Structured energy variation that increases under policy-driven rollouts
  • Coherent control-induced energy transfer between subspaces
  • Highest return in every tested out-of-distribution condition
  • Consistent wins across 3-, 5-, and 7-step MSE cells on control tasks

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same (q, p) plus context decomposition could be grafted onto other world-model backbones to test whether the geometric organization itself, rather than the specific Mamba memory, drives the stability gains.
  • If the soft-Hamiltonian constraint truly limits energy drift, the approach may extend naturally to domains where physical conservation laws are known, such as robotic manipulation with contact dynamics.
  • A direct test on tasks with longer horizons or partial observability would reveal whether the selective memory continues to supply sufficient Markov completeness when the planning horizon exceeds the training distribution.
  • The observed energy-transfer diagnostics suggest that adding explicit dissipation terms to the c subspace might further reduce error accumulation in highly dissipative environments.

Load-bearing premise

Decomposing the latent state into a canonical (q, p) subspace evolving through an energy-derived Hamiltonian vector field plus learnable residuals, together with a context subspace c and Mamba memory, provides approximate Markov completeness and geometric organization sufficient to stabilize planning.

What would settle it

Long-horizon rollouts that exhibit unbounded Hamiltonian-energy drift or HaM-World failing to outperform strong baselines on additional tasks under the same twelve out-of-distribution perturbations would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.05951 by Haodong Cui, Haoyun Tang, Keyao Xu, Kun Wang, Zhandong Mei.

Figure 1
Figure 1. Figure 1: Architecture of HaM-World. Observations are encoded into Hamiltonian state view at source ↗
Figure 2
Figure 2. Figure 2: Return learning curves over four tasks (mean curve with seed envelope). HaM-World view at source ↗
Figure 3
Figure 3. Figure 3: Cheetah Run Hamiltonian diagnostics. From left to right: Hamiltonian energy under view at source ↗
Figure 4
Figure 4. Figure 4: H contours and control coupling. The same diagnostic explains how external control moves rollouts across energy layers: high |push| = |∇pH · control| consistently yields larger equipotential crossing rates across thresholds. (D) Visualizing Hamiltonian geometry and control coupling. Figure 4a visualizes the learned Hamiltonian on a canonical slice (q5, p5): color denotes H(q, p), white curves are energy co… view at source ↗
Figure 5
Figure 5. Figure 5: Representative keyframes from one best-seed evaluation rollout, with one row per task and uniformly sampled view at source ↗
read the original abstract

World models enable model-based planning through learned latent dynamics, but imagined rollouts become unstable as the planning horizon grows or the dynamics distribution shifts. We argue that this instability reflects two missing structures in planner-facing latents: history-conditioned memory for approximate Markov completeness, and geometric organization that separates configuration, momentum, and task semantics. We propose HaM-World (HMW), a structured world model that decomposes the latent state into a canonical (q, p) subspace and a context subspace c, while using Mamba selective state-space memory as the history-conditioned input to the same latent dynamics. Within this interface, (q, p) evolves through an energy-derived Hamiltonian vector field plus learnable residual/control dynamics, while c captures semantic, dissipative, and non-conservative factors. This gives the planner a single latent state shared by dynamics prediction, reward/value estimation, imagined rollouts, and CEM action search. On four DeepMind Control Suite tasks, HaM-World reaches the highest Avg. AUC (117.9, +9.5%), reduces long-horizon rollout error to 45% of a strong baseline model, and wins 11/12 k in {3,5,7} MSE cells. Under 12 OOD perturbations spanning dynamics shifts, action delay, and observation masking, HaM-World achieves the highest return in every condition, with average OOD-return gains of 10.2% on Finger Spin and 13.6% on Reacher Easy. Mechanism diagnostics further show bounded action-free Hamiltonian-energy drift, structured energy variation under policy rollouts, and coherent control-induced energy transfer, supporting the intended Soft-Hamiltonian dynamics design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes HaM-World, a structured world model for model-based planning that decomposes the latent state into a canonical (q, p) subspace evolving under an energy-derived soft-Hamiltonian vector field plus learnable residuals, a context subspace c capturing semantic and dissipative factors, and Mamba selective state-space memory for history-conditioned dynamics. The shared latent state supports dynamics prediction, reward estimation, imagined rollouts, and CEM planning. On four DeepMind Control Suite tasks, it reports the highest Avg. AUC (117.9, +9.5%), long-horizon rollout error reduced to 45% of a strong baseline, wins in 11/12 MSE cells across horizons, and highest returns under all 12 OOD perturbations (dynamics shifts, action delay, observation masking), with mechanism diagnostics showing bounded energy drift.

Significance. If the results hold, this demonstrates that injecting Hamiltonian geometric structure alongside selective memory into planner-facing latents can stabilize long-horizon rollouts and improve OOD robustness in world models. The paper supplies concrete, multi-task empirical gains on standard benchmarks together with energy-drift diagnostics that directly test the intended soft-Hamiltonian design.

major comments (2)
  1. [Section 5.2 and 5.3] The central claim attributes rollout stability and OOD gains to the joint (q, p) Hamiltonian evolution plus Mamba memory, yet the manuscript contains no ablation that retains the latent decomposition and Mamba input while replacing the energy-derived Hamiltonian vector field with an unstructured learned dynamics function. Without this control, the 55% rollout-error reduction and consistent OOD gains cannot be attributed to the Hamiltonian term rather than the decomposition or memory alone (abstract; Section 5.2 rollout results; Section 5.3 mechanism diagnostics).
  2. [Section 5.3] The mechanism diagnostics confirm bounded action-free energy drift and coherent control-induced energy transfer under the full model, but provide no comparative diagnostics under a non-Hamiltonian dynamics variant; this leaves open whether the geometric organization is load-bearing for the reported stability (Section 5.3).
minor comments (2)
  1. [Abstract] The abstract states 'wins 11/12 k in {3,5,7} MSE cells' without defining k or the cells; this should be clarified or cross-referenced to the corresponding table or figure.
  2. [Model description] The precise mathematical form combining the Hamiltonian vector field with the learnable residual/control dynamics is described in prose but would benefit from an explicit equation in the model section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our paper. We address the major concerns regarding the attribution of performance gains to the Hamiltonian structure by committing to additional ablation studies in the revised manuscript.

read point-by-point responses
  1. Referee: [Section 5.2 and 5.3] The central claim attributes rollout stability and OOD gains to the joint (q, p) Hamiltonian evolution plus Mamba memory, yet the manuscript contains no ablation that retains the latent decomposition and Mamba input while replacing the energy-derived Hamiltonian vector field with an unstructured learned dynamics function. Without this control, the 55% rollout-error reduction and consistent OOD gains cannot be attributed to the Hamiltonian term rather than the decomposition or memory alone (abstract; Section 5.2 rollout results; Section 5.3 mechanism diagnostics).

    Authors: We agree that an ablation isolating the Hamiltonian vector field is necessary to strengthen the causal attribution of the observed improvements. The current experiments demonstrate the overall effectiveness of HaM-World, but do not directly compare against a non-Hamiltonian dynamics model with the same decomposition and memory components. In the revised version, we will introduce this control experiment: a variant that uses the same latent decomposition into (q, p) and c subspaces, Mamba memory, but replaces the energy-derived soft-Hamiltonian dynamics with a standard learned residual dynamics function (e.g., MLP-based). We will evaluate this variant on the same rollout error metrics, OOD perturbations, and include energy diagnostics for comparison. This will be detailed in updated Sections 5.2 and 5.3, allowing readers to assess the specific contribution of the geometric structure. revision: yes

  2. Referee: [Section 5.3] The mechanism diagnostics confirm bounded action-free energy drift and coherent control-induced energy transfer under the full model, but provide no comparative diagnostics under a non-Hamiltonian dynamics variant; this leaves open whether the geometric organization is load-bearing for the reported stability (Section 5.3).

    Authors: We concur that comparative energy diagnostics are important to confirm that the bounded drift and structured energy behavior are due to the soft-Hamiltonian design rather than other model components. As part of the ablation study outlined in response to the first comment, we will compute and report the action-free energy drift and control-induced energy transfer for the non-Hamiltonian variant. This will provide direct evidence on whether the geometric organization is load-bearing for stability. These results will be incorporated into Section 5.3 of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No load-bearing circularity; performance claims rest on external benchmark comparisons

full rationale

The paper's core contribution is an architectural proposal (latent decomposition into (q,p) + c with soft-Hamiltonian evolution plus Mamba memory) whose value is asserted via direct empirical comparisons on DeepMind Control Suite tasks, long-horizon MSE, and 12 OOD perturbations. No equation or 'prediction' is shown to equal its own fitted inputs by construction, and no uniqueness theorem or ansatz is imported via self-citation to force the design. The reported gains (AUC 117.9, 55% error reduction, consistent OOD wins) are therefore falsifiable against baselines rather than tautological. A score of 2 accounts for the possibility of minor unexamined self-citations in the full text while confirming the central claims remain independently testable.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that latent dynamics admit a useful decomposition into conservative Hamiltonian and context components, plus learned parameters for the full model.

free parameters (1)
  • neural network weights for dynamics, memory, and reward models
    All components are learned from data; no specific count or values given in abstract.
axioms (1)
  • domain assumption Latent states can be structured into (q, p) and c subspaces with Hamiltonian evolution providing geometric organization and approximate Markov completeness
    Invoked to justify the interface for dynamics prediction, rollouts, and planning.
invented entities (1)
  • Soft-Hamiltonian vector field in latent (q, p) subspace no independent evidence
    purpose: To model conservative dynamics while allowing learnable residuals and control
    New postulated structure for the dynamics; no independent evidence outside the model performance.

pith-pipeline@v0.9.0 · 5615 in / 1390 out tokens · 41598 ms · 2026-05-08T10:51:35.495355+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 3 canonical work pages · 3 internal anchors

  1. [1]

    Recurrent world models facilitate policy evolution.Ad- vances in neural information processing systems, 31, 2018

    David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution.Ad- vances in neural information processing systems, 31, 2018

  2. [2]

    Mastering diverse control tasks through world models.Nature, 640(8059):647–653, 2025

    Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse control tasks through world models.Nature, 640(8059):647–653, 2025

  3. [3]

    Hamiltonian neural networks.Advances in neural information processing systems, 32, 2019

    Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks.Advances in neural information processing systems, 32, 2019

  4. [4]

    Symplectic ode-net: Learning hamiltonian dynamics with control

    Yaofeng Desmond Zhong, Biswadip Dey, and Amit Chakraborty. Symplectic ode-net: Learning hamiltonian dynamics with control. InInternational Conference on Learning Representations, 2020

  5. [5]

    When to trust your model: Model-based policy optimization.Advances in neural information processing systems, 32, 2019

    Michael Janner, Justin Fu, Marvin Zhang, and Sergey Levine. When to trust your model: Model-based policy optimization.Advances in neural information processing systems, 32, 2019

  6. [6]

    Denoised MDPs: Learning world models better than the world itself

    Tongzhou Wang, Simon Du, Antonio Torralba, Phillip Isola, Amy Zhang, and Yuandong Tian. Denoised MDPs: Learning world models better than the world itself. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of ...

  7. [7]

    Learning world models with identifiable factorization.Advances in Neural Information Processing Systems, 36:31831–31864, 2023

    Yuren Liu, Biwei Huang, Zhengmao Zhu, Honglong Tian, Mingming Gong, Yang Yu, and Kun Zhang. Learning world models with identifiable factorization.Advances in Neural Information Processing Systems, 36:31831–31864, 2023

  8. [8]

    Mamba: Linear-time sequence modeling with selective state spaces

    Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. In First Conference on Language Modeling, 2024

  9. [9]

    Action-conditioned hamiltonian generative networks (ac-hgn) for supervised and reinforcement learning

    Arne Troch, Kevin Mets, and Siegfried Mercelis. Action-conditioned hamiltonian generative networks (ac-hgn) for supervised and reinforcement learning. In7th Annual Learning for Dynamics & Control Conference, 04-06 June, 2025, Ann Arbor, Michigan, USA, pages 310–322, 2025

  10. [10]

    Self-supervised learning from images with a joint- embedding predictive architecture

    Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint- embedding predictive architecture. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15619–15629, 2023

  11. [11]

    The cross-entropy method for combinatorial and continuous optimization

    Reuven Y Rubinstein. The cross-entropy method for combinatorial and continuous optimization. Methodology and computing in applied probability, 1(2):127–190, 1999

  12. [12]

    DeepMind Control Suite

    Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, and Martin Riedmiller. Deepmind control suite.arXiv preprint arXiv:1801.00690, 2018

  13. [13]

    Dyna, an integrated architecture for learning, planning, and reacting.ACM Sigart Bulletin, 2(4):160–163, 1991

    Richard S Sutton. Dyna, an integrated architecture for learning, planning, and reacting.ACM Sigart Bulletin, 2(4):160–163, 1991

  14. [14]

    Learning latent dynamics for planning from pixels

    Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InInternational conference on machine learning, pages 2555–2565. PMLR, 2019

  15. [15]

    Dream to control: Learning behaviors by latent imagination

    Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representa- tions, 2020

  16. [16]

    Mastering atari with discrete world models

    Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models. InInternational Conference on Learning Representations, 2021. 10

  17. [17]

    Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

    Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Si- mon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

  18. [18]

    Temporal difference learning for model predictive control

    Nicklas A Hansen, Hao Su, and Xiaolong Wang. Temporal difference learning for model predictive control. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 8387–8406. PMLR, 1...

  19. [19]

    TD-MPC2: Scalable, robust world models for continuous control

    Nicklas Hansen, Hao Su, and Xiaolong Wang. TD-MPC2: Scalable, robust world models for continuous control. InThe Twelfth International Conference on Learning Representations, 2024

  20. [20]

    Learning hierarchical world models with adaptive temporal abstractions from discrete latent dynamics

    Christian Gumbsch, Noor Sajid, Georg Martius, and Martin V Butz. Learning hierarchical world models with adaptive temporal abstractions from discrete latent dynamics. InThe Twelfth International Conference on Learning Representations, 2024

  21. [21]

    Bootstrapped model predictive control

    Yuhang Wang, Hanwei Guo, Sizhe Wang, Long Qian, and Xuguang Lan. Bootstrapped model predictive control. In Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, editors,International Conference on Learning Representations, volume 2025, pages 54241–54259, 2025

  22. [22]

    PWM: Policy learn- ing with multi-task world models

    Ignat Georgiev, Varun Giridhar, Nicklas Hansen, and Animesh Garg. PWM: Policy learn- ing with multi-task world models. InThe Thirteenth International Conference on Learning Representations, 2025

  23. [23]

    Parallelizing model-based reinforcement learning over the sequence length.Advances in Neural Information Processing Systems, 37: 131398–131433, 2024

    ZiRui Wang, Yue Deng, Junfeng Long, and Yin Zhang. Parallelizing model-based reinforcement learning over the sequence length.Advances in Neural Information Processing Systems, 37: 131398–131433, 2024

  24. [24]

    SOMBRL: Scalable and optimistic model-based RL

    Bhavya Sukhija, Lenart Treven, Carmelo Sferrazza, Florian Dorfler, Pieter Abbeel, and Andreas Krause. SOMBRL: Scalable and optimistic model-based RL. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  25. [25]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  26. [26]

    Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor

    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning, pages 1861–1870. Pmlr, 2018

  27. [27]

    Transformers are sample-efficient world models

    Vincent Micheli, Eloi Alonso, and François Fleuret. Transformers are sample-efficient world models. InThe Eleventh International Conference on Learning Representations, 2023

  28. [28]

    Storm: Efficient stochastic transformer based world models for reinforcement learning.Advances in Neural Information Processing Systems, 36:27147–27166, 2023

    Weipu Zhang, Gang Wang, Jian Sun, Yetian Yuan, and Gao Huang. Storm: Efficient stochastic transformer based world models for reinforcement learning.Advances in Neural Information Processing Systems, 36:27147–27166, 2023

  29. [29]

    ivideogpt: Interactive videogpts are scalable world models.Advances in Neural Information Processing Systems, 37:68082–68119, 2024

    Jialong Wu, Shaofeng Yin, Ningya Feng, Xu He, Dong Li, Jianye Hao, and Mingsheng Long. ivideogpt: Interactive videogpts are scalable world models.Advances in Neural Information Processing Systems, 37:68082–68119, 2024

  30. [30]

    Diffusion for world modeling: Visual details matter in atari.Advances in Neural Information Processing Systems, 37:58757–58791, 2024

    Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, and François Fleuret. Diffusion for world modeling: Visual details matter in atari.Advances in Neural Information Processing Systems, 37:58757–58791, 2024

  31. [31]

    EDELINE: Enhancing memory in diffusion-based world models via linear-time sequence modeling

    Jia-Hua Lee, Bor-Jiun Lin, Wei-Fang Sun, and Chun-Yi Lee. EDELINE: Enhancing memory in diffusion-based world models via linear-time sequence modeling. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  32. [32]

    Genie: Generative interactive environments

    Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, et al. Genie: Generative interactive environments. InForty-first International Conference on Machine Learning, 2024. 11

  33. [33]

    Learning interactive real-world simulators

    Sherry Yang, Yilun Du, Seyed Kamyar Seyed Ghasemipour, Jonathan Tompson, Leslie Pack Kaelbling, Dale Schuurmans, and Pieter Abbeel. Learning interactive real-world simulators. In The Twelfth International Conference on Learning Representations, 2024

  34. [34]

    Genrl: Multimodal-foundation world models for generalization in embodied agents.Advances in neural information processing systems, 37:27529–27555, 2024

    Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt, Aaron Courville, and Sai Rajeswar. Genrl: Multimodal-foundation world models for generalization in embodied agents.Advances in neural information processing systems, 37:27529–27555, 2024

  35. [35]

    Operator world models for reinforcement learning.Advances in Neural Information Processing Systems, 37: 111432–111463, 2024

    Pietro Novelli, Marco Pratticò, Massimiliano Pontil, and Carlo Ciliberto. Operator world models for reinforcement learning.Advances in Neural Information Processing Systems, 37: 111432–111463, 2024

  36. [36]

    Policy-shaped prediction: avoiding distractions in model-based reinforcement learning.Advances in Neural Information Processing Systems, 37: 13124–13148, 2024

    Miles Hutson, Isaac Kauvar, and Nick Haber. Policy-shaped prediction: avoiding distractions in model-based reinforcement learning.Advances in Neural Information Processing Systems, 37: 13124–13148, 2024

  37. [37]

    Understanding world or predicting future? a comprehensive survey of world models.ACM Computing Surveys, 58(3):1–38, 2025

    Jingtao Ding, Yunke Zhang, Yu Shang, Yuheng Zhang, Zefang Zong, Jie Feng, Yuan Yuan, Hongyuan Su, Nian Li, Nicholas Sukiennik, et al. Understanding world or predicting future? a comprehensive survey of world models.ACM Computing Surveys, 58(3):1–38, 2025

  38. [38]

    Revisiting feature prediction for learning visual representations from video.Transactions on Machine Learning Research, 2024

    Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann LeCun, Mido Assran, and Nicolas Ballas. Revisiting feature prediction for learning visual representations from video.Transactions on Machine Learning Research, 2024. ISSN 2835-8856. Featured Certification

  39. [39]

    Connecting joint-embedding predictive architecture with contrastive self-supervised learning.Advances in neural information processing systems, 37: 2348–2377, 2024

    Shentong Mo and Shengbang Tong. Connecting joint-embedding predictive architecture with contrastive self-supervised learning.Advances in neural information processing systems, 37: 2348–2377, 2024

  40. [40]

    V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

    Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, et al. V-jepa 2: Self-supervised video models enable understanding, prediction and planning.arXiv preprint arXiv:2506.09985, 2025

  41. [41]

    Learning invariant representations for reinforcement learning without reconstruction

    Amy Zhang, Rowan Thomas McAllister, Roberto Calandra, Yarin Gal, and Sergey Levine. Learning invariant representations for reinforcement learning without reconstruction. InInter- national Conference on Learning Representations, 2021

  42. [42]

    Dreaming: Model-based reinforcement learning by latent imagination without reconstruction

    Masashi Okada and Tadahiro Taniguchi. Dreaming: Model-based reinforcement learning by latent imagination without reconstruction. In2021 ieee international conference on robotics and automation (icra), pages 4209–4215. IEEE, 2021

  43. [43]

    Dreamerpro: Reconstruction-free model-based rein- forcement learning with prototypical representations

    Fei Deng, Ingook Jang, and Sungjin Ahn. Dreamerpro: Reconstruction-free model-based rein- forcement learning with prototypical representations. InInternational conference on machine learning, pages 4956–4975. PMLR, 2022

  44. [44]

    TD- JEPA: Latent-predictive representations for zero-shot reinforcement learning

    Marco Bagatella, Matteo Pirotta, Ahmed Touati, Alessandro Lazaric, and Andrea Tirinzoni. TD- JEPA: Latent-predictive representations for zero-shot reinforcement learning. InThe Fourteenth International Conference on Learning Representations, 2026

  45. [45]

    Solar: Deep structured representations for model-based reinforcement learning

    Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew Johnson, and Sergey Levine. Solar: Deep structured representations for model-based reinforcement learning. In International conference on machine learning, pages 7444–7453. PMLR, 2019

  46. [46]

    Planning to explore via self-supervised world models

    Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. Planning to explore via self-supervised world models. InInternational conference on machine learning, pages 8583–8592. PMLR, 2020

  47. [47]

    Disentangled world models: Learning to transfer semantic knowledge from distracting videos for reinforcement learning

    Qi Wang, Zhipeng Zhang, Baao Xie, Xin Jin, Yunbo Wang, Shiyu Wang, Liaomo Zheng, Xiaokang Yang, and Wenjun Zeng. Disentangled world models: Learning to transfer semantic knowledge from distracting videos for reinforcement learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2599–2608, 2025. 12

  48. [48]

    Dymodreamer: World modeling with dynamic modulation

    Boxuan Zhang, Runqing Wang, Wei Xiao, Weipu Zhang, Jian Sun, Gao Huang, Jie Chen, and Gang Wang. Dymodreamer: World modeling with dynamic modulation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  49. [49]

    Banerjee

    Frank Röder, Jan Benad, Manfred Eppe, and Pradeep Kr. Banerjee. Dynamics-aligned latent imagination in contextual world models for zero-shot generalization. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  50. [50]

    DMWM: Dual-mind world model with long-term imagination

    Lingyi Wang, Rashed Shelim, Walid Saad, and Naren Ramakrishnan. DMWM: Dual-mind world model with long-term imagination. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  51. [51]

    Roth, Dominik K

    Fabian J. Roth, Dominik K. Klein, Maximilian Kannapinn, Jan Peters, and Oliver Weeger. Stable port-hamiltonian neural networks. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  52. [52]

    Roboscape: Physics-informed embodied world model

    Yu Shang, Xin Zhang, Yinzhou Tang, Lei Jin, Chen Gao, Wei Wu, and Yong Li. Roboscape: Physics-informed embodied world model. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  53. [53]

    Spectral normalization for generative adversarial networks

    Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. InInternational Conference on Learning Representations, 2018

  54. [54]

    Structured state space models for in-context reinforcement learning

    Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, and Feryal Behbahani. Structured state space models for in-context reinforcement learning. Advances in Neural Information Processing Systems, 36:47016–47031, 2023

  55. [55]

    Decision mamba: A multi-grained state space model with self-evolution regularization for offline rl.Advances in neural information processing systems, 37:22827–22849, 2024

    Qi Lv, Xiang Deng, Gongwei Chen, Michael Y Wang, and Liqiang Nie. Decision mamba: A multi-grained state space model with self-evolution regularization for offline rl.Advances in neural information processing systems, 37:22827–22849, 2024

  56. [56]

    Accelerating model- based reinforcement learning with state-space world models

    Elie Aljalbout, Maria Krinner, Angel Romero, and Davide Scaramuzza. Accelerating model- based reinforcement learning with state-space world models. InICLR 2025 Workshop on World Models: Understanding, Modelling and Scaling, 2025

  57. [57]

    Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality

    Tri Dao and Albert Gu. Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProceeding...