arxiv: 2604.08780 · v1 · submitted 2026-04-09 · 💻 cs.RO · cs.LG

Recognition: unknown

Toward Hardware-Agnostic Quadrupedal World Models via Morphology Conditioning

Mohamad H. Danesh , Chenhao Li , Amin Abyaneh , Anas Houssaini , Kirsty Ellis , Glen Berseth , Marco Hutter , Hsiu-Chin Lin

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:50 UTC · model grok-4.3

classification 💻 cs.RO cs.LG

keywords world modelsquadrupedal locomotionmorphology conditioningzero-shot generalizationneural simulatorslegged roboticshardware-agnostic control

0 comments

The pith

A quadrupedal world model generalizes zero-shot to new robot morphologies by conditioning on their engineering specifications.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

World models in robotics typically overfit to one robot's hardware, so a model trained on one quadruped fails on another with different limb lengths or actuators. This paper instead feeds the robot's explicit engineering details directly into the generative dynamics model along with a morphology encoder and reward normalizer. The resulting model separates robot-specific traits from general environmental physics. A sympathetic reader would care because this removes the need to retrain or adapt the model when swapping hardware, potentially allowing one learned simulator to serve many quadruped platforms.

Core claim

By explicitly conditioning the generative dynamics on robot engineering specifications rather than treating physical properties as latent variables inferred from motion history, the Quadrupedal World Model disentangles environmental dynamics from morphology and functions as a neural simulator that supports zero-shot locomotion control across different quadrupedal embodiments within a bounded distribution.

What carries the argument

Morphology-conditioned generative dynamics that takes explicit engineering specifications as conditioning input to separate embodiment from environmental physics.

If this is right

Zero-shot transfer of locomotion policies to new quadruped hardware without retraining or adaptation.
Elimination of safety risks from adaptation lag that occurs when inferring morphology from motion history.
One model serving as a shared simulator across multiple quadruped designs.
Faster iteration on robot hardware because behaviors learned in the conditioned model transfer directly.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same conditioning strategy could be tested on other legged platforms such as bipeds to check whether the disentanglement principle generalizes beyond quadrupeds.
Combining the morphology encoder with limited real-world fine-tuning on a target robot might extend reliable operation beyond the current interpolation range.
Designers could use the model to simulate candidate robot geometries before building them, treating morphology as a controllable input variable.

Load-bearing premise

Explicitly supplying engineering specifications is sufficient to disentangle morphology-specific effects from shared environmental dynamics without residual confusion.

What would settle it

Measure prediction error of the trained model on a quadruped whose limb lengths or masses lie well outside the training distribution, such as a much larger or smaller robot than those seen during training, and check whether error remains low without any online adaptation.

Figures

Figures reproduced from arXiv: 2604.08780 by Amin Abyaneh, Anas Houssaini, Chenhao Li, Glen Berseth, Hsiu-Chin Lin, Kirsty Ellis, Marco Hutter, Mohamad H. Danesh.

**Figure 1.** Figure 1: Overview of the QWM framework. Left (WM Learning): We train a single generalizable WM across diverse morphologies. The [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: The heterogeneous morphology cohort used in our experi [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Learning curves comparing QWM against baselines trained [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Long-Horizon Dynamics Prediction. Left: Open-loop imagination rollouts vs. ground truth physics. QWM maintains tight synchronization with the simulator across diverse scales. Right: Quantitative Normalized Mean Squared Error (NMSE) over a 45-step horizon (N = 32 trajectories). The error is normalized by the natural variance of each robot’s motion. Shaded regions denote standard deviation. QWM exhibits natu… view at source ↗

**Figure 5.** Figure 5: Real-world deployment on Unitree Go1 and ANYmal-D. Both [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Morphological Feature Distance Matrix. We compute the Euclidean distance between the z-score standardized extracted features ( [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Ablation Study on Heterogeneous Cohort. We compare QWM against architectural ablations regarding morphology encoding (PME), [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

**Figure 8.** Figure 8: PCA of QWM latent states [ht, zt] — morphology (a) vs. dynamic state gradients (b–e). Each point is one latent observation; 32768 points are shown across all eight robots. (a) Coloring by robot identity reveals that the latent space organizes itself into morphology-specific clusters, even though robot identity is never provided as a supervised signal. (b–e) The same projection colored by four continuous dy… view at source ↗

**Figure 9.** Figure 9: t-SNE of QWM latent states [ht, zt] — morphology (a) vs. dynamic state gradients (b–e). t-SNE (perplexity = 40, 1000 iterations, KL divergence = 1.73) is applied to the same 32768-point dataset as [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗

**Figure 10.** Figure 10: Probing comparison of ht (deterministic memory) vs. zt (stochastic current observation). Each row shows PCA of one latent component colored by morphology (left) and forward speed vx (right). Top row (ht): Clear morphology-specific clustering coexists with smooth intra-cluster velocity gradients: ht encodes both the robot’s static physical identity (via µ conditioning) and its dynamic trajectory context. B… view at source ↗

read the original abstract

World models promise a paradigm shift in robotics, where an agent learns the underlying physics of its environment once to enable efficient planning and behavior learning. However, current world models are often hardware-locked specialists: a model trained on a Boston Dynamics Spot robot fails catastrophically on a Unitree Go1 due to the mismatch in kinematic and dynamic properties, as the model overfits to specific embodiment constraints rather than capturing the universal locomotion dynamics. Consequently, a slight change in actuator dynamics or limb length necessitates training a new model from scratch. In this work, we take a step towards a framework for training a generalizable Quadrupedal World Model (QWM) that disentangles environmental dynamics from robot morphology. We address the limitations of implicit system identification, where treating static physical properties (like mass or limb length) as latent variables to be inferred from motion history creates an adaptation lag that can compromise zero-shot safety and efficiency. Instead, we explicitly condition the generative dynamics on the robot's engineering specifications. By integrating a physical morphology encoder and a reward normalizer, we enable the model to serve as a neural simulator capable of generalizing across morphologies. This capability unlocks zero-shot control across a range of embodiments. We introduce, for the first time, a world model that enables zero-shot generalization to new morphologies for locomotion. While we carefully study the limitations of our method, QWM operates as a distribution-bounded interpolator within the quadrupedal morphology family rather than a universal physics engine, this work represents a significant step toward morphology-conditioned world models for legged locomotion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper conditions a quadruped world model on explicit morphology specs to reduce per-robot retraining, but the zero-shot claim looks like bounded interpolation rather than broad generalization.

read the letter

The main point is that they train a world model for quadrupedal locomotion by feeding the robot's physical parameters (limb lengths, masses, actuators) directly into the dynamics network instead of trying to infer them as hidden states from motion. They pair this with a reward normalizer so the same model can act as a simulator for control on different bodies. That explicit conditioning is the concrete step forward from standard world-model setups that overfit to one embodiment like Spot or Go1.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a Quadrupedal World Model (QWM) that explicitly conditions generative dynamics on robot engineering specifications (via a physical morphology encoder and reward normalizer) to disentangle environmental dynamics from morphology. This is intended to overcome hardware-specific overfitting in existing world models and enable zero-shot locomotion control on new quadrupedal embodiments, in contrast to implicit system identification approaches that incur adaptation lag.

Significance. If the central claims are supported by rigorous evaluation, the work would advance hardware-agnostic world models for legged robotics by providing a practical conditioning mechanism that avoids per-embodiment retraining. The explicit use of engineering specs rather than learned latents is a clear methodological choice with potential safety benefits. The paper appropriately qualifies its scope as distribution-bounded interpolation within the quadrupedal family rather than a universal engine, which keeps the contribution proportionate.

major comments (1)

[Abstract] Abstract: The claim of introducing 'for the first time, a world model that enables zero-shot generalization to new morphologies for locomotion' is load-bearing for the paper's contribution. However, the same paragraph qualifies the model as 'a distribution-bounded interpolator within the quadrupedal morphology family'. The evaluation must demonstrate that held-out test morphologies have engineering parameters (limb lengths, masses, actuator dynamics) lying outside the convex hull of the training distribution; otherwise the results reduce to interpolation and do not substantiate the asserted disentanglement or zero-shot transfer.

minor comments (1)

[Abstract] The final sentence of the abstract is a run-on that mixes a limitation statement with a significance claim; splitting it would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We address the single major comment below and commit to revisions that strengthen the alignment between claims and evaluation.

read point-by-point responses

Referee: [Abstract] Abstract: The claim of introducing 'for the first time, a world model that enables zero-shot generalization to new morphologies for locomotion' is load-bearing for the paper's contribution. However, the same paragraph qualifies the model as 'a distribution-bounded interpolator within the quadrupedal morphology family'. The evaluation must demonstrate that held-out test morphologies have engineering parameters (limb lengths, masses, actuator dynamics) lying outside the convex hull of the training distribution; otherwise the results reduce to interpolation and do not substantiate the asserted disentanglement or zero-shot transfer.

Authors: We agree that clarifying the scope of generalization is essential. In the manuscript, 'zero-shot' specifically denotes the absence of any online adaptation, fine-tuning, or latent inference from interaction history (in contrast to implicit system identification baselines). The test morphologies are held-out samples drawn from the same quadrupedal family but with parameter combinations not encountered during training. We acknowledge that this is interpolation within a bounded distribution rather than extrapolation to arbitrary embodiments. To directly address the convex-hull concern, we will add to the revised manuscript (1) a table in the experiments section listing the concrete engineering parameters (limb lengths, masses, actuator dynamics) for every training and test morphology, and (2) an explicit analysis of whether each test morphology lies inside or outside the convex hull of the training set. If any test points fall inside the hull, we will revise the abstract and introduction language to describe the results as 'strong interpolation within the quadrupedal family' while retaining the zero-shot (no-adaptation) distinction. These changes will make the evaluation fully rigorous and proportionate to the stated claims. revision: yes

Circularity Check

0 steps flagged

No circularity: method uses explicit conditioning on provided morphology parameters without self-referential definitions or fitted predictions.

full rationale

The paper's core approach—explicitly conditioning generative dynamics on engineering specifications via a morphology encoder and reward normalizer—is presented as a direct architectural choice to avoid implicit latent inference. No equations, derivations, or results in the abstract reduce a claimed prediction or generalization to a parameter fitted from the target outcome itself. The zero-shot claim is framed as an empirical outcome of training across a morphology family and evaluating held-out cases, with an explicit qualification that the model remains a bounded interpolator. This structure is self-contained and does not rely on self-citation chains, ansatzes smuggled via prior work, or renaming of known results as new derivations.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that morphology specifications are sufficient to condition dynamics models for generalization, plus standard neural network training assumptions. No new physical entities are postulated.

free parameters (2)

morphology encoder network weights
Neural network parameters fitted during training to map robot specs to latent representations.
reward normalizer parameters
Scaling factors fitted to normalize rewards across morphologies.

axioms (1)

domain assumption Explicit conditioning on static physical properties disentangles embodiment from environmental dynamics without requiring motion history inference.
Invoked in the abstract when contrasting with implicit system identification and stating the model serves as a neural simulator.

pith-pipeline@v0.9.0 · 5611 in / 1230 out tokens · 60679 ms · 2026-05-10T16:50:09.043843+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

99 extracted references · 21 canonical work pages · 2 internal anchors

[1]

Fay, Henrik I Christensen, Jan Peters, and Hao Su

Bo Ai, Liu Dai, Nico Bohlinger, Dichen Li, Tongzhou Mu, Zhanxin Wu, K. Fay, Henrik I Christensen, Jan Peters, and Hao Su. Towards embodiment scaling laws in robot locomotion. In Joseph Lim, Shuran Song, and Hae- Won Park, editors,Proceedings of The 9th Conference on Robot Learning, volume 305 ofProceedings of Machine Learning Research, pages 3483–3515. PM...

2025
[2]

Diffusion for world modeling: Visual details matter in atari.Advances in Neural Information Pro- cessing Systems, 37:58757–58791, 2024

Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kan- ervisto, Amos J Storkey, Tim Pearce, and Franc ¸ois Fleuret. Diffusion for world modeling: Visual details matter in atari.Advances in Neural Information Pro- cessing Systems, 37:58757–58791, 2024

2024
[3]

Genesis: A generative and universal physics engine for robotics and beyond, December 2024

Genesis Authors. Genesis: A generative and universal physics engine for robotics and beyond, December 2024. URL https://github.com/Genesis-Embodied-AI/Genesis

2024
[4]

Sample-efficient learning to solve a real-world labyrinth game using data-augmented model-based reinforcement learning

Thomas Bi and Raffaello D’Andrea. Sample-efficient learning to solve a real-world labyrinth game using data-augmented model-based reinforcement learning. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 7455–7460. IEEE, 2024

2024
[5]

One policy to run them all: an end- to-end learning approach to multi-embodiment locomo- tion

Nico Bohlinger, Grzegorz Czechmanowski, Maciej Piotr Krupka, Piotr Kicki, Krzysztof Walas, Jan Peters, and Davide Tateo. One policy to run them all: an end- to-end learning approach to multi-embodiment locomo- tion. In Pulkit Agrawal, Oliver Kroemer, and Wolfram Burgard, editors,Proceedings of The 8th Conference on Robot Learning, volume 270 ofProceedings...

2025
[6]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott G...

1901
[7]

Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, and Tim Rockt ¨aschel

Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Maria Elisabeth Bechtle, Feryal Behbahani, Stephanie C.Y . Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando...

2024
[8]

Learning transformer-based world models with contrastive predic- tive coding

Maxime Burchi and Radu Timofte. Learning transformer-based world models with contrastive predic- tive coding. InThe Thirteenth International Confer- ence on Learning Representations, 2025. URL https: //openreview.net/forum?id=YK9G4Htdew

2025
[9]

Mi-hgnn: Morphology-informed heterogeneous graph neural network for legged robot contact perception

Daniel Butterfield, Sandilya Sai Garimella, Nai-Jen Cheng, and Lu Gan. Mi-hgnn: Morphology-informed heterogeneous graph neural network for legged robot contact perception. In2025 IEEE International Confer- ence on Robotics and Automation (ICRA), pages 10110– 10116. IEEE, 2025

2025
[10]

Diwa: Diffusion policy adaptation with world models.Conference on Robot Learning (CoRL), 2025

Akshay L Chandra, Iman Nematollahi, Chenguang Huang, Tim Welschehold, Wolfram Burgard, and Ab- hinav Valada. Diwa: Diffusion policy adaptation with world models.Conference on Robot Learning (CoRL), 2025

2025
[11]

Transdreamer: Reinforcement learning with transformer world models.arXiv preprint arXiv:2202.09481, 2022

Chang Chen, Yi-Fu Wu, Jaesik Yoon, and Sungjin Ahn. Transdreamer: Reinforcement learning with transformer world models.arXiv preprint arXiv:2202.09481, 2022

work page arXiv 2022
[12]

Hardware conditioned policies for multi-robot transfer learning.Advances in Neural Information Processing Systems, 31, 2018

Tao Chen, Adithyavairavan Murali, and Abhinav Gupta. Hardware conditioned policies for multi-robot transfer learning.Advances in Neural Information Processing Systems, 31, 2018

2018
[13]

Mohamad H. Danesh. Heterogeneous environments in isaac lab. Technical Blog Post, 2026. URL https: //modanesh.github.io/blog/hetero-isaaclab

2026
[14]

Mohamad H. Danesh. Hetero-isaac: Heterogeneous quadrupedal simulation built atop isaac lab, 2026. URL https://github.com/modanesh/Hetero-IsaacLab

2026
[15]

Safe domain randomization via uncertainty-aware out-of-distribution detection and policy adaptation.arXiv preprint arXiv:2507.06111, 2025

Mohamad H Danesh, Maxime Wabartha, Stanley Wu, Joelle Pineau, and Hsiu-Chin Lin. Safe domain randomization via uncertainty-aware out-of-distribution detection and policy adaptation.arXiv preprint arXiv:2507.06111, 2025

work page arXiv 2025
[16]

Im- proving transformer world models for data-efficient RL

Antoine Dedieu, Joseph Ortiz, Xinghua Lou, Carter Wendelken, J Swaroop Guntupalli, Wolfgang Lehrach, Miguel Lazaro-Gredilla, and Kevin Patrick Murphy. Im- proving transformer world models for data-efficient RL. InForty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id= IajCvMJw41

2025
[17]

Pilco: a model-based and data-efficient approach to pol- icy search

Marc Peter Deisenroth and Carl Edward Rasmussen. Pilco: a model-based and data-efficient approach to pol- icy search. InProceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, page 465–472, Madison, WI, USA,
[18]

ISBN 9781450306195

Omnipress. ISBN 9781450306195
[19]

Bert: Pre-training of deep bidirec- tional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirec- tional transformers for language understanding. InPro- ceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

2019
[20]

Visual foresight: Model-based deep reinforcement learning for vision-based robotic control,

Frederik Ebert, Chelsea Finn, Sudeep Dasari, Annie Xie, Alex Lee, and Sergey Levine. Visual foresight: Model-based deep reinforcement learning for vision- based robotic control.arXiv preprint arXiv:1812.00568, 2018

work page arXiv 2018
[21]

Genloco: Gen- eralized locomotion controllers for quadrupedal robots

Gilbert Feng, Hongbo Zhang, Zhongyu Li, Xue Bin Peng, Bhuvan Basireddy, Linzhu Yue, Zhitao Song, Lizhi Yang, Yunhui Liu, Koushil Sreenath, et al. Genloco: Gen- eralized locomotion controllers for quadrupedal robots. InConference on Robot Learning, pages 1893–1903. PMLR, 2023

1903
[22]

Finetuning offline world models in the real world

Yunhai Feng, Nicklas Hansen, Ziyan Xiong, Chan- dramouli Rajagopalan, and Xiaolong Wang. Finetuning offline world models in the real world. InProceedings of the 7th Conference on Robot Learning (CoRL), 2023

2023
[23]

Focus: object-centric world models for robotic manipulation.Frontiers in Neurorobotics, 19: 1585386, 2025

Stefano Ferraro, Pietro Mazzaglia, Tim Verbelen, and Bart Dhoedt. Focus: object-centric world models for robotic manipulation.Frontiers in Neurorobotics, 19: 1585386, 2025

2025
[24]

arXiv preprint arXiv:2506.22355 (2025) 5

Pascale Fung, Yoram Bachrach, Asli Celikyilmaz, Ka- malika Chaudhuri, Delong Chen, Willy Chung, Em- manuel Dupoux, Hongyu Gong, Herv´e J´egou, Alessandro Lazaric, et al. Embodied ai agents: Modeling the world. arXiv preprint arXiv:2506.22355, 2025

work page arXiv 2025
[25]

PWM: Policy learning with multi- task world models

Ignat Georgiev, Varun Giridhar, Nicklas Hansen, and Animesh Garg. PWM: Policy learning with multi- task world models. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=hOELrZfg0J

2025
[26]

Metamorph: Learning universal controllers with transformers

Agrim Gupta, Linxi Fan, Surya Ganguli, and Li Fei- Fei. Metamorph: Learning universal controllers with transformers. InInternational Conference on Learn- ing Representations, 2022. URL https://openreview.net/ forum?id=Opmqtk GvYL

2022
[27]

Recurrent world models facilitate policy evolution

David Ha and J ¨urgen Schmidhuber. Recurrent world models facilitate policy evolution. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper files/paper/2018/ file/2de5d16682c3c35007e...

2018
[28]

Learning latent dynamics for planning from pixels

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. In International conference on machine learning, pages 2555–2565. PMLR, 2019

2019
[29]

Mastering atari with discrete world models

Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models. InInternational Conference on Learn- ing Representations, 2021. URL https://openreview.net/ forum?id=0oabwyZbOu

2021
[30]

Mastering diverse control tasks through world models.Nature, pages 1–7, 2025

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Tim- othy Lillicrap. Mastering diverse control tasks through world models.Nature, pages 1–7, 2025

2025
[31]

Td-mpc2: Scalable, robust world models for continuous control

Nicklas Hansen, Hao Su, and Xiaolong Wang. Td-mpc2: Scalable, robust world models for continuous control. In International Conference on Learning Representations, 2024

2024
[32]

Safedreamer: Safe reinforcement learning with world models

Weidong Huang, Jiaming Ji, Borong Zhang, Chunhe Xia, and Yaodong Yang. Safedreamer: Safe reinforcement learning with world models. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=tsE5HLYtYg

2024
[33]

One policy to control them all: Shared modular policies for agent-agnostic control

Wenlong Huang, Igor Mordatch, and Deepak Pathak. One policy to control them all: Shared modular policies for agent-agnostic control. InInternational Conference on Machine Learning, pages 4455–4464. PMLR, 2020

2020
[34]

Learning agile and dynamic motor skills for legged robots.Science Robotics, 4(26):eaau5872, 2019

Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots.Science Robotics, 4(26):eaau5872, 2019

2019
[35]

Dreamgen: Un- locking generalization in robot learning through neural trajectories.arXiv e-prints, pages arXiv–2505, 2025

Joel Jang, Seonghyeon Ye, Zongyu Lin, Jiannan Xiang, Johan Bjorck, Yu Fang, Fengyuan Hu, Spencer Huang, Kaushil Kundalia, Yen-Chen Lin, et al. Dreamgen: Un- locking generalization in robot learning through neural trajectories.arXiv e-prints, pages arXiv–2505, 2025

2025
[36]

Reinforce- ment learning in robotics: A survey.The International Journal of Robotics Research, 32(11):1238–1274, 2013

Jens Kober, J Andrew Bagnell, and Jan Peters. Reinforce- ment learning in robotics: A survey.The International Journal of Robotics Research, 32(11):1238–1274, 2013

2013
[37]

Demonstrating A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Ilya Kostrikov, Laura M Smith, and Sergey Levine. Demonstrating A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning. InProceedings of Robotics: Science and Systems, Daegu, Republic of Korea, July 2023. doi: 10.15607/RSS.2023. XIX.056

work page doi:10.15607/rss.2023 2023
[38]

Rma: Rapid motor adaptation for legged robots

Ashish Kumar, Zipeng Fu, Deepak Pathak, and Jitendra Malik. Rma: Rapid motor adaptation for legged robots. InProceedings of Robotics: Science and Systems, 2021

2021
[39]

World model-based perception for visual legged locomotion

Hang Lai, Jiahang Cao, Jiafeng Xu, Hongtao Wu, Yun- feng Lin, Tao Kong, Yong Yu, and Weinan Zhang. World model-based perception for visual legged locomotion. In 2025 IEEE International Conference on Robotics and Automation (ICRA), pages 11531–11537. IEEE, 2025

2025
[40]

Cqm: Curriculum reinforcement learning with a quantized world model.Advances in Neural Information Processing Systems, 36:78824–78845, 2023

Seungjae Lee, Daesol Cho, Jonghae Park, and H Jin Kim. Cqm: Curriculum reinforcement learning with a quantized world model.Advances in Neural Information Processing Systems, 36:78824–78845, 2023

2023
[41]

Offline robotic world model: Learning robotic policies without a physics simulator.arXiv preprint arXiv:2504.16680, 2025

Chenhao Li, Andreas Krause, and Marco Hutter. Offline robotic world model: Learning robotic policies without a physics simulator.arXiv preprint arXiv:2504.16680, 2025

work page arXiv 2025
[42]

Robotic world model: A neural network simulator for robust policy optimization in robotics.arXiv preprint arXiv:2501.10100, 2025

Chenhao Li, Andreas Krause, and Marco Hutter. Robotic world model: A neural network simulator for ro- bust policy optimization in robotics.arXiv preprint arXiv:2501.10100, 2025

work page arXiv 2025
[43]

Unified video action model

Shuang Li, Yihuai Gao, Dorsa Sadigh, and Shuran Song. Unified video action model. InProceedings of Robotics: Science and Systems, 2025

2025
[44]

Harmonydream: Task harmonization inside world models

Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, and Mingsheng Long. Harmonydream: Task harmonization inside world models. InInternational Conference on Machine Learn- ing, 2024

2024
[45]

GenRL: Multimodal- foundation world models for generalization in embodied agents

Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt, Aaron Courville, and Sai Rajeswar. GenRL: Multimodal- foundation world models for generalization in embodied agents. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=za9Jx8yqUA

2024
[46]

Transformers are sample-efficient world models

Vincent Micheli, Eloi Alonso, and Franc ¸ois Fleuret. Transformers are sample-efficient world models. InThe Eleventh International Conference on Learning Repre- sentations, 2023. URL https://openreview.net/forum?id= vhFu1Acb0xb

2023
[47]

Efficient world models with context-aware tokeniza- tion

Vincent Micheli, Eloi Alonso, and Franc ¸ois Fleuret. Efficient world models with context-aware tokeniza- tion. InForty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id= BiWIERWBFX

2024
[48]

Mcarl: Morphology-control-aware re- inforcement learning for generalizable quadrupedal loco- motion.arXiv preprint arXiv:2505.18418, 2025

Prakhar Mishra, Amir Hossain Raj, Xuesu Xiao, and Dinesh Manocha. Mcarl: Morphology-control-aware re- inforcement learning for generalizable quadrupedal loco- motion.arXiv preprint arXiv:2505.18418, 2025

work page arXiv 2025
[49]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Mu ˜noz, Xinjie Yao, Ren ´e Zurbr ¨ugg, Nikita Rudin, Lukasz Wawrzyniak, Milad Rakhsha, Alain Denzler, Eric Hei- den, Ales Borovicka, Ossama Ahmed, Iretiayo Akinola, Abrar Anwar, Mark T. Carlson, Ji Yuan Feng, Ani- mesh Garg, Renato Gasoto, Lionel Gulich, Yijie...

work page internal anchor Pith review arXiv 2025
[50]

Model-based reinforcement learning: A survey.Foundations and Trends® in Machine Learning, 16(1):1–118, 2023

Thomas M Moerland, Joost Broekens, Aske Plaat, Catholijn M Jonker, et al. Model-based reinforcement learning: A survey.Foundations and Trends® in Machine Learning, 16(1):1–118, 2023

2023
[51]

Deep Dynamics Models for Learning Dexterous Manipulation

Anusha Nagabandi, Kurt Konoglie, Sergey Levine, and Vikash Kumar. Deep Dynamics Models for Learning Dexterous Manipulation. InConference on Robot Learn- ing (CoRL), 2019

2019
[52]

Lumos: Language-conditioned imitation learning with world models

Iman Nematollahi, Branton DeMoss, Akshay L Chandra, Nick Hawes, Wolfram Burgard, and Ingmar Posner. Lumos: Language-conditioned imitation learning with world models. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 8219–8225,
[53]

doi: 10.1109/ICRA55743.2025.11127988

work page doi:10.1109/icra55743.2025.11127988 2025
[54]

Isaac sim

NVIDIA. Isaac sim. https://github.com/isaac-sim/ IsaacSim, 2025. Version 5.1.0, Apache-2.0 License

2025
[55]

Learning to control self- assembling morphologies: a study of generalization via modularity.Advances in Neural Information Processing Systems, 32, 2019

Deepak Pathak, Christopher Lu, Trevor Darrell, Phillip Isola, and Alexei A Efros. Learning to control self- assembling morphologies: a study of generalization via modularity.Advances in Neural Information Processing Systems, 32, 2019

2019
[56]

Bounded exploration with world model uncertainty in soft actor-critic reinforcement learning algorithm.arXiv preprint arXiv:2412.06139, 2024

Ting Qiao, Henry Williams, David Valencia, and Bruce MacDonald. Bounded exploration with world model uncertainty in soft actor-critic reinforcement learning algorithm.arXiv preprint arXiv:2412.06139, 2024

work page arXiv 2024
[57]

General agents need world models

Jonathan Richens, Tom Everitt, and David Abel. General agents need world models. InForty-second International Conference on Machine Learning, 2025

2025
[58]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[59]

Planning to explore via self-supervised world models

Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. Planning to explore via self-supervised world models. InInter- national conference on machine learning, pages 8583–
[60]

Body transformer: Leveraging robot embodiment for policy learning

Carmelo Sferrazza, Dun-Ming Huang, Fangchen Liu, Jongmin Lee, and Pieter Abbeel. Body transformer: Leveraging robot embodiment for policy learning. In 8th Annual Conference on Robot Learning, 2024. URL https://openreview.net/forum?id=Oce2215aJE

2024
[61]

Manyquadrupeds: Learning a single locomotion pol- icy for diverse quadruped robots.arXiv preprint arXiv:2310.10486, 2023

Milad Shafiee, Guillaume Bellegarda, and Auke Ijspeert. Manyquadrupeds: Learning a single locomotion pol- icy for diverse quadruped robots.arXiv preprint arXiv:2310.10486, 2023

work page arXiv 2023
[62]

Mastering the game of go without human knowledge

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017

2017
[63]

Richard S. Sutton. Dyna, an integrated architecture for learning, planning, and reacting.SIGART Bull., 2(4): 160–163, July 1991. ISSN 0163-5719. doi: 10.1145/ 122344.122377. URL https://doi.org/10.1145/122344. 122377

work page doi:10.1145/122344 1991
[64]

Anymorph: Learning transferable polices by inferring agent morphology

Brandon Trabucco, Mariano Phielipp, and Glen Berseth. Anymorph: Learning transferable polices by inferring agent morphology. InInternational Conference on Ma- chine Learning, pages 21677–21691. PMLR, 2022

2022
[65]

Making offline RL online: Collaborative world models for offline visual reinforce- ment learning

Qi Wang, Junming Yang, Yunbo Wang, Xin Jin, Wenjun Zeng, and Xiaokang Yang. Making offline RL online: Collaborative world models for offline visual reinforce- ment learning. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=ucxQrked0d

2024
[66]

Cross- embodiment robot manipulation skill transfer using la- tent space alignment,

Tianyu Wang, Dwait Bhatt, Xiaolong Wang, and Niko- lay Atanasov. Cross-embodiment robot manipulation skill transfer using latent space alignment.CoRR, abs/2406.01968, 2024. URL https://doi.org/10.48550/ arXiv.2406.01968

work page arXiv 2024
[67]

Nervenet: Learning structured policy with graph neural networks

Tingwu Wang, Renjie Liao, Jimmy Ba, and Sanja Fidler. Nervenet: Learning structured policy with graph neural networks. InInternational Conference on Learning Rep- resentations, 2018. URL https://openreview.net/forum? id=S1sqHMZCb

2018
[68]

Drama: Mamba-enabled model-based reinforcement learning is sample and parameter efficient

Wenlong Wang, Ivana Dusparic, Yucheng Shi, Ke Zhang, and Vinny Cahill. Drama: Mamba-enabled model-based reinforcement learning is sample and parameter efficient. InThe Thirteenth International Conference on Learn- ing Representations, 2025. URL https://openreview.net/ forum?id=7XIkRgYjK3

2025
[69]

Parallelizing Model-based Reinforcement Learning Over the Sequence Length

ZiRui Wang, Yue Deng, Junfeng Long, and Yin Zhang. Parallelizing Model-based Reinforcement Learning Over the Sequence Length. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, November 2024

2024
[70]

Ms- ppo: Morphological-symmetry-equivariant policy for legged robot locomotion,

Sizhe Wei, Xulin Chen, Fengze Xie, Garrett Ethan Katz, Zhenyu Gan, and Lu Gan. Ms-ppo: Morphological- symmetry-equivariant policy for legged robot locomo- tion.arXiv preprint arXiv:2512.00727, 2025

work page arXiv 2025
[71]

Learning modular robot control policies.IEEE Transac- tions on Robotics, 39(5):4095–4113, 2023

Julian Whitman, Matthew Travers, and Howie Choset. Learning modular robot control policies.IEEE Transac- tions on Robotics, 39(5):4095–4113, 2023

2023
[72]

Daydreamer: World models for physical robot learning

Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. In Karen Liu, Dana Kulic, and Jeff Ichnowski, editors,Proceedings of The 6th Conference on Robot Learning, volume 205 ofProceed- ings of Machine Learning Research, pages 2226–2240. PMLR, 14–18 Dec 2023. URL https://proce...

2023
[73]

V ocaloco: Viability- optimized cost-aware adaptive locomotion.IEEE Robotics and Automation Letters, 11(2):1146–1153, 2025

Stanley Wu, Mohamad H Danesh, Simon Li, Hanna Yurchyk, Amin Abyaneh, Anas El Houssaini, David Meger, and Hsiu-Chin Lin. V ocaloco: Viability- optimized cost-aware adaptive locomotion.IEEE Robotics and Automation Letters, 11(2):1146–1153, 2025

2025
[74]

Unilegs: Universal multi-legged robot control through morphology-agnostic policy distil- lation.IEEE/RSJ International Conference on Intelligent Robots and Systems, 2025

Weijie Xi, Zhanxiang Cao, Chenlin Ming, Jianying Zheng, and Guyue Zhou. Unilegs: Universal multi-legged robot control through morphology-agnostic policy distil- lation.IEEE/RSJ International Conference on Intelligent Robots and Systems, 2025

2025
[75]

Xing, and Zhiting Hu

Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, and Zhiting Hu. Pandora: Towards general world model with natural language actions and video states.arXiv preprint arXiv:2406.09455, 2024

work page arXiv 2024
[76]

Morphological-symmetry-equivariant heteroge- neous graph neural network for robotic dynamics learn- ing

Fengze Xie, Sizhe Wei, Yue Song, Yisong Yue, and Lu Gan. Morphological-symmetry-equivariant heteroge- neous graph neural network for robotic dynamics learn- ing. In Necmiye Ozay, Laura Balzano, Dimitra Panagou, and Alessandro Abate, editors,Proceedings of the 7th Annual Learning for Dynamics & Control Confer- ence, volume 283 ofProceedings of Machine ...

2025
[77]

Uni- versal Morphology Control via Contextual Modulation

Zheng Xiong, Jacob Beck, and Shimon Whiteson. Uni- versal Morphology Control via Contextual Modulation. InProceedings of the 40th International Conference on Machine Learning, pages 38286–38300. PMLR, July 2023

2023
[78]

TWIST: Teacher-Student World Model Distillation for Efficient Sim-to-Real Transfer, November 2023

Jun Yamada, Marc Rigter, Jack Collins, and Ingmar Pos- ner. TWIST: Teacher-Student World Model Distillation for Efficient Sim-to-Real Transfer, November 2023

2023
[79]

Thirty-Fifth

Denis Yarats, Amy Zhang, Ilya Kostrikov, Brandon Amos, Joelle Pineau, and Rob Fergus. Improving sample efficiency in model-free reinforcement learning from images.Proceedings of the AAAI Conference on Artificial Intelligence, 35(12):10674–10681, May 2021. doi: 10.1609/aaai.v35i12.17276. URL https://ojs.aaai. org/index.php/AAAI/article/view/17276

work page doi:10.1609/aaai.v35i12.17276 2021
[80]

Karen Liu, and Greg Turk

Wenhao Yu, Jie Tan, C. Karen Liu, and Greg Turk. Preparing for the unknown: Learning a universal policy with online system identification. InProceedings of Robotics: Science and Systems, 2017

2017

Showing first 80 references.