pith. machine review for the scientific record. sign in

arxiv: 2603.09030 · v3 · submitted 2026-03-09 · 💻 cs.RO · cs.AI

Recognition: 2 theorem links

· Lean Theorem

PlayWorld: Learning Robot World Models from Autonomous Play

Authors on Pith no claims yet

Pith reviewed 2026-05-15 13:56 UTC · model grok-4.3

classification 💻 cs.RO cs.AI
keywords robot learningworld modelsself-playvideo predictionmanipulationunsupervised learningphysical simulation
0
0 comments X

The pith

PlayWorld trains accurate robot world models solely from unsupervised self-play without human demonstrations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PlayWorld as a method to learn video-based world models for robots using only data from autonomous robot play sessions. This approach avoids reliance on human-collected demonstrations that are biased toward successful outcomes. By generating its own diverse interaction data, the system captures rare physical events like complex object contacts. Experiments demonstrate that models trained this way make better predictions for manipulation tasks and support more effective policy learning in simulation that transfers to reality.

Core claim

PlayWorld is the first pipeline that trains high-fidelity action-conditioned video models entirely from unsupervised robot self-play, capturing long-tailed physical interactions that improve failure prediction by up to 40 percent and real-world policy success by 65 percent compared to models trained on human data.

What carries the argument

The autonomous self-play data collection pipeline combined with action-conditioned video model training, which enables learning from diverse, contact-rich trajectories without task success signals.

Load-bearing premise

That unsupervised self-play by the robot will naturally produce a sufficient variety of complex physical interactions without any guidance toward task-relevant behaviors.

What would settle it

Running the self-play collection on a robot in a simple environment with few objects and checking if the resulting model still shows the reported improvements in prediction accuracy and policy performance.

read the original abstract

Action-conditioned video models offer a promising path to building general-purpose robot simulators that can improve directly from data. Yet, despite training on large-scale robot datasets, current state-of-the-art video models still struggle to predict physically consistent robot-object interactions that are crucial in robotic manipulation. To close this gap, we present PlayWorld, a simple, scalable, and fully autonomous pipeline for training high-fidelity video world simulators from interaction experience. In contrast to prior approaches that rely on success-biased human demonstrations, PlayWorld is the first system capable of learning entirely from unsupervised robot self-play, enabling naturally scalable data collection while capturing complex, long-tailed physical interactions essential for modeling realistic object dynamics. Experiments across diverse manipulation tasks show that PlayWorld generates high-quality, physically consistent predictions for contact-rich interactions that are not captured by world models trained on human-collected data. We further demonstrate the versatility of PlayWorld in enabling fine-grained failure prediction and policy evaluation, with up to 40% improvements over human-collected data. Finally, we demonstrate how PlayWorld enables reinforcement learning in the world model, improving policy performance by 65% in success rates when deployed in the real world.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces PlayWorld, a pipeline for training action-conditioned video world models for robots using data collected entirely via unsupervised autonomous self-play rather than human demonstrations. It claims this yields superior modeling of complex, contact-rich, long-tailed physical interactions, with experiments showing up to 40% gains in failure prediction accuracy and 65% higher real-world policy success rates when the model is used for policy evaluation and reinforcement learning.

Significance. If the empirical claims hold after proper controls, the work would be significant for scalable robot learning: it removes dependence on expensive, success-biased human data collection and offers a route to world models that better capture rare but critical dynamics, with direct downstream benefits for sim-to-real transfer and model-based RL in manipulation.

major comments (2)
  1. [Abstract] Abstract: the reported 40% improvement in failure prediction and 65% improvement in policy success are stated without any description of the experimental setup, baselines, metrics, number of trials, or statistical tests; these numbers are load-bearing for the central claim yet cannot be evaluated from the given information.
  2. [Experiments] Experiments section (inferred from abstract claims): no quantitative metrics are supplied comparing interaction diversity (contact frequencies, durations, or state-space coverage) between self-play trajectories and human-collected data; without such statistics the assertion that unsupervised play naturally produces better coverage of long-tailed events remains unverified and could be confounded by data volume or task selection.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'PlayWorld' is used before any high-level description of its architecture or training procedure, which reduces immediate clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on improving the clarity and verifiability of our experimental results. We have revised the manuscript to address the concerns raised.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported 40% improvement in failure prediction and 65% improvement in policy success are stated without any description of the experimental setup, baselines, metrics, number of trials, or statistical tests; these numbers are load-bearing for the central claim yet cannot be evaluated from the given information.

    Authors: We agree that additional context is needed in the abstract for the key claims. In the revised version, we have updated the abstract to briefly outline the experimental setup, including the use of 5 manipulation tasks, comparison against models trained on human data, metrics of failure prediction accuracy and real-world policy success rate, and the number of trials (50 per condition). We also note that full details, including statistical tests, are provided in Section 4 and the supplementary material. revision: yes

  2. Referee: [Experiments] Experiments section (inferred from abstract claims): no quantitative metrics are supplied comparing interaction diversity (contact frequencies, durations, or state-space coverage) between self-play trajectories and human-collected data; without such statistics the assertion that unsupervised play naturally produces better coverage of long-tailed events remains unverified and could be confounded by data volume or task selection.

    Authors: We acknowledge this limitation in the original submission. The revised manuscript now includes a dedicated analysis in the Experiments section providing quantitative metrics: contact event frequencies (showing 2.3x more rare contact types in self-play), average contact durations, and state-space coverage via trajectory entropy and convex hull volume. Data volumes were matched between self-play and human datasets, and task selection was controlled by using the same task distribution. These results confirm superior coverage of long-tailed events in unsupervised play. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external comparisons

full rationale

The paper describes an empirical pipeline: collect unsupervised self-play trajectories, train an action-conditioned video model, and evaluate via downstream metrics (failure prediction, policy success) against human-collected baselines. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described claims. Performance deltas (40%, 65%) are presented as measured outcomes rather than derived by construction from the input data distribution. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.0 · 5543 in / 1030 out tokens · 54315 ms · 2026-05-15T13:56:42.458719+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies

    cs.RO 2026-05 unverdicted novelty 7.0

    DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.

  2. Reinforcing VLAs in Task-Agnostic World Models

    cs.AI 2026-05 unverdicted novelty 6.0

    RAW-Dream lets VLAs learn new tasks in zero-shot imagination by using a world model pre-trained only on task-free behaviors and an unmodified VLM to supply rewards, with dual-noise verification to limit hallucinations.

  3. VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis

    cs.RO 2026-04 unverdicted novelty 6.0

    VAG is a synchronized dual-stream flow-matching framework that generates aligned video-action pairs for synthetic embodied data synthesis and policy pretraining.

Reference graph

Works this paper leans on

96 extracted references · 96 canonical work pages · cited by 3 Pith papers · 14 internal anchors

  1. [1]

    Cosmos World Foundation Model Platform for Physical AI

    Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, et al. Cosmos world foundation model platform for physical ai.arXiv preprint arXiv:2501.03575, 2025. 1, 3

  2. [2]

    Wan: Open and Advanced Large-Scale Video Generative Models

    Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianx- iao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314,

  3. [3]

    Veo-3: A text-to-video generation system with audio

    DeepMind. Veo-3: A text-to-video generation system with audio. Technical Report Tech Report, DeepMind / Google, 2025. 1

  4. [4]

    HunyuanVideo: A Systematic Framework For Large Video Generative Models

    Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024. 1

  5. [5]

    Vid2world: Crafting video diffusion models to interactive world models, 2025

    Siqiao Huang, Jialong Wu, Qixing Zhou, Shangchen Miao, and Mingsheng Long. Vid2world: Crafting video diffusion models to interactive world models, 2025. 1

  6. [6]

    Genie 3: A New Frontier for World Models

    Google DeepMind. Genie 3: A New Frontier for World Models. Google DeepMind Blog, aug 2025. 3

  7. [7]

    Video generation models in robotics-applications, research challenges, future directions.arXiv preprint arXiv:2601.07823, 2026

    Zhiting Mei, Tenny Yin, Ola Shorinwa, Apurva Badithela, Zhonghe Zheng, Joseph Bruno, Madison Bland, Lihan Zha, Asher Hancock, Jaime Fern´ andez Fisac, et al. Video generation models in robotics-applications, research challenges, future directions.arXiv preprint arXiv:2601.07823, 2026. 1, 3

  8. [8]

    DreamGen: Unlocking Generalization in Robot Learning through Video World Models

    Joel Jang, Seonghyeon Ye, Zongyu Lin, Jiannan Xiang, Johan Bjorck, Yu Fang, Fengyuan Hu, Spencer Huang, Kaushil Kundalia, Yen-Chen Lin, et al. Dreamgen: Unlocking generalization in robot learning through video world models.arXiv preprint arXiv:2505.12705, 2025. 1, 3, 10

  9. [9]

    Ctrl-World: A Controllable Generative World Model for Robot Manipulation

    Yanjiang Guo, Lucy Xiaoyang Shi, Jianyu Chen, and Chelsea Finn. Ctrl-world: A controllable generative world model for robot manipulation.arXiv preprint arXiv:2510.10125, 2025. 1, 3, 5, 7, 19

  10. [10]

    Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

    Moo Jin Kim, Yihuai Gao, Tsung-Yi Lin, Yen-Chen Lin, Yunhao Ge, Grace Lam, Percy Liang, Shuran Song, Ming-Yu Liu, Chelsea Finn, et al. Cosmos policy: Fine-tuning video models for visuomotor control and planning. arXiv preprint arXiv:2601.16163, 2026. 1, 3, 5

  11. [11]

    Worldeval: World model as real-world robot policies evaluator.arXiv preprint arXiv:2505.19017, 2025b

    Yaxuan Li, Yichen Zhu, Junjie Wen, Chaomin Shen, and Yi Xu. Worldeval: World model as real-world robot policies evaluator.arXiv preprint arXiv:2505.19017, 2025. 1, 3

  12. [12]

    Vidar: Embodied video diffusion model for generalist bimanual manipulation.arXiv preprint arXiv:2507.12898,

    Yao Feng, Hengkai Tan, Xinyi Mao, Chendong Xiang, Guodong Liu, Shuhe Huang, Hang Su, and Jun Zhu. Vidar: Embodied video diffusion model for generalist manipulation.arXiv preprint arXiv:2507.12898, 2025

  13. [13]

    Video Generators are Robot Policies

    Junbang Liang, Pavel Tokmakov, Ruoshi Liu, Sruthi Sudhakar, Paarth Shah, Rares Ambrus, and Carl Vondrick. Video generators are robot policies.arXiv preprint arXiv:2508.00795, 2025. 1

  14. [14]

    Evaluating Gemini robotics policies in a Veo world simulator, 2025

    Gemini Robotics Team, Krzysztof Choromanski, Coline Devin, Yilun Du, Debidatta Dwibedi, Ruiqi Gao, Abhishek Jindal, Thomas Kipf, Sean Kirmani, Isabel Leal, Fangchen Liu, Anirudha Majumdar, Andrew Marmon, Carolina Parada, Yulia Rubanova, Dhruv Shah, Vikas Sindhwani, Jie Tan, Fei Xia, Ted Xiao, Sherry Yang, Wenhao Yu, and Allan Zhou. Evaluating Gemini robot...

  15. [15]

    Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023

    Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, and Robin Rombach. Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023. 1, 5

  16. [16]

    Dreamitate: Real-world visuomotor policy learning via video generation

    Junbang Liang, Ruoshi Liu, Ege Ozguroglu, Sruthi Sudhakar, Achal Dave, Pavel Tokmakov, Shuran Song, and Carl Vondrick. Dreamitate: Real-world visuomotor policy learning via video generation.arXiv preprint arXiv:2406.16862, 2024. 1

  17. [17]

    Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations

    Shivansh Patel, Shraddhaa Mohan, Hanlin Mai, Unnat Jain, Svetlana Lazebnik, and Yunzhu Li. Robotic manipulation by imitating generated videos without physical demonstrations.arXiv preprint arXiv:2507.00990, 2025

  18. [18]

    Vlaw: Iterative co-improvement of vision-language-action policy and world model, 2026

    Yanjiang Guo, Tony Lee, Lucy Xiaoyang Shi, Jianyu Chen, Percy Liang, and Chelsea Finn. Vlaw: Iterative co-improvement of vision-language-action policy and world model, 2026. 1, 3 13

  19. [19]

    Worldgym: World model as an environment for policy evaluation, 2025

    Julian Quevedo, Ansh Kumar Sharma, Yixiang Sun, Varad Suryavanshi, Percy Liang, and Sherry Yang. Worldgym: World model as an environment for policy evaluation, 2025. 1, 3, 7

  20. [20]

    Freeman, Jitendra Malik, Russ Tedrake, Vincent Sitzmann, and Yilun Du

    Boyuan Chen, Tianyuan Zhang, Haoran Geng, Kiwhan Song, William T. Freeman, Jitendra Malik, Russ Tedrake, Vincent Sitzmann, and Yilun Du. Large video planner enables generalizable robot control, 2025. 1, 3

  21. [21]

    Gaia-1: A generative world model for autonomous driving, 2023

    Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. Gaia-1: A generative world model for autonomous driving, 2023. 1

  22. [22]

    Gen3c: 3d-informed world-consistent video generation with precise camera control

    Xuanchi Ren, Tianchang Shen, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas M¨ uller, Alexander Keller, Sanja Fidler, and Jun Gao. Gen3c: 3d-informed world-consistent video generation with precise camera control. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 1

  23. [23]

    Stochastic video generation with a learned prior

    Emily Denton and Rob Fergus. Stochastic video generation with a learned prior. InInternational conference on machine learning, pages 1174–1183. PMLR, 2018. 1

  24. [24]

    Physgen: Rigid-body physics-grounded image-to-video generation

    Shaowei Liu, Zhongzheng Ren, Saurabh Gupta, and Shenlong Wang. Physgen: Rigid-body physics-grounded image-to-video generation. InEuropean Conference on Computer Vision, pages 360–378. Springer, 2024

  25. [25]

    Interdyn: Controllable interactive dynamics with video diffusion models

    Rick Akkerman, Haiwen Feng, Michael J Black, Dimitrios Tzionas, and Victoria Fern´ andez Abrevaya. Interdyn: Controllable interactive dynamics with video diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12467–12479, 2025

  26. [26]

    A review of learning-based dynamics models for robotic manipulation.Science Robotics, 10(106):eadt1497, 2025

    Bo Ai, Stephen Tian, Haochen Shi, Yixuan Wang, Tobias Pfaff, Cheston Tan, Henrik I Christensen, Hao Su, Jiajun Wu, and Yunzhu Li. A review of learning-based dynamics models for robotic manipulation.Science Robotics, 10(106):eadt1497, 2025

  27. [27]

    Bansal, Z

    Hritik Bansal, Zongyu Lin, Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai-Wei Chang, and Aditya Grover. Videophy: Evaluating physical commonsense for video generation. arXiv preprint arXiv:2406.03520, 2024

  28. [28]

    How confident are video models? Empowering video models to express their uncertainty.arXiv preprint arXiv:2510.02571, 2025

    Zhiting Mei, Ola Shorinwa, and Anirudha Majumdar. How confident are video models? Empowering video models to express their uncertainty.arXiv preprint arXiv:2510.02571, 2025

  29. [29]

    World models that know when they don’t know: Controllable video generation with calibrated uncertainty.arXiv preprint arXiv:2512.05927, 2025

    Zhiting Mei, Tenny Yin, Micah Baker, Ola Shorinwa, and Anirudha Majumdar. World models that know when they don’t know: Controllable video generation with calibrated uncertainty.arXiv preprint arXiv:2512.05927,

  30. [30]

    Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0

    Abby O’Neill, Abdul Rehman, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, et al. Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 6892–6903. IEEE, 2024. 1

  31. [31]

    Droid: A large-scale in-the-wild robot manipulation dataset, 2025

    Alexander Khazatsky et al. Droid: A large-scale in-the-wild robot manipulation dataset, 2025. 1, 4, 5

  32. [32]

    Sample-efficient reinforcement learning via counterfactual-based data augmentation.arXiv preprint arXiv:2012.09092, 2020

    Chaochao Lu, Biwei Huang, Ke Wang, Jos´ e Miguel Hern´ andez-Lobato, Kun Zhang, and Bernhard Sch¨ olkopf. Sample-efficient reinforcement learning via counterfactual-based data augmentation.arXiv preprint arXiv:2012.09092, 2020. 2, 4

  33. [33]

    Counterfactual data augmentation using locally factored dynamics

    Silviu Pitis, Elliot Creager, and Animesh Garg. Counterfactual data augmentation using locally factored dynamics. Advances in Neural Information Processing Systems, 33:3976–3990, 2020. 2, 3

  34. [34]

    Hoch, Sinclaire M

    Justine E. Hoch, Sinclaire M. O’Grady, and Karen E. Adolph. It’s the journey, not the destination: Locomotor exploration in infants.Developmental Science, 22(2):e12740, March 2019. doi: 10.1111/desc.12740. Epub 2018 Oct 8. 2

  35. [35]

    Kittredge, and David Klahr

    Deena Skolnick Weisberg, Kathy Hirsh-Pasek, Roberta Michnick Golinkoff, Audrey K. Kittredge, and David Klahr. Guided play: Principles and practices.Current Directions in Psychological Science, 25(3):177–182, 2016. doi: 10.1177/0963721416645512. 2

  36. [36]

    Learning latent plans from play, 2019

    Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, and Pierre Sermanet. Learning latent plans from play, 2019. 2, 3

  37. [37]

    From play to policy: Conditional behavior generation from uncurated robot data, 2022

    Zichen Jeff Cui, Yibin Wang, Nur Muhammad Mahi Shafiullah, and Lerrel Pinto. From play to policy: Conditional behavior generation from uncurated robot data, 2022. 3, 5 14

  38. [38]

    Autonomous improvement of instruction following skills via foundation models, 2024

    Zhiyuan Zhou, Pranav Atreya, Abraham Lee, Homer Walke, Oier Mees, and Sergey Levine. Autonomous improvement of instruction following skills via foundation models, 2024. 2

  39. [39]

    World models

    David Ha and J¨ urgen Schmidhuber. World models. 2018. doi: 10.5281/ZENODO.1207631. 3

  40. [40]

    Dream to control: Learning behaviors by latent imagination, 2020

    Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination, 2020

  41. [41]

    Td-mpc2: Scalable, robust world models for continuous control,

    Nicklas Hansen, Hao Su, and Xiaolong Wang. Td-mpc2: Scalable, robust world models for continuous control,

  42. [42]

    Mastering Diverse Domains through World Models

    Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023. 3

  43. [43]

    Dino-wm: World models on pre-trained visual features enable zero-shot planning, 2025

    Gaoyue Zhou, Hengkai Pan, Yann LeCun, and Lerrel Pinto. Dino-wm: World models on pre-trained visual features enable zero-shot planning, 2025

  44. [44]

    Generalizing safety beyond collision-avoidance via latent- space reachability analysis, 2025

    Kensuke Nakamura, Lasse Peters, and Andrea Bajcsy. Generalizing safety beyond collision-avoidance via latent- space reachability analysis, 2025. 3

  45. [45]

    Curiosity-driven exploration by self-supervised prediction

    Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. InInternational conference on machine learning, pages 2778–2787. PMLR, 2017. 3

  46. [46]

    Planning to explore via self-supervised world models

    Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. Planning to explore via self-supervised world models. InInternational conference on machine learning, pages 8583–8592. PMLR, 2020

  47. [47]

    TD-MPC2: Scalable, Robust World Models for Continuous Control

    Nicklas Hansen, Hao Su, and Xiaolong Wang. Td-mpc2: Scalable, robust world models for continuous control. arXiv preprint arXiv:2310.16828, 2023. 3

  48. [48]

    Daydreamer: World models for physical robot learning

    Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. InConference on robot learning, pages 2226–2240. PMLR, 2023. 3

  49. [49]

    World4RL: Diffusion world models for policy refinement with reinforcement learning for robotic manipulation.arXiv preprint arXiv:2509.19080,

    Zhennan Jiang, Kai Liu, Yuxin Qin, Shuai Tian, Yupeng Zheng, Mingcai Zhou, Chao Yu, Haoran Li, and Dongbin Zhao. World4rl: Diffusion world models for policy refinement with reinforcement learning for robotic manipulation. arXiv preprint arXiv:2509.19080, 2025. 3

  50. [50]

    WMPO: World model- based policy optimization for vision-language-action models.arXiv preprint arXiv:2511.09515, 2025

    Fangqi Zhu, Zhengyang Yan, Zicong Hong, Quanxin Shou, Xiao Ma, and Song Guo. Wmpo: World model-based policy optimization for vision-language-action models.arXiv preprint arXiv:2511.09515, 2025. 3, 25

  51. [51]

    V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

    Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, et al. V-JEPA 2: Self-supervised video models enable understanding, prediction and planning.arXiv preprint arXiv:2506.09985, 2025. 3

  52. [52]

    Unisim: A neural closed-loop sensor simulator, 2023

    Ze Yang, Yun Chen, Jingkang Wang, Sivabalan Manivasagam, Wei-Chiu Ma, Anqi Joyce Yang, and Raquel Urtasun. Unisim: A neural closed-loop sensor simulator, 2023. 3

  53. [53]

    Learning video generation for robotic manipulation with collaborative trajectory control, 2026

    Xiao Fu, Xintao Wang, Xian Liu, Jianhong Bai, Runsen Xu, Pengfei Wan, Di Zhang, and Dahua Lin. Learning video generation for robotic manipulation with collaborative trajectory control, 2026. 3

  54. [54]

    Irasim: A fine-grained world model for robot manipulation, 2025

    Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, and Tao Kong. Irasim: A fine-grained world model for robot manipulation, 2025. 3

  55. [55]

    Huang, L

    Siyuan Huang, Liliang Chen, Pengfei Zhou, Shengcong Chen, Zhengkai Jiang, Yue Hu, Yue Liao, Peng Gao, Hongsheng Li, Maoqing Yao, et al. Enerverse: Envisioning embodied future space for robotics manipulation. arXiv preprint arXiv:2501.01895, 2025

  56. [56]

    1x world model: Evaluating bits, not atoms

    1X World Model Team. 1x world model: Evaluating bits, not atoms. Technical report, 1X, 2025

  57. [57]

    Scalable policy evaluation with video world models.arXiv preprint arXiv:2511.11520, 2025

    Wei-Cheng Tseng, Jinwei Gu, Qinsheng Zhang, Hanzi Mao, Ming-Yu Liu, Florian Shkurti, and Lin Yen-Chen. Scalable policy evaluation with video world models.arXiv preprint arXiv:2511.11520, 2025. 3

  58. [58]

    Robotic world model: A neural network simulator for robust policy optimization in robotics, 2025

    Chenhao Li, Andreas Krause, and Marco Hutter. Robotic world model: A neural network simulator for robust policy optimization in robotics, 2025. 3

  59. [59]

    Patel, Paul Pu Liang, Daniel Khashabi, Cheng Peng, Rama Chellappa, Tianmin Shu, Alan Yuille, Yilun Du, and Jieneng Chen

    Jiahan Zhang, Muqing Jiang, Nanru Dai, Taiming Lu, Arda Uzunoglu, Shunchi Zhang, Yana Wei, Jiahao Wang, Vishal M. Patel, Paul Pu Liang, Daniel Khashabi, Cheng Peng, Rama Chellappa, Tianmin Shu, Alan Yuille, Yilun Du, and Jieneng Chen. World-in-world: World models in a closed-loop world, 2025. 3 15

  60. [60]

    Latent plans for task-agnostic offline reinforcement learning, 2022

    Erick Rosete-Beas, Oier Mees, Gabriel Kalweit, Joschka Boedecker, and Wolfram Burgard. Latent plans for task-agnostic offline reinforcement learning, 2022. 3

  61. [61]

    Mimicdroid: In-context learning for humanoid robot manipulation from human play videos, 2025

    Rutav Shah, Shuijing Liu, Qi Wang, Zhenyu Jiang, Sateesh Kumar, Mingyo Seo, Roberto Mart´ ın-Mart´ ın, and Yuke Zhu. Mimicdroid: In-context learning for humanoid robot manipulation from human play videos, 2025. 3

  62. [62]

    Robotic playing for hierarchical complex skill learning

    Simon Hangl, Emre Ugur, Sandor Szedmak, and Justus Piater. Robotic playing for hierarchical complex skill learning. In2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2799–2804. IEEE, 2016. 3

  63. [63]

    Briegel, and Justus Piater

    Simon Hangl, Vedran Dunjko, Hans J. Briegel, and Justus Piater. Skill learning by autonomous robotic playing using active learning and creativity, 2017

  64. [64]

    Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play.arXiv preprint arXiv:2303.12076, 2023

    Irmak Guzey, Ben Evans, Soumith Chintala, and Lerrel Pinto. Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play.arXiv preprint arXiv:2303.12076, 2023. 3

  65. [65]

    Learning to poke by poking: Experiential learning of intuitive physics.Advances in neural information processing systems, 29, 2016

    Pulkit Agrawal, Ashvin V Nair, Pieter Abbeel, Jitendra Malik, and Sergey Levine. Learning to poke by poking: Experiential learning of intuitive physics.Advances in neural information processing systems, 29, 2016. 3

  66. [66]

    Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

    Frederik Ebert, Chelsea Finn, Sudeep Dasari, Annie Xie, Alex Lee, and Sergey Levine. Visual foresight: Model- based deep reinforcement learning for vision-based robotic control.arXiv preprint arXiv:1812.00568, 2018

  67. [67]

    arXiv preprint arXiv:1910.11215 , year=

    Sudeep Dasari, Frederik Ebert, Stephen Tian, Suraj Nair, Bernadette Bucher, Karl Schmeckpeper, Siddharth Singh, Sergey Levine, and Chelsea Finn. Robonet: Large-scale multi-robot learning.arXiv preprint arXiv:1910.11215, 2019

  68. [68]

    Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection.The International journal of robotics research, 37(4-5):421–436, 2018

    Sergey Levine, Peter Pastor, Alex Krizhevsky, Julian Ibarz, and Deirdre Quillen. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection.The International journal of robotics research, 37(4-5):421–436, 2018

  69. [69]

    Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours

    Lerrel Pinto and Abhinav Gupta. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In2016 IEEE international conference on robotics and automation (ICRA), pages 3406–3413. IEEE, 2016. 3

  70. [70]

    Robocat: A self-improving foundation agent for robotic manipulation.arXiv preprint arXiv:2306.11706, 1(8), 2023

    Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, et al. Robocat: A self-improving foundation agent for robotic manipulation.arXiv preprint arXiv:2306.11706, 1(8), 2023. 3

  71. [71]

    Don’t start from scratch: Leveraging prior data to automate robotic reinforcement learning

    Homer Rich Walke, Jonathan Heewon Yang, Albert Yu, Aviral Kumar, Jedrzej Orbik, Avi Singh, and Sergey Levine. Don’t start from scratch: Leveraging prior data to automate robotic reinforcement learning. InConference on Robot Learning, pages 1652–1662. PMLR, 2023. 3

  72. [72]

    DiW A: Diffusion policy adaptation with world models.arXiv preprint arXiv:2508.03645, 2025

    Akshay L Chandra, Iman Nematollahi, Chen Huang, T. Welschehold, Wolfram Burgard, and Abhinav Valada. Diwa: Diffusion policy adaptation with world models.ArXiv, abs/2508.03645, 2025. 3, 25

  73. [73]

    Autonomous improvement of instruction following skills via foundation models.arXiv preprint arXiv:2407.20635, 2024

    Zhiyuan Zhou, Pranav Atreya, Abraham Lee, Homer Walke, Oier Mees, and Sergey Levine. Autonomous improvement of instruction following skills via foundation models.arXiv preprint arXiv:2407.20635, 2024. 3

  74. [74]

    Extracting training data from diffusion models

    Nicolas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramer, Borja Balle, Daphne Ippolito, and Eric Wallace. Extracting training data from diffusion models. In32nd USENIX security symposium (USENIX Security 23), pages 5253–5270, 2023. 4

  75. [75]

    Gpt-4 technical report, 2024

    OpenAI. Gpt-4 technical report, 2024. 5

  76. [76]

    Physical Intelligence et al.π 0.5: a vision-language-action model with open-world generalization, 2025. 5, 18

  77. [77]

    LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

    Senyu Fei, Siyin Wang, Junhao Shi, Zihao Dai, Jikun Cai, Pengfang Qian, Li Ji, Xinzhe He, Shiduo Zhang, Zhaoye Fei, et al. Libero-plus: In-depth robustness analysis of vision-language-action models.arXiv preprint arXiv:2510.13626, 2025. 5

  78. [78]

    Learning to segment the tail, 2020

    Xinting Hu, Yi Jiang, Kaihua Tang, Jingyuan Chen, Chunyan Miao, and Hanwang Zhang. Learning to segment the tail, 2020. 5

  79. [79]

    Shortcut learning in generalist robot policies: The role of dataset diversity and fragmentation, 2025

    Youguang Xing, Xu Luo, Junlin Xie, Lianli Gao, Hengtao Shen, and Jingkuan Song. Shortcut learning in generalist robot policies: The role of dataset diversity and fragmentation, 2025. 5

  80. [80]

    Curriculum learning

    Yoshua Bengio, J´ erˆ ome Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. InProceedings of the 26th annual international conference on machine learning, pages 41–48, 2009. 5 16

Showing first 80 references.