arxiv: 2603.09030 · v3 · submitted 2026-03-09 · 💻 cs.RO · cs.AI

Recognition: 2 theorem links

· Lean Theorem

PlayWorld: Learning Robot World Models from Autonomous Play

Tenny Yin , Zhiting Mei , Zhonghe Zheng , Miyu Yamane , David Wang , Jade Sceats , Samuel M. Bateman , Lihan Zha

show 3 more authors

Apurva Badithela Ola Shorinwa Anirudha Majumdar

Authors on Pith no claims yet

Pith reviewed 2026-05-15 13:56 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords robot learningworld modelsself-playvideo predictionmanipulationunsupervised learningphysical simulation

0 comments

The pith

PlayWorld trains accurate robot world models solely from unsupervised self-play without human demonstrations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PlayWorld as a method to learn video-based world models for robots using only data from autonomous robot play sessions. This approach avoids reliance on human-collected demonstrations that are biased toward successful outcomes. By generating its own diverse interaction data, the system captures rare physical events like complex object contacts. Experiments demonstrate that models trained this way make better predictions for manipulation tasks and support more effective policy learning in simulation that transfers to reality.

Core claim

PlayWorld is the first pipeline that trains high-fidelity action-conditioned video models entirely from unsupervised robot self-play, capturing long-tailed physical interactions that improve failure prediction by up to 40 percent and real-world policy success by 65 percent compared to models trained on human data.

What carries the argument

The autonomous self-play data collection pipeline combined with action-conditioned video model training, which enables learning from diverse, contact-rich trajectories without task success signals.

Load-bearing premise

That unsupervised self-play by the robot will naturally produce a sufficient variety of complex physical interactions without any guidance toward task-relevant behaviors.

What would settle it

Running the self-play collection on a robot in a simple environment with few objects and checking if the resulting model still shows the reported improvements in prediction accuracy and policy performance.

read the original abstract

Action-conditioned video models offer a promising path to building general-purpose robot simulators that can improve directly from data. Yet, despite training on large-scale robot datasets, current state-of-the-art video models still struggle to predict physically consistent robot-object interactions that are crucial in robotic manipulation. To close this gap, we present PlayWorld, a simple, scalable, and fully autonomous pipeline for training high-fidelity video world simulators from interaction experience. In contrast to prior approaches that rely on success-biased human demonstrations, PlayWorld is the first system capable of learning entirely from unsupervised robot self-play, enabling naturally scalable data collection while capturing complex, long-tailed physical interactions essential for modeling realistic object dynamics. Experiments across diverse manipulation tasks show that PlayWorld generates high-quality, physically consistent predictions for contact-rich interactions that are not captured by world models trained on human-collected data. We further demonstrate the versatility of PlayWorld in enabling fine-grained failure prediction and policy evaluation, with up to 40% improvements over human-collected data. Finally, we demonstrate how PlayWorld enables reinforcement learning in the world model, improving policy performance by 65% in success rates when deployed in the real world.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces PlayWorld, a pipeline for training action-conditioned video world models for robots using data collected entirely via unsupervised autonomous self-play rather than human demonstrations. It claims this yields superior modeling of complex, contact-rich, long-tailed physical interactions, with experiments showing up to 40% gains in failure prediction accuracy and 65% higher real-world policy success rates when the model is used for policy evaluation and reinforcement learning.

Significance. If the empirical claims hold after proper controls, the work would be significant for scalable robot learning: it removes dependence on expensive, success-biased human data collection and offers a route to world models that better capture rare but critical dynamics, with direct downstream benefits for sim-to-real transfer and model-based RL in manipulation.

major comments (2)

[Abstract] Abstract: the reported 40% improvement in failure prediction and 65% improvement in policy success are stated without any description of the experimental setup, baselines, metrics, number of trials, or statistical tests; these numbers are load-bearing for the central claim yet cannot be evaluated from the given information.
[Experiments] Experiments section (inferred from abstract claims): no quantitative metrics are supplied comparing interaction diversity (contact frequencies, durations, or state-space coverage) between self-play trajectories and human-collected data; without such statistics the assertion that unsupervised play naturally produces better coverage of long-tailed events remains unverified and could be confounded by data volume or task selection.

minor comments (1)

[Abstract] Abstract: the phrase 'PlayWorld' is used before any high-level description of its architecture or training procedure, which reduces immediate clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on improving the clarity and verifiability of our experimental results. We have revised the manuscript to address the concerns raised.

read point-by-point responses

Referee: [Abstract] Abstract: the reported 40% improvement in failure prediction and 65% improvement in policy success are stated without any description of the experimental setup, baselines, metrics, number of trials, or statistical tests; these numbers are load-bearing for the central claim yet cannot be evaluated from the given information.

Authors: We agree that additional context is needed in the abstract for the key claims. In the revised version, we have updated the abstract to briefly outline the experimental setup, including the use of 5 manipulation tasks, comparison against models trained on human data, metrics of failure prediction accuracy and real-world policy success rate, and the number of trials (50 per condition). We also note that full details, including statistical tests, are provided in Section 4 and the supplementary material. revision: yes
Referee: [Experiments] Experiments section (inferred from abstract claims): no quantitative metrics are supplied comparing interaction diversity (contact frequencies, durations, or state-space coverage) between self-play trajectories and human-collected data; without such statistics the assertion that unsupervised play naturally produces better coverage of long-tailed events remains unverified and could be confounded by data volume or task selection.

Authors: We acknowledge this limitation in the original submission. The revised manuscript now includes a dedicated analysis in the Experiments section providing quantitative metrics: contact event frequencies (showing 2.3x more rare contact types in self-play), average contact durations, and state-space coverage via trajectory entropy and convex hull volume. Data volumes were matched between self-play and human datasets, and task selection was controlled by using the same task distribution. These results confirm superior coverage of long-tailed events in unsupervised play. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external comparisons

full rationale

The paper describes an empirical pipeline: collect unsupervised self-play trajectories, train an action-conditioned video model, and evaluate via downstream metrics (failure prediction, policy success) against human-collected baselines. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described claims. Performance deltas (40%, 65%) are presented as measured outcomes rather than derived by construction from the input data distribution. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.0 · 5543 in / 1030 out tokens · 54315 ms · 2026-05-15T13:56:42.458719+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PlayWorld is the first system capable of learning entirely from unsupervised robot self-play, enabling naturally scalable data collection while capturing complex, long-tailed physical interactions
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

curriculum learning setup to feed training data into the model in order of (auto-rated) 'difficulty': initializing with frequently occurring free space motions and static contacts

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies
cs.RO 2026-05 unverdicted novelty 7.0

DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.
Reinforcing VLAs in Task-Agnostic World Models
cs.AI 2026-05 unverdicted novelty 6.0

RAW-Dream lets VLAs learn new tasks in zero-shot imagination by using a world model pre-trained only on task-free behaviors and an unmodified VLM to supply rewards, with dual-noise verification to limit hallucinations.
VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis
cs.RO 2026-04 unverdicted novelty 6.0

VAG is a synchronized dual-stream flow-matching framework that generates aligned video-action pairs for synthetic embodied data synthesis and policy pretraining.

Reference graph

Works this paper leans on

96 extracted references · 96 canonical work pages · cited by 3 Pith papers · 14 internal anchors

[1]

Cosmos World Foundation Model Platform for Physical AI

Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, et al. Cosmos world foundation model platform for physical ai.arXiv preprint arXiv:2501.03575, 2025. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Wan: Open and Advanced Large-Scale Video Generative Models

Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianx- iao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Veo-3: A text-to-video generation system with audio

DeepMind. Veo-3: A text-to-video generation system with audio. Technical Report Tech Report, DeepMind / Google, 2025. 1

work page 2025
[4]

HunyuanVideo: A Systematic Framework For Large Video Generative Models

Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024. 1

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

Vid2world: Crafting video diffusion models to interactive world models, 2025

Siqiao Huang, Jialong Wu, Qixing Zhou, Shangchen Miao, and Mingsheng Long. Vid2world: Crafting video diffusion models to interactive world models, 2025. 1

work page 2025
[6]

Genie 3: A New Frontier for World Models

Google DeepMind. Genie 3: A New Frontier for World Models. Google DeepMind Blog, aug 2025. 3

work page 2025
[7]

Video generation models in robotics-applications, research challenges, future directions.arXiv preprint arXiv:2601.07823, 2026

Zhiting Mei, Tenny Yin, Ola Shorinwa, Apurva Badithela, Zhonghe Zheng, Joseph Bruno, Madison Bland, Lihan Zha, Asher Hancock, Jaime Fern´ andez Fisac, et al. Video generation models in robotics-applications, research challenges, future directions.arXiv preprint arXiv:2601.07823, 2026. 1, 3

work page arXiv 2026
[8]

DreamGen: Unlocking Generalization in Robot Learning through Video World Models

Joel Jang, Seonghyeon Ye, Zongyu Lin, Jiannan Xiang, Johan Bjorck, Yu Fang, Fengyuan Hu, Spencer Huang, Kaushil Kundalia, Yen-Chen Lin, et al. Dreamgen: Unlocking generalization in robot learning through video world models.arXiv preprint arXiv:2505.12705, 2025. 1, 3, 10

work page internal anchor Pith review arXiv 2025
[9]

Ctrl-World: A Controllable Generative World Model for Robot Manipulation

Yanjiang Guo, Lucy Xiaoyang Shi, Jianyu Chen, and Chelsea Finn. Ctrl-world: A controllable generative world model for robot manipulation.arXiv preprint arXiv:2510.10125, 2025. 1, 3, 5, 7, 19

work page internal anchor Pith review arXiv 2025
[10]

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

Moo Jin Kim, Yihuai Gao, Tsung-Yi Lin, Yen-Chen Lin, Yunhao Ge, Grace Lam, Percy Liang, Shuran Song, Ming-Yu Liu, Chelsea Finn, et al. Cosmos policy: Fine-tuning video models for visuomotor control and planning. arXiv preprint arXiv:2601.16163, 2026. 1, 3, 5

work page internal anchor Pith review Pith/arXiv arXiv 2026
[11]

Worldeval: World model as real-world robot policies evaluator.arXiv preprint arXiv:2505.19017, 2025b

Yaxuan Li, Yichen Zhu, Junjie Wen, Chaomin Shen, and Yi Xu. Worldeval: World model as real-world robot policies evaluator.arXiv preprint arXiv:2505.19017, 2025. 1, 3

work page arXiv 2025
[12]

Vidar: Embodied video diffusion model for generalist bimanual manipulation.arXiv preprint arXiv:2507.12898,

Yao Feng, Hengkai Tan, Xinyi Mao, Chendong Xiang, Guodong Liu, Shuhe Huang, Hang Su, and Jun Zhu. Vidar: Embodied video diffusion model for generalist manipulation.arXiv preprint arXiv:2507.12898, 2025

work page arXiv 2025
[13]

Video Generators are Robot Policies

Junbang Liang, Pavel Tokmakov, Ruoshi Liu, Sruthi Sudhakar, Paarth Shah, Rares Ambrus, and Carl Vondrick. Video generators are robot policies.arXiv preprint arXiv:2508.00795, 2025. 1

work page internal anchor Pith review arXiv 2025
[14]

Evaluating Gemini robotics policies in a Veo world simulator, 2025

Gemini Robotics Team, Krzysztof Choromanski, Coline Devin, Yilun Du, Debidatta Dwibedi, Ruiqi Gao, Abhishek Jindal, Thomas Kipf, Sean Kirmani, Isabel Leal, Fangchen Liu, Anirudha Majumdar, Andrew Marmon, Carolina Parada, Yulia Rubanova, Dhruv Shah, Vikas Sindhwani, Jie Tan, Fei Xia, Ted Xiao, Sherry Yang, Wenhao Yu, and Allan Zhou. Evaluating Gemini robot...

work page 2025
[15]

Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, and Robin Rombach. Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023. 1, 5

work page 2023
[16]

Dreamitate: Real-world visuomotor policy learning via video generation

Junbang Liang, Ruoshi Liu, Ege Ozguroglu, Sruthi Sudhakar, Achal Dave, Pavel Tokmakov, Shuran Song, and Carl Vondrick. Dreamitate: Real-world visuomotor policy learning via video generation.arXiv preprint arXiv:2406.16862, 2024. 1

work page arXiv 2024
[17]

Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations

Shivansh Patel, Shraddhaa Mohan, Hanlin Mai, Unnat Jain, Svetlana Lazebnik, and Yunzhu Li. Robotic manipulation by imitating generated videos without physical demonstrations.arXiv preprint arXiv:2507.00990, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

Vlaw: Iterative co-improvement of vision-language-action policy and world model, 2026

Yanjiang Guo, Tony Lee, Lucy Xiaoyang Shi, Jianyu Chen, Percy Liang, and Chelsea Finn. Vlaw: Iterative co-improvement of vision-language-action policy and world model, 2026. 1, 3 13

work page 2026
[19]

Worldgym: World model as an environment for policy evaluation, 2025

Julian Quevedo, Ansh Kumar Sharma, Yixiang Sun, Varad Suryavanshi, Percy Liang, and Sherry Yang. Worldgym: World model as an environment for policy evaluation, 2025. 1, 3, 7

work page 2025
[20]

Freeman, Jitendra Malik, Russ Tedrake, Vincent Sitzmann, and Yilun Du

Boyuan Chen, Tianyuan Zhang, Haoran Geng, Kiwhan Song, William T. Freeman, Jitendra Malik, Russ Tedrake, Vincent Sitzmann, and Yilun Du. Large video planner enables generalizable robot control, 2025. 1, 3

work page 2025
[21]

Gaia-1: A generative world model for autonomous driving, 2023

Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. Gaia-1: A generative world model for autonomous driving, 2023. 1

work page 2023
[22]

Gen3c: 3d-informed world-consistent video generation with precise camera control

Xuanchi Ren, Tianchang Shen, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas M¨ uller, Alexander Keller, Sanja Fidler, and Jun Gao. Gen3c: 3d-informed world-consistent video generation with precise camera control. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 1

work page 2025
[23]

Stochastic video generation with a learned prior

Emily Denton and Rob Fergus. Stochastic video generation with a learned prior. InInternational conference on machine learning, pages 1174–1183. PMLR, 2018. 1

work page 2018
[24]

Physgen: Rigid-body physics-grounded image-to-video generation

Shaowei Liu, Zhongzheng Ren, Saurabh Gupta, and Shenlong Wang. Physgen: Rigid-body physics-grounded image-to-video generation. InEuropean Conference on Computer Vision, pages 360–378. Springer, 2024

work page 2024
[25]

Interdyn: Controllable interactive dynamics with video diffusion models

Rick Akkerman, Haiwen Feng, Michael J Black, Dimitrios Tzionas, and Victoria Fern´ andez Abrevaya. Interdyn: Controllable interactive dynamics with video diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12467–12479, 2025

work page 2025
[26]

A review of learning-based dynamics models for robotic manipulation.Science Robotics, 10(106):eadt1497, 2025

Bo Ai, Stephen Tian, Haochen Shi, Yixuan Wang, Tobias Pfaff, Cheston Tan, Henrik I Christensen, Hao Su, Jiajun Wu, and Yunzhu Li. A review of learning-based dynamics models for robotic manipulation.Science Robotics, 10(106):eadt1497, 2025

work page 2025
[27]

Bansal, Z

Hritik Bansal, Zongyu Lin, Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai-Wei Chang, and Aditya Grover. Videophy: Evaluating physical commonsense for video generation. arXiv preprint arXiv:2406.03520, 2024

work page arXiv 2024
[28]

How confident are video models? Empowering video models to express their uncertainty.arXiv preprint arXiv:2510.02571, 2025

Zhiting Mei, Ola Shorinwa, and Anirudha Majumdar. How confident are video models? Empowering video models to express their uncertainty.arXiv preprint arXiv:2510.02571, 2025

work page arXiv 2025
[29]

World models that know when they don’t know: Controllable video generation with calibrated uncertainty.arXiv preprint arXiv:2512.05927, 2025

Zhiting Mei, Tenny Yin, Micah Baker, Ola Shorinwa, and Anirudha Majumdar. World models that know when they don’t know: Controllable video generation with calibrated uncertainty.arXiv preprint arXiv:2512.05927,

work page arXiv
[30]

Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0

Abby O’Neill, Abdul Rehman, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, et al. Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 6892–6903. IEEE, 2024. 1

work page 2024
[31]

Droid: A large-scale in-the-wild robot manipulation dataset, 2025

Alexander Khazatsky et al. Droid: A large-scale in-the-wild robot manipulation dataset, 2025. 1, 4, 5

work page 2025
[32]

Sample-efficient reinforcement learning via counterfactual-based data augmentation.arXiv preprint arXiv:2012.09092, 2020

Chaochao Lu, Biwei Huang, Ke Wang, Jos´ e Miguel Hern´ andez-Lobato, Kun Zhang, and Bernhard Sch¨ olkopf. Sample-efficient reinforcement learning via counterfactual-based data augmentation.arXiv preprint arXiv:2012.09092, 2020. 2, 4

work page arXiv 2012
[33]

Counterfactual data augmentation using locally factored dynamics

Silviu Pitis, Elliot Creager, and Animesh Garg. Counterfactual data augmentation using locally factored dynamics. Advances in Neural Information Processing Systems, 33:3976–3990, 2020. 2, 3

work page 2020
[34]

Hoch, Sinclaire M

Justine E. Hoch, Sinclaire M. O’Grady, and Karen E. Adolph. It’s the journey, not the destination: Locomotor exploration in infants.Developmental Science, 22(2):e12740, March 2019. doi: 10.1111/desc.12740. Epub 2018 Oct 8. 2

work page doi:10.1111/desc.12740 2019
[35]

Kittredge, and David Klahr

Deena Skolnick Weisberg, Kathy Hirsh-Pasek, Roberta Michnick Golinkoff, Audrey K. Kittredge, and David Klahr. Guided play: Principles and practices.Current Directions in Psychological Science, 25(3):177–182, 2016. doi: 10.1177/0963721416645512. 2

work page doi:10.1177/0963721416645512 2016
[36]

Learning latent plans from play, 2019

Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, and Pierre Sermanet. Learning latent plans from play, 2019. 2, 3

work page 2019
[37]

From play to policy: Conditional behavior generation from uncurated robot data, 2022

Zichen Jeff Cui, Yibin Wang, Nur Muhammad Mahi Shafiullah, and Lerrel Pinto. From play to policy: Conditional behavior generation from uncurated robot data, 2022. 3, 5 14

work page 2022
[38]

Autonomous improvement of instruction following skills via foundation models, 2024

Zhiyuan Zhou, Pranav Atreya, Abraham Lee, Homer Walke, Oier Mees, and Sergey Levine. Autonomous improvement of instruction following skills via foundation models, 2024. 2

work page 2024
[39]

World models

David Ha and J¨ urgen Schmidhuber. World models. 2018. doi: 10.5281/ZENODO.1207631. 3

work page doi:10.5281/zenodo.1207631 2018
[40]

Dream to control: Learning behaviors by latent imagination, 2020

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination, 2020

work page 2020
[41]

Td-mpc2: Scalable, robust world models for continuous control,

Nicklas Hansen, Hao Su, and Xiaolong Wang. Td-mpc2: Scalable, robust world models for continuous control,

work page
[42]

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023. 3

work page internal anchor Pith review Pith/arXiv arXiv 2023
[43]

Dino-wm: World models on pre-trained visual features enable zero-shot planning, 2025

Gaoyue Zhou, Hengkai Pan, Yann LeCun, and Lerrel Pinto. Dino-wm: World models on pre-trained visual features enable zero-shot planning, 2025

work page 2025
[44]

Generalizing safety beyond collision-avoidance via latent- space reachability analysis, 2025

Kensuke Nakamura, Lasse Peters, and Andrea Bajcsy. Generalizing safety beyond collision-avoidance via latent- space reachability analysis, 2025. 3

work page 2025
[45]

Curiosity-driven exploration by self-supervised prediction

Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. InInternational conference on machine learning, pages 2778–2787. PMLR, 2017. 3

work page 2017
[46]

Planning to explore via self-supervised world models

Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. Planning to explore via self-supervised world models. InInternational conference on machine learning, pages 8583–8592. PMLR, 2020

work page 2020
[47]

TD-MPC2: Scalable, Robust World Models for Continuous Control

Nicklas Hansen, Hao Su, and Xiaolong Wang. Td-mpc2: Scalable, robust world models for continuous control. arXiv preprint arXiv:2310.16828, 2023. 3

work page internal anchor Pith review Pith/arXiv arXiv 2023
[48]

Daydreamer: World models for physical robot learning

Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. InConference on robot learning, pages 2226–2240. PMLR, 2023. 3

work page 2023
[49]

World4RL: Diffusion world models for policy refinement with reinforcement learning for robotic manipulation.arXiv preprint arXiv:2509.19080,

Zhennan Jiang, Kai Liu, Yuxin Qin, Shuai Tian, Yupeng Zheng, Mingcai Zhou, Chao Yu, Haoran Li, and Dongbin Zhao. World4rl: Diffusion world models for policy refinement with reinforcement learning for robotic manipulation. arXiv preprint arXiv:2509.19080, 2025. 3

work page arXiv 2025
[50]

WMPO: World model- based policy optimization for vision-language-action models.arXiv preprint arXiv:2511.09515, 2025

Fangqi Zhu, Zhengyang Yan, Zicong Hong, Quanxin Shou, Xiao Ma, and Song Guo. Wmpo: World model-based policy optimization for vision-language-action models.arXiv preprint arXiv:2511.09515, 2025. 3, 25

work page arXiv 2025
[51]

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, et al. V-JEPA 2: Self-supervised video models enable understanding, prediction and planning.arXiv preprint arXiv:2506.09985, 2025. 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[52]

Unisim: A neural closed-loop sensor simulator, 2023

Ze Yang, Yun Chen, Jingkang Wang, Sivabalan Manivasagam, Wei-Chiu Ma, Anqi Joyce Yang, and Raquel Urtasun. Unisim: A neural closed-loop sensor simulator, 2023. 3

work page 2023
[53]

Learning video generation for robotic manipulation with collaborative trajectory control, 2026

Xiao Fu, Xintao Wang, Xian Liu, Jianhong Bai, Runsen Xu, Pengfei Wan, Di Zhang, and Dahua Lin. Learning video generation for robotic manipulation with collaborative trajectory control, 2026. 3

work page 2026
[54]

Irasim: A fine-grained world model for robot manipulation, 2025

Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, and Tao Kong. Irasim: A fine-grained world model for robot manipulation, 2025. 3

work page 2025
[55]

Huang, L

Siyuan Huang, Liliang Chen, Pengfei Zhou, Shengcong Chen, Zhengkai Jiang, Yue Hu, Yue Liao, Peng Gao, Hongsheng Li, Maoqing Yao, et al. Enerverse: Envisioning embodied future space for robotics manipulation. arXiv preprint arXiv:2501.01895, 2025

work page arXiv 2025
[56]

1x world model: Evaluating bits, not atoms

1X World Model Team. 1x world model: Evaluating bits, not atoms. Technical report, 1X, 2025

work page 2025
[57]

Scalable policy evaluation with video world models.arXiv preprint arXiv:2511.11520, 2025

Wei-Cheng Tseng, Jinwei Gu, Qinsheng Zhang, Hanzi Mao, Ming-Yu Liu, Florian Shkurti, and Lin Yen-Chen. Scalable policy evaluation with video world models.arXiv preprint arXiv:2511.11520, 2025. 3

work page arXiv 2025
[58]

Robotic world model: A neural network simulator for robust policy optimization in robotics, 2025

Chenhao Li, Andreas Krause, and Marco Hutter. Robotic world model: A neural network simulator for robust policy optimization in robotics, 2025. 3

work page 2025
[59]

Patel, Paul Pu Liang, Daniel Khashabi, Cheng Peng, Rama Chellappa, Tianmin Shu, Alan Yuille, Yilun Du, and Jieneng Chen

Jiahan Zhang, Muqing Jiang, Nanru Dai, Taiming Lu, Arda Uzunoglu, Shunchi Zhang, Yana Wei, Jiahao Wang, Vishal M. Patel, Paul Pu Liang, Daniel Khashabi, Cheng Peng, Rama Chellappa, Tianmin Shu, Alan Yuille, Yilun Du, and Jieneng Chen. World-in-world: World models in a closed-loop world, 2025. 3 15

work page 2025
[60]

Latent plans for task-agnostic offline reinforcement learning, 2022

Erick Rosete-Beas, Oier Mees, Gabriel Kalweit, Joschka Boedecker, and Wolfram Burgard. Latent plans for task-agnostic offline reinforcement learning, 2022. 3

work page 2022
[61]

Mimicdroid: In-context learning for humanoid robot manipulation from human play videos, 2025

Rutav Shah, Shuijing Liu, Qi Wang, Zhenyu Jiang, Sateesh Kumar, Mingyo Seo, Roberto Mart´ ın-Mart´ ın, and Yuke Zhu. Mimicdroid: In-context learning for humanoid robot manipulation from human play videos, 2025. 3

work page 2025
[62]

Robotic playing for hierarchical complex skill learning

Simon Hangl, Emre Ugur, Sandor Szedmak, and Justus Piater. Robotic playing for hierarchical complex skill learning. In2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2799–2804. IEEE, 2016. 3

work page 2016
[63]

Briegel, and Justus Piater

Simon Hangl, Vedran Dunjko, Hans J. Briegel, and Justus Piater. Skill learning by autonomous robotic playing using active learning and creativity, 2017

work page 2017
[64]

Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play.arXiv preprint arXiv:2303.12076, 2023

Irmak Guzey, Ben Evans, Soumith Chintala, and Lerrel Pinto. Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play.arXiv preprint arXiv:2303.12076, 2023. 3

work page arXiv 2023
[65]

Learning to poke by poking: Experiential learning of intuitive physics.Advances in neural information processing systems, 29, 2016

Pulkit Agrawal, Ashvin V Nair, Pieter Abbeel, Jitendra Malik, and Sergey Levine. Learning to poke by poking: Experiential learning of intuitive physics.Advances in neural information processing systems, 29, 2016. 3

work page 2016
[66]

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

Frederik Ebert, Chelsea Finn, Sudeep Dasari, Annie Xie, Alex Lee, and Sergey Levine. Visual foresight: Model- based deep reinforcement learning for vision-based robotic control.arXiv preprint arXiv:1812.00568, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[67]

arXiv preprint arXiv:1910.11215 , year=

Sudeep Dasari, Frederik Ebert, Stephen Tian, Suraj Nair, Bernadette Bucher, Karl Schmeckpeper, Siddharth Singh, Sergey Levine, and Chelsea Finn. Robonet: Large-scale multi-robot learning.arXiv preprint arXiv:1910.11215, 2019

work page arXiv 1910
[68]

Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection.The International journal of robotics research, 37(4-5):421–436, 2018

Sergey Levine, Peter Pastor, Alex Krizhevsky, Julian Ibarz, and Deirdre Quillen. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection.The International journal of robotics research, 37(4-5):421–436, 2018

work page 2018
[69]

Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours

Lerrel Pinto and Abhinav Gupta. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In2016 IEEE international conference on robotics and automation (ICRA), pages 3406–3413. IEEE, 2016. 3

work page 2016
[70]

Robocat: A self-improving foundation agent for robotic manipulation.arXiv preprint arXiv:2306.11706, 1(8), 2023

Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, et al. Robocat: A self-improving foundation agent for robotic manipulation.arXiv preprint arXiv:2306.11706, 1(8), 2023. 3

work page arXiv 2023
[71]

Don’t start from scratch: Leveraging prior data to automate robotic reinforcement learning

Homer Rich Walke, Jonathan Heewon Yang, Albert Yu, Aviral Kumar, Jedrzej Orbik, Avi Singh, and Sergey Levine. Don’t start from scratch: Leveraging prior data to automate robotic reinforcement learning. InConference on Robot Learning, pages 1652–1662. PMLR, 2023. 3

work page 2023
[72]

DiW A: Diffusion policy adaptation with world models.arXiv preprint arXiv:2508.03645, 2025

Akshay L Chandra, Iman Nematollahi, Chen Huang, T. Welschehold, Wolfram Burgard, and Abhinav Valada. Diwa: Diffusion policy adaptation with world models.ArXiv, abs/2508.03645, 2025. 3, 25

work page arXiv 2025
[73]

Autonomous improvement of instruction following skills via foundation models.arXiv preprint arXiv:2407.20635, 2024

Zhiyuan Zhou, Pranav Atreya, Abraham Lee, Homer Walke, Oier Mees, and Sergey Levine. Autonomous improvement of instruction following skills via foundation models.arXiv preprint arXiv:2407.20635, 2024. 3

work page arXiv 2024
[74]

Extracting training data from diffusion models

Nicolas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramer, Borja Balle, Daphne Ippolito, and Eric Wallace. Extracting training data from diffusion models. In32nd USENIX security symposium (USENIX Security 23), pages 5253–5270, 2023. 4

work page 2023
[75]

Gpt-4 technical report, 2024

OpenAI. Gpt-4 technical report, 2024. 5

work page 2024
[76]

Physical Intelligence et al.π 0.5: a vision-language-action model with open-world generalization, 2025. 5, 18

work page 2025
[77]

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

Senyu Fei, Siyin Wang, Junhao Shi, Zihao Dai, Jikun Cai, Pengfang Qian, Li Ji, Xinzhe He, Shiduo Zhang, Zhaoye Fei, et al. Libero-plus: In-depth robustness analysis of vision-language-action models.arXiv preprint arXiv:2510.13626, 2025. 5

work page internal anchor Pith review Pith/arXiv arXiv 2025
[78]

Learning to segment the tail, 2020

Xinting Hu, Yi Jiang, Kaihua Tang, Jingyuan Chen, Chunyan Miao, and Hanwang Zhang. Learning to segment the tail, 2020. 5

work page 2020
[79]

Shortcut learning in generalist robot policies: The role of dataset diversity and fragmentation, 2025

Youguang Xing, Xu Luo, Junlin Xie, Lianli Gao, Hengtao Shen, and Jingkuan Song. Shortcut learning in generalist robot policies: The role of dataset diversity and fragmentation, 2025. 5

work page 2025
[80]

Curriculum learning

Yoshua Bengio, J´ erˆ ome Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. InProceedings of the 26th annual international conference on machine learning, pages 41–48, 2009. 5 16

work page 2009

Showing first 80 references.