Recognition: 2 theorem links
· Lean TheoremPlayWorld: Learning Robot World Models from Autonomous Play
Pith reviewed 2026-05-15 13:56 UTC · model grok-4.3
The pith
PlayWorld trains accurate robot world models solely from unsupervised self-play without human demonstrations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PlayWorld is the first pipeline that trains high-fidelity action-conditioned video models entirely from unsupervised robot self-play, capturing long-tailed physical interactions that improve failure prediction by up to 40 percent and real-world policy success by 65 percent compared to models trained on human data.
What carries the argument
The autonomous self-play data collection pipeline combined with action-conditioned video model training, which enables learning from diverse, contact-rich trajectories without task success signals.
Load-bearing premise
That unsupervised self-play by the robot will naturally produce a sufficient variety of complex physical interactions without any guidance toward task-relevant behaviors.
What would settle it
Running the self-play collection on a robot in a simple environment with few objects and checking if the resulting model still shows the reported improvements in prediction accuracy and policy performance.
read the original abstract
Action-conditioned video models offer a promising path to building general-purpose robot simulators that can improve directly from data. Yet, despite training on large-scale robot datasets, current state-of-the-art video models still struggle to predict physically consistent robot-object interactions that are crucial in robotic manipulation. To close this gap, we present PlayWorld, a simple, scalable, and fully autonomous pipeline for training high-fidelity video world simulators from interaction experience. In contrast to prior approaches that rely on success-biased human demonstrations, PlayWorld is the first system capable of learning entirely from unsupervised robot self-play, enabling naturally scalable data collection while capturing complex, long-tailed physical interactions essential for modeling realistic object dynamics. Experiments across diverse manipulation tasks show that PlayWorld generates high-quality, physically consistent predictions for contact-rich interactions that are not captured by world models trained on human-collected data. We further demonstrate the versatility of PlayWorld in enabling fine-grained failure prediction and policy evaluation, with up to 40% improvements over human-collected data. Finally, we demonstrate how PlayWorld enables reinforcement learning in the world model, improving policy performance by 65% in success rates when deployed in the real world.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PlayWorld, a pipeline for training action-conditioned video world models for robots using data collected entirely via unsupervised autonomous self-play rather than human demonstrations. It claims this yields superior modeling of complex, contact-rich, long-tailed physical interactions, with experiments showing up to 40% gains in failure prediction accuracy and 65% higher real-world policy success rates when the model is used for policy evaluation and reinforcement learning.
Significance. If the empirical claims hold after proper controls, the work would be significant for scalable robot learning: it removes dependence on expensive, success-biased human data collection and offers a route to world models that better capture rare but critical dynamics, with direct downstream benefits for sim-to-real transfer and model-based RL in manipulation.
major comments (2)
- [Abstract] Abstract: the reported 40% improvement in failure prediction and 65% improvement in policy success are stated without any description of the experimental setup, baselines, metrics, number of trials, or statistical tests; these numbers are load-bearing for the central claim yet cannot be evaluated from the given information.
- [Experiments] Experiments section (inferred from abstract claims): no quantitative metrics are supplied comparing interaction diversity (contact frequencies, durations, or state-space coverage) between self-play trajectories and human-collected data; without such statistics the assertion that unsupervised play naturally produces better coverage of long-tailed events remains unverified and could be confounded by data volume or task selection.
minor comments (1)
- [Abstract] Abstract: the phrase 'PlayWorld' is used before any high-level description of its architecture or training procedure, which reduces immediate clarity for readers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on improving the clarity and verifiability of our experimental results. We have revised the manuscript to address the concerns raised.
read point-by-point responses
-
Referee: [Abstract] Abstract: the reported 40% improvement in failure prediction and 65% improvement in policy success are stated without any description of the experimental setup, baselines, metrics, number of trials, or statistical tests; these numbers are load-bearing for the central claim yet cannot be evaluated from the given information.
Authors: We agree that additional context is needed in the abstract for the key claims. In the revised version, we have updated the abstract to briefly outline the experimental setup, including the use of 5 manipulation tasks, comparison against models trained on human data, metrics of failure prediction accuracy and real-world policy success rate, and the number of trials (50 per condition). We also note that full details, including statistical tests, are provided in Section 4 and the supplementary material. revision: yes
-
Referee: [Experiments] Experiments section (inferred from abstract claims): no quantitative metrics are supplied comparing interaction diversity (contact frequencies, durations, or state-space coverage) between self-play trajectories and human-collected data; without such statistics the assertion that unsupervised play naturally produces better coverage of long-tailed events remains unverified and could be confounded by data volume or task selection.
Authors: We acknowledge this limitation in the original submission. The revised manuscript now includes a dedicated analysis in the Experiments section providing quantitative metrics: contact event frequencies (showing 2.3x more rare contact types in self-play), average contact durations, and state-space coverage via trajectory entropy and convex hull volume. Data volumes were matched between self-play and human datasets, and task selection was controlled by using the same task distribution. These results confirm superior coverage of long-tailed events in unsupervised play. revision: yes
Circularity Check
No significant circularity; empirical claims rest on external comparisons
full rationale
The paper describes an empirical pipeline: collect unsupervised self-play trajectories, train an action-conditioned video model, and evaluate via downstream metrics (failure prediction, policy success) against human-collected baselines. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described claims. Performance deltas (40%, 65%) are presented as measured outcomes rather than derived by construction from the input data distribution. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
PlayWorld is the first system capable of learning entirely from unsupervised robot self-play, enabling naturally scalable data collection while capturing complex, long-tailed physical interactions
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_strictMono_of_one_lt unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
curriculum learning setup to feed training data into the model in order of (auto-rated) 'difficulty': initializing with frequently occurring free space motions and static contacts
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies
DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.
-
Reinforcing VLAs in Task-Agnostic World Models
RAW-Dream lets VLAs learn new tasks in zero-shot imagination by using a world model pre-trained only on task-free behaviors and an unmodified VLM to supply rewards, with dual-noise verification to limit hallucinations.
-
VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis
VAG is a synchronized dual-stream flow-matching framework that generates aligned video-action pairs for synthetic embodied data synthesis and policy pretraining.
Reference graph
Works this paper leans on
-
[1]
Cosmos World Foundation Model Platform for Physical AI
Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, et al. Cosmos world foundation model platform for physical ai.arXiv preprint arXiv:2501.03575, 2025. 1, 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Wan: Open and Advanced Large-Scale Video Generative Models
Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianx- iao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Veo-3: A text-to-video generation system with audio
DeepMind. Veo-3: A text-to-video generation system with audio. Technical Report Tech Report, DeepMind / Google, 2025. 1
work page 2025
-
[4]
HunyuanVideo: A Systematic Framework For Large Video Generative Models
Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024. 1
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[5]
Vid2world: Crafting video diffusion models to interactive world models, 2025
Siqiao Huang, Jialong Wu, Qixing Zhou, Shangchen Miao, and Mingsheng Long. Vid2world: Crafting video diffusion models to interactive world models, 2025. 1
work page 2025
-
[6]
Genie 3: A New Frontier for World Models
Google DeepMind. Genie 3: A New Frontier for World Models. Google DeepMind Blog, aug 2025. 3
work page 2025
-
[7]
Zhiting Mei, Tenny Yin, Ola Shorinwa, Apurva Badithela, Zhonghe Zheng, Joseph Bruno, Madison Bland, Lihan Zha, Asher Hancock, Jaime Fern´ andez Fisac, et al. Video generation models in robotics-applications, research challenges, future directions.arXiv preprint arXiv:2601.07823, 2026. 1, 3
-
[8]
DreamGen: Unlocking Generalization in Robot Learning through Video World Models
Joel Jang, Seonghyeon Ye, Zongyu Lin, Jiannan Xiang, Johan Bjorck, Yu Fang, Fengyuan Hu, Spencer Huang, Kaushil Kundalia, Yen-Chen Lin, et al. Dreamgen: Unlocking generalization in robot learning through video world models.arXiv preprint arXiv:2505.12705, 2025. 1, 3, 10
work page internal anchor Pith review arXiv 2025
-
[9]
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Yanjiang Guo, Lucy Xiaoyang Shi, Jianyu Chen, and Chelsea Finn. Ctrl-world: A controllable generative world model for robot manipulation.arXiv preprint arXiv:2510.10125, 2025. 1, 3, 5, 7, 19
work page internal anchor Pith review arXiv 2025
-
[10]
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
Moo Jin Kim, Yihuai Gao, Tsung-Yi Lin, Yen-Chen Lin, Yunhao Ge, Grace Lam, Percy Liang, Shuran Song, Ming-Yu Liu, Chelsea Finn, et al. Cosmos policy: Fine-tuning video models for visuomotor control and planning. arXiv preprint arXiv:2601.16163, 2026. 1, 3, 5
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[11]
Worldeval: World model as real-world robot policies evaluator.arXiv preprint arXiv:2505.19017, 2025b
Yaxuan Li, Yichen Zhu, Junjie Wen, Chaomin Shen, and Yi Xu. Worldeval: World model as real-world robot policies evaluator.arXiv preprint arXiv:2505.19017, 2025. 1, 3
-
[12]
Yao Feng, Hengkai Tan, Xinyi Mao, Chendong Xiang, Guodong Liu, Shuhe Huang, Hang Su, and Jun Zhu. Vidar: Embodied video diffusion model for generalist manipulation.arXiv preprint arXiv:2507.12898, 2025
-
[13]
Video Generators are Robot Policies
Junbang Liang, Pavel Tokmakov, Ruoshi Liu, Sruthi Sudhakar, Paarth Shah, Rares Ambrus, and Carl Vondrick. Video generators are robot policies.arXiv preprint arXiv:2508.00795, 2025. 1
work page internal anchor Pith review arXiv 2025
-
[14]
Evaluating Gemini robotics policies in a Veo world simulator, 2025
Gemini Robotics Team, Krzysztof Choromanski, Coline Devin, Yilun Du, Debidatta Dwibedi, Ruiqi Gao, Abhishek Jindal, Thomas Kipf, Sean Kirmani, Isabel Leal, Fangchen Liu, Anirudha Majumdar, Andrew Marmon, Carolina Parada, Yulia Rubanova, Dhruv Shah, Vikas Sindhwani, Jie Tan, Fei Xia, Ted Xiao, Sherry Yang, Wenhao Yu, and Allan Zhou. Evaluating Gemini robot...
work page 2025
-
[15]
Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023
Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, and Robin Rombach. Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023. 1, 5
work page 2023
-
[16]
Dreamitate: Real-world visuomotor policy learning via video generation
Junbang Liang, Ruoshi Liu, Ege Ozguroglu, Sruthi Sudhakar, Achal Dave, Pavel Tokmakov, Shuran Song, and Carl Vondrick. Dreamitate: Real-world visuomotor policy learning via video generation.arXiv preprint arXiv:2406.16862, 2024. 1
-
[17]
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
Shivansh Patel, Shraddhaa Mohan, Hanlin Mai, Unnat Jain, Svetlana Lazebnik, and Yunzhu Li. Robotic manipulation by imitating generated videos without physical demonstrations.arXiv preprint arXiv:2507.00990, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
Vlaw: Iterative co-improvement of vision-language-action policy and world model, 2026
Yanjiang Guo, Tony Lee, Lucy Xiaoyang Shi, Jianyu Chen, Percy Liang, and Chelsea Finn. Vlaw: Iterative co-improvement of vision-language-action policy and world model, 2026. 1, 3 13
work page 2026
-
[19]
Worldgym: World model as an environment for policy evaluation, 2025
Julian Quevedo, Ansh Kumar Sharma, Yixiang Sun, Varad Suryavanshi, Percy Liang, and Sherry Yang. Worldgym: World model as an environment for policy evaluation, 2025. 1, 3, 7
work page 2025
-
[20]
Freeman, Jitendra Malik, Russ Tedrake, Vincent Sitzmann, and Yilun Du
Boyuan Chen, Tianyuan Zhang, Haoran Geng, Kiwhan Song, William T. Freeman, Jitendra Malik, Russ Tedrake, Vincent Sitzmann, and Yilun Du. Large video planner enables generalizable robot control, 2025. 1, 3
work page 2025
-
[21]
Gaia-1: A generative world model for autonomous driving, 2023
Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. Gaia-1: A generative world model for autonomous driving, 2023. 1
work page 2023
-
[22]
Gen3c: 3d-informed world-consistent video generation with precise camera control
Xuanchi Ren, Tianchang Shen, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas M¨ uller, Alexander Keller, Sanja Fidler, and Jun Gao. Gen3c: 3d-informed world-consistent video generation with precise camera control. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 1
work page 2025
-
[23]
Stochastic video generation with a learned prior
Emily Denton and Rob Fergus. Stochastic video generation with a learned prior. InInternational conference on machine learning, pages 1174–1183. PMLR, 2018. 1
work page 2018
-
[24]
Physgen: Rigid-body physics-grounded image-to-video generation
Shaowei Liu, Zhongzheng Ren, Saurabh Gupta, and Shenlong Wang. Physgen: Rigid-body physics-grounded image-to-video generation. InEuropean Conference on Computer Vision, pages 360–378. Springer, 2024
work page 2024
-
[25]
Interdyn: Controllable interactive dynamics with video diffusion models
Rick Akkerman, Haiwen Feng, Michael J Black, Dimitrios Tzionas, and Victoria Fern´ andez Abrevaya. Interdyn: Controllable interactive dynamics with video diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12467–12479, 2025
work page 2025
-
[26]
Bo Ai, Stephen Tian, Haochen Shi, Yixuan Wang, Tobias Pfaff, Cheston Tan, Henrik I Christensen, Hao Su, Jiajun Wu, and Yunzhu Li. A review of learning-based dynamics models for robotic manipulation.Science Robotics, 10(106):eadt1497, 2025
work page 2025
- [27]
-
[28]
Zhiting Mei, Ola Shorinwa, and Anirudha Majumdar. How confident are video models? Empowering video models to express their uncertainty.arXiv preprint arXiv:2510.02571, 2025
-
[29]
Zhiting Mei, Tenny Yin, Micah Baker, Ola Shorinwa, and Anirudha Majumdar. World models that know when they don’t know: Controllable video generation with calibrated uncertainty.arXiv preprint arXiv:2512.05927,
-
[30]
Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0
Abby O’Neill, Abdul Rehman, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, et al. Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 6892–6903. IEEE, 2024. 1
work page 2024
-
[31]
Droid: A large-scale in-the-wild robot manipulation dataset, 2025
Alexander Khazatsky et al. Droid: A large-scale in-the-wild robot manipulation dataset, 2025. 1, 4, 5
work page 2025
-
[32]
Chaochao Lu, Biwei Huang, Ke Wang, Jos´ e Miguel Hern´ andez-Lobato, Kun Zhang, and Bernhard Sch¨ olkopf. Sample-efficient reinforcement learning via counterfactual-based data augmentation.arXiv preprint arXiv:2012.09092, 2020. 2, 4
-
[33]
Counterfactual data augmentation using locally factored dynamics
Silviu Pitis, Elliot Creager, and Animesh Garg. Counterfactual data augmentation using locally factored dynamics. Advances in Neural Information Processing Systems, 33:3976–3990, 2020. 2, 3
work page 2020
-
[34]
Justine E. Hoch, Sinclaire M. O’Grady, and Karen E. Adolph. It’s the journey, not the destination: Locomotor exploration in infants.Developmental Science, 22(2):e12740, March 2019. doi: 10.1111/desc.12740. Epub 2018 Oct 8. 2
-
[35]
Deena Skolnick Weisberg, Kathy Hirsh-Pasek, Roberta Michnick Golinkoff, Audrey K. Kittredge, and David Klahr. Guided play: Principles and practices.Current Directions in Psychological Science, 25(3):177–182, 2016. doi: 10.1177/0963721416645512. 2
-
[36]
Learning latent plans from play, 2019
Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, and Pierre Sermanet. Learning latent plans from play, 2019. 2, 3
work page 2019
-
[37]
From play to policy: Conditional behavior generation from uncurated robot data, 2022
Zichen Jeff Cui, Yibin Wang, Nur Muhammad Mahi Shafiullah, and Lerrel Pinto. From play to policy: Conditional behavior generation from uncurated robot data, 2022. 3, 5 14
work page 2022
-
[38]
Autonomous improvement of instruction following skills via foundation models, 2024
Zhiyuan Zhou, Pranav Atreya, Abraham Lee, Homer Walke, Oier Mees, and Sergey Levine. Autonomous improvement of instruction following skills via foundation models, 2024. 2
work page 2024
-
[39]
David Ha and J¨ urgen Schmidhuber. World models. 2018. doi: 10.5281/ZENODO.1207631. 3
-
[40]
Dream to control: Learning behaviors by latent imagination, 2020
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination, 2020
work page 2020
-
[41]
Td-mpc2: Scalable, robust world models for continuous control,
Nicklas Hansen, Hao Su, and Xiaolong Wang. Td-mpc2: Scalable, robust world models for continuous control,
-
[42]
Mastering Diverse Domains through World Models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[43]
Dino-wm: World models on pre-trained visual features enable zero-shot planning, 2025
Gaoyue Zhou, Hengkai Pan, Yann LeCun, and Lerrel Pinto. Dino-wm: World models on pre-trained visual features enable zero-shot planning, 2025
work page 2025
-
[44]
Generalizing safety beyond collision-avoidance via latent- space reachability analysis, 2025
Kensuke Nakamura, Lasse Peters, and Andrea Bajcsy. Generalizing safety beyond collision-avoidance via latent- space reachability analysis, 2025. 3
work page 2025
-
[45]
Curiosity-driven exploration by self-supervised prediction
Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. InInternational conference on machine learning, pages 2778–2787. PMLR, 2017. 3
work page 2017
-
[46]
Planning to explore via self-supervised world models
Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. Planning to explore via self-supervised world models. InInternational conference on machine learning, pages 8583–8592. PMLR, 2020
work page 2020
-
[47]
TD-MPC2: Scalable, Robust World Models for Continuous Control
Nicklas Hansen, Hao Su, and Xiaolong Wang. Td-mpc2: Scalable, robust world models for continuous control. arXiv preprint arXiv:2310.16828, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[48]
Daydreamer: World models for physical robot learning
Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. InConference on robot learning, pages 2226–2240. PMLR, 2023. 3
work page 2023
-
[49]
Zhennan Jiang, Kai Liu, Yuxin Qin, Shuai Tian, Yupeng Zheng, Mingcai Zhou, Chao Yu, Haoran Li, and Dongbin Zhao. World4rl: Diffusion world models for policy refinement with reinforcement learning for robotic manipulation. arXiv preprint arXiv:2509.19080, 2025. 3
-
[50]
Fangqi Zhu, Zhengyang Yan, Zicong Hong, Quanxin Shou, Xiao Ma, and Song Guo. Wmpo: World model-based policy optimization for vision-language-action models.arXiv preprint arXiv:2511.09515, 2025. 3, 25
-
[51]
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, et al. V-JEPA 2: Self-supervised video models enable understanding, prediction and planning.arXiv preprint arXiv:2506.09985, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[52]
Unisim: A neural closed-loop sensor simulator, 2023
Ze Yang, Yun Chen, Jingkang Wang, Sivabalan Manivasagam, Wei-Chiu Ma, Anqi Joyce Yang, and Raquel Urtasun. Unisim: A neural closed-loop sensor simulator, 2023. 3
work page 2023
-
[53]
Learning video generation for robotic manipulation with collaborative trajectory control, 2026
Xiao Fu, Xintao Wang, Xian Liu, Jianhong Bai, Runsen Xu, Pengfei Wan, Di Zhang, and Dahua Lin. Learning video generation for robotic manipulation with collaborative trajectory control, 2026. 3
work page 2026
-
[54]
Irasim: A fine-grained world model for robot manipulation, 2025
Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, and Tao Kong. Irasim: A fine-grained world model for robot manipulation, 2025. 3
work page 2025
- [55]
-
[56]
1x world model: Evaluating bits, not atoms
1X World Model Team. 1x world model: Evaluating bits, not atoms. Technical report, 1X, 2025
work page 2025
-
[57]
Scalable policy evaluation with video world models.arXiv preprint arXiv:2511.11520, 2025
Wei-Cheng Tseng, Jinwei Gu, Qinsheng Zhang, Hanzi Mao, Ming-Yu Liu, Florian Shkurti, and Lin Yen-Chen. Scalable policy evaluation with video world models.arXiv preprint arXiv:2511.11520, 2025. 3
-
[58]
Robotic world model: A neural network simulator for robust policy optimization in robotics, 2025
Chenhao Li, Andreas Krause, and Marco Hutter. Robotic world model: A neural network simulator for robust policy optimization in robotics, 2025. 3
work page 2025
-
[59]
Jiahan Zhang, Muqing Jiang, Nanru Dai, Taiming Lu, Arda Uzunoglu, Shunchi Zhang, Yana Wei, Jiahao Wang, Vishal M. Patel, Paul Pu Liang, Daniel Khashabi, Cheng Peng, Rama Chellappa, Tianmin Shu, Alan Yuille, Yilun Du, and Jieneng Chen. World-in-world: World models in a closed-loop world, 2025. 3 15
work page 2025
-
[60]
Latent plans for task-agnostic offline reinforcement learning, 2022
Erick Rosete-Beas, Oier Mees, Gabriel Kalweit, Joschka Boedecker, and Wolfram Burgard. Latent plans for task-agnostic offline reinforcement learning, 2022. 3
work page 2022
-
[61]
Mimicdroid: In-context learning for humanoid robot manipulation from human play videos, 2025
Rutav Shah, Shuijing Liu, Qi Wang, Zhenyu Jiang, Sateesh Kumar, Mingyo Seo, Roberto Mart´ ın-Mart´ ın, and Yuke Zhu. Mimicdroid: In-context learning for humanoid robot manipulation from human play videos, 2025. 3
work page 2025
-
[62]
Robotic playing for hierarchical complex skill learning
Simon Hangl, Emre Ugur, Sandor Szedmak, and Justus Piater. Robotic playing for hierarchical complex skill learning. In2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2799–2804. IEEE, 2016. 3
work page 2016
-
[63]
Simon Hangl, Vedran Dunjko, Hans J. Briegel, and Justus Piater. Skill learning by autonomous robotic playing using active learning and creativity, 2017
work page 2017
-
[64]
Irmak Guzey, Ben Evans, Soumith Chintala, and Lerrel Pinto. Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play.arXiv preprint arXiv:2303.12076, 2023. 3
-
[65]
Pulkit Agrawal, Ashvin V Nair, Pieter Abbeel, Jitendra Malik, and Sergey Levine. Learning to poke by poking: Experiential learning of intuitive physics.Advances in neural information processing systems, 29, 2016. 3
work page 2016
-
[66]
Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control
Frederik Ebert, Chelsea Finn, Sudeep Dasari, Annie Xie, Alex Lee, and Sergey Levine. Visual foresight: Model- based deep reinforcement learning for vision-based robotic control.arXiv preprint arXiv:1812.00568, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[67]
arXiv preprint arXiv:1910.11215 , year=
Sudeep Dasari, Frederik Ebert, Stephen Tian, Suraj Nair, Bernadette Bucher, Karl Schmeckpeper, Siddharth Singh, Sergey Levine, and Chelsea Finn. Robonet: Large-scale multi-robot learning.arXiv preprint arXiv:1910.11215, 2019
-
[68]
Sergey Levine, Peter Pastor, Alex Krizhevsky, Julian Ibarz, and Deirdre Quillen. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection.The International journal of robotics research, 37(4-5):421–436, 2018
work page 2018
-
[69]
Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours
Lerrel Pinto and Abhinav Gupta. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In2016 IEEE international conference on robotics and automation (ICRA), pages 3406–3413. IEEE, 2016. 3
work page 2016
-
[70]
Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, et al. Robocat: A self-improving foundation agent for robotic manipulation.arXiv preprint arXiv:2306.11706, 1(8), 2023. 3
-
[71]
Don’t start from scratch: Leveraging prior data to automate robotic reinforcement learning
Homer Rich Walke, Jonathan Heewon Yang, Albert Yu, Aviral Kumar, Jedrzej Orbik, Avi Singh, and Sergey Levine. Don’t start from scratch: Leveraging prior data to automate robotic reinforcement learning. InConference on Robot Learning, pages 1652–1662. PMLR, 2023. 3
work page 2023
-
[72]
DiW A: Diffusion policy adaptation with world models.arXiv preprint arXiv:2508.03645, 2025
Akshay L Chandra, Iman Nematollahi, Chen Huang, T. Welschehold, Wolfram Burgard, and Abhinav Valada. Diwa: Diffusion policy adaptation with world models.ArXiv, abs/2508.03645, 2025. 3, 25
-
[73]
Zhiyuan Zhou, Pranav Atreya, Abraham Lee, Homer Walke, Oier Mees, and Sergey Levine. Autonomous improvement of instruction following skills via foundation models.arXiv preprint arXiv:2407.20635, 2024. 3
-
[74]
Extracting training data from diffusion models
Nicolas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramer, Borja Balle, Daphne Ippolito, and Eric Wallace. Extracting training data from diffusion models. In32nd USENIX security symposium (USENIX Security 23), pages 5253–5270, 2023. 4
work page 2023
- [75]
-
[76]
Physical Intelligence et al.π 0.5: a vision-language-action model with open-world generalization, 2025. 5, 18
work page 2025
-
[77]
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Senyu Fei, Siyin Wang, Junhao Shi, Zihao Dai, Jikun Cai, Pengfang Qian, Li Ji, Xinzhe He, Shiduo Zhang, Zhaoye Fei, et al. Libero-plus: In-depth robustness analysis of vision-language-action models.arXiv preprint arXiv:2510.13626, 2025. 5
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[78]
Learning to segment the tail, 2020
Xinting Hu, Yi Jiang, Kaihua Tang, Jingyuan Chen, Chunyan Miao, and Hanwang Zhang. Learning to segment the tail, 2020. 5
work page 2020
-
[79]
Youguang Xing, Xu Luo, Junlin Xie, Lianli Gao, Hengtao Shen, and Jingkuan Song. Shortcut learning in generalist robot policies: The role of dataset diversity and fragmentation, 2025. 5
work page 2025
-
[80]
Yoshua Bengio, J´ erˆ ome Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. InProceedings of the 26th annual international conference on machine learning, pages 41–48, 2009. 5 16
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.