Recognition: unknown
Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training
Pith reviewed 2026-05-09 21:23 UTC · model grok-4.3
The pith
Humans intervene inside a learned world model to correct failing robot rollouts and generate training data that transfers to physical gains.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
World models can function as reusable corrective substrates rather than only as imagination engines or evaluators. By letting humans intervene on failure-prone simulated rollouts and collecting the resulting trajectories for post-training, the approach produces policies that succeed more often when transferred to physical robots across rigid and deformable manipulation tasks and different policy backbones.
What carries the argument
Human-in-the-World-Model (Hi-WM), which embeds short human corrective actions inside an action-conditioned world model together with state caching, rollback, and branching to produce dense corrective trajectories for policy post-training.
If this is right
- Policy improvement becomes possible with far fewer physical resets, scene setups, and real-time human supervision.
- A single failure state inside the model can yield multiple corrective trajectories through branching, increasing data density around problem behaviors.
- World-model evaluation serves as a strong proxy for real-world performance, with measured correlation of 0.953.
- The same framework works across rigid-object and deformable-object tasks and on multiple policy architectures.
Where Pith is reading between the lines
- One well-trained world model could support repeated post-training cycles for many different policies without additional real-world data collection.
- As world-model fidelity increases, the volume of real-world corrections needed for reliable transfer may continue to shrink.
- The approach opens a path toward using simulated interventions to target rare but costly failure modes that are hard to encounter repeatedly in the physical world.
Load-bearing premise
Short corrective actions supplied by humans inside the world model must generate trajectories whose distribution matches real dynamics closely enough that the post-trained policy actually improves when run on physical robots.
What would settle it
If policies trained on the Hi-WM corrective trajectories show no gain or a drop in real-world success rates compared with the base policy, the central claim would be refuted.
read the original abstract
Post-training is essential for turning pretrained generalist robot policies into reliable task-specific controllers, but existing human-in-the-loop pipelines remain tied to physical execution: each correction requires robot time, scene setup, resets, and operator supervision in the real world. Meanwhile, action-conditioned world models have been studied mainly for imagination, synthetic data generation, and policy evaluation. We propose \textbf{Human-in-the-World-Model (Hi-WM)}, a post-training framework that uses a learned world model as a reusable corrective substrate for failure-targeted policy improvement. A policy is first rolled out in closed loop inside the world model; when the rollout becomes incorrect or failure-prone, a human intervenes directly in the model to provide short corrective actions. Hi-WM caches intermediate states and supports rollback and branching, allowing a single failure state to be reused for multiple corrective continuations and yielding dense supervision around behaviors that the base policy handles poorly. The resulting corrective trajectories are then added back to the training set for post-training. We evaluate Hi-WM on three real-world manipulation tasks spanning both rigid and deformable object interaction, and on two policy backbones. Hi-WM improves real-world success by 37.9 points on average over the base policy and by 19.0 points over a world-model closed-loop baseline, while world-model evaluation correlates strongly with real-world performance (r = 0.953). These results suggest that world models can serve not only as generators or evaluators, but also as effective corrective substrates for scalable robot post-training.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Human-in-the-World-Model (Hi-WM), a post-training framework in which an action-conditioned world model serves as a reusable substrate for human corrective interventions. A base policy is rolled out in closed loop inside the WM; upon detecting failure-prone states, a human supplies short corrective actions with support for state caching, rollback, and branching to generate dense supervision around weak behaviors. The resulting corrective trajectories are added to the training set for policy fine-tuning. Experiments across three real-world manipulation tasks (rigid and deformable) and two policy backbones report average real-world success gains of 37.9 percentage points over the base policy and 19.0 points over a WM closed-loop baseline, together with a strong correlation (r = 0.953) between WM-based and real-world policy evaluations.
Significance. If the transfer assumption holds, Hi-WM offers a practical route to scalable human-in-the-loop post-training that avoids repeated physical resets and supervision. The strong WM-real correlation is a concrete strength that could support using world models as cheap proxies for policy evaluation. The approach directly targets the cost bottleneck in turning generalist robot policies into reliable task-specific controllers.
major comments (3)
- Abstract and Evaluation sections: the headline gains (37.9 pp over base, 19.0 pp over WM baseline) and r = 0.953 correlation are presented without any reported trial counts per task, standard deviations, confidence intervals, or statistical significance tests, leaving the robustness of the central empirical claim difficult to assess.
- Hi-WM Framework and Methods sections: no quantitative verification is supplied that the short human corrective trajectories generated inside the learned WM have dynamics close enough to real execution (e.g., per-step prediction error on corrective segments, state-action distribution divergence, or real-world replay of WM trajectories). This fidelity assumption is load-bearing for the transfer claim.
- Baselines and Implementation details: the exact protocol for the world-model closed-loop baseline (intervention timing, human interface, number of corrections) is not specified, nor are the WM training procedure, architecture, dataset, or hyperparameters, preventing independent assessment of whether the reported advantage is attributable to the Hi-WM intervention mechanism.
minor comments (2)
- Figure 1: the pipeline diagram would be clearer with explicit annotations for the rollback/branching operations and the exact point at which human input is injected.
- Notation: the distinction between WM-internal states and real-world states is occasionally ambiguous in the text; consistent use of subscripts (e.g., s_WM vs s_real) would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important areas for improving statistical reporting, fidelity analysis, and reproducibility. We address each major comment point-by-point below and will revise the manuscript to incorporate the requested details where possible.
read point-by-point responses
-
Referee: Abstract and Evaluation sections: the headline gains (37.9 pp over base, 19.0 pp over WM baseline) and r = 0.953 correlation are presented without any reported trial counts per task, standard deviations, confidence intervals, or statistical significance tests, leaving the robustness of the central empirical claim difficult to assess.
Authors: We agree that additional statistical details are essential for evaluating robustness. In the revised manuscript, we will report the exact number of trials per task and condition (25 trials were conducted per condition), include standard deviations alongside success rates, add 95% confidence intervals, and report the p-value for the correlation (r = 0.953) to establish statistical significance. These values were collected during experimentation but omitted for brevity; they will be added to the Evaluation section and referenced in the abstract. revision: yes
-
Referee: Hi-WM Framework and Methods sections: no quantitative verification is supplied that the short human corrective trajectories generated inside the learned WM have dynamics close enough to real execution (e.g., per-step prediction error on corrective segments, state-action distribution divergence, or real-world replay of WM trajectories). This fidelity assumption is load-bearing for the transfer claim.
Authors: We acknowledge that direct quantitative fidelity checks on corrective trajectories would strengthen the transfer claim. While the reported r = 0.953 correlation between WM and real-world evaluations offers indirect support for sufficient dynamics capture, we did not compute per-step prediction errors or distribution divergences specifically on the human corrective segments. In the revision, we will add an analysis (in Methods or Appendix) with per-step MSE on held-out corrective trajectories and available divergence metrics, along with a discussion of limitations if real-world replay of WM trajectories was not performed. revision: partial
-
Referee: Baselines and Implementation details: the exact protocol for the world-model closed-loop baseline (intervention timing, human interface, number of corrections) is not specified, nor are the WM training procedure, architecture, dataset, or hyperparameters, preventing independent assessment of whether the reported advantage is attributable to the Hi-WM intervention mechanism.
Authors: We agree that insufficient implementation details hinder reproducibility and attribution of gains. In the revised manuscript, we will expand the Baselines and Implementation Details sections to fully specify the WM closed-loop baseline protocol (including failure detection criteria, intervention timing, human interface, and number of corrections), as well as the world model architecture, training dataset (size and composition), procedure, and all hyperparameters. This will clarify that performance differences arise from the Hi-WM rollback and branching mechanisms. revision: yes
Circularity Check
No significant circularity; empirical claims rest on external real-world benchmarks
full rationale
The paper proposes an empirical framework (Hi-WM) for generating corrective trajectories inside a learned world model and adding them to post-training data. All load-bearing results—37.9 pp average real-world success gain, 19.0 pp gain over WM-closed-loop baseline, and r=0.953 WM-real correlation—are measured on held-out physical robot tasks separate from WM training and corrective data collection. No equations, fitted parameters, or self-citations are presented that reduce these quantities to definitions or inputs internal to the paper; the validation chain therefore remains externally falsifiable rather than self-referential.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Action-conditioned world models can generate trajectories whose corrective modifications transfer to real-world policy improvement
Reference graph
Works this paper leans on
-
[1]
1X World Model — 1x.tech.https://www.1x.tech/discover/1x-world-model, 2025
1X Technologies. 1X World Model — 1x.tech.https://www.1x.tech/discover/1x-world-model, 2025. [Ac- cessed 16-05-2025]
2025
-
[2]
World Simulation with Video Foundation Models for Physical AI
Arslan Ali, Junjie Bai, Maciej Bala, Yogesh Balaji, Aaron Blakeman, Tiffany Cai, Jiaxin Cao, Tianshi Cao, Elizabeth Cha, Yu-Wei Chao, et al. World simulation with video foundation models for physical ai.arXiv preprint arXiv:2511.00062, 2025
work page internal anchor Pith review arXiv 2025
-
[3]
World-gan: a generative model for minecraft worlds
Maren Awiszus, Frederik Schubert, and Bodo Rosenhahn. World-gan: a generative model for minecraft worlds. In2021 IEEE Conference on Games (CoG), pages 1–8. IEEE, 2021
2021
-
[4]
Amir Bar, Gaoyue Zhou, Danny Tran, Trevor Darrell, and Yann LeCun. Navigation world models.arXiv preprint arXiv:2412.03572, 2024
-
[5]
Homanga Bharadhwaj, Debidatta Dwibedi, Abhinav Gupta, Shubham Tulsiani, Carl Doersch, Ted Xiao, Dhruv Shah, Fei Xia, Dorsa Sadigh, and Sean Kirmani. Gen2act: Human video generation in novel scenarios enables generalizable robot manipulation.arXiv preprint arXiv:2409.16283, 2024
- [6]
-
[7]
Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, and Ury Zhilinsky.π0: A visio...
-
[8]
URLhttps://arxiv.org/abs/2410.24164
work page internal anchor Pith review arXiv
-
[9]
Genie: Generative interactive environments
Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, et al. Genie: Generative interactive environments. InForty- first International Conference on Machine Learning, 2024
2024
-
[10]
Xiaoyu Chen, Junliang Guo, Tianyu He, Chuheng Zhang, Pushi Zhang, Derek Cathera Yang, Li Zhao, and Jiang Bian. Igor: Image-goal representations are the atomic control units for foundation models in embodied ai.arXiv preprint arXiv:2411.00785, 2024
-
[11]
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion.arXiv preprint arXiv:2303.04137, 2023
work page internal anchor Pith review arXiv 2023
-
[12]
arXiv preprint arXiv:2509.22642 , year=
Xiaowei Chi, Peidong Jia, Chun-Kai Fan, Xiaozhu Ju, Weishi Mi, Kevin Zhang, Zhiyuan Qin, Wanxin Tian, Kuangzhi Ge, Hao Li, et al. Wow: Towards a world omniscient world model through embodied interaction. arXiv preprint arXiv:2509.22642, 2025
-
[13]
Learning universal policies via text-guided video generation.Advances in neural information processing systems, 36:9156–9172, 2023
Yilun Du, Sherry Yang, Bo Dai, Hanjun Dai, Ofir Nachum, Josh Tenenbaum, Dale Schuurmans, and Pieter Abbeel. Learning universal policies via text-guided video generation.Advances in neural information processing systems, 36:9156–9172, 2023
2023
-
[14]
A taxonomy for evaluating generalist robot policies.arXiv preprint arXiv:2503.01238, 2025
Jensen Gao, Suneel Belkhale, Sudeep Dasari, Ashwin Balakrishna, Dhruv Shah, and Dorsa Sadigh. A taxonomy for evaluating generalist robot manipulation policies.arXiv preprint arXiv:2503.01238, 2025
-
[15]
Pre- diction with action: Visual policy learning via joint denoising process
Yanjiang Guo, Yucheng Hu, Jianke Zhang, Yen-Jen Wang, Xiaoyu Chen, Chaochao Lu, and Jianyu Chen. Pre- diction with action: Visual policy learning via joint denoising process. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
2024
-
[16]
Ctrl-world: A controllable generative world model for robot manipulation, 2026
Yanjiang Guo, Lucy Xiaoyang Shi, Jianyu Chen, and Chelsea Finn. Ctrl-world: A controllable generative world model for robot manipulation.arXiv preprint arXiv:2510.10125, 2025
-
[17]
Yanjiang Guo, Tony Lee, Lucy Xiaoyang Shi, Jianyu Chen, Percy Liang, and Chelsea Finn. Vlaw: Iterative co-improvement of vision-language-action policy and world model.arXiv preprint arXiv:2602.12063, 2026
-
[18]
1x world model: evaluating bits, not atoms, 2025
D HO, J MONAS, JT REN, and C YU. 1x world model: evaluating bits, not atoms, 2025
2025
-
[19]
Brown, and Ken Goldberg
Ryan Hoque, Ashwin Balakrishna, Ellen Novoseller, Albert Wilcox, Daniel S. Brown, and Ken Goldberg. ThriftyDAgger: Budget-aware novelty and risk gating for interactive imitation learning. InProceedings of the 5th Conference on Robot Learning, pages 598–608, 2022. 12
2022
-
[20]
Fleet-DAgger: Interactive robot fleet learning with scalable human supervision
Ryan Hoque, Lawrence Yunliang Chen, Satvik Sharma, Karthik Dharmarajan, Brijen Thananjeyan, Pieter Abbeel, and Ken Goldberg. Fleet-DAgger: Interactive robot fleet learning with scalable human supervision. InProceedings of The 6th Conference on Robot Learning, pages 368–380, 2023
2023
-
[21]
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Yucheng Hu, Yanjiang Guo, Pengchao Wang, Xiaoyu Chen, Yen-Jen Wang, Jianke Zhang, Koushil Sreenath, Chaochao Lu, and Jianyu Chen. Video prediction policy: A generalist robot policy with predictive visual representations.arXiv preprint arXiv:2412.14803, 2024
work page internal anchor Pith review arXiv 2024
-
[22]
Enerverse: Envisioning embodied future space for robotics manipu- lation, 2025
Siyuan Huang, Liliang Chen, Pengfei Zhou, Shengcong Chen, Zhengkai Jiang, Yue Hu, Yue Liao, Peng Gao, Hongsheng Li, Maoqing Yao, et al. Enerverse: Envisioning embodied future space for robotics manipulation. arXiv preprint arXiv:2501.01895, 2025
-
[23]
DreamGen: Unlocking generalization in robot learning through video world models
Joel Jang, Seonghyeon Ye, Zongyu Lin, Jiannan Xiang, Johan Bjorck, Yu Fang, Fengyuan Hu, Spencer Huang, Kaushil Kundalia, Yen-Chen Lin, Loïc Magne, Ajay Mandlekar, Avnish Narayan, You Liang Tan, Guanzhi Wang, Jing Wang, Qi Wang, Yinzhen Xu, Xiaohui Zeng, Kaiyuan Zheng, Ruijie Zheng, Ming-Yu Liu, Luke Zettlemoyer, Dieter Fox, Jan Kautz, Scott Reed, Yuke Zh...
2025
-
[24]
Interactive imitation learning in state-space
Snehal Jauhri, Carlos Celemin, and Jens Kober. Interactive imitation learning in state-space. InProceedings of the 2020 Conference on Robot Learning, pages 682–692, 2021
2020
-
[25]
Video2act: A dual-system video diffusion policy with robotic spatio-motional modeling,
YueruJia, JiamingLiu, ShengbangLiu, RuiZhou, WanheYu, YuyangYan, XiaoweiChi, YandongGuo, BoxinShi, and Shanghang Zhang. Video2act: A dual-system video diffusion policy with robotic spatio-motional modeling. arXiv preprint arXiv:2512.03044, 2025
-
[26]
TRANSIC: Sim-to-real policy transfer by learning from online correction
Yunfan Jiang, Chen Wang, Ruohan Zhang, Jiajun Wu, and Fei-Fei Li. TRANSIC: Sim-to-real policy transfer by learning from online correction. InProceedings of The 8th Conference on Robot Learning, pages 1691–1729, 2025
2025
-
[27]
Enerverse-ac: Envisioning embodied environments with action condition,
Yuxin Jiang, Shengcong Chen, Siyuan Huang, Liliang Chen, Pengfei Zhou, Yue Liao, Xindong He, Chiming Liu, Hongsheng Li, Maoqing Yao, et al. Enerverse-ac: Envisioning embodied environments with action condition. arXiv preprint arXiv:2505.09723, 2025
-
[28]
Zhennan Jiang, Kai Liu, Yuxin Qin, Shuai Tian, Yupeng Zheng, Mingcai Zhou, Chao Yu, Haoran Li, and Dongbin Zhao. World4rl: Diffusion world models for policy refinement with reinforcement learning for robotic manipulation.arXiv preprint arXiv:2509.19080, 2025
-
[29]
Zhennan Jiang, Shangqing Zhou, Yutong Jiang, Zefang Huang, Mingjie Wei, Yuhui Chen, Tianxing Zhou, Zhen Guo, Hao Lin, Quanlu Zhang, et al. Wovr: World models as reliable simulators for post-training vla policies with rl.arXiv preprint arXiv:2602.13977, 2026
-
[30]
Kochenderfer
Michael Kelly, Chelsea Sidrane, Katherine Driggs-Campbell, and Mykel J. Kochenderfer. HG-DAgger: Interactive imitation learning with human experts. In2019 International Conference on Robotics and Automation (ICRA), pages 8077–8083, 2019
2019
-
[31]
Pathdreamer: A world model for indoor navigation
Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, and Peter Anderson. Pathdreamer: A world model for indoor navigation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 14738–14748, 2021
2021
-
[32]
DART: Noise injection for robust imitation learning
Michael Laskey, Jonathan Lee, Roy Fox, Anca Dragan, and Ken Goldberg. DART: Noise injection for robust imitation learning. InProceedings of the 1st Annual Conference on Robot Learning, pages 143–156, 2017
2017
-
[33]
Chenhao Li, Andreas Krause, and Marco Hutter. Robotic world model: A neural network simulator for robust policy optimization in robotics.arXiv preprint arXiv:2501.10100, 2025
-
[34]
Evaluating real-world robot manipulation policies in simulation
Xuanlin Li, Kyle Hsu, Jiayuan Gu, Oier Mees, Karl Pertsch, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, and Ted Xiao. Evaluating real-world robot manipulation policies in simulation. InProceedings of The 8th Conference on Robot Learning, pages 3705–3728, 2025
2025
-
[35]
Worldeval: World model as real-world robot policies evaluator,
Yaxuan Li, Yichen Zhu, Junjie Wen, Chaomin Shen, and Yi Xu. Worldeval: World model as real-world robot policies evaluator.arXiv preprint arXiv:2505.19017, 2025
-
[36]
Yue Liao, Pengfei Zhou, Siyuan Huang, Donglin Yang, Shengcong Chen, Yuxin Jiang, Yue Hu, Jingbin Cai, Si Liu, Jianlan Luo, et al. Genie envisioner: A unified world foundation platform for robotic manipulation.arXiv preprint arXiv:2508.05635, 2025. 13
-
[37]
Robot learning on the job: Human-in- the-loop autonomy and learning during deployment
Huihan Liu, Soroush Nasiriany, Lance Zhang, Zhiyao Bao, and Yuke Zhu. Robot learning on the job: Human-in- the-loop autonomy and learning during deployment. InRobotics: Science and Systems (RSS), 2023
2023
-
[38]
Xiaokang Liu, Zechen Bai, Hai Ci, Kevin Yuchen Ma, and Mike Zheng Shou. World-vla-loop: Closed-loop learning of video world model and vla policy.arXiv preprint arXiv:2602.06508, 2026
-
[39]
Serl: A software suite for sample-efficient robotic reinforcement learning, 2024
Jianlan Luo, Zheyuan Hu, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, and Sergey Levine. Serl: A software suite for sample-efficient robotic reinforcement learning, 2024
2024
-
[40]
Precise and dexterous robotic manipulation via human- in-the-loop reinforcement learning, 2024
Jianlan Luo, Charles Xu, Jeffrey Wu, and Sergey Levine. Precise and dexterous robotic manipulation via human- in-the-loop reinforcement learning, 2024
2024
-
[41]
Human-in-the-loop task and motion planning for imitation learning
Ajay Mandlekar, Caelan Garrett, Danfei Xu, and Dieter Fox. Human-in-the-loop task and motion planning for imitation learning. InConference on Robot Learning (CoRL), volume 229 ofPMLR, 2023
2023
-
[42]
Structured world models from human videos
Russell Mendonca, Shikhar Bahl, and Deepak Pathak. Structured world models from human videos. InRobotics: Science and Systems (RSS), 2023
2023
-
[43]
Lucibot: Automated robot policy learning from generated videos,
Xiaowen Qiu, Yian Wang, Jiting Cai, Zhehuan Chen, Chunru Lin, Tsun-Hsuan Wang, and Chuang Gan. LuciBot: Automated robot policy learning from generated videos.arXiv preprint arXiv:2503.09871, 2025
-
[44]
Worldgym: World model as an environment for policy evaluation, 2025
Julian Quevedo, Ansh Kumar Sharma, Yixiang Sun, Varad Suryavanshi, Percy Liang, and Sherry Yang. WorldGym: World model as an environment for policy evaluation.arXiv preprint arXiv:2506.00613, 2025
-
[45]
Pengzhen Ren, Kaidong Zhang, Hetao Zheng, Zixuan Li, Yuhang Wen, Fengda Zhu, Mas Ma, and Xiaodan Liang. Surfer: Progressive reasoning with world models for robotic manipulation.arXiv preprint arXiv:2306.11335, 2023
-
[46]
Gordon, and J
Stéphane Ross, Geoffrey J. Gordon, and J. Andrew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. InProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 627–635, 2011
2011
-
[47]
Multi-view masked world models for visual robotic manipulation
Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jinwoo Shin, and Pieter Abbeel. Multi-view masked world models for visual robotic manipulation. InInternational Conference on Machine Learning, pages 30613–30632. PMLR, 2023
2023
-
[48]
Multi-view masked world models for visual robotic manipulation
Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jinwoo Shin, and Pieter Abbeel. Multi-view masked world models for visual robotic manipulation. InProceedings of the 40th International Conference on Machine Learning, pages 30613–30632, 2023
2023
-
[49]
World-gymnast: Training robots with reinforcement learning in a world model, 2026
Ansh Kumar Sharma, Yixiang Sun, Ninghao Lu, Yunzhe Zhang, Jiarao Liu, and Sherry Yang. World-gymnast: Training robots with reinforcement learning in a world model.arXiv preprint arXiv:2602.02454, 2026
-
[50]
Learning from interventions: Human-robot interaction as both explicit and implicit feed- back
Jonathan Spencer, Sanjiban Choudhury, Matt Barnes, Matthew Schmittle, Mung Chiang, Peter Ramadge, and Siddhartha Srinivasa. Learning from interventions: Human-robot interaction as both explicit and implicit feed- back. InProceedings of Robotics: Science and Systems, 2020. doi: 10.15607/RSS.2020.XVI.055
-
[51]
Evaluating gemini robotics policies in a veo world simulator.arXiv preprint arXiv:2512.10675, 2025
Gemini Robotics Team, Coline Devin, Yilun Du, Debidatta Dwibedi, Ruiqi Gao, Abhishek Jindal, Thomas Kipf, Sean Kirmani, Fangchen Liu, Anirudha Majumdar, et al. Evaluating gemini robotics policies in a veo world simulator.arXiv preprint arXiv:2512.10675, 2025
-
[52]
Interactive world simulator for robot policy training and evaluation, 2026
Yixuan Wang et al. Interactive world simulator for robot policy training and evaluation.arXiv preprint arXiv:2603.08546, 2026
-
[53]
Interactive imitation learning for dexterous robotic manipulation: Challenges and perspectives
Elise Welte et al. Interactive imitation learning for dexterous robotic manipulation: Challenges and perspectives. Frontiers in Robotics and AI, 12, 2025
2025
-
[54]
Daydreamer: World models for physical robot learning
Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. InConference on robot learning, pages 2226–2240. PMLR, 2023
2023
-
[55]
Philipp Wu, Yide Shentu, Qiayuan Liao, Ding Jin, Menglong Guo, Koushil Sreenath, Xingyu Lin, and Pieter Abbeel. RoboCopilot: Human-in-the-loop interactive imitation learning for robot manipulation.arXiv preprint arXiv:2503.07771, 2025
-
[56]
World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training
Junjin Xiao, Yandan Yang, Xinyuan Chang, Ronghan Chen, Feng Xiong, Mu Xu, Wei-Shi Zheng, and Qing Zhang. World-env: Leveraging world model as a virtual environment for vla post-training.arXiv preprint arXiv:2509.24948, 2025. 14
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[57]
Latent Action Pretraining from Videos
Seonghyeon Ye, Joel Jang, Byeongguk Jeon, Sejune Joo, Jianwei Yang, Baolin Peng, Ajay Mandlekar, Reuben Tan, Yu-WeiChao, BillYuchen Lin, etal. Latentactionpretrainingfromvideos.arXiv preprint arXiv:2410.11758, 2024
work page Pith review arXiv 2024
-
[58]
Kaidong Zhang, Pengzhen Ren, Bingqian Lin, Junfan Lin, Shikui Ma, Hang Xu, and Xiaodan Liang. Pivot-r: Primitive-driven waypoint-aware world model for robotic manipulation.arXiv preprint arXiv:2410.10394, 2024
-
[59]
Veo-Act: How Far Can Frontier Video Models Advance Generalizable Robot Manipulation?
Zhongru Zhang, Chenghan Yang, Qingzhou Lu, Yanjiang Guo, Jianke Zhang, Yucheng Hu, and Jianyu Chen. Veo-act: How far can frontier video models advance generalizable robot manipulation?arXiv preprint arXiv:2604.04502, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[60]
Act2goal: From world model to general goal-conditioned policy, 2025
Pengfei Zhou, Liliang Chen, Shengcong Chen, Di Chen, Wenzhi Zhao, Rongjun Jin, Guanghui Ren, and Jianlan Luo. Act2goal: From world model to general goal-conditioned policy.arXiv preprint arXiv:2512.23541, 2025
-
[61]
Zhiyuan Zhou, Pranav Atreya, You Liang Tan, Karl Pertsch, and Sergey Levine. Autoeval: Autonomous evalu- ation of generalist robot manipulation policies in the real world.arXiv preprint arXiv:2503.24278, 2025
-
[62]
Mani-WM: An interactive world model for real-robot manipulation, 2024
Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, and Tao Kong. Mani-WM: An interactive world model for real-robot manipulation, 2024. URLhttps://openreview.net/forum?id=aVyJwS1fqQ. 15
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.