{"total":12,"items":[{"citing_arxiv_id":"2606.20698","ref_index":39,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SafeDojo: Safe Reinforcement Learning for VLA via Interactive World Model","primary_cat":"cs.RO","submitted_at":"2026-06-15T08:58:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SafeDojo is a new world model-based safe RL framework for VLA that outperforms baselines on SafeLIBERO and real robot tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.15032","ref_index":37,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"How Should World Models Be Evaluated for Embodied Decision-Making? A Decision-Making-Centric Position","primary_cat":"cs.LG","submitted_at":"2026-06-13T00:21:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The paper proposes an L0-L7 evidential ladder for evaluating world models in embodied decision-making, prioritizing interventional action fidelity and policy optimization utility over visual plausibility.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00113","ref_index":115,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"World Models for Robotic Manipulation: A Survey","primary_cat":"cs.RO","submitted_at":"2026-05-27T05:32:17+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Survey organizing world models for robotic manipulation into representation families, a functional taxonomy, and infrastructure roles across pretraining, post-training, and inference, while reviewing 34 datasets and evaluation protocols.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12334","ref_index":18,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Reinforcing VLAs in Task-Agnostic World Models","primary_cat":"cs.AI","submitted_at":"2026-05-12T16:16:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RAW-Dream disentangles world-model learning from task data by using a pre-trained task-agnostic world model and VLM rewards, with dual-noise filtering, to enable zero-shot VLA adaptation in simulation and real settings.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"the VLA policy without physical execution. Some frameworks [ 13, 30, 37, 44, 46] utilize world models as static simulators for post-training policies, while others [8, 21, 39] explore closed-loop systems where the policy and world model are iteratively co-trained. To provide reward signals for these imagined rollouts, existing literature generally relies on specifically engineered [ 18] or task-finetuned [8, 13, 46]) reward functions, though a few methods [10, 30, 43] explored zero-shot VLM rewards. However, a fundamental limitation shared across these approaches is their reliance on task-dependent world models. Unlike these approaches, our work takes a step towards a more generalizable framework by pairing a task-agnostic world model, capable of simulating unseen"},{"citing_arxiv_id":"2605.12090","ref_index":54,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"World Action Models: The Next Frontier in Embodied AI","primary_cat":"cs.RO","submitted_at":"2026-05-12T13:10:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Embodied World Model SWIM [34], DreamDojo [ 35], RoboDreamer [36], RoboScape [37]. . . WM for VLA Imitation Learning Ctrl-World [38], RoboScape [37], DREMA [ 39] Reinforcement Learning Dreamer to Control [ 40] DreamerV2 [ 41], Dreamer 4 [ 42], RISE [ 43] DreamerV3 [44], DayDreamer [45], World-Env [46], RoboScape-R [47] WMPO [48], WoVR [49], VLA-RFT [50], RWML [51], MoDem-V2 [52] World-Gymnast [53], RWM-U [54], World4RL [55], VIPER [ 56] PhysWorld [57], Diffusion Reward [58], GenReward [59] Evaluation Ctrl-World [38], Veo Robotics [60], Interactive World Simulator [61] WorldEval [62], WorldGym [63], dWorldEval [64] Architecture Cascaded W AM Explicit UniPi [6], VLP [ 7], RoboEnvision [9], ThisThat [ 65], TesserAct [66], MVISTA-4D [67] Say ,Dream,and Act [10], Gen2Act [68], A VDC [8], Im2Flow2Act [69], 3DFlowAction [70]"},{"citing_arxiv_id":"2605.11832","ref_index":58,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning Action Manifold with Multi-view Latent Priors for Robotic Manipulation","primary_cat":"cs.RO","submitted_at":"2026-05-12T09:21:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The method uses multi-view diffusion priors and action manifold learning to resolve depth ambiguity and improve action prediction in VLA robotic manipulation models, reporting higher success rates than baselines on LIBERO, RoboTwin, and real-robot tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Quanxin, M. Xiao, and G. Song, \"Wmpo: World model-based policy optimization for vision-language-action models,\" inInt. Conf. Learn. Represent., 2026. 2 [57] S. Fei, S. Wang, L. Ji, A. Li, S. Zhang, L. Liu, J. Hou, J. Gong, X. Zhao, and X. Qiu, \"Srpo: Self-referential policy optimization for vision-language-action models,\"arXiv:2511.15605, 2025. 2 [58] H. Li, P . Ding, R. Suo, Y. Wang, Z. Ge, D. Zang, K. Yu, M. Sun, H. Zhang, D. Wanget al., \"Vla-rft: Vision-language-action rein- forcement fine-tuning with verified rewards in world simulators,\" arXiv:2510.00406, 2025. 2 [59] Z. Jiang, S. Zhou, Y. Jiang, Z. Huang, M. Wei, Y. Chen, T. Zhou, Z. Guo, H. Lin, Q. Zhang, Y. Wang, H. Li, C. Yu, and D. Zhao,"},{"citing_arxiv_id":"2605.11665","ref_index":61,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Nautilus: From One Prompt to Plug-and-Play Robot Learning","primary_cat":"cs.RO","submitted_at":"2026-05-12T07:26:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"NAUTILUS is a prompt-driven harness that automates plug-and-play adapters, typed contracts, and validation for policies, benchmarks, and robots in learning research.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"fits policies to expert trajectories [12, 28-36], whilereinforcement learning(RL) optimizes behavior through reward-driven interaction with the environment [37-41]. More recently,vision-language- action(VLA) models have emerged as a broader architectural class that maps multimodal inputs to robot actions [42-60]; most VLAs are trained with IL-style objectives, though recent work also uses RL fine-tuning [61, 62]. A related direction is theworld action model(WAM), which routes action prediction through a learned dynamics model [63-74]. These families differ in training and deployment assumptions, but NAUTILUStargets their workflows rather than any single model class. LLMs in robotics and embodied orchestration.Recent work has applied LLMs and VLMs inside"},{"citing_arxiv_id":"2604.21138","ref_index":18,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems","primary_cat":"cs.RO","submitted_at":"2026-04-22T22:58:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.14732","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems","primary_cat":"cs.RO","submitted_at":"2026-04-16T07:46:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The World-Value-Action model enables implicit planning for VLA systems by performing inference over a learned latent representation of high-value future trajectories instead of direct action prediction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.12639","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"RoboStereo: Dual-Tower 4D Embodied World Models for Unified Policy Optimization","primary_cat":"cs.CV","submitted_at":"2026-03-13T04:16:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A dual-tower 4D embodied world model called RoboStereo reduces geometric hallucinations and delivers over 97% relative improvement on manipulation tasks via test-time augmentation, imitative learning, and open exploration.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.10503","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Towards Long-Lived Robots: Continual Learning VLA Models via Reinforcement Fine-Tuning","primary_cat":"cs.RO","submitted_at":"2026-02-11T04:05:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LifeLong-RFT applies chunking-level on-policy reinforcement learning with Quantized Action Consistency Reward, Continuous Trajectory Alignment Reward, and Format Compliance Reward to fine-tune VLA models, achieving a 22% average success rate gain over supervised fine-tuning on the LIBERO benchmark's","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.06949","ref_index":55,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos","primary_cat":"cs.RO","submitted_at":"2026-02-06T18:49:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"A Path Towards Autonomous Machine Intelligence.Open Review, 2022. 2 [54] Hengtao Li, Pengxiang Ding, Runze Suo, Yihao Wang, Zirui Ge, Dongyuan Zang, Kexian Yu, Mingyang Sun, Hongyin Zhang, Donglin Wang, et al. VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators.arXiv preprint arXiv:2510.00406, 2025. 16 [55] Qixiu Li, Yu Deng, Yaobo Liang, Lin Luo, Lei Zhou, Chengtang Yao, Lingqi Zeng, Zhiyuan Feng, Huizhi Liang, Sicheng Xu, et al. Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos.arXiv preprint arXiv:2510.21571, 2025. 4 [56] Shuang Li, Yihuai Gao, Dorsa Sadigh, and Shuran Song. Unified Video Action Model."}],"limit":50,"offset":0}