pith. machine review for the scientific record. sign in

arxiv: 2603.09030 · v3 · submitted 2026-03-09 · 💻 cs.RO · cs.AI

Recognition: unknown

PlayWorld: Learning Robot World Models from Autonomous Play

Authors on Pith no claims yet
classification 💻 cs.RO cs.AI
keywords playworldworlddatamodelsrobotinteractionslearningvideo
0
0 comments X
read the original abstract

Action-conditioned video models offer a promising path to building general-purpose robot simulators that can improve directly from data. Yet, despite training on large-scale robot datasets, current state-of-the-art video models still struggle to predict physically consistent robot-object interactions that are crucial in robotic manipulation. To close this gap, we present PlayWorld, a simple, scalable, and fully autonomous pipeline for training high-fidelity video world simulators from interaction experience. In contrast to prior approaches that rely on success-biased human demonstrations, PlayWorld is the first system capable of learning entirely from unsupervised robot self-play, enabling naturally scalable data collection while capturing complex, long-tailed physical interactions essential for modeling realistic object dynamics. Experiments across diverse manipulation tasks show that PlayWorld generates high-quality, physically consistent predictions for contact-rich interactions that are not captured by world models trained on human-collected data. We further demonstrate the versatility of PlayWorld in enabling fine-grained failure prediction and policy evaluation, with up to 40% improvements over human-collected data. Finally, we demonstrate how PlayWorld enables reinforcement learning in the world model, improving policy performance by 65% in success rates when deployed in the real world.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies

    cs.RO 2026-05 unverdicted novelty 7.0

    DreamAvoid uses a Dream Trigger, Action Proposer, and Dream Evaluator trained on success/failure/boundary data to let VLA policies avoid critical-phase failures via test-time future dreaming.

  2. Reinforcing VLAs in Task-Agnostic World Models

    cs.AI 2026-05 unverdicted novelty 6.0

    RAW-Dream lets VLAs learn new tasks in zero-shot imagination by using a world model pre-trained only on task-free behaviors and an unmodified VLM to supply rewards, with dual-noise verification to limit hallucinations.

  3. VAG: Dual-Stream Video-Action Generation for Embodied Data Synthesis

    cs.RO 2026-04 unverdicted novelty 6.0

    VAG is a synchronized dual-stream flow-matching framework that generates aligned video-action pairs for synthetic embodied data synthesis and policy pretraining.