RLA-WM predicts residual latent actions via flow matching to create visual feature world models that outperform prior feature-based and diffusion approaches while enabling offline video-based robot RL.
hub Canonical reference
Robotic world model: A neural network simulator for robust policy optimization in robotics.arXiv preprint arXiv:2501.10100, 2025a
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 8polarities
background 8representative citing papers
Action Images turn robot arm motions into interpretable multiview pixel videos, letting video backbones serve as zero-shot policies for end-to-end robot learning.
Target-Bench shows the best off-the-shelf video world model scores only 0.341 on semantic target-approaching and directional consistency, with fine-tuning on a small robot dataset yielding measurable gains.
QOED selects identifiable parameter directions via Fisher matrix eigenspace analysis and modifies exploration objectives to approximate ideal information gain under bounded nuisance assumptions, yielding 21-35% performance gains in robotic tasks.
Hi-WM uses human interventions inside an action-conditioned world model with rollback and branching to generate dense corrective data, raising real-world success by 37.9 points on average across three manipulation tasks.
Tempered sequential Monte Carlo samples from a Boltzmann-tilted distribution over controllers to optimize trajectories and policies under differentiable dynamics.
Morphology-conditioned quadrupedal world model enables zero-shot generalization to new robot embodiments for locomotion tasks.
World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and DreamerV3.
HAIC enables robust humanoid interactions with underactuated objects by predicting their dynamics from proprioceptive history and using a world model for adaptive control.
RISE combines a controllable dynamics model and progress value model into a closed-loop self-improving pipeline that updates robot policies entirely in imagination, reporting over 35% absolute gains on three real-world tasks.
Survey organizing world models for robotic manipulation into representation families, a functional taxonomy, and infrastructure roles across pretraining, post-training, and inference, while reviewing 34 datasets and evaluation protocols.
Cortex 2.0 introduces world-model-based planning that generates and scores future trajectories to outperform reactive vision-language-action baselines on industrial robotic tasks including pick-and-place, sorting, and unpacking.
The paper delivers a multi-axis taxonomy for world models that maps architectures, training families, reasoning strategies, and domains from early cognitive foundations through systems such as Dreamer, MuZero, and Sora while noting evaluation gaps.
A literature survey summarizing modeling, state estimation, control methods, applications, and open challenges for legged robots operating in non-inertial environments where the ground moves or accelerates.
citing papers explorer
-
Learning Visual Feature-Based World Models via Residual Latent Action
RLA-WM predicts residual latent actions via flow matching to create visual feature world models that outperform prior feature-based and diffusion approaches while enabling offline video-based robot RL.
-
Action Images: End-to-End Policy Learning via Multiview Video Generation
Action Images turn robot arm motions into interpretable multiview pixel videos, letting video backbones serve as zero-shot policies for end-to-end robot learning.
-
Target-Bench: Can Video World Models Achieve Mapless Path Planning with Semantic Targets?
Target-Bench shows the best off-the-shelf video world model scores only 0.341 on semantic target-approaching and directional consistency, with fine-tuning on a small robot dataset yielding measurable gains.
-
Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration
QOED selects identifiable parameter directions via Fisher matrix eigenspace analysis and modifies exploration objectives to approximate ideal information gain under bounded nuisance assumptions, yielding 21-35% performance gains in robotic tasks.
-
Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training
Hi-WM uses human interventions inside an action-conditioned world model with rollback and branching to generate dense corrective data, raising real-world success by 37.9 points on average across three manipulation tasks.
-
Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics
Tempered sequential Monte Carlo samples from a Boltzmann-tilted distribution over controllers to optimize trajectories and policies under differentiable dynamics.
-
Toward Hardware-Agnostic Quadrupedal World Models via Morphology Conditioning
Morphology-conditioned quadrupedal world model enables zero-shot generalization to new robot embodiments for locomotion tasks.
-
Safety, Security, and Cognitive Risks in World Models
World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and DreamerV3.
-
HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model
HAIC enables robust humanoid interactions with underactuated objects by predicting their dynamics from proprioceptive history and using a world model for adaptive control.
-
RISE: Self-Improving Robot Policy with Compositional World Model
RISE combines a controllable dynamics model and progress value model into a closed-loop self-improving pipeline that updates robot policies entirely in imagination, reporting over 35% absolute gains on three real-world tasks.
-
World Models for Robotic Manipulation: A Survey
Survey organizing world models for robotic manipulation into representation families, a functional taxonomy, and infrastructure roles across pretraining, post-training, and inference, while reviewing 34 datasets and evaluation protocols.
-
Cortex 2.0: Grounding World Models in Real-World Industrial Deployment
Cortex 2.0 introduces world-model-based planning that generates and scores future trajectories to outperform reactive vision-language-action baselines on industrial robotic tasks including pick-and-place, sorting, and unpacking.
-
World Models: A Comprehensive Survey of Architectures, Methodologies, Reasoning Paradigms, and Applications
The paper delivers a multi-axis taxonomy for world models that maps architectures, training families, reasoning strategies, and domains from early cognitive foundations through systems such as Dreamer, MuZero, and Sora while noting evaluation gaps.
-
A Survey of Legged Robotics in Non-Inertial Environments: Past, Present, and Future
A literature survey summarizing modeling, state estimation, control methods, applications, and open challenges for legged robots operating in non-inertial environments where the ground moves or accelerates.
- Self-supervised Hierarchical Visual Reasoning with World Model