PearlVLA achieves SOTA on LIBERO by separating VLM representations into visual grounding and an iterative latent plan branch refined via world model queries and RefineNet with process-reward RL.
hub Canonical reference
Vla-rft: Vision- language-action reinforcement fine-tuning with veri- fied rewards in world simulators
Canonical reference. 80% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
years
2026 17roles
background 5representative citing papers
Foresight uses iterative VLM plan proposal and critique with RL from human feedback to raise navigation success 37% and cut interventions 52% in real-world tests.
DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.
SafeDojo is a new world model-based safe RL framework for VLA that outperforms baselines on SafeLIBERO and real robot tasks.
RAW-Dream disentangles world-model learning from task data by using a pre-trained task-agnostic world model and VLM rewards, with dual-noise filtering, to enable zero-shot VLA adaptation in simulation and real settings.
Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.
A dual-tower 4D embodied world model called RoboStereo reduces geometric hallucinations and delivers over 97% relative improvement on manipulation tasks via test-time augmentation, imitative learning, and open exploration.
LifeLong-RFT applies chunking-level on-policy reinforcement learning with Quantized Action Consistency Reward, Continuous Trajectory Alignment Reward, and Format Compliance Reward to fine-tune VLA models, achieving a 22% average success rate gain over supervised fine-tuning on the LIBERO benchmark's
WorldSample generates synthetic transitions from a post-trained world model grounded in real rollouts and uses Policy-Paced Learning to improve RL policies, reporting 28% higher success rates and 59% fewer training steps on contact-rich robot tasks.
The paper proposes an L0-L7 evidential ladder for evaluating world models in embodied decision-making, prioritizing interventional action fidelity and policy optimization utility over visual plausibility.
World Pilot augments VLA policies with world-action priors through latent and action steering pathways, reporting 84.7% success on LIBERO-Plus zero-shot OOD and top real-robot results across four tasks.
AllDayNav encodes scene dynamics into a large model's parameters via RL and a multimodal memory, achieving near-100% success rates in lifelong navigation and outperforming map-based and VLM baselines.
Survey organizing world models for robotic manipulation into representation families, a functional taxonomy, and infrastructure roles across pretraining, post-training, and inference, while reviewing 34 datasets and evaluation protocols.
The method uses multi-view diffusion priors and action manifold learning to resolve depth ambiguity and improve action prediction in VLA robotic manipulation models, reporting higher success rates than baselines on LIBERO, RoboTwin, and real-robot tasks.
NAUTILUS is a prompt-driven harness that automates plug-and-play adapters, typed contracts, and validation for policies, benchmarks, and robots in learning research.
The World-Value-Action model enables implicit planning for VLA systems by performing inference over a learned latent representation of high-value future trajectories instead of direct action prediction.
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
citing papers explorer
-
PearlVLA: Progressive Embodied Action-Plan Refinement in Latent Space
PearlVLA achieves SOTA on LIBERO by separating VLM representations into visual grounding and an iterative latent plan branch refined via world model queries and RefineNet with process-reward RL.
-
Foresight: Iterative Reasoning About Clues that Matter for Navigation
Foresight uses iterative VLM plan proposal and critique with RL from human feedback to raise navigation success 37% and cut interventions 52% in real-world tests.
-
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos
DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.
-
SafeDojo: Safe Reinforcement Learning for VLA via Interactive World Model
SafeDojo is a new world model-based safe RL framework for VLA that outperforms baselines on SafeLIBERO and real robot tasks.
-
Reinforcing VLAs in Task-Agnostic World Models
RAW-Dream disentangles world-model learning from task data by using a pre-trained task-agnostic world model and VLM rewards, with dual-noise filtering, to enable zero-shot VLA adaptation in simulation and real settings.
-
Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems
Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.
-
RoboStereo: Dual-Tower 4D Embodied World Models for Unified Policy Optimization
A dual-tower 4D embodied world model called RoboStereo reduces geometric hallucinations and delivers over 97% relative improvement on manipulation tasks via test-time augmentation, imitative learning, and open exploration.
-
Towards Long-Lived Robots: Continual Learning VLA Models via Reinforcement Fine-Tuning
LifeLong-RFT applies chunking-level on-policy reinforcement learning with Quantized Action Consistency Reward, Continuous Trajectory Alignment Reward, and Format Compliance Reward to fine-tune VLA models, achieving a 22% average success rate gain over supervised fine-tuning on the LIBERO benchmark's
-
WorldSample: Closed-loop Real-robot RL with World Modelling
WorldSample generates synthetic transitions from a post-trained world model grounded in real rollouts and uses Policy-Paced Learning to improve RL policies, reporting 28% higher success rates and 59% fewer training steps on contact-rich robot tasks.
-
How Should World Models Be Evaluated for Embodied Decision-Making? A Decision-Making-Centric Position
The paper proposes an L0-L7 evidential ladder for evaluating world models in embodied decision-making, prioritizing interventional action fidelity and policy optimization utility over visual plausibility.
-
World Pilot: Steering Vision-Language-Action Models with World-Action Priors
World Pilot augments VLA policies with world-action priors through latent and action steering pathways, reporting 84.7% success on LIBERO-Plus zero-shot OOD and top real-robot results across four tasks.
-
AllDayNav: Lifelong Navigation via Real-World Reinforcement Learning
AllDayNav encodes scene dynamics into a large model's parameters via RL and a multimodal memory, achieving near-100% success rates in lifelong navigation and outperforming map-based and VLM baselines.
-
World Models for Robotic Manipulation: A Survey
Survey organizing world models for robotic manipulation into representation families, a functional taxonomy, and infrastructure roles across pretraining, post-training, and inference, while reviewing 34 datasets and evaluation protocols.
-
Learning Action Manifold with Multi-view Latent Priors for Robotic Manipulation
The method uses multi-view diffusion priors and action manifold learning to resolve depth ambiguity and improve action prediction in VLA robotic manipulation models, reporting higher success rates than baselines on LIBERO, RoboTwin, and real-robot tasks.
-
Nautilus: From One Prompt to Plug-and-Play Robot Learning
NAUTILUS is a prompt-driven harness that automates plug-and-play adapters, typed contracts, and validation for policies, benchmarks, and robots in learning research.
-
World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems
The World-Value-Action model enables implicit planning for VLA systems by performing inference over a learned latent representation of high-value future trajectories instead of direct action prediction.
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.