Dream.exe evaluates 8 video generation models on 101 manipulation tasks by converting generated videos into executable robot trajectories in a simulator, finding measurable success rates that visual metrics do not predict.
Robocasa365: A large-scale simulation framework for training and benchmarking generalist robots
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8representative citing papers
SafeManip is a benchmark applying reusable LTLf templates across eight safety categories to evaluate temporal properties in robotic manipulation on VLA policies.
MemCompiler reframes memory use as state-conditioned compilation, delivering relevant guidance via text and latent channels to improve embodied agent performance up to 129% and cut latency 60% versus static injection.
ABot-M0.5 proposes a unified mobility-and-manipulation world action model using three alignment strategies that achieves state-of-the-art performance on mobile and fine-grained manipulation benchmarks.
DeMiAn re-annotates robot and egocentric videos with VLM-generated dense labels across motion, scene, pose, and reasoning aspects, then uses a learned instructor to boost policy success by 5 points on RoboCasa over task-only baselines.
The paper identifies four missing interfaces (data autolabelling, embodiment retargeting, physics-grounded world models, and video-based reward inference) as the central bottleneck beyond VLA scaling for robot intelligence.
RLDX-1 outperforms frontier VLAs such as π0.5 and GR00T N1.6 on dexterous manipulation benchmarks, reaching 86.8% success on ALLEX humanoid tasks versus around 40% for the baselines.
citing papers explorer
-
Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?
Dream.exe evaluates 8 video generation models on 101 manipulation tasks by converting generated videos into executable robot trajectories in a simulator, finding measurable success rates that visual metrics do not predict.
-
SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation
SafeManip is a benchmark applying reusable LTLf templates across eight safety categories to evaluate temporal properties in robotic manipulation on VLA policies.
-
MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents
MemCompiler reframes memory use as state-conditioned compilation, delivering relevant guidance via text and latent channels to improve embodied agent performance up to 129% and cut latency 60% versus static injection.
-
ABot-M0.5: Unified Mobility-and-Manipulation World Action Model
ABot-M0.5 proposes a unified mobility-and-manipulation world action model using three alignment strategies that achieves state-of-the-art performance on mobile and fine-grained manipulation benchmarks.
-
How to Instruct Your Robot: Dense Language Annotations Power Robot Policy Learning
DeMiAn re-annotates robot and egocentric videos with VLM-generated dense labels across motion, scene, pose, and reasoning aspects, then uses a learned instructor to boost policy success by 5 points on RoboCasa over task-only baselines.
-
Robots Need More than VLA and World Models
The paper identifies four missing interfaces (data autolabelling, embodiment retargeting, physics-grounded world models, and video-based reward inference) as the central bottleneck beyond VLA scaling for robot intelligence.
-
RLDX-1 Technical Report
RLDX-1 outperforms frontier VLAs such as π0.5 and GR00T N1.6 on dexterous manipulation benchmarks, reaching 86.8% success on ALLEX humanoid tasks versus around 40% for the baselines.
- Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond