MultiWorld is a scalable framework for multi-agent multi-view video world models that improves controllability and consistency over single-agent baselines in game and robot tasks.
Physical autoregressive model for robotic manipulation without action pretraining
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
background 2polarities
background 2representative citing papers
Vision-geometry backbones using pretrained 3D world models outperform vision-language and video models for robotic manipulation by enabling direct mapping from visual input to geometric actions.
OOWM models the world as an explicit symbolic tuple with UML diagrams and trains via SFT plus GRPO to outperform text-based CoT on embodied planning benchmarks.
Survey organizing world models for robotic manipulation into representation families, a functional taxonomy, and infrastructure roles across pretraining, post-training, and inference, while reviewing 34 datasets and evaluation protocols.
IndusAgent achieves state-of-the-art zero-shot performance on industrial anomaly benchmarks by using a custom Indus-CoT dataset, dynamic tool orchestration, and gated RL to optimize anomaly classification, localization, and reasoning.
citing papers explorer
-
MultiWorld: Scalable Multi-Agent Multi-View Video World Models
MultiWorld is a scalable framework for multi-agent multi-view video world models that improves controllability and consistency over single-agent baselines in game and robot tasks.
-
Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models
Vision-geometry backbones using pretrained 3D world models outperform vision-language and video models for robotic manipulation by enabling direct mapping from visual input to geometric actions.
-
OOWM: Structuring Embodied Reasoning and Planning via Object-Oriented Programmatic World Modeling
OOWM models the world as an explicit symbolic tuple with UML diagrams and trains via SFT plus GRPO to outperform text-based CoT on embodied planning benchmarks.
-
World Models for Robotic Manipulation: A Survey
Survey organizing world models for robotic manipulation into representation families, a functional taxonomy, and infrastructure roles across pretraining, post-training, and inference, while reviewing 34 datasets and evaluation protocols.
-
IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools
IndusAgent achieves state-of-the-art zero-shot performance on industrial anomaly benchmarks by using a custom Indus-CoT dataset, dynamic tool orchestration, and gated RL to optimize anomaly classification, localization, and reasoning.