RoboWM-Bench evaluates video world models by converting their manipulation video predictions into executable actions validated in simulation, showing that visual plausibility does not guarantee physical executability.
//arxiv.org/abs/2212.06870
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
EventTrack6D tracks 6D poses of unseen objects from event cameras by reconstructing dense intensity and depth cues between frames, generalizing from synthetic training to real data at high speed.
HA-HOI produces physically plausible 4D HOI animations from monocular videos by anchoring object reconstruction to human motion and refining the result in a physics-based humanoid-object simulator.
OneViewAll achieves 92.5% ADD-0.1 accuracy on LINEMOD for novel object 6D pose estimation using only one real reference view by integrating category, symmetry, and patch-level semantic priors in a projection-equivariant alignment.
SAM 3D reconstructs 3D objects from single images with geometry, texture, and pose using human-model annotated data at scale and synthetic-to-real training, achieving 5:1 human preference wins.
A geometry-aware 4D video generation model trained with cross-view pointmap alignment to produce spatio-temporally consistent future videos from novel viewpoints for robot manipulation.
citing papers explorer
-
RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation
RoboWM-Bench evaluates video world models by converting their manipulation video predictions into executable actions validated in simulation, showing that visual plausibility does not guarantee physical executability.
-
Event6D: Event-based Novel Object 6D Pose Tracking
EventTrack6D tracks 6D poses of unseen objects from event cameras by reconstructing dense intensity and depth cues between frames, generalizing from synthetic training to real data at high speed.
-
Real2Sim in HOI: Toward Physically Plausible HOI Reconstruction from Monocular Videos
HA-HOI produces physically plausible 4D HOI animations from monocular videos by anchoring object reconstruction to human motion and refining the result in a physics-based humanoid-object simulator.
-
OneViewAll: Semantic Prior Guided One-View 6D Pose Estimation for Novel Objects
OneViewAll achieves 92.5% ADD-0.1 accuracy on LINEMOD for novel object 6D pose estimation using only one real reference view by integrating category, symmetry, and patch-level semantic priors in a projection-equivariant alignment.
-
SAM 3D: 3Dfy Anything in Images
SAM 3D reconstructs 3D objects from single images with geometry, texture, and pose using human-model annotated data at scale and synthetic-to-real training, achieving 5:1 human preference wins.
-
Geometry-aware 4D Video Generation for Robot Manipulation
A geometry-aware 4D video generation model trained with cross-view pointmap alignment to produce spatio-temporally consistent future videos from novel viewpoints for robot manipulation.
- MAPRPose: Mask-Aware Proposal and Amodal Refinement for Multi-Object 6D Pose Estimation
- From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data