VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.
View-invariant policy learning via zero-shot novel view synthesis
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.RO 5years
2026 5verdicts
UNVERDICTED 5roles
background 2polarities
background 2representative citing papers
DockAnywhere lifts single demonstrations to diverse docking points via structure-preserving augmentation and point-cloud spatial editing to improve viewpoint generalization in visuomotor policies for mobile manipulation.
UniviewVLA generates multiview future views from two cameras via world modeling, plus token compression and view selection, to boost occlusion handling in robot manipulation while matching standard benchmark performance.
A framework augments single fisheye demonstrations into multiple novel-view trajectories with obstacles via fisheye-adapted Gaussian Splatting and trajectory optimization, raising policy success rates in original and modified scenes.
WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.
citing papers explorer
-
VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis
VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.
-
DockAnywhere: Data-Efficient Visuomotor Policy Learning for Mobile Manipulation via Novel Demonstration Generation
DockAnywhere lifts single demonstrations to diverse docking points via structure-preserving augmentation and point-cloud spatial editing to improve viewpoint generalization in visuomotor policies for mobile manipulation.
-
UniviewVLA: A Unified Multiview Vision-Language-Action Model with World Modeling
UniviewVLA generates multiview future views from two cameras via world modeling, plus token compression and view selection, to boost occlusion handling in robot manipulation while matching standard benchmark performance.
-
One Demo is Worth a Thousand Trajectories: Action-View Augmentation for Visuomotor Policies
A framework augments single fisheye demonstrations into multiple novel-view trajectories with obstacles via fisheye-adapted Gaussian Splatting and trajectory optimization, raising policy success rates in original and modified scenes.
-
WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations
WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.