VICX decouples frozen video-based visual planning from in-context visual-to-trajectory mapping via V2T-ICON to achieve cross-task and cross-embodiment generalization in robot manipulation.
Ma et al.A Survey on Vision-Language-Action Models for Embodied AI
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.RO 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Real-robot trials with OpenVLA on a UR5e arm show consistent offline-to-closed-loop gaps driven by action semantics, coordinate conventions, temporal alignment, image preprocessing, and dataset quality rather than model capacity.
citing papers explorer
-
VICX: Generalizable Robot Manipulation via Video Generation and In-Context Operator Network
VICX decouples frozen video-based visual planning from in-context visual-to-trajectory mapping via V2T-ICON to achieve cross-task and cross-embodiment generalization in robot manipulation.
-
Vision-Language-Action Models: Experimental Insights from a Real-World UR5 Platform
Real-robot trials with OpenVLA on a UR5e arm show consistent offline-to-closed-loop gaps driven by action semantics, coordinate conventions, temporal alignment, image preprocessing, and dataset quality rather than model capacity.