VP-VLA decouples high-level reasoning from low-level control in VLA models by rendering spatial anchors as visual prompts directly in the RGB observation space, outperforming end-to-end baselines.
In: Forty-first International Conference on Machine Learning (2024)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.RO 2years
2026 2representative citing papers
citing papers explorer
-
VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models
VP-VLA decouples high-level reasoning from low-level control in VLA models by rendering spatial anchors as visual prompts directly in the RGB observation space, outperforming end-to-end baselines.
- AnySlot: Goal-Conditioned Vision-Language-Action Policies for Zero-Shot Slot-Level Placement