Direct 3D point grounding injected into the action head via a two-layer MLP and adaptive layer norm boosts VLA success rates by 32-46 points on spatial and task perturbations in LIBERO-PRO.
Pointvla: Injecting the 3d world into vision-language-action models
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Mapping point clouds to Fourier features improves high-precision imitation learning policies on RoboCasa, ManiSkill3, and real-robot tasks compared with Cartesian inputs.
Sparse2Act pretrains sparse 3D encoders via masked action-alignment supervision, yielding reusable representations that reach 86.9% success on LIBERO-10 and enable cross-domain transfer.
citing papers explorer
-
Direct Action-Head Injection of A Grounded 3D Point Unlocks Spatial and Task Generalization
Direct 3D point grounding injected into the action head via a two-layer MLP and adaptive layer norm boosts VLA success rates by 32-46 points on spatial and task perturbations in LIBERO-PRO.
-
Fourier Features Let Agents Learn High Precision Policies with Imitation Learning
Mapping point clouds to Fourier features improves high-precision imitation learning policies on RoboCasa, ManiSkill3, and real-robot tasks compared with Cartesian inputs.
-
Sparse2Act: Learning Action-Aligned Sparse 3D Representations for Cross-Domain Robot Manipulation
Sparse2Act pretrains sparse 3D encoders via masked action-alignment supervision, yielding reusable representations that reach 86.9% success on LIBERO-10 and enable cross-domain transfer.