VoxAfford fuses multi-scale voxel features into MLLM output tokens using cross-attention with a learned compatibility gate to achieve SOTA open-vocabulary 3D affordance detection with ~8% mIoU gain and zero-shot robot transfer.
Segment anything
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
GAP pre-trains the spatial adapter on a lightweight simulated proxy task with free object masks to generate repeatable geometric keypoints, yielding higher success rates than baselines in low-data robotic manipulation on RoboMimic and ManiSkill.
citing papers explorer
-
VoxAfford: Multi-Scale Voxel-Token Fusion for Open-Vocabulary 3D Affordance Detection
VoxAfford fuses multi-scale voxel features into MLLM output tokens using cross-attention with a learned compatibility gate to achieve SOTA open-vocabulary 3D affordance detection with ~8% mIoU gain and zero-shot robot transfer.
-
GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks
GAP pre-trains the spatial adapter on a lightweight simulated proxy task with free object masks to generate repeatable geometric keypoints, yielding higher success rates than baselines in low-data robotic manipulation on RoboMimic and ManiSkill.