HUG trains a flow-matching model on a new 1M-frame egocentric human grasp dataset to generate retargetable grasps from single RGB-D images, beating baselines by 23-34% on a new 90-object benchmark.
Anygrasp: Robust and efficient grasp perception in spatial and temporal domains, 2023
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.RO 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
AffordanceVLA proposes a VLA model with affordance-aware modules (Which2Act, Where2Act, How2Act) in a Mixture-of-Transformer trained in three stages to improve robotic manipulation.
Introduces Embodied Tool Protocol and tool externalization to improve embodied AI performance on perception and cognition tasks, with measured gains but limits on execution capabilities.
citing papers explorer
-
Human Universal Grasping
HUG trains a flow-matching model on a new 1M-frame egocentric human grasp dataset to generate retargetable grasps from single RGB-D images, beating baselines by 23-34% on a new 90-object benchmark.
-
AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding
AffordanceVLA proposes a VLA model with affordance-aware modules (Which2Act, Where2Act, How2Act) in a Mixture-of-Transformer trained in three stages to improve robotic manipulation.
-
Enabling Extensible Embodied Capabilities with Tools
Introduces Embodied Tool Protocol and tool externalization to improve embodied AI performance on perception and cognition tasks, with measured gains but limits on execution capabilities.