Affordance2Action introduces A2A-Bench, a manipulation-oriented benchmark for scene-level task-conditioned affordance grounding covering single- and multi-region correspondences, plus an annotation pipeline, and reports gaps in existing segmentation and VLM baselines.
Videoafford: Grounding 3d affordance from human-object-interaction videos via multimodal large language model.arXiv preprint arXiv:2602.09638
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
CompassAD benchmark and CompassNet framework for intent-driven affordance prediction on the appropriate object within multi-object 3D point clouds conditioned on natural language intent.
AFUN predicts task-conditional functional masks and 3D post-contact motion curves from RGB-D and language, trained via a standardized multi-source data pipeline, and reports large gains over baselines on segmentation, contact prediction, and motion tasks.
citing papers explorer
-
Affordance2Action: Task-Conditioned Scene-level Affordance Grounding for Real-Time Manipulation
Affordance2Action introduces A2A-Bench, a manipulation-oriented benchmark for scene-level task-conditioned affordance grounding covering single- and multi-region correspondences, plus an annotation pipeline, and reports gaps in existing segmentation and VLM baselines.
-
CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects
CompassAD benchmark and CompassNet framework for intent-driven affordance prediction on the appropriate object within multi-object 3D point clouds conditioned on natural language intent.
-
AFUN: Towards an Affordance Foundation Model for Functionality Understanding
AFUN predicts task-conditional functional masks and 3D post-contact motion curves from RGB-D and language, trained via a standardized multi-source data pipeline, and reports large gains over baselines on segmentation, contact prediction, and motion tasks.