pith. sign in

Videoafford: Grounding 3d affordance from human-object-interaction videos via multimodal large language model.arXiv preprint arXiv:2602.09638

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.RO 2 cs.CV 1

years

2026 3

verdicts

UNVERDICTED 3

representative citing papers

AFUN: Towards an Affordance Foundation Model for Functionality Understanding

cs.RO · 2026-06-01 · unverdicted · novelty 6.0

AFUN predicts task-conditional functional masks and 3D post-contact motion curves from RGB-D and language, trained via a standardized multi-source data pipeline, and reports large gains over baselines on segmentation, contact prediction, and motion tasks.

citing papers explorer

Showing 3 of 3 citing papers.