SceneFunRI benchmark shows current VLMs struggle severely with inferring locations of invisible functional objects, with the strongest model (Gemini 3 Flash) reaching only 15.20 CAcc@75.
Qwen3.5: Towards native multimodal agents,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
method 1polarities
use method 1representative citing papers
The NTIRE 2026 CD-FSOD Challenge report details innovative methods and performance results from 19 teams on cross-domain few-shot object detection in open- and closed-source tracks.
citing papers explorer
-
SceneFunRI: Reasoning the Invisible for Task-Driven Functional Object Localization
SceneFunRI benchmark shows current VLMs struggle severely with inferring locations of invisible functional objects, with the strongest model (Gemini 3 Flash) reaching only 15.20 CAcc@75.
-
The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results
The NTIRE 2026 CD-FSOD Challenge report details innovative methods and performance results from 19 teams on cross-domain few-shot object detection in open- and closed-source tracks.