FineBench is a new dense VQA benchmark for fine-grained human activity understanding in long videos, revealing weaknesses in open VLMs and showing that FineAgent improves them via localization and description modules.
Egoschema: A diagnostic benchmark for very long- form video language understanding.Advances in Neural In- formation Processing Systems, 36:46212–46244, 2023
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding
FineBench is a new dense VQA benchmark for fine-grained human activity understanding in long videos, revealing weaknesses in open VLMs and showing that FineAgent improves them via localization and description modules.