pith. sign in

hub Baseline reference

Mmbench-video: A long-form multi-shot benchmark for holistic video under- standing

Baseline reference. 50% of citing Pith papers use this work as a benchmark or comparison.

12 Pith papers citing it
Baseline 50% of classified citations

hub tools

citation-role summary

dataset 4 background 2

citation-polarity summary

fields

cs.CV 11 cs.CL 1

representative citing papers

Qwen2.5-VL Technical Report

cs.CV · 2025-02-19 · unverdicted · novelty 5.0

Qwen2.5-VL reports a vision-language model family using native dynamic-resolution ViT and absolute time encoding that matches GPT-4o on document and diagram tasks while supporting hour-long videos with second-level localization.

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling

cs.CV · 2025-01-21 · unverdicted · novelty 5.0

InternVideo2.5 improves video MLLMs by incorporating dense vision task annotations via direct preference optimization and compact spatiotemporal representations via adaptive hierarchical token compression, yielding better benchmark performance, 6x longer video memory, and new capabilities likeobject

citing papers explorer

Showing 12 of 12 citing papers.