RefereeBench shows that even the strongest video MLLMs reach only around 60% accuracy on multi-sport refereeing tasks and struggle with rule application and temporal grounding.
Videoreasonbench: Can mllms perform vision-centric complex video reasoning?
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
Introduces VURB benchmark and VUP-35K dataset to train discriminative and generative video reward models that achieve SOTA performance on VURB and VideoRewardBench.
Video-MME-v2 is a new benchmark that applies progressive visual-to-reasoning levels and non-linear group scoring to expose gaps in video MLLM capabilities.
EasyVideoR1 delivers an optimized RL pipeline for video understanding in large vision-language models, achieving 1.47x throughput gains and aligned results on 22 benchmarks.
citing papers explorer
-
RefereeBench: Are Video MLLMs Ready to be Multi-Sport Referees
RefereeBench shows that even the strongest video MLLMs reach only around 60% accuracy on multi-sport refereeing tasks and struggle with rule application and temporal grounding.
-
Video Understanding Reward Modeling: A Robust Benchmark and Performant Reward Models
Introduces VURB benchmark and VUP-35K dataset to train discriminative and generative video reward models that achieve SOTA performance on VURB and VideoRewardBench.
-
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
Video-MME-v2 is a new benchmark that applies progressive visual-to-reasoning levels and non-linear group scoring to expose gaps in video MLLM capabilities.
-
EasyVideoR1: Easier RL for Video Understanding
EasyVideoR1 delivers an optimized RL pipeline for video understanding in large vision-language models, achieving 1.47x throughput gains and aligned results on 22 benchmarks.