Introduces OMTG benchmark with C-Acc and EtF1 metrics, a 56k dataset, and caption/temporal rewards, reaching 43.65% EtF1 SOTA on the new bench.
CyberV: Cybernetics for Test-time Scaling in Video Understanding.arXiv preprint arXiv:2506.07971, 2025
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
AVIS is an adaptive policy that jointly scales visual context via key-based token pruning and reasoning via difficulty-predicted self-consistency to improve the accuracy-compute curve on image and video tasks.
A survey of test-time scaling for multimodal foundation models that introduces a three-way taxonomy of sampling, feedback, and search approaches along with applications and benchmarks.
This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.
citing papers explorer
-
Towards One-to-Many Temporal Grounding
Introduces OMTG benchmark with C-Acc and EtF1 metrics, a 56k dataset, and caption/temporal rewards, reaching 43.65% EtF1 SOTA on the new bench.
-
AVIS: Adaptive Test-Time Scaling for Vision-Language Models
AVIS is an adaptive policy that jointly scales visual context via key-based token pruning and reasoning via difficulty-predicted self-consistency to improve the accuracy-compute curve on image and video tasks.
-
Test-Time Scaling in Multimodal Foundation Models: A Comprehensive Survey of Generation and Reasoning
A survey of test-time scaling for multimodal foundation models that introduces a three-way taxonomy of sampling, feedback, and search approaches along with applications and benchmarks.
-
Watch, Remember, Reason: Human-View Video Understanding with MLLMs
This is a survey that frames video MLLM research via a human-view formulation of perceptual representations, memory states, reasoning traces, and predictions, then reviews methods, datasets, benchmarks, and open problems.