arXiv preprint arXiv:2508.10922 , year =

· 2025 · arXiv 2508.10922

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

NEST: Narrative Event Structures in Time for Long Video Understanding

cs.CV · 2026-06-18 · unverdicted · novelty 7.0

NEST is a new benchmark dataset for narrative event structures in long videos, with baselines reporting ETD below 8%, EL under 6%, EAE below 11%, and ERE at 35-44% F1.

Natural-Language Temporal Grounding in Hour-Long Videos is a Search Problem: A Benchmark and Empirical Decomposition

cs.CV · 2026-06-10 · conditional · novelty 7.0

Hour-long video temporal grounding is a search problem, shown by a new benchmark where all Video-LLMs collapse, frame retrieval outperforms them, 85% of failures are search-related, and a retrieve-then-ground hybrid improves results 6.7x.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Natural-Language Temporal Grounding in Hour-Long Videos is a Search Problem: A Benchmark and Empirical Decomposition cs.CV · 2026-06-10 · conditional · none · ref 28
Hour-long video temporal grounding is a search problem, shown by a new benchmark where all Video-LLMs collapse, frame retrieval outperforms them, 85% of failures are search-related, and a retrieve-then-ground hybrid improves results 6.7x.

arXiv preprint arXiv:2508.10922 , year =

fields

years

verdicts

representative citing papers

citing papers explorer