Momentseeker: A comprehensive benchmark and a strong baseline for moment retrieval within long videos

Huaying Yuan, Jian Ni, Yueze Wang, Junjie Zhou, Zhengyang Liang, Zheng Liu, Zhao Cao, Zhicheng Dou, Ji-Rong Wen · 2025 · arXiv 2502.12558

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

StoryTR: Narrative-Centric Video Temporal Retrieval with Theory of Mind Reasoning

cs.AI · 2026-04-25 · unverdicted · novelty 7.0

StoryTR is a new benchmark and agentic data pipeline that adds explicit Theory of Mind reasoning chains to train smaller video retrieval models, yielding a 15% relative IoU gain over larger baselines on narrative content.

VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents

cs.CV · 2025-07-07 · unverdicted · novelty 5.0

VLM2Vec-V2 is a multimodal embedding model trained on an extended MMEB-V2 benchmark that adds video and visual document tasks and reports gains on both new and prior image benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

StoryTR: Narrative-Centric Video Temporal Retrieval with Theory of Mind Reasoning cs.AI · 2026-04-25 · unverdicted · none · ref 24
StoryTR is a new benchmark and agentic data pipeline that adds explicit Theory of Mind reasoning chains to train smaller video retrieval models, yielding a 15% relative IoU gain over larger baselines on narrative content.
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents cs.CV · 2025-07-07 · unverdicted · none · ref 30
VLM2Vec-V2 is a multimodal embedding model trained on an extended MMEB-V2 benchmark that adds video and visual document tasks and reports gains on both new and prior image benchmarks.

Momentseeker: A comprehensive benchmark and a strong baseline for moment retrieval within long videos

fields

years

verdicts

representative citing papers

citing papers explorer