pith. sign in

Surveillancevqa-589k: A benchmark for comprehensive surveillance video-language understanding with large models

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

fields

cs.CV 4

years

2026 3 2025 1

representative citing papers

MAVEN: A Multi-stage Agentic Annotation Pipeline for Video Reasoning Tasks

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

MAVEN pipeline generates multi-scale spatio-temporal event descriptions from videos using agentic adaptation and refinement, then produces training data that lets a fine-tuned 8B model outperform Gemini baselines on private CCTV and AccidentBench tasks.

MetaphorVU: Towards Metaphorical Video Understanding

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

Introduces the first benchmark for metaphorical video understanding, identifies MLLM weaknesses in cross-domain mapping, and proposes an inference-time enhancement using a knowledge graph.

citing papers explorer

Showing 4 of 4 citing papers.