Introduces Anom-π framework for active video anomaly understanding via interleaved policy optimization and iDPO under weak supervision, claiming a 2B model outperforms larger SOTA VAU models.
Vau-r1: Advancing video anomaly understanding via reinforcement fine-tuning
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
ESOM is a training-free streaming model for open-world video anomaly detection with dynamic definitions that achieves real-time single-GPU efficiency and state-of-the-art results on a new benchmark.
The survey formalizes MLLM perception as a unified vision-language capability and traces its evolution via a new five-stage taxonomy while outlining future challenges.
citing papers explorer
-
Learning to Watch: Active Video Anomaly Understanding via Interleaved Policy Optimization
Introduces Anom-π framework for active video anomaly understanding via interleaved policy optimization and iDPO under weak supervision, claiming a 2B model outperforms larger SOTA VAU models.
-
ESOM: Efficiently Understanding Streaming Video Anomalies with Open-world Dynamic Definitions
ESOM is a training-free streaming model for open-world video anomaly detection with dynamic definitions that achieves real-time single-GPU efficiency and state-of-the-art results on a new benchmark.
-
From Structure to Synergy: A Survey of Vision-Language Perception Paradigm Evolution in Multimodal Large Language Models
The survey formalizes MLLM perception as a unified vision-language capability and traces its evolution via a new five-stage taxonomy while outlining future challenges.