Vau-r1: Advancing video anomaly understanding via reinforcement fine-tuning

· 2025 · arXiv 2505.23504

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Learning to Watch: Active Video Anomaly Understanding via Interleaved Policy Optimization

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

Introduces Anom-π framework for active video anomaly understanding via interleaved policy optimization and iDPO under weak supervision, claiming a 2B model outperforms larger SOTA VAU models.

ESOM: Efficiently Understanding Streaming Video Anomalies with Open-world Dynamic Definitions

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

ESOM is a training-free streaming model for open-world video anomaly detection with dynamic definitions that achieves real-time single-GPU efficiency and state-of-the-art results on a new benchmark.

From Structure to Synergy: A Survey of Vision-Language Perception Paradigm Evolution in Multimodal Large Language Models

cs.CL · 2026-06-24 · unverdicted · novelty 5.0

The survey formalizes MLLM perception as a unified vision-language capability and traces its evolution via a new five-stage taxonomy while outlining future challenges.

citing papers explorer

Showing 3 of 3 citing papers.

Learning to Watch: Active Video Anomaly Understanding via Interleaved Policy Optimization cs.CV · 2026-07-01 · unverdicted · none · ref 2
Introduces Anom-π framework for active video anomaly understanding via interleaved policy optimization and iDPO under weak supervision, claiming a 2B model outperforms larger SOTA VAU models.
ESOM: Efficiently Understanding Streaming Video Anomalies with Open-world Dynamic Definitions cs.CV · 2026-04-09 · unverdicted · none · ref 16
ESOM is a training-free streaming model for open-world video anomaly detection with dynamic definitions that achieves real-time single-GPU efficiency and state-of-the-art results on a new benchmark.
From Structure to Synergy: A Survey of Vision-Language Perception Paradigm Evolution in Multimodal Large Language Models cs.CL · 2026-06-24 · unverdicted · none · ref 197
The survey formalizes MLLM perception as a unified vision-language capability and traces its evolution via a new five-stage taxonomy while outlining future challenges.

Vau-r1: Advancing video anomaly understanding via reinforcement fine-tuning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer