Learning Deep Representations of Appearance and Motion for Anomalous Event Detection

· 2015 · cs.CV · arXiv 1510.01553

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes. While most existing works merely use hand-crafted appearance and motion features, we propose Appearance and Motion DeepNet (AMDN) which utilizes deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining both the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Based on the learned representations, multiple one-class SVM models are used to predict the anomaly scores of each input, which are then integrated with a late fusion strategy for final anomaly detection. We evaluate the proposed method on two publicly available video surveillance datasets, showing competitive performance with respect to state of the art approaches.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Is Video Anomaly Detection Misframed? Evidence from LLM-Based and Multi-Scene Models

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

Video anomaly detection is misframed by multi-scene LLM models that reduce the task to semantic action recognition instead of capturing local scene normality, requiring a return to single-scene spatially-aware methods.

citing papers explorer

Showing 1 of 1 citing paper.

Is Video Anomaly Detection Misframed? Evidence from LLM-Based and Multi-Scene Models cs.CV · 2026-05-12 · unverdicted · none · ref 45 · internal anchor
Video anomaly detection is misframed by multi-scene LLM models that reduce the task to semantic action recognition instead of capturing local scene normality, requiring a return to single-scene spatially-aware methods.

Learning Deep Representations of Appearance and Motion for Anomalous Event Detection

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer