pith. sign in

arxiv: 1709.06316 · v3 · pith:3C3RWYUAnew · submitted 2017-09-19 · 💻 cs.CV

Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM

classification 💻 cs.CV
keywords saliencyvideopredictingattentionconvolutionaldatabasemethodobjects
0
0 comments X
read the original abstract

Over the past few years, deep neural networks (DNNs) have exhibited great success in predicting the saliency of images. However, there are few works that apply DNNs to predict the saliency of generic videos. In this paper, we propose a novel DNN-based video saliency prediction method. Specifically, we establish a large-scale eye-tracking database of videos (LEDOV), which provides sufficient data to train the DNN models for predicting video saliency. Through the statistical analysis of our LEDOV database, we find that human attention is normally attracted by objects, particularly moving objects or the moving parts of objects. Accordingly, we propose an object-to-motion convolutional neural network (OM-CNN) to learn spatio-temporal features for predicting the intra-frame saliency via exploring the information of both objectness and object motion. We further find from our database that there exists a temporal correlation of human attention with a smooth saliency transition across video frames. Therefore, we develop a two-layer convolutional long short-term memory (2C-LSTM) network in our DNN-based method, using the extracted features of OM-CNN as the input. Consequently, the inter-frame saliency maps of videos can be generated, which consider the transition of attention across video frames. Finally, the experimental results show that our method advances the state-of-the-art in video saliency prediction.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Predicting video saliency using crowdsourced mouse-tracking data

    cs.CV 2019-06 unverdicted novelty 6.0

    Crowdsourced mouse-tracking data from a custom viewing system approximates eye-tracking saliency maps for videos and is improved by a proposed deep neural network.

  2. Simple vs complex temporal recurrences for video saliency prediction

    cs.CV 2019-07 unverdicted novelty 4.0

    Both ConvLSTM and exponential moving average modifications to a static saliency model achieve state-of-the-art video saliency prediction on DHF1K after SALICON pre-training and yield similar maps.