Predicting video saliency using crowdsourced mouse-tracking data

Dmitriy Vatolin; Vitaliy Lyudvichenko

arxiv: 1907.00480 · v1 · pith:URL3X3QKnew · submitted 2019-06-30 · 💻 cs.CV

Predicting video saliency using crowdsourced mouse-tracking data

Vitaliy Lyudvichenko , Dmitriy Vatolin This is my paper

Pith reviewed 2026-05-25 12:23 UTC · model grok-4.3

classification 💻 cs.CV

keywords video saliencymouse-trackingcrowdsourcingeye-tracking approximationsaliency mapsdeep neural networkperipheral vision simulation

0 comments

The pith

Crowdsourced mouse-tracking data collected through a cursor-contingent viewing system can approximate eye-tracking data for video saliency maps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a mouse-contingent video system, which blurs peripheral areas based on cursor position, lets ordinary mouse movements serve as a practical substitute for gaze fixations when building saliency maps. A crowdsourcing platform then gathers this data at scale from regular computers. The authors demonstrate that the resulting maps closely track those from eye-trackers, and they introduce a deep neural network that further refines the mouse-derived maps to higher accuracy. This matters because eye-tracking hardware limits dataset size and accessibility, while mouse data runs on any device. If the approximation holds, researchers can train saliency models on far larger and more diverse video collections without specialized equipment.

Core claim

We designed a mouse-contingent video viewing system which simulates the viewers' peripheral vision based on the position of the mouse cursor. The system enables the use of mouse-tracking data recorded from an ordinary computer mouse as an alternative to real gaze fixations recorded by a more expensive eye-tracker. We developed a crowdsourcing system that enables the collection of such mouse-tracking data at large scale. Using the collected mouse-tracking data we showed that it can serve as an approximation of eye-tracking data. Moreover, trying to increase the efficiency of collected mouse-tracking data we proposed a novel deep neural network algorithm that improves the quality of mouse-

What carries the argument

The mouse-contingent video viewing system that simulates peripheral vision from mouse cursor position, turning mouse movements into a proxy for gaze fixations used to build saliency maps.

If this is right

Mouse-tracking data gathered via crowdsourcing serves as a scalable, low-cost approximation to eye-tracking data for video saliency.
A dedicated deep neural network can measurably raise the quality of saliency maps derived from mouse-tracking inputs.
Large-scale video saliency datasets become feasible to collect without eye-tracking hardware.
Saliency prediction models can be trained on substantially bigger and more varied video sets assembled this way.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same cursor-contingent approach could be adapted to collect attention data for other dynamic visual tasks where eye-trackers are impractical.
The DNN refinement step implies that mouse data contains learnable, systematic deviations from true gaze that can be corrected algorithmically.
Performance of the approximation may vary with video content type, suggesting targeted validation on fast-motion or low-contrast scenes.
Hybrid training that mixes mouse-derived maps with smaller eye-tracking sets might improve model generalization beyond either data source alone.

Load-bearing premise

The mouse-contingent viewing system accurately simulates viewers' peripheral vision based on the mouse cursor position, so mouse movements reliably stand in for actual gaze fixations.

What would settle it

Side-by-side quantitative comparison of saliency maps produced from the crowdsourced mouse data against maps from simultaneous eye-tracking recordings on identical videos, checking agreement in fixation locations and saliency values.

Figures

Figures reproduced from arXiv: 1907.00480 by Dmitriy Vatolin, Vitaliy Lyudvichenko.

**Figure 1.** Figure 1: An example of a tutorial page and the mouse-contingent video player used in our system. The video around the cursor is sharp. To tackle this problem the semiautomatic paradigm for predicting saliency was proposed in [1]. Unlike conventional saliency models, semiautomatic approaches take eye-tracking saliency maps as an additional input and postprocess them which enables better saliency maps using less dat… view at source ↗

**Figure 2.** Figure 2: Overview of proposed temporal semiautomatic model based on SAM-ResNet [11]. We introduce the external prior maps and concatenate them with the features of the input layer and three intermediate layers. To make the network temporal-aware we introduce new spatiotemporal features and adapt the attentive ConvLSTM module so that it can pass the states to the following frames. The made modifications are marked b… view at source ↗

**Figure 3.** Figure 3: shows the results and illustrates that mousetracking of two observers have the same quality as eye-tracking of the single observer, so the data collected with the proposed system can approximate eyetracking. Note, when we estimated the eye-tracking performance of N observers we compared them with the remaining M − N observers of total M observers. Therefore the eye-tracking curve has stopped increasin… view at source ↗

read the original abstract

This paper presents a new way of getting high-quality saliency maps for video, using a cheaper alternative to eye-tracking data. We designed a mouse-contingent video viewing system which simulates the viewers' peripheral vision based on the position of the mouse cursor. The system enables the use of mouse-tracking data recorded from an ordinary computer mouse as an alternative to real gaze fixations recorded by a more expensive eye-tracker. We developed a crowdsourcing system that enables the collection of such mouse-tracking data at large scale. Using the collected mouse-tracking data we showed that it can serve as an approximation of eye-tracking data. Moreover, trying to increase the efficiency of collected mouse-tracking data we proposed a novel deep neural network algorithm that improves the quality of mouse-tracking saliency maps.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mouse-contingent crowdsourcing gives a cheaper route to video saliency data but the approximation to eye tracking still needs the numbers to back it up.

read the letter

The paper's main move is a mouse-contingent viewing system that blurs everything outside the cursor position, lets ordinary users drive it with a regular mouse, and feeds the resulting tracks into a DNN that cleans up the saliency maps. They also built the crowdsourcing side to scale collection without eye trackers. That setup is the concrete new piece; prior work had mouse tracking for images but this version targets video with the peripheral simulation and the post-processing network. The practical gain is real: eye-tracking hardware and lab time are expensive, so anything that drops the cost while still producing usable maps could help groups that train saliency models on limited budgets. The crowdsourcing pipeline itself looks workable on paper. The soft spot sits right at the central claim. The abstract states that the mouse data approximates eye data and that the network improves it, yet the description supplies no AUC, NSS, KL, or other direct comparison numbers between the two on the same stimuli. The stress-test concern lands here: cursor-plus-blur does not automatically reproduce saccades, covert shifts, or the fact that mouse control is an active motor task. Without those side-by-side metrics in the full text, it is hard to know whether the approximation is close enough for downstream use or whether the DNN is mostly correcting for mouse-specific artifacts. This is aimed at computer-vision labs that need more video saliency training data and are willing to trade some precision for scale. A reader already working on attention datasets or cheap annotation pipelines would get the most out of it. I would send it to peer review if the full manuscript contains the missing quantitative comparisons and they are at least in the ballpark of existing eye-tracking baselines; otherwise it stays preliminary.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces a mouse-contingent video viewing system that applies peripheral blur based on mouse cursor position to enable collection of crowdsourced mouse-tracking data as a low-cost proxy for eye-tracking saliency maps on videos. It asserts that the collected mouse data approximates eye-tracking data and proposes a novel DNN to improve the quality of the resulting saliency maps.

Significance. If the mouse-to-eye approximation holds with strong quantitative support, the work would be significant for computer vision by enabling scalable, low-cost collection of video saliency data via crowdsourcing, which could expand training sets for saliency prediction models. The crowdsourcing platform itself represents a practical engineering contribution.

major comments (2)

[Abstract] Abstract: the central claim that 'mouse-tracking data ... can serve as an approximation of eye-tracking data' is load-bearing yet unsupported by any reported quantitative metrics (AUC, NSS, KL divergence, or correlation) or direct comparison on identical stimuli; the description supplies no validation details or baselines.
[Abstract] Abstract: the mouse-contingent system is presented as simulating peripheral vision, but the manuscript provides no evidence that cursor-based blur replicates saccadic dynamics, covert attention, or natural gaze trajectories; this untested fidelity is required for the proxy claim to hold.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We agree that the abstract should better support the central claims with quantitative details from the manuscript and will revise it accordingly. We address each point below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'mouse-tracking data ... can serve as an approximation of eye-tracking data' is load-bearing yet unsupported by any reported quantitative metrics (AUC, NSS, KL divergence, or correlation) or direct comparison on identical stimuli; the description supplies no validation details or baselines.

Authors: The abstract is a concise summary; the manuscript reports direct comparisons on identical stimuli with quantitative metrics (AUC, NSS, KL divergence, and correlation) in the experimental results and figures. To address the concern, we will revise the abstract to include key validation metrics and baselines. revision: yes
Referee: [Abstract] Abstract: the mouse-contingent system is presented as simulating peripheral vision, but the manuscript provides no evidence that cursor-based blur replicates saccadic dynamics, covert attention, or natural gaze trajectories; this untested fidelity is required for the proxy claim to hold.

Authors: The system applies cursor-based peripheral blur to enable scalable data collection, with proxy validity shown empirically via saliency map approximation rather than exact replication of saccades or covert attention. We will revise the abstract to clarify the system's design scope and empirical support without overstating fidelity. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical data collection with no derivation chain

full rationale

The paper's core contribution is an empirical crowdsourcing pipeline for mouse-tracking saliency data plus a DNN post-processing step; the abstract and description contain no equations, fitted parameters, or mathematical derivations. Claims rest on direct collection and comparison to eye-tracking, which are externally falsifiable and do not reduce to self-definition or self-citation. No load-bearing uniqueness theorems, ansatzes, or renamed known results appear. This is the normal non-circular case for an applied data-collection study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the work is presented as an empirical engineering contribution.

pith-pipeline@v0.9.0 · 5654 in / 996 out tokens · 21126 ms · 2026-05-25T12:23:21.851100+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 2 internal anchors

[1]

Predicting video saliency using crowdsourced mouse-tracking data

Introduction When watching videos, humans distribute their at- tention unevenly. Some objects in the video may at- tract more attention than the others. This distribu- tion can be represented by per-frame saliency maps deﬁning the importance of each frame region for view- ers. The use of saliency can improve the quality of many video processing applicatio...

work page internal anchor Pith review Pith/arXiv arXiv 1907
[2]

Hereafter we provide a brief overview of these topics

Related work The paper makes a contribution to two topics: cursor-based alternatives to eye tracking and semiau- tomatic saliency modeling. Hereafter we provide a brief overview of these topics. Cursor-based alternatives to eye tracking. There were many eﬀorts to use mouse tracking as a cheap alternative to eye tracking. However, most of these eﬀorts were...

work page
[3]

We show a participant the video in a special video player in real-time in full-screen mode

Cursor-based saliency for video We propose a methodology for high-quality visual- attention estimation based on mouse-tracking data and a system collecting such data using crowdsourc- ing platforms. We show a participant the video in a special video player in real-time in full-screen mode. Input frames Dilated ResNet Conv LSTM Conv 1x1 Spatial features Te...

work page
[4]

The algorithm is based on SAM [11] architecture which was originally designed to predict saliency of static images

Semiautomatic deep neural network To improve saliency maps generated using the cur- sor positions as eye ﬁxations we developed a new neu- ral network algorithm. The algorithm is based on SAM [11] architecture which was originally designed to predict saliency of static images. Though SAM is a static model, its retrained ResNet version can outper- form the ...

work page 2048
[5]

We hired participants on Sub- jectify.us crowdsourcing platform, showed them 10 videos and paid them $0.15 if they watched all videos

Experiments We used our cursor-based saliency system to col- lect mouse-movement data in 12 random videos from Hollywood-2 video saliency dataset [7] that are each 20–30 seconds long. We hired participants on Sub- jectify.us crowdsourcing platform, showed them 10 videos and paid them $0.15 if they watched all videos. In total, we collected data of 30 part...

work page
[6]

We developed a novel system that shows viewers videos in a mouse-contingent video player and collects mouse-tracking data approximat- ing real eye ﬁxations

Conclusion In this paper, we proposed a cheap way of get- ting high-quality saliency maps for video through the use of additional data. We developed a novel system that shows viewers videos in a mouse-contingent video player and collects mouse-tracking data approximat- ing real eye ﬁxations. We showed that mouse-tracking data can be used as an alternative...

work page
[7]

Acknowledgments This work was partially supported by the Russian Foundation for Basic Research under Grant 19-01- 00785 a

work page
[8]

Gitman, M

Y. Gitman, M. Erofeev, D. Vatolin, B. Andrey, and F. Alexey. Semiautomatic visual-attention modeling and its application to video compres- sion. In International Conference on Image Pro- cessing (ICIP), pages 1105–1109, 2014

work page 2014
[9]

T. Lu, Z. Yuan, Y. Huang, D. Wu, and H. Yu. Video retargeting with nonlinear spatial- temporal saliency fusion. In 2010 IEEE Inter- national Conference on Image Processing , pages 1801–1804, 2010

work page 2010
[10]

Borji and L

A. Borji and L. Itti. State-of-the-art in visual at- tention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence , 35(1):185– 207, 2013

work page 2013
[11]

Saliency prediction in the deep learning era: An empirical investigation, 2018

Ali Borji. Saliency prediction in the deep learning era: An empirical investigation, 2018

work page 2018
[12]

A semiauto- matic saliency model and its application to video compression

Vitaliy Lyudvichenko, Mikhail Erofeev, Yury Gitman, and Dmitriy Vatolin. A semiauto- matic saliency model and its application to video compression. In 13th IEEE International Con- ference on Intelligent Computer Communication and Processing, pages 403–410, 2017

work page 2017
[13]

Improv- ing video compression with deep visual-attention models

Vitaliy Lyudvichenko, Mikhail Erofeev, Alexan- der Ploshkin, and Dmitriy Vatolin. Improv- ing video compression with deep visual-attention models. In 2019 International Conference on In- telligent Medicine and Image Processing , 2019

work page 2019
[14]

Mathe and C

S. Mathe and C. Sminchisescu. Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence , pages 1408–1424, 2015

work page 2015
[15]

Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks

Xun Huang, Chengyao Shen, Xavier Boix, and Qi Zhao. Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In 2015 International Conference on Computer Vision, pages 262–270, 2015

work page 2015
[16]

Borkin, Krzysztof Z

Nam Wook Kim, Zoya Bylinskii, Michelle A. Borkin, Krzysztof Z. Gajos, Aude Oliva, Fredo Durand, and Hanspeter Pﬁster. Bubbleview: An interface for crowdsourcing image importance maps and tracking visual attention. ACM Trans. Comput.-Hum. Interact., 24(5):36:1–36:40, 2017

work page 2017
[17]

A benchmark of computational models of saliency to predict human ﬁxations

Tilke Judd, Fr´ edo Durand, and Antonio Tor- ralba. A benchmark of computational models of saliency to predict human ﬁxations. Technical report, Computer Science and Artiﬁcial Intelli- gence Lab, Massachusetts Institute of Technol- ogy, 2012

work page 2012
[18]

Predicting Human Eye Fixations via an LSTM-based Saliency At- tentive Model

Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. Predicting Human Eye Fixations via an LSTM-based Saliency At- tentive Model. IEEE Transactions on Image Pro- cessing, 27(10):5142–5154, 2018

work page 2018
[19]

Spatio-temporal modeling and predic- tion of visual attention in graphical user inter- faces

Pingmei Xu, Yusuke Sugano, and Andreas Bulling. Spatio-temporal modeling and predic- tion of visual attention in graphical user inter- faces. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems , pages 3299–3310, 2016

work page 2016
[20]

Are all the frames equally important? CoRR, abs/1905.07984, 2019

Oleksii Sidorov, Marius Pedersen, Nam Wook Kim, and Sumit Shekhar. Are all the frames equally important? CoRR, abs/1905.07984, 2019

work page arXiv 1905
[21]

Revisiting video sali- ency: A large-scale benchmark and a new model

Wenguan Wang, Jianbing Shen, Fang Guo, Ming- Ming Cheng, and Ali Borji. Revisiting video sali- ency: A large-scale benchmark and a new model. 2018

work page 2018
[22]

Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM

Lai Jiang, Mai Xu, and Zulin Wang. Pre- dicting video saliency with object-to-motion cnn and two-layer convolutional lstm. CoRR, abs/1709.06316, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[23]

Learning to predict where hu- mans look

Tilke Judd, Krista Ehinger, Fr´ edo Durand, and Antonio Torralba. Learning to predict where hu- mans look. In International Conference on Com- puter Vision (ICCV) , pages 2106–2113, 2009. About the authors Vitaliy Lyudvichenko is a Ph.D. student of Com- puter Graphics and Media Lab of Computer Science department of Lomonosov Moscow State University. Dmitr...

work page 2009

[1] [1]

Predicting video saliency using crowdsourced mouse-tracking data

Introduction When watching videos, humans distribute their at- tention unevenly. Some objects in the video may at- tract more attention than the others. This distribu- tion can be represented by per-frame saliency maps deﬁning the importance of each frame region for view- ers. The use of saliency can improve the quality of many video processing applicatio...

work page internal anchor Pith review Pith/arXiv arXiv 1907

[2] [2]

Hereafter we provide a brief overview of these topics

Related work The paper makes a contribution to two topics: cursor-based alternatives to eye tracking and semiau- tomatic saliency modeling. Hereafter we provide a brief overview of these topics. Cursor-based alternatives to eye tracking. There were many eﬀorts to use mouse tracking as a cheap alternative to eye tracking. However, most of these eﬀorts were...

work page

[3] [3]

We show a participant the video in a special video player in real-time in full-screen mode

Cursor-based saliency for video We propose a methodology for high-quality visual- attention estimation based on mouse-tracking data and a system collecting such data using crowdsourc- ing platforms. We show a participant the video in a special video player in real-time in full-screen mode. Input frames Dilated ResNet Conv LSTM Conv 1x1 Spatial features Te...

work page

[4] [4]

The algorithm is based on SAM [11] architecture which was originally designed to predict saliency of static images

Semiautomatic deep neural network To improve saliency maps generated using the cur- sor positions as eye ﬁxations we developed a new neu- ral network algorithm. The algorithm is based on SAM [11] architecture which was originally designed to predict saliency of static images. Though SAM is a static model, its retrained ResNet version can outper- form the ...

work page 2048

[5] [5]

We hired participants on Sub- jectify.us crowdsourcing platform, showed them 10 videos and paid them $0.15 if they watched all videos

Experiments We used our cursor-based saliency system to col- lect mouse-movement data in 12 random videos from Hollywood-2 video saliency dataset [7] that are each 20–30 seconds long. We hired participants on Sub- jectify.us crowdsourcing platform, showed them 10 videos and paid them $0.15 if they watched all videos. In total, we collected data of 30 part...

work page

[6] [6]

We developed a novel system that shows viewers videos in a mouse-contingent video player and collects mouse-tracking data approximat- ing real eye ﬁxations

Conclusion In this paper, we proposed a cheap way of get- ting high-quality saliency maps for video through the use of additional data. We developed a novel system that shows viewers videos in a mouse-contingent video player and collects mouse-tracking data approximat- ing real eye ﬁxations. We showed that mouse-tracking data can be used as an alternative...

work page

[7] [7]

Acknowledgments This work was partially supported by the Russian Foundation for Basic Research under Grant 19-01- 00785 a

work page

[8] [8]

Gitman, M

Y. Gitman, M. Erofeev, D. Vatolin, B. Andrey, and F. Alexey. Semiautomatic visual-attention modeling and its application to video compres- sion. In International Conference on Image Pro- cessing (ICIP), pages 1105–1109, 2014

work page 2014

[9] [9]

T. Lu, Z. Yuan, Y. Huang, D. Wu, and H. Yu. Video retargeting with nonlinear spatial- temporal saliency fusion. In 2010 IEEE Inter- national Conference on Image Processing , pages 1801–1804, 2010

work page 2010

[10] [10]

Borji and L

A. Borji and L. Itti. State-of-the-art in visual at- tention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence , 35(1):185– 207, 2013

work page 2013

[11] [11]

Saliency prediction in the deep learning era: An empirical investigation, 2018

Ali Borji. Saliency prediction in the deep learning era: An empirical investigation, 2018

work page 2018

[12] [12]

A semiauto- matic saliency model and its application to video compression

Vitaliy Lyudvichenko, Mikhail Erofeev, Yury Gitman, and Dmitriy Vatolin. A semiauto- matic saliency model and its application to video compression. In 13th IEEE International Con- ference on Intelligent Computer Communication and Processing, pages 403–410, 2017

work page 2017

[13] [13]

Improv- ing video compression with deep visual-attention models

Vitaliy Lyudvichenko, Mikhail Erofeev, Alexan- der Ploshkin, and Dmitriy Vatolin. Improv- ing video compression with deep visual-attention models. In 2019 International Conference on In- telligent Medicine and Image Processing , 2019

work page 2019

[14] [14]

Mathe and C

S. Mathe and C. Sminchisescu. Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence , pages 1408–1424, 2015

work page 2015

[15] [15]

Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks

Xun Huang, Chengyao Shen, Xavier Boix, and Qi Zhao. Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In 2015 International Conference on Computer Vision, pages 262–270, 2015

work page 2015

[16] [16]

Borkin, Krzysztof Z

Nam Wook Kim, Zoya Bylinskii, Michelle A. Borkin, Krzysztof Z. Gajos, Aude Oliva, Fredo Durand, and Hanspeter Pﬁster. Bubbleview: An interface for crowdsourcing image importance maps and tracking visual attention. ACM Trans. Comput.-Hum. Interact., 24(5):36:1–36:40, 2017

work page 2017

[17] [17]

A benchmark of computational models of saliency to predict human ﬁxations

Tilke Judd, Fr´ edo Durand, and Antonio Tor- ralba. A benchmark of computational models of saliency to predict human ﬁxations. Technical report, Computer Science and Artiﬁcial Intelli- gence Lab, Massachusetts Institute of Technol- ogy, 2012

work page 2012

[18] [18]

Predicting Human Eye Fixations via an LSTM-based Saliency At- tentive Model

Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. Predicting Human Eye Fixations via an LSTM-based Saliency At- tentive Model. IEEE Transactions on Image Pro- cessing, 27(10):5142–5154, 2018

work page 2018

[19] [19]

Spatio-temporal modeling and predic- tion of visual attention in graphical user inter- faces

Pingmei Xu, Yusuke Sugano, and Andreas Bulling. Spatio-temporal modeling and predic- tion of visual attention in graphical user inter- faces. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems , pages 3299–3310, 2016

work page 2016

[20] [20]

Are all the frames equally important? CoRR, abs/1905.07984, 2019

Oleksii Sidorov, Marius Pedersen, Nam Wook Kim, and Sumit Shekhar. Are all the frames equally important? CoRR, abs/1905.07984, 2019

work page arXiv 1905

[21] [21]

Revisiting video sali- ency: A large-scale benchmark and a new model

Wenguan Wang, Jianbing Shen, Fang Guo, Ming- Ming Cheng, and Ali Borji. Revisiting video sali- ency: A large-scale benchmark and a new model. 2018

work page 2018

[22] [22]

Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM

Lai Jiang, Mai Xu, and Zulin Wang. Pre- dicting video saliency with object-to-motion cnn and two-layer convolutional lstm. CoRR, abs/1709.06316, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[23] [23]

Learning to predict where hu- mans look

Tilke Judd, Krista Ehinger, Fr´ edo Durand, and Antonio Torralba. Learning to predict where hu- mans look. In International Conference on Com- puter Vision (ICCV) , pages 2106–2113, 2009. About the authors Vitaliy Lyudvichenko is a Ph.D. student of Com- puter Graphics and Media Lab of Computer Science department of Lomonosov Moscow State University. Dmitr...

work page 2009