SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

Cristian Canton Ferrer; Elisa Sayrol; Jordi Torres; Junting Pan; Kevin McGuinness; Noel E. O'Connor; Xavier Giro-i-Nieto

arxiv: 1701.01081 · v3 · pith:NPIDZ47Onew · submitted 2017-01-04 · 💻 cs.CV

SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

Junting Pan , Cristian Canton Ferrer , Kevin McGuinness , Noel E. O'Connor , Jordi Torres , Elisa Sayrol , Xavier Giro-i-Nieto This is my paper

classification 💻 cs.CV

keywords saliencyadversarialnetworkpredictiontrainedbinarygenerativeloss

0 comments

read the original abstract

We introduce SalGAN, a deep convolutional neural network for visual saliency prediction trained with adversarial examples. The first stage of the network consists of a generator model whose weights are learned by back-propagation computed from a binary cross entropy (BCE) loss over downsampled versions of the saliency maps. The resulting prediction is processed by a discriminator network trained to solve a binary classification task between the saliency maps generated by the generative stage and the ground truth ones. Our experiments show how adversarial training allows reaching state-of-the-art performance across different metrics when combined with a widely-used loss function like BCE. Our results can be reproduced with the source code and trained models available at https://imatge-upc.github.io/saliency-salgan-2017/.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Exploring deep learning for Event-Based Saliency Prediction with a Transformer-based model
cs.CV 2026-05 unverdicted novelty 8.0

SEST is the first deep learning model for event-based saliency prediction, using a pretrained Swin Transformer backbone and synthetic benchmarks to outperform prior event methods while transferring to real event streams.
Believe It or Not, We Know What You Are Looking at!
cs.CV 2019-07 unverdicted novelty 6.0

A two-stage CNN model generates multi-scale gaze direction fields before regressing gaze heatmaps and introduces a new video dataset annotated by in-scene observers, outperforming prior methods.
Deep Saliency Models : The Quest For The Loss Function
cs.CV 2019-07 conditional novelty 6.0

Varying and combining loss functions in deep visual saliency prediction models produces significant performance gains on fixed architectures that hold across datasets and networks.
Data-centric Design of Learning-based Surgical Gaze Perception Models in Multi-Task Simulation
cs.RO 2026-02 unverdicted novelty 4.0

Introduces a multi-task surgical gaze dataset comparing active execution versus passive viewing and novice versus intermediate expertise, showing passive novice labels approximate intermediate active attention with li...
Simple vs complex temporal recurrences for video saliency prediction
cs.CV 2019-07 unverdicted novelty 4.0

Both ConvLSTM and exponential moving average modifications to a static saliency model achieve state-of-the-art video saliency prediction on DHF1K after SALICON pre-training and yield similar maps.
Learning Where to Look While Tracking Instruments in Robot-assisted Surgery
cs.CV 2019-06 unverdicted novelty 4.0

An end-to-end multitask model with shared encoder, separate decoders, batch-Wasserstein loss, and soft attention module reports better performance than prior segmentation and saliency methods on the MICCAI robotic ins...