SalGAN: Visual Saliency Prediction with Generative Adversarial Networks
read the original abstract
We introduce SalGAN, a deep convolutional neural network for visual saliency prediction trained with adversarial examples. The first stage of the network consists of a generator model whose weights are learned by back-propagation computed from a binary cross entropy (BCE) loss over downsampled versions of the saliency maps. The resulting prediction is processed by a discriminator network trained to solve a binary classification task between the saliency maps generated by the generative stage and the ground truth ones. Our experiments show how adversarial training allows reaching state-of-the-art performance across different metrics when combined with a widely-used loss function like BCE. Our results can be reproduced with the source code and trained models available at https://imatge-upc.github.io/saliency-salgan-2017/.
This paper has not been read by Pith yet.
Forward citations
Cited by 6 Pith papers
-
Exploring deep learning for Event-Based Saliency Prediction with a Transformer-based model
SEST is the first deep learning model for event-based saliency prediction, using a pretrained Swin Transformer backbone and synthetic benchmarks to outperform prior event methods while transferring to real event streams.
-
Believe It or Not, We Know What You Are Looking at!
A two-stage CNN model generates multi-scale gaze direction fields before regressing gaze heatmaps and introduces a new video dataset annotated by in-scene observers, outperforming prior methods.
-
Deep Saliency Models : The Quest For The Loss Function
Varying and combining loss functions in deep visual saliency prediction models produces significant performance gains on fixed architectures that hold across datasets and networks.
-
Data-centric Design of Learning-based Surgical Gaze Perception Models in Multi-Task Simulation
Introduces a multi-task surgical gaze dataset comparing active execution versus passive viewing and novice versus intermediate expertise, showing passive novice labels approximate intermediate active attention with li...
-
Simple vs complex temporal recurrences for video saliency prediction
Both ConvLSTM and exponential moving average modifications to a static saliency model achieve state-of-the-art video saliency prediction on DHF1K after SALICON pre-training and yield similar maps.
-
Learning Where to Look While Tracking Instruments in Robot-assisted Surgery
An end-to-end multitask model with shared encoder, separate decoders, batch-Wasserstein loss, and soft attention module reports better performance than prior segmentation and saliency methods on the MICCAI robotic ins...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.