Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Kelvin Xu , Jimmy Ba , Ryan Kiros , Kyunghyun Cho , Aaron Courville , Ruslan Salakhutdinov , Richard Zemel , Yoshua Bengio

Authors on Pith no claims yet

classification 💻 cs.LG cs.CV

keywords attentionmodelautomaticallydescribeableattendbackpropagationbenchmark

0 comments

read the original abstract

Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. We also show through visualization how the model is able to automatically learn to fix its gaze on salient objects while generating the corresponding words in the output sequence. We validate the use of attention with state-of-the-art performance on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
cs.CL 2017-05 accept novelty 8.0

TriviaQA is a new large-scale dataset for reading comprehension that features complex compositional questions, high lexical variability, and cross-sentence reasoning requirements, where current baselines reach only 40...
Categorical Reparameterization with Gumbel-Softmax
stat.ML 2016-11 unverdicted novelty 8.0

Gumbel-Softmax provides a continuous relaxation of categorical sampling that anneals to discrete samples for gradient-based optimization.
Heuristic Style Transfer for Real-Time, Efficient Weather Attribute Detection
cs.CV 2026-04 conditional novelty 5.0

Lightweight multi-task models using Gram matrices and PatchGAN-style architectures detect 53 weather classes from RGB images with F1 scores above 96% internally and 78% zero-shot externally, supported by a new 503k-im...