pith. machine review for the scientific record. sign in

arxiv: 1410.1090 · v1 · submitted 2014-10-04 · 💻 cs.CV · cs.CL· cs.LG

Recognition: unknown

Explain Images with Multimodal Recurrent Neural Networks

Authors on Pith no claims yet
classification 💻 cs.CV cs.CLcs.LG
keywords modelimagesm-rnnmultimodalnetworkneuralrecurrentdeep
0
0 comments X
read the original abstract

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models the probability distribution of generating a word given previous words and the image. Image descriptions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on three benchmark datasets (IAPR TC-12, Flickr 8K, and Flickr 30K). Our model outperforms the state-of-the-art generative method. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Microsoft COCO Captions: Data Collection and Evaluation Server

    cs.CV 2015-04 accept novelty 6.0

    Microsoft COCO Captions provides 1.5 million human captions across 330,000 images and a public server to evaluate captioning models with BLEU, METEOR, ROUGE, and CIDEr.