Learning language through pictures

Afra Alishahi; \'Akos K\'ad\'ar; Grzegorz Chrupa{\l}a

arxiv: 1506.03694 · v2 · pith:RJJSC4HZnew · submitted 2015-06-11 · 💻 cs.CL

Learning language through pictures

Grzegorz Chrupa{\l}a , \'Akos K\'ad\'ar , Afra Alishahi This is my paper

classification 💻 cs.CL

keywords languagelearningvisualmodelrepresentationstextualwordacquires

0 comments

read the original abstract

We propose Imaginet, a model of learning visually grounded representations of language from coupled textual and visual input. The model consists of two Gated Recurrent Unit networks with shared word embeddings, and uses a multi-task objective by receiving a textual description of a scene and trying to concurrently predict its visual representation and the next word in the sentence. Mimicking an important aspect of human language learning, it acquires meaning representations for individual words from descriptions of visual scenes. Moreover, it learns to effectively use sequential structure in semantic interpretation of multi-word phrases.

This paper has not been read by Pith yet.

Learning language through pictures

discussion (0)