pith. machine review for the scientific record. sign in

arxiv: 1609.03976 · v1 · submitted 2016-09-13 · 💻 cs.CL · cs.NE

Recognition: unknown

Multimodal Attention for Neural Machine Translation

Authors on Pith no claims yet
classification 💻 cs.CL cs.NE
keywords attentionimagemechanismcaptioningcompareddescriptionlanguagemachine
0
0 comments X
read the original abstract

The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultaneously focus over an image and its natural language description for generating a description in another language. We train several variants of our proposed attention mechanism on the Multi30k multilingual image captioning dataset. We show that a dedicated attention for each modality achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT baseline.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Video-guided Machine Translation with Global Video Context

    cs.CV 2026-04 unverdicted novelty 4.0

    A globally video-guided multimodal translation framework retrieves semantically related video segments with a vector database and applies attention mechanisms to improve subtitle translation accuracy in long videos.