pith. machine review for the scientific record. sign in

arxiv: 1508.04025 · v5 · submitted 2015-08-17 · 💻 cs.CL

Recognition: unknown

Effective Approaches to Attention-based Neural Machine Translation

Christopher D. Manning, Hieu Pham, Minh-Thang Luong

classification 💻 cs.CL
keywords translationbleupointssourceapproachesarchitecturesattentionattention-based
0
0 comments X
read the original abstract

An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches over the WMT translation tasks between English and German in both directions. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems which already incorporate known techniques such as dropout. Our ensemble model using different attention architectures has established a new state-of-the-art result in the WMT'15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Selective Contrastive Learning For Gloss Free Sign Language Translation

    cs.CL 2026-04 unverdicted novelty 7.0

    A pair selection strategy based on negative similarity dynamics strengthens contrastive supervision in gloss-free sign language translation by reducing noisy negatives.

  2. Jet Quenching Identification via Supervised Learning in Simulated Heavy-Ion Collisions

    hep-ph 2026-04 unverdicted novelty 6.0

    Sequential machine learning on jet declustering history trees outperforms static models at identifying jet quenching in heavy-ion collision simulations.

  3. Attention U-Net: Learning Where to Look for the Pancreas

    cs.CV 2018-04 unverdicted novelty 6.0

    Attention gates added to U-Net automatically focus on target organs in CT images and improve segmentation performance on abdominal datasets.

  4. Attention Is All You Need

    cs.CL 2017-06 unverdicted novelty 5.0

    Pith review generated a malformed one-line summary.

  5. Resource-Efficient CSI Prediction: A Gated Fusion and Factorized Projection Approach

    eess.SP 2026-05 unverdicted novelty 4.0

    A gated-fusion CSI predictor using GRU, attention, and DSLH reaches -13.84 dB NMSE with 26% fewer parameters and 2.3x higher throughput than a LinFormer baseline on 3GPP channels.

  6. Skeleton-based Coherence Modeling in Narratives

    cs.CL 2026-04 unverdicted novelty 4.0

    Sentence-level models outperform skeleton-based approaches for narrative coherence despite a new SSN network improving on cosine and Euclidean baselines.

  7. Gemma 2: Improving Open Language Models at a Practical Size

    cs.CL 2024-07 conditional novelty 3.0

    Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.