pith. sign in

arxiv: 2008.07905 · v3 · pith:2AUMUSOPnew · submitted 2020-08-18 · 💻 cs.CL

Glancing Transformer for Non-Autoregressive Neural Machine Translation

classification 💻 cs.CL
keywords transformertranslationdecodingglancingglatmachinenon-autoregressiveparallel
0
0 comments X
read the original abstract

Recent work on non-autoregressive neural machine translation (NAT) aims at improving the efficiency by parallel decoding without sacrificing the quality. However, existing NAT methods are either inferior to Transformer or require multiple decoding passes, leading to reduced speedup. We propose the Glancing Language Model (GLM), a method to learn word interdependency for single-pass parallel generation models. With GLM, we develop Glancing Transformer (GLAT) for machine translation. With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup. Experiments on multiple WMT language directions show that GLAT outperforms all previous single pass non-autoregressive methods, and is nearly comparable to Transformer, reducing the gap to 0.25-0.9 BLEU points.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

    cs.CL 2024-12 unverdicted novelty 7.0

    o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.