Glancing Transformer for Non-Autoregressive Neural Machine Translation

Hao Zhou; Lei Li; Lihua Qian; Lin Qiu; Mingxuan Wang; Weinan Zhang; Yong Yu; Yu Bao

arxiv: 2008.07905 · v3 · pith:2AUMUSOPnew · submitted 2020-08-18 · 💻 cs.CL

Glancing Transformer for Non-Autoregressive Neural Machine Translation

Lihua Qian , Hao Zhou , Yu Bao , Mingxuan Wang , Lin Qiu , Weinan Zhang , Yong Yu , Lei Li This is my paper

classification 💻 cs.CL

keywords transformertranslationdecodingglancingglatmachinenon-autoregressiveparallel

0 comments

read the original abstract

Recent work on non-autoregressive neural machine translation (NAT) aims at improving the efficiency by parallel decoding without sacrificing the quality. However, existing NAT methods are either inferior to Transformer or require multiple decoding passes, leading to reduced speedup. We propose the Glancing Language Model (GLM), a method to learn word interdependency for single-pass parallel generation models. With GLM, we develop Glancing Transformer (GLAT) for machine translation. With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup. Experiments on multiple WMT language directions show that GLAT outperforms all previous single pass non-autoregressive methods, and is nearly comparable to Transformer, reducing the gap to 0.25-0.9 BLEU points.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
cs.CL 2024-12 unverdicted novelty 7.0

o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.