hub

Non-Autoregressive Neural Machine Translation

Jiatao Gu, James Bradbury, Caiming Xiong, Victor OK Li, Richard Socher · 2017 · cs.CL · arXiv 1711.02281

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

open full Pith review browse 15 citing papers arXiv PDF

abstract

Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference. Through knowledge distillation, the use of input token fertilities as a latent variable, and policy gradient fine-tuning, we achieve this at a cost of as little as 2.0 BLEU points relative to the autoregressive Transformer network used as a teacher. We demonstrate substantial cumulative improvements associated with each of the three aspects of our training strategy, and validate our approach on IWSLT 2016 English-German and two WMT language pairs. By sampling fertilities in parallel at inference time, our non-autoregressive model achieves near-state-of-the-art performance of 29.8 BLEU on WMT 2016 English-Romanian.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Bastion: Budget-Aware Speculative Decoding with Tree-structured Block Diffusion Drafting

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

BASTION is a budget-aware speculative decoding framework with adaptive tree-structured block diffusion drafting that reports up to 6.61x speedup and 39% improvement over block-diffusion baselines.

Discrete Stochastic Localization for Non-autoregressive Generation

cs.LG · 2026-02-18 · unverdicted · novelty 7.0

Discrete Stochastic Localization lets a single trained network support an entire family of per-token SNR paths for discrete sequence generation, with masked diffusion as a special case, and improves MAUVE scores when fine-tuning pretrained checkpoints.

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation

cs.HC · 2026-05-11 · unverdicted · novelty 7.0

HapticLDM is the first latent diffusion model that generates vibrotactile signals directly from text, using dynamic text curation and global denoising to improve realism and semantic alignment over autoregressive baselines.

PlayGen-MoG: Framework for Diverse Multi-Agent Play Generation via Mixture-of-Gaussians Trajectory Prediction

cs.CV · 2026-04-02 · unverdicted · novelty 7.0

PlayGen-MoG uses a shared Mixture-of-Gaussians head across agents plus relative attention to generate diverse coordinated plays from a single static formation, achieving 1.68 yard ADE and 3.98 yard FDE with full mixture utilization on football data.

Which Tokens Need Context? A Reference-Based Analysis of Translation Responsibility Using Fertility and Entropy

cs.CL · 2026-06-28 · unverdicted · novelty 6.0

A post-hoc framework using fertility and entropy from word alignments on reference translations shows context redistributes responsibility to context tokens for function words but not content words across three language pairs.

Posterior Refinement: Fast Language Generation via Any-Order Flow Maps

cs.CL · 2026-06-23 · unverdicted · novelty 6.0

FMLM+ with Posterior Refinement bridges masked diffusion and flow map models to match discrete baseline quality in language generation using 32x fewer neural function evaluations via posterior scoring and refinement.

Flow Map Language Models: One-step Language Modeling via Continuous Denoising

cs.CL · 2026-02-18 · conditional · novelty 6.0 · 2 refs

Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.

Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation

cs.CL · 2019-06-22 · unverdicted · novelty 6.0

Reinforce-NAT and FS-decoder retrieve target sequential information for non-autoregressive translation, yielding higher BLEU than baseline NAT while preserving fast decoding and approaching autoregressive quality.

BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.

Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts

cs.CL · 2026-05-08 · conditional · novelty 6.0

Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.

VaaWIT: Visual-Aware Adaptation of Large Language Models for Multilingual Web Image Translation

cs.CV · 2026-05-23 · unverdicted · novelty 5.0

VaaWIT proposes DSAM and VAA modules to adapt LLMs for multilingual web image translation, claiming outperformance over open-source baselines on benchmarks.

Continuous diffusion for categorical data

cs.CL · 2022-11-28 · unverdicted · novelty 5.0

The paper proposes CDCD, a continuous-time and continuous-space diffusion framework for categorical data, and reports results on language modeling tasks.

Attending to Emotional Narratives

cs.LG · 2019-07-08 · unverdicted · novelty 4.0

Transformer and Memory Fusion Network attention mechanisms generalize to multimodal time-series emotion recognition on emotional autobiographical narratives, achieving performance comparable to human raters in some cases.

Sequence Generation: From Both Sides to the Middle

cs.CL · 2019-06-23 · unverdicted · novelty 4.0

SBSG model generates sequences bidirectionally from ends to middle via interactive attention, claiming faster decoding and better quality than autoregressive Transformer on NMT and summarization tasks.

citing papers explorer

Showing 15 of 15 citing papers.

Bastion: Budget-Aware Speculative Decoding with Tree-structured Block Diffusion Drafting cs.LG · 2026-05-28 · unverdicted · none · ref 25 · internal anchor
BASTION is a budget-aware speculative decoding framework with adaptive tree-structured block diffusion drafting that reports up to 6.61x speedup and 39% improvement over block-diffusion baselines.
Discrete Stochastic Localization for Non-autoregressive Generation cs.LG · 2026-02-18 · unverdicted · none · ref 6 · internal anchor
Discrete Stochastic Localization lets a single trained network support an entire family of per-token SNR paths for discrete sequence generation, with masked diffusion as a special case, and improves MAUVE scores when fine-tuning pretrained checkpoints.
Massive Activations in Large Language Models cs.CL · 2024-02-27 · unverdicted · none · ref 35 · internal anchor
Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation cs.HC · 2026-05-11 · unverdicted · none · ref 51
HapticLDM is the first latent diffusion model that generates vibrotactile signals directly from text, using dynamic text curation and global denoising to improve realism and semantic alignment over autoregressive baselines.
PlayGen-MoG: Framework for Diverse Multi-Agent Play Generation via Mixture-of-Gaussians Trajectory Prediction cs.CV · 2026-04-02 · unverdicted · none · ref 4
PlayGen-MoG uses a shared Mixture-of-Gaussians head across agents plus relative attention to generate diverse coordinated plays from a single static formation, achieving 1.68 yard ADE and 3.98 yard FDE with full mixture utilization on football data.
Which Tokens Need Context? A Reference-Based Analysis of Translation Responsibility Using Fertility and Entropy cs.CL · 2026-06-28 · unverdicted · none · ref 13 · internal anchor
A post-hoc framework using fertility and entropy from word alignments on reference translations shows context redistributes responsibility to context tokens for function words but not content words across three language pairs.
Posterior Refinement: Fast Language Generation via Any-Order Flow Maps cs.CL · 2026-06-23 · unverdicted · none · ref 4 · internal anchor
FMLM+ with Posterior Refinement bridges masked diffusion and flow map models to match discrete baseline quality in language generation using 32x fewer neural function evaluations via posterior scoring and refinement.
Flow Map Language Models: One-step Language Modeling via Continuous Denoising cs.CL · 2026-02-18 · conditional · none · ref 26 · 2 links · internal anchor
Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.
Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation cs.CL · 2019-06-22 · unverdicted · none · ref 7 · internal anchor
Reinforce-NAT and FS-decoder retrieve target sequential information for non-autoregressive translation, yielding higher BLEU than baseline NAT while preserving fast decoding and approaching autoregressive quality.
BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion cs.CL · 2026-05-12 · unverdicted · none · ref 12
BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.
Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts cs.CL · 2026-05-08 · conditional · none · ref 10
Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.
VaaWIT: Visual-Aware Adaptation of Large Language Models for Multilingual Web Image Translation cs.CV · 2026-05-23 · unverdicted · none · ref 11 · internal anchor
VaaWIT proposes DSAM and VAA modules to adapt LLMs for multilingual web image translation, claiming outperformance over open-source baselines on benchmarks.
Continuous diffusion for categorical data cs.CL · 2022-11-28 · unverdicted · none · ref 24 · internal anchor
The paper proposes CDCD, a continuous-time and continuous-space diffusion framework for categorical data, and reports results on language modeling tasks.
Attending to Emotional Narratives cs.LG · 2019-07-08 · unverdicted · none · ref 23 · internal anchor
Transformer and Memory Fusion Network attention mechanisms generalize to multimodal time-series emotion recognition on emotional autobiographical narratives, achieving performance comparable to human raters in some cases.
Sequence Generation: From Both Sides to the Middle cs.CL · 2019-06-23 · unverdicted · none · ref 5 · internal anchor
SBSG model generates sequences bidirectionally from ends to middle via interactive attention, claiming faster decoding and better quality than autoregressive Transformer on NMT and summarization tasks.

Non-Autoregressive Neural Machine Translation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer