pith. sign in

hub

Non-Autoregressive Neural Machine Translation

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it
abstract

Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference. Through knowledge distillation, the use of input token fertilities as a latent variable, and policy gradient fine-tuning, we achieve this at a cost of as little as 2.0 BLEU points relative to the autoregressive Transformer network used as a teacher. We demonstrate substantial cumulative improvements associated with each of the three aspects of our training strategy, and validate our approach on IWSLT 2016 English-German and two WMT language pairs. By sampling fertilities in parallel at inference time, our non-autoregressive model achieves near-state-of-the-art performance of 29.8 BLEU on WMT 2016 English-Romanian.

hub tools

citation-role summary

background 1 method 1

citation-polarity summary

representative citing papers

Discrete Stochastic Localization for Non-autoregressive Generation

cs.LG · 2026-02-18 · unverdicted · novelty 7.0

Discrete Stochastic Localization lets a single trained network support an entire family of per-token SNR paths for discrete sequence generation, with masked diffusion as a special case, and improves MAUVE scores when fine-tuning pretrained checkpoints.

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation

cs.HC · 2026-05-11 · unverdicted · novelty 7.0

HapticLDM is the first latent diffusion model that generates vibrotactile signals directly from text, using dynamic text curation and global denoising to improve realism and semantic alignment over autoregressive baselines.

Continuous diffusion for categorical data

cs.CL · 2022-11-28 · unverdicted · novelty 5.0

The paper proposes CDCD, a continuous-time and continuous-space diffusion framework for categorical data, and reports results on language modeling tasks.

Attending to Emotional Narratives

cs.LG · 2019-07-08 · unverdicted · novelty 4.0

Transformer and Memory Fusion Network attention mechanisms generalize to multimodal time-series emotion recognition on emotional autobiographical narratives, achieving performance comparable to human raters in some cases.

Sequence Generation: From Both Sides to the Middle

cs.CL · 2019-06-23 · unverdicted · novelty 4.0

SBSG model generates sequences bidirectionally from ends to middle via interactive attention, claiming faster decoding and better quality than autoregressive Transformer on NMT and summarization tasks.

citing papers explorer

Showing 14 of 14 citing papers.