Neural Machine Translation in Linear Time

Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, Koray Kavukcuoglu · 2016 · cs.CL · arXiv 1610.10099

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

open full Pith review browse 9 citing papers arXiv PDF

abstract

We present a novel neural network for processing sequences. The ByteNet is a one-dimensional convolutional neural network that is composed of two parts, one to encode the source sequence and the other to decode the target sequence. The two network parts are connected by stacking the decoder on top of the encoder and preserving the temporal resolution of the sequences. To address the differing lengths of the source and the target, we introduce an efficient mechanism by which the decoder is dynamically unfolded over the representation of the encoder. The ByteNet uses dilation in the convolutional layers to increase its receptive field. The resulting network has two core properties: it runs in time that is linear in the length of the sequences and it sidesteps the need for excessive memorization. The ByteNet decoder attains state-of-the-art performance on character-level language modelling and outperforms the previous best results obtained with recurrent networks. The ByteNet also achieves state-of-the-art performance on character-to-character machine translation on the English-to-German WMT translation task, surpassing comparable neural translation models that are based on recurrent networks with attentional pooling and run in quadratic time. We find that the latent alignment structure contained in the representations reflects the expected alignment between the tokens.

citation-role summary

background 3

citation-polarity summary

unclear 2 background 1

representative citing papers

Scalable Memristive-Friendly Reservoir Computing for Time Series Classification

cs.NE · 2026-04-21 · unverdicted · novelty 7.0

MARS parallel reservoirs achieve up to 21x training speedups and outperform LRU, S5, and Mamba on long sequence benchmarks while remaining gradient-free and compact.

Compressive Transformers for Long-Range Sequence Modelling

cs.LG · 2019-11-13 · unverdicted · novelty 6.0

Compressive Transformer sets new records on WikiText-103 (17.1 ppl) and Enwik8 (0.97 bpc) via memory compression and introduces the PG-19 long-range language benchmark.

Revisiting Transformer Layer Parameterization Through Causal Energy Minimization

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

CEM recasts Transformer layers as energy minimization steps, enabling constrained parameterizations like weight sharing and low-rank interactions that match standard baselines in 100M-scale language modeling.

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

cs.LG · 2021-04-27 · accept · novelty 6.0

Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.

Learning to Reformulate the Queries on the WEB

cs.IR · 2019-07-02 · unverdicted · novelty 5.0

An unsupervised character-level CNN encoder with attention-based RNN decoder, trained on Clueweb09 anchor phrases, generates query reformulations that improve retrieval on TREC collections.

Attention Is All You Need

cs.CL · 2017-06-12 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

Hierarchical Sequence to Sequence Voice Conversion with Limited Data

eess.AS · 2019-07-15 · unverdicted · novelty 4.0

Hierarchical seq2seq model for parallel voice conversion pretrained as autoencoder on single-speaker data then adapted to limited multispeaker data, using mel spectrograms converted via wavenet vocoder.

Improving Zero-shot Translation with Language-Independent Constraints

cs.CL · 2019-06-20 · unverdicted · novelty 4.0

Language-independent constraints and regularization in multilingual Transformer NMT yield a 2.23 BLEU average gain on zero-shot pairs from the IWSLT 2017 dataset.

A Survey of Advancing Audio Super-Resolution and Bandwidth Extension from Discriminative to Generative Models

eess.AS · 2026-05-15 · unverdicted · novelty 2.0

A structured survey of audio bandwidth extension that organizes the transition from deterministic discriminative DNNs to generative approaches including GANs, diffusion models, and flow-based methods.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Hierarchical Sequence to Sequence Voice Conversion with Limited Data eess.AS · 2019-07-15 · unverdicted · none · ref 59 · internal anchor
Hierarchical seq2seq model for parallel voice conversion pretrained as autoencoder on single-speaker data then adapted to limited multispeaker data, using mel spectrograms converted via wavenet vocoder.
A Survey of Advancing Audio Super-Resolution and Bandwidth Extension from Discriminative to Generative Models eess.AS · 2026-05-15 · unverdicted · none · ref 26 · internal anchor
A structured survey of audio bandwidth extension that organizes the transition from deterministic discriminative DNNs to generative approaches including GANs, diffusion models, and flow-based methods.

Neural Machine Translation in Linear Time

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer