MARS parallel reservoirs achieve up to 21x training speedups and outperform LRU, S5, and Mamba on long sequence benchmarks while remaining gradient-free and compact.
Neural Machine Translation in Linear Time
9 Pith papers cite this work. Polarity classification is still indexing.
abstract
We present a novel neural network for processing sequences. The ByteNet is a one-dimensional convolutional neural network that is composed of two parts, one to encode the source sequence and the other to decode the target sequence. The two network parts are connected by stacking the decoder on top of the encoder and preserving the temporal resolution of the sequences. To address the differing lengths of the source and the target, we introduce an efficient mechanism by which the decoder is dynamically unfolded over the representation of the encoder. The ByteNet uses dilation in the convolutional layers to increase its receptive field. The resulting network has two core properties: it runs in time that is linear in the length of the sequences and it sidesteps the need for excessive memorization. The ByteNet decoder attains state-of-the-art performance on character-level language modelling and outperforms the previous best results obtained with recurrent networks. The ByteNet also achieves state-of-the-art performance on character-to-character machine translation on the English-to-German WMT translation task, surpassing comparable neural translation models that are based on recurrent networks with attentional pooling and run in quadratic time. We find that the latent alignment structure contained in the representations reflects the expected alignment between the tokens.
citation-role summary
citation-polarity summary
roles
background 3representative citing papers
Compressive Transformer sets new records on WikiText-103 (17.1 ppl) and Enwik8 (0.97 bpc) via memory compression and introduces the PG-19 long-range language benchmark.
CEM recasts Transformer layers as energy minimization steps, enabling constrained parameterizations like weight sharing and low-rank interactions that match standard baselines in 100M-scale language modeling.
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
An unsupervised character-level CNN encoder with attention-based RNN decoder, trained on Clueweb09 anchor phrases, generates query reformulations that improve retrieval on TREC collections.
Pith review generated a malformed one-line summary.
Hierarchical seq2seq model for parallel voice conversion pretrained as autoencoder on single-speaker data then adapted to limited multispeaker data, using mel spectrograms converted via wavenet vocoder.
Language-independent constraints and regularization in multilingual Transformer NMT yield a 2.23 BLEU average gain on zero-shot pairs from the IWSLT 2017 dataset.
A structured survey of audio bandwidth extension that organizes the transition from deterministic discriminative DNNs to generative approaches including GANs, diffusion models, and flow-based methods.
citing papers explorer
-
Hierarchical Sequence to Sequence Voice Conversion with Limited Data
Hierarchical seq2seq model for parallel voice conversion pretrained as autoencoder on single-speaker data then adapted to limited multispeaker data, using mel spectrograms converted via wavenet vocoder.
-
A Survey of Advancing Audio Super-Resolution and Bandwidth Extension from Discriminative to Generative Models
A structured survey of audio bandwidth extension that organizes the transition from deterministic discriminative DNNs to generative approaches including GANs, diffusion models, and flow-based methods.