Theory and Experiments on Vector Quantized Autoencoders

Arvind Neelakantan; Ashish Vaswani; Aurko Roy; Niki Parmar

arxiv: 1805.11063 · v2 · pith:YS6ECKWRnew · submitted 2018-05-28 · 💻 cs.LG · stat.ML

Theory and Experiments on Vector Quantized Autoencoders

Aurko Roy , Ashish Vaswani , Arvind Neelakantan , Niki Parmar This is my paper

classification 💻 cs.LG stat.ML

keywords discretelatenttrainingalmostautoencodersbettercifar-10models

0 comments

read the original abstract

Deep neural networks with discrete latent variables offer the promise of better symbolic reasoning, and learning abstractions that are more useful to new tasks. There has been a surge in interest in discrete latent variable models, however, despite several recent improvements, the training of discrete latent variable models has remained challenging and their performance has mostly failed to match their continuous counterparts. Recent work on vector quantized autoencoders (VQ-VAE) has made substantial progress in this direction, with its perplexity almost matching that of a VAE on datasets such as CIFAR-10. In this work, we investigate an alternate training technique for VQ-VAE, inspired by its connection to the Expectation Maximization (EM) algorithm. Training the discrete bottleneck with EM helps us achieve better image generation results on CIFAR-10, and together with knowledge distillation, allows us to develop a non-autoregressive machine translation model whose accuracy almost matches a strong greedy autoregressive baseline Transformer, while being 3.3 times faster at inference.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Finite Scalar Quantization: VQ-VAE Made Simple
cs.CV 2023-09 conditional novelty 7.0

Finite scalar quantization simplifies VQ-VAE latents by independently rounding a few dimensions to fixed levels, producing an equivalent-sized implicit codebook with competitive performance and no collapse.
Two-Dimensional Quantization for Geometry-Aware Audio Coding
cs.SD 2025-12 unverdicted novelty 6.0

Q2D2 uses 2D geometric grid projections to quantize feature pairs in neural audio codecs, yielding implicit codebooks that improve efficiency and utilization over RVQ, VQ, and FSQ while maintaining reconstruction quality.
Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges
cs.CL 2019-07 unverdicted novelty 5.0

A single multilingual NMT model for 103 languages trained on 25B examples demonstrates transfer learning benefits for low-resource languages.