Pixel Recurrent Neural Networks

Aaron van den Oord , Nal Kalchbrenner , Koray Kavukcuoglu

Authors on Pith no claims yet

classification 💻 cs.CV cs.LGcs.NE

keywords imagerecurrentdeepimagesmodelnaturalnetworksneural

read the original abstract

Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

WaveNet: A Generative Model for Raw Audio
cs.SD 2016-09 accept novelty 9.0

WaveNet generates realistic raw audio using an autoregressive neural network with dilated convolutions, achieving state-of-the-art naturalness in speech synthesis for English and Mandarin.
Density estimation using Real NVP
cs.LG 2016-05 accept novelty 8.0

Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
High-Resolution Image Synthesis with Latent Diffusion Models
cs.CV 2021-12 conditional novelty 7.0

Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrai...
Scaling Laws for Autoregressive Generative Modeling
cs.LG 2020-10 accept novelty 7.0

Autoregressive transformers follow power-law scaling laws for cross-entropy loss with nearly universal exponents relating optimal model size to compute budget across four domains.
Generating Long Sequences with Sparse Transformers
cs.LG 2019-04 unverdicted novelty 7.0

Sparse Transformers factorize attention to handle sequences tens of thousands long, achieving new SOTA density modeling on Enwik8, CIFAR-10, and ImageNet-64.
MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model
cs.CV 2026-03 unverdicted novelty 6.0

MPDiT uses a hierarchical multi-patch design in transformers to lower computation in diffusion models by handling coarse global features first then fine local details, plus faster-converging embeddings.
Language Models (Mostly) Know What They Know
cs.CL 2022-07 unverdicted novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment
cs.CL 2021-12 conditional novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance
cs.CV 2026-04 unverdicted novelty 5.0

ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.