Pixel recurrent neural networks

Koray Kavukcuoglu · 2016 · cs.CV · arXiv 1601.06759

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

open full Pith review browse 9 citing papers arXiv PDF

abstract

Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.

representative citing papers

WaveNet: A Generative Model for Raw Audio

cs.SD · 2016-09-12 · accept · novelty 9.0

WaveNet generates realistic raw audio using an autoregressive neural network with dilated convolutions, achieving state-of-the-art naturalness in speech synthesis for English and Mandarin.

Density estimation using Real NVP

cs.LG · 2016-05-27 · accept · novelty 8.0

Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.

High-Resolution Image Synthesis with Latent Diffusion Models

cs.CV · 2021-12-20 · conditional · novelty 7.0

Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and

Scaling Laws for Autoregressive Generative Modeling

cs.LG · 2020-10-28 · accept · novelty 7.0

Autoregressive transformers follow power-law scaling laws for cross-entropy loss with nearly universal exponents relating optimal model size to compute budget across four domains.

Generating Long Sequences with Sparse Transformers

cs.LG · 2019-04-23 · unverdicted · novelty 7.0

Sparse Transformers factorize attention to handle sequences tens of thousands long, achieving new SOTA density modeling on Enwik8, CIFAR-10, and ImageNet-64.

MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model

cs.CV · 2026-03-27 · unverdicted · novelty 6.0

MPDiT uses a hierarchical multi-patch design in transformers to lower computation in diffusion models by handling coarse global features first then fine local details, plus faster-converging embeddings.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

A General Language Assistant as a Laboratory for Alignment

cs.CL · 2021-12-01 · conditional · novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance

cs.CV · 2026-04-29 · unverdicted · novelty 5.0

ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.

citing papers explorer

Showing 9 of 9 citing papers.

WaveNet: A Generative Model for Raw Audio cs.SD · 2016-09-12 · accept · none · ref 50
WaveNet generates realistic raw audio using an autoregressive neural network with dilated convolutions, achieving state-of-the-art naturalness in speech synthesis for English and Mandarin.
Density estimation using Real NVP cs.LG · 2016-05-27 · accept · none · ref 46
Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
High-Resolution Image Synthesis with Latent Diffusion Models cs.CV · 2021-12-20 · conditional · none · ref 95
Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and
Scaling Laws for Autoregressive Generative Modeling cs.LG · 2020-10-28 · accept · none · ref 26
Autoregressive transformers follow power-law scaling laws for cross-entropy loss with nearly universal exponents relating optimal model size to compute budget across four domains.
Generating Long Sequences with Sparse Transformers cs.LG · 2019-04-23 · unverdicted · none · ref 17
Sparse Transformers factorize attention to handle sequences tens of thousands long, achieving new SOTA density modeling on Enwik8, CIFAR-10, and ImageNet-64.
MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model cs.CV · 2026-03-27 · unverdicted · none · ref 45 · internal anchor
MPDiT uses a hierarchical multi-patch design in transformers to lower computation in diffusion models by handling coarse global features first then fine local details, plus faster-converging embeddings.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 117
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 59
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance cs.CV · 2026-04-29 · unverdicted · none · ref 31
ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.

Pixel recurrent neural networks

fields

years

verdicts

representative citing papers

citing papers explorer