pith. machine review for the scientific record. sign in

arxiv: 1601.06759 · v3 · submitted 2016-01-25 · 💻 cs.CV · cs.LG· cs.NE

Recognition: unknown

Pixel Recurrent Neural Networks

Authors on Pith no claims yet
classification 💻 cs.CV cs.LGcs.NE
keywords imagerecurrentdeepimagesmodelnaturalnetworksneural
0
0 comments X
read the original abstract

Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. WaveNet: A Generative Model for Raw Audio

    cs.SD 2016-09 accept novelty 9.0

    WaveNet generates realistic raw audio using an autoregressive neural network with dilated convolutions, achieving state-of-the-art naturalness in speech synthesis for English and Mandarin.

  2. Density estimation using Real NVP

    cs.LG 2016-05 accept novelty 8.0

    Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.

  3. High-Resolution Image Synthesis with Latent Diffusion Models

    cs.CV 2021-12 conditional novelty 7.0

    Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrai...

  4. Scaling Laws for Autoregressive Generative Modeling

    cs.LG 2020-10 accept novelty 7.0

    Autoregressive transformers follow power-law scaling laws for cross-entropy loss with nearly universal exponents relating optimal model size to compute budget across four domains.

  5. Generating Long Sequences with Sparse Transformers

    cs.LG 2019-04 unverdicted novelty 7.0

    Sparse Transformers factorize attention to handle sequences tens of thousands long, achieving new SOTA density modeling on Enwik8, CIFAR-10, and ImageNet-64.

  6. MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model

    cs.CV 2026-03 unverdicted novelty 6.0

    MPDiT uses a hierarchical multi-patch design in transformers to lower computation in diffusion models by handling coarse global features first then fine local details, plus faster-converging embeddings.

  7. Language Models (Mostly) Know What They Know

    cs.CL 2022-07 unverdicted novelty 6.0

    Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

  8. A General Language Assistant as a Laboratory for Alignment

    cs.CL 2021-12 conditional novelty 6.0

    Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

  9. ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance

    cs.CV 2026-04 unverdicted novelty 5.0

    ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.