arxiv: 1605.08803 · v3 · submitted 2016-05-27 · 💻 cs.LG · cs.AI· cs.NE· stat.ML

Recognition: 3 theorem links

· Lean Theorem

Density estimation using Real NVP

Jascha Sohl-Dickstein, Laurent Dinh, Samy Bengio

Pith reviewed 2026-05-11 23:49 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NEstat.ML

keywords density estimationreal NVPinvertible transformationsunsupervised learninggenerative modelsnatural imagesexact likelihoodlatent space

0 comments

The pith

Real NVP transformations provide invertible mappings that make density estimation tractable with exact likelihood computation, sampling, and latent inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces real-valued non-volume preserving transformations, called real NVP, to expand the class of usable probabilistic models for unsupervised learning. These transformations are designed to be invertible and learnable, so that the resulting models support exact log-likelihood evaluation, exact sampling from the model, exact recovery of latent variables, and an interpretable latent space. The authors apply the method to natural images and evaluate it through generated samples, likelihood scores, and direct manipulation of the latent variables on four datasets. A sympathetic reader cares because most high-dimensional density estimators previously required approximations that made some of these operations intractable or biased.

Core claim

We extend the space of such models using real-valued non-volume preserving (real NVP) transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space. We demonstrate its ability to model natural images on four datasets through sampling, log-likelihood evaluation and latent variable manipulations.

What carries the argument

real NVP transformations built from stacked affine coupling layers whose scale and translation functions are parameterized by neural networks, allowing the Jacobian determinant to be computed in closed form.

If this is right

Any data point can be assigned an exact probability under the learned distribution.
New samples are obtained by drawing from a simple base distribution and applying the inverse transformation.
Latent codes for observed images are recovered exactly rather than approximated.
The latent space supports direct arithmetic operations that produce semantically meaningful changes in the generated images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same coupling-layer construction could be adapted to sequential or graph-structured data if the conditioner networks are replaced by appropriate architectures.
Exact inference removes the need for variational bounds, which may simplify training objectives in other generative settings.
Because the transformations are volume-preserving up to a known factor, they might be combined with other invertible flows to trade off expressivity against computational cost.

Load-bearing premise

The neural-network-parameterized affine coupling layers are expressive enough to capture the structure of natural images without needing impractically many layers.

What would settle it

If a real NVP model trained on the same image datasets produces samples that bear no visual resemblance to the data or reports log-likelihood values far below those of other published density estimators, the practical utility claim would be refuted.

read the original abstract

Unsupervised learning of probabilistic models is a central yet challenging problem in machine learning. Specifically, designing models with tractable learning, sampling, inference and evaluation is crucial in solving this task. We extend the space of such models using real-valued non-volume preserving (real NVP) transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space. We demonstrate its ability to model natural images on four datasets through sampling, log-likelihood evaluation and latent variable manipulations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Real NVP adds non-volume-preserving affine couplings to invertible flows, delivering exact likelihoods and sampling on image data with a clean derivation and supporting experiments.

read the letter

Real NVP gives a workable way to do density estimation with exact likelihoods and sampling by stacking invertible coupling layers. The new part is the real-valued non-volume preserving transformations using affine couplings where the scale and translation are functions of the other half of the variables. This makes the Jacobian determinant just the product of the scales, which is cheap to compute, and the inverse is simple to run. That delivers exact log-likelihood via change of variables, exact sampling by inverting, and exact latent inference. The multi-scale version they use helps with the high-dimensional image data. The paper handles this cleanly in the derivation and backs it up with numbers on four image datasets: CIFAR-10, ImageNet 32x32, LSUN, CelebA. They get competitive likelihoods and decent samples, plus some examples of manipulating the latent space to show interpretability. The main soft spot is whether these particular neural net parameterizations for the scale and translation functions scale to harder problems without getting too deep or wide. The experiments show it works here, but that's the assumption carrying the load. The multi-scale architecture helps but feels a bit engineered rather than derived from first principles. No issues with circularity or unsupported claims though. This paper is for anyone working on unsupervised density models or generative flows. If you're already following NICE or similar invertible models, you'll see the direct extension and the practical gains. A reader who wants to implement or build on exact-likelihood flows will find the details useful. I'd send it to peer review. The core idea is sound and the evaluation is honest enough to be useful for the field.

Referee Report

0 major / 3 minor

Summary. The paper introduces real-valued non-volume preserving (Real NVP) transformations based on affine coupling layers. These yield invertible maps whose Jacobians are triangular, allowing exact log-likelihood evaluation via the change-of-variables formula, exact sampling by inversion, and exact latent inference. The model is demonstrated on four image datasets (CIFAR-10, ImageNet 32×32, LSUN, CelebA) with reported log-likelihoods, samples, and latent-space manipulations.

Significance. If the central construction holds, the work is significant: it supplies a flow-based generative model that simultaneously achieves exact likelihood, exact sampling, and competitive performance on high-dimensional natural images, addressing a key limitation of contemporaneous methods such as VAEs and GANs. The multi-scale architecture and neural-network parameterizations for the scale and translation functions are shown to be sufficiently expressive for the reported tasks.

minor comments (3)

[§3.2] §3.2, Eq. (6): the multi-scale architecture description would benefit from an explicit statement of how the checkerboard and channel-wise masks are alternated across layers to ensure full mixing.
[Table 1] Table 1: the log-likelihood numbers are given without standard errors across multiple runs; adding these would strengthen the quantitative comparison to NICE and other baselines.
[Figure 4] Figure 4: the latent-space arithmetic examples are visually informative, but the paper does not report a quantitative measure (e.g., reconstruction error after manipulation) to support the claim of an interpretable latent space.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading and positive evaluation of the manuscript. The provided summary accurately reflects the core contributions of Real NVP, including the use of affine coupling layers for invertible transformations with tractable Jacobians, enabling exact likelihood, sampling, and inference. We are pleased that the significance for flow-based generative modeling on high-dimensional image data is recognized.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The central construction defines affine coupling layers whose Jacobian is triangular by direct substitution (scale factors on one partition, identity on the other), yielding an exactly computable determinant via the change-of-variables formula. Log-likelihood, sampling, and latent inference follow immediately from this definition without fitted parameters or self-referential predictions. Prior work (NICE) is cited for context but is not load-bearing for the new real NVP properties or reported results. Empirical log-likelihoods on image datasets are external benchmarks, not internal fits renamed as predictions. No self-definitional, uniqueness-imported, or ansatz-smuggled steps appear.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard change-of-variables formula for densities under diffeomorphisms and on the assumption that neural networks can parameterize sufficiently flexible coupling functions; no ad-hoc constants or new entities are introduced.

axioms (1)

standard math Change of variables formula for probability densities under invertible differentiable transformations
Invoked to obtain exact log-likelihood from the Jacobian determinant of the coupling layers.

pith-pipeline@v0.9.0 · 5396 in / 1202 out tokens · 60648 ms · 2026-05-11T23:49:35.826368+00:00 · methodology

discussion (0)

Forward citations

Cited by 30 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Generative Modeling with Flux Matching
cs.LG 2026-05 unverdicted novelty 8.0

Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices be...
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
cs.LG 2022-09 unverdicted novelty 8.0

Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.
Denoising Diffusion Probabilistic Models
cs.LG 2020-06 accept novelty 8.0

Denoising diffusion probabilistic models generate high-quality images by learning to reverse a fixed forward diffusion process, achieving FID 3.17 on CIFAR10.
DriftXpress: Faster Drifting Models via Projected RKHS Fields
cs.LG 2026-05 unverdicted novelty 7.0

DriftXpress approximates drifting kernels via projected RKHS fields to lower training cost of one-step generative models while matching original FID scores.
Normalizing Trajectory Models
cs.CV 2026-05 unverdicted novelty 7.0

NTM uses per-step conditional normalizing flows plus a trajectory-wide predictor to achieve exact-likelihood 4-step sampling that matches or exceeds baselines on text-to-image tasks.
On the Invariance and Generality of Neural Scaling Laws
cs.LG 2026-05 unverdicted novelty 7.0

Neural scaling laws are invariant under bijective data transformations and change predictably with information resolution ρ under non-bijective transformations, enabling cross-domain transport of fitted exponents.
TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models
stat.ML 2026-05 unverdicted novelty 7.0

TRACE creates valid conformal prediction sets for complex generative models by scoring outputs via averaged denoising or velocity errors along stochastic transport paths instead of likelihoods.
TMDs in the Lens of Generative AI: A Pixel-Based Approach to Partonic Imaging
hep-ph 2026-05 unverdicted novelty 7.0

A nonparametric pixel-based Bayesian method integrates TMD evolution with generative AI and SVD to image parton distributions and reveal null TMDs unconstrained by observables.
Risk-Controlled Post-Processing of Decision Policies
stat.ML 2026-05 unverdicted novelty 7.0

Risk-controlled post-processing yields a threshold-structured policy that follows the baseline except where an oracle fallback sharply reduces conditional violation risk, achieving O(log n/n) expected excess risk in i...
Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors
cs.LG 2026-05 unverdicted novelty 7.0

Diffusion model priors enable training-free Bayesian sampling for more accurate rain field reconstruction from path-integrated commercial microwave link measurements than Gaussian process baselines.
Personalized Multi-Interest Modeling for Cross-Domain Recommendation to Cold-Start Users
cs.IR 2026-04 unverdicted novelty 7.0

NF-NPCDR enhances neural processes with normalizing flows to model personalized multi-interest preferences and uses a preference pool plus adaptive decoder to improve cross-domain recommendations for cold-start users.
Probing the 3D Structures of Supernovae through IR Signatures of CO and SiO
astro-ph.HE 2026-04 unverdicted novelty 7.0

MOFAT applied to SN2024ggi shows CO triggering inner SiO formation with a receding edge, order-of-magnitude mass drop, clumping signatures, and no dust formation.
MorphoFlow: Sparse-Supervised Generative Shape Modeling with Adaptive Latent Relevance
cs.CV 2026-04 unverdicted novelty 7.0

MorphoFlow learns compact probabilistic 3D shape representations from sparse annotations using neural implicits, autodecoders, autoregressive flows, and adaptive sparsity priors on latent dimensions.
Differentiable free energy surface: a variational approach to directly observing rare events using generative deep-learning models
physics.comp-ph 2026-04 unverdicted novelty 7.0

VaFES constructs a latent space from reversible collective variables and variationally optimizes a tractable-density generative model to produce a continuous free energy surface from which rare events are directly sampled.
Operator Spectroscopy of Trained Lattice Samplers
hep-lat 2026-05 unverdicted novelty 6.0

Operator projections of trained sampler functions in 2D phi^4 lattice theory decompose residuals into zero-mode Binder and finite-k correlator components, distinguishing flow-matching, diffusion, and normalizing-flow models.
CONTRA: Conformal Prediction Region via Normalizing Flow Transformation
stat.ML 2026-05 unverdicted novelty 6.0

CONTRA generates sharp multi-dimensional conformal prediction regions by defining nonconformity scores as distances from the center in the latent space of a normalizing flow.
STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation
cs.CV 2026-05 unverdicted novelty 6.0

STARFlow2 presents an autoregressive flow-based architecture for unified multimodal text-image generation by interleaving a VLM stream with a TarFlow stream via residual skips and a unified latent space.
Accelerating the Simulation of Ordinary Differential Equations Through Physics-Preserving Neural Networks
math.NA 2026-05 unverdicted novelty 6.0

A neural network maps ODE states to a slow-evolving latent space with dynamics derived from the original equations via the chain rule, enabling accelerated simulations with fewer function calls.
Conservative Flows: A New Paradigm of Generative Models
cs.LG 2026-05 unverdicted novelty 6.0

Conservative flows generate by running probability-preserving stochastic dynamics initialized at data points rather than noise, using corrected Langevin or predictor-corrector mechanisms on top of any pretrained flow ...
Robust Conditional Conformal Prediction via Branched Normalizing Flow
cs.LG 2026-05 unverdicted novelty 6.0

Branched Normalizing Flow improves conditional coverage robustness of conformal prediction under distribution shift by normalizing test inputs to the calibration distribution and mapping prediction sets back.
Normalizing Flows with Iterative Denoising
cs.CV 2026-04 unverdicted novelty 6.0

iTARFlow augments normalizing flows with diffusion-style iterative denoising during sampling while preserving end-to-end likelihood training, reaching competitive results on ImageNet 64/128/256.
OLLM: Options-based Large Language Models
cs.AI 2026-04 unverdicted novelty 6.0

OLLM models next-token generation as a latent-indexed set of options, enabling up to 70% math reasoning correctness versus 51% baselines and structure-based alignment via a compact latent policy.
Lookahead Drifting Model
cs.LG 2026-04 unverdicted novelty 6.0

The lookahead drifting model improves upon the drifting model by sequentially computing multiple drifting terms that incorporate higher-order gradient information, leading to better performance on toy examples and CIFAR10.
Dartmouth Stellar Evolution Emulator (DSEE) 1: Generative Stellar Evolution Model Database
astro-ph.SR 2026-04 unverdicted novelty 6.0

DSEE is a flow-based emulator that generates stellar evolution tracks and isochrones as probabilistic outputs from a single model trained on millions of simulations, enabling fast interpolation and uncertainty-aware analyses.
Jeffreys Flow: Robust Boltzmann Generators for Rare Event Sampling via Parallel Tempering Distillation
cs.LG 2026-04 unverdicted novelty 6.0

Jeffreys Flow distills Parallel Tempering trajectories via Jeffreys divergence to produce robust Boltzmann generators that suppress mode collapse and correct sampling inaccuracies for rare event sampling.
VideoGPT: Video Generation using VQ-VAE and Transformers
cs.CV 2021-04 accept novelty 6.0

VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.
To Use AI as Dice of Possibilities with Timing Computation
cs.AI 2026-05 unverdicted novelty 5.0

Proposes verb-based paradigm with timing computation to enable data-driven discovery of patient trajectories and counterfactual timing from EHR data without domain knowledge.
Pre-localization of Massive Black Hole Binaries in the Millihertz Band
gr-qc 2026-04 unverdicted novelty 5.0

A neural spline flow pipeline performs amortized inference on millihertz MBHB signals, delivering ~20 deg² pre-merger sky localizations in ~1 minute while matching PTMCMC sky modes and parameter uncertainties.
Generative Design of a Gas Turbine Combustor Using Invertible Neural Networks
cs.AI 2026-04 unverdicted novelty 5.0

Invertible Neural Networks are used to generate gas turbine combustor designs that meet specified performance criteria from a training database of parameterized designs and simulations.
Scalable DDPM-Polycube: An Extended Diffusion-Based Method for Hexahedral Mesh and Volumetric Spline Construction
cs.CE 2026-04 unverdicted novelty 3.0

Scalable DDPM-Polycube adds a blind-hole cube primitive, enlarges the grid to 3D, and introduces genus-guided hierarchical verification to improve diffusion-based polycube generation for complex geometries.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · cited by 30 Pith papers · 8 internal anchors

[1]

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorﬂow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016

work page Pith review arXiv 2016
[2]

Understanding symmetries in deep networks

Vijay Badrinarayanan, Bamdev Mishra, and Roberto Cipolla. Understanding symmetries in deep networks. arXiv preprint arXiv:1511.01029, 2015

work page arXiv 2015
[3]

Density modeling of images using a generalized normalization transformation

Johannes Ballé, Valero Laparra, and Eero P Simoncelli. Density modeling of images using a generalized normalization transformation. arXiv preprint arXiv:1511.06281, 2015

work page arXiv 2015
[4]

An information-maximization approach to blind separation and blind deconvolution

Anthony J Bell and Terrence J Sejnowski. An information-maximization approach to blind separation and blind deconvolution. Neural computation, 7(6):1129–1159, 1995

work page 1995
[5]

Artiﬁcial neural networks and their application to sequence recognition

Yoshua Bengio. Artiﬁcial neural networks and their application to sequence recognition. 1991

work page 1991
[6]

Modeling high-dimensional discrete data with multi-layer neural networks

Yoshua Bengio and Samy Bengio. Modeling high-dimensional discrete data with multi-layer neural networks. In NIPS, volume 99, pages 400–406, 1999

work page 1999
[7]

Stochastic gradient estimate variance in contrastive divergence and persistent contrastive divergence

Mathias Berglund and Tapani Raiko. Stochastic gradient estimate variance in contrastive divergence and persistent contrastive divergence. arXiv preprint arXiv:1312.6002, 2013

work page arXiv 2013
[8]

Bowman, Luke Vilnis, Oriol Vinyals, Andrew M

Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349, 2015

work page arXiv 2015
[9]

Super-resolution with deep convolutional sufﬁcient statistics

Joan Bruna, Pablo Sprechmann, and Yann LeCun. Super-resolution with deep convolutional sufﬁcient statistics. arXiv preprint arXiv:1511.05666, 2015

work page arXiv 2015
[10]

Importance weighted autoencoders

Yuri Burda, Roger Grosse, and Ruslan Salakhutdinov. Importance weighted autoencoders. arXiv preprint arXiv:1509.00519, 2015

work page arXiv 2015
[11]

Gaussianization

Scott Shaobing Chen and Ramesh A Gopinath. Gaussianization. In Advances in Neural Information Processing Systems, 2000

work page 2000
[12]

A recurrent latent variable model for sequential data

Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C Courville, and Yoshua Bengio. A recurrent latent variable model for sequential data. In Advances in neural information processing systems, pages 2962–2970, 2015

work page 2015
[13]

The helmholtz machine

Peter Dayan, Geoffrey E Hinton, Radford M Neal, and Richard S Zemel. The helmholtz machine. Neural computation, 7(5):889–904, 1995

work page 1995
[14]

Higher order statistical decorrelation without information loss

Gustavo Deco and Wilfried Brauer. Higher order statistical decorrelation without information loss. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors,Advances in Neural Information Processing Systems 7, pages 247–254. MIT Press, 1995

work page 1995
[15]

Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus

Emily L. Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. Deep generative image models using a laplacian pyramid of adversarial networks. In Advances in Neural Information Processing Systems 28: 10 Published as a conference paper at ICLR 2017 Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Can...

work page 2017
[16]

Sample-based non-uniform random variate generation

Luc Devroye. Sample-based non-uniform random variate generation. InProceedings of the 18th conference on Winter simulation, pages 260–265. ACM, 1986

work page 1986
[17]

NICE: Non-linear Independent Components Estimation

Laurent Dinh, David Krueger, and Yoshua Bengio. Nice: non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014

work page internal anchor Pith review arXiv 2014
[18]

Graphical models for machine learning and digital communication

Brendan J Frey. Graphical models for machine learning and digital communication. MIT press, 1998

work page 1998
[19]

Gatys, Alexander S

Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. Texture synthesis using convolutional neural networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 262–270, 2015

work page 2015
[20]

MADE: masked autoencoder for distribution estimation

Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE: masked autoencoder for distribution estimation. CoRR, abs/1502.03509, 2015

work page arXiv 2015
[21]

Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 2672–2680, 2014

work page 2014
[22]

Towards conceptual compression

Karol Gregor, Frederic Besse, Danilo Jimenez Rezende, Ivo Danihelka, and Daan Wierstra. Towards conceptual compression. arXiv preprint arXiv:1604.08772, 2016

work page arXiv 2016
[23]

Continuous deep q-learning with model-based acceleration

Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine. Continuous deep q-learning with model-based acceleration. arXiv preprint arXiv:1603.00748, 2016

work page arXiv 2016
[24]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[25]

Identity mappings in deep residual networks

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. CoRR, abs/1603.05027, 2016

work page arXiv 2016
[26]

Long short-term memory.Neural Computation, 9(8):1735–1780, 1997

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735–1780, 1997

work page 1997
[27]

Stochastic variational inference

Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. Stochastic variational inference. The Journal of Machine Learning Research, 14(1):1303–1347, 2013

work page 2013
[28]

Independent component analysis, volume 46

Aapo Hyvärinen, Juha Karhunen, and Erkki Oja. Independent component analysis, volume 46. John Wiley & Sons, 2004

work page 2004
[29]

Nonlinear independent component analysis: Existence and uniqueness results

Aapo Hyvärinen and Petteri Pajunen. Nonlinear independent component analysis: Existence and uniqueness results. Neural Networks, 12(3):429–439, 1999

work page 1999
[30]

Generating images with recurrent adversarial networks

Daniel Jiwoong Im, Chris Dongjoo Kim, Hui Jiang, and Roland Memisevic. Generating images with recurrent adversarial networks. arXiv preprint arXiv:1602.05110, 2016

work page arXiv 2016
[31]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015

work page internal anchor Pith review arXiv 2015
[32]

Exploring the limits of language modeling

Rafal Józefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. Exploring the limits of language modeling. CoRR, abs/1602.02410, 2016

work page arXiv 2016
[33]

Adam: A Method for Stochastic Optimization

Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[34]

Improving variational inference with inverse autoregressive ﬂow

Diederik P Kingma, Tim Salimans, and Max Welling. Improving variational inference with inverse autoregressive ﬂow. arXiv preprint arXiv:1606.04934, 2016

work page arXiv 2016
[35]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[36]

Learning multiple layers of features from tiny images, 2009

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images, 2009

work page 2009
[37]

The neural autoregressive distribution estimator

Hugo Larochelle and Iain Murray. The neural autoregressive distribution estimator. In AISTATS, 2011

work page 2011
[38]

Autoencoding beyond pixels using a learned similarity metric

Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, and Ole Winther. Autoencoding beyond pixels using a learned similarity metric. CoRR, abs/1512.09300, 2015

work page arXiv 2015
[39]

Efﬁcient backprop

Yann A LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. Efﬁcient backprop. InNeural networks: Tricks of the trade, pages 9–48. Springer, 2012

work page 2012
[40]

Deeply-supervised nets

Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, and Zhuowen Tu. Deeply-supervised nets. arXiv preprint arXiv:1409.5185, 2014

work page arXiv 2014
[41]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015

work page 2015
[42]

Auxiliary deep generative models

Lars Maaløe, Casper Kaae Sønderby, Søren Kaae Sønderby, and Ole Winther. Auxiliary deep generative models. arXiv preprint arXiv:1602.05473, 2016

work page arXiv 2016
[43]

Neural variational inference and learning in belief networks

Andriy Mnih and Karol Gregor. Neural variational inference and learning in belief networks. arXiv preprint arXiv:1402.0030, 2014

work page arXiv 2014
[44]

Human-level control through deep reinforcement learning

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015

work page 2015
[45]

A view of the em algorithm that justiﬁes incremental, sparse, and other variants

Radford M Neal and Geoffrey E Hinton. A view of the em algorithm that justiﬁes incremental, sparse, and other variants. In Learning in graphical models, pages 355–368. Springer, 1998. 11 Published as a conference paper at ICLR 2017

work page 1998
[46]

Pixel recurrent neural networks

Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759, 2016

work page arXiv 2016
[47]

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR, abs/1511.06434, 2015

work page internal anchor Pith review arXiv 2015
[48]

Jimenez Rezende and S

Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing ﬂows. arXiv preprint arXiv:1505.05770, 2015

work page arXiv 2015
[49]

J., Mohamed, S., and Wierstra, D

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approxi- mate inference in deep generative models. arXiv preprint arXiv:1401.4082, 2014

work page arXiv 2014
[50]

High-Dimensional Probability Estimation with Deep Density Models

Oren Rippel and Ryan Prescott Adams. High-dimensional probability estimation with deep density models. arXiv preprint arXiv:1302.5125, 2013

work page Pith review arXiv 2013
[51]

Learning representations by back- propagating errors

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back- propagating errors. Cognitive modeling, 5(3):1, 1988

work page 1988
[52]

Imagenet large scale visual recognition challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015

work page 2015
[53]

Deep boltzmann machines

Ruslan Salakhutdinov and Geoffrey E Hinton. Deep boltzmann machines. In International conference on artiﬁcial intelligence and statistics, pages 448–455, 2009

work page 2009
[54]

Weight normalization: A sim- ple reparameterization to accelerate training of deep neural networks

Tim Salimans and Diederik P Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. arXiv preprint arXiv:1602.07868, 2016

work page arXiv 2016
[55]

Markov chain monte carlo and variational inference: Bridging the gap

Tim Salimans, Diederik P Kingma, and Max Welling. Markov chain monte carlo and variational inference: Bridging the gap. arXiv preprint arXiv:1410.6460, 2014

work page arXiv 2014
[56]

Mean ﬁeld theory for sigmoid belief networks

Lawrence K Saul, Tommi Jaakkola, and Michael I Jordan. Mean ﬁeld theory for sigmoid belief networks. Journal of artiﬁcial intelligence research, 4(1):61–76, 1996

work page 1996
[57]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recogni- tion. arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[58]

Information processing in dynamical systems: Foundations of harmony theory

Paul Smolensky. Information processing in dynamical systems: Foundations of harmony theory. Technical report, DTIC Document, 1986

work page 1986
[59]

Weiss, Niru Maheswaranathan, and Surya Ganguli

Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, pages 2256–2265, 2015

work page 2015
[60]

Resnet in resnet: Generalizing residual architectures,

Sasha Targ, Diogo Almeida, and Kevin Lyman. Resnet in resnet: Generalizing residual architectures. CoRR, abs/1603.08029, 2016

work page arXiv 2016
[61]

Generative image modeling using spatial lstms

Lucas Theis and Matthias Bethge. Generative image modeling using spatial lstms. In Advances in Neural Information Processing Systems, pages 1918–1926, 2015

work page 1918
[62]

A note on the evaluation of generative models, 2016

Lucas Theis, Aäron Van Den Oord, and Matthias Bethge. A note on the evaluation of generative models. CoRR, abs/1511.01844, 2015

work page arXiv 2015
[63]

Variational gaussian process

Dustin Tran, Rajesh Ranganath, and David M Blei. Variational gaussian process. arXiv preprint arXiv:1511.06499, 2015

work page arXiv 2015
[64]

Rnade: The real-valued neural autoregressive density- estimator

Benigno Uria, Iain Murray, and Hugo Larochelle. Rnade: The real-valued neural autoregressive density- estimator. In Advances in Neural Information Processing Systems, pages 2175–2183, 2013

work page 2013
[65]

Learning functions across many orders of magnitudes

Hado van Hasselt, Arthur Guez, Matteo Hessel, and David Silver. Learning functions across many orders of magnitudes. arXiv preprint arXiv:1602.07714, 2016

work page arXiv 2016
[66]

Order matters: Sequence to sequence for sets.arXiv preprint arXiv:1511.06391, 2015a

Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. Order matters: Sequence to sequence for sets. arXiv preprint arXiv:1511.06391, 2015

work page arXiv 2015
[67]

Embed to control: A locally linear latent dynamics model for control from raw images

Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in Neural Information Processing Systems, pages 2728–2736, 2015

work page 2015
[68]

Simple statistical gradient-following algorithms for connectionist reinforcement learning

Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992

work page 1992
[69]

Multi-scale context aggregation by dilated convolutions,

Fisher Yu and Vladlen Koltun. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015

work page arXiv 2015
[70]

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015

work page internal anchor Pith review arXiv 2015
[71]

Colorful image colorization

Richard Zhang, Phillip Isola, and Alexei A Efros. Colorful image colorization. arXiv preprint arXiv:1603.08511, 2016. 12 Published as a conference paper at ICLR 2017 A Samples Figure 7: Samples from a model trained on Imagenet (64× 64). 13 Published as a conference paper at ICLR 2017 Figure 8: Samples from a model trained on CelebA. 14 Published as a conf...

work page arXiv 2016