arxiv: 1406.2661 · v1 · submitted 2014-06-10 · 📊 stat.ML · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Generative Adversarial Networks

Aaron Courville, Bing Xu, David Warde-Farley, Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Sherjil Ozair, Yoshua Bengio

Pith reviewed 2026-05-13 03:56 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords generative adversarial networksminimax gamegenerative modelsdiscriminatorbackpropagationdata distributionunsupervised learning

0 comments

The pith

An adversarial minimax game between a generator and a discriminator yields a unique equilibrium where the generator recovers the training data distribution and the discriminator outputs 1/2 everywhere.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that trains two models at once: a generator G that produces samples meant to match the data distribution, and a discriminator D that estimates whether a sample is real or generated. G is trained to maximize the chance that D makes a mistake, which frames the process as a two-player minimax game. In the space of arbitrary functions, this game has a unique solution with G exactly matching the training distribution and D constant at 1/2. When G and D are implemented as multilayer perceptrons, the whole system trains end-to-end with backpropagation and requires no Markov chains or approximate inference networks. A reader would care because the method offers a direct optimization route to generative modeling that bypasses explicit likelihood calculations.

Core claim

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained by

What carries the argument

The minimax two-player game between generator G and discriminator D, in which G is optimized to fool D into classifying its outputs as real.

If this is right

Samples can be generated directly without running Markov chains or unrolled inference networks.
The full system trains end-to-end using standard backpropagation.
The generator learns an implicit density model that matches the data distribution at equilibrium.
Qualitative and quantitative evaluation of generated samples can demonstrate the framework's effectiveness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adversarial objective might be applied to other model classes beyond perceptrons, such as convolutional networks for images.
Success in reaching equilibrium could depend on careful balancing of the two networks' capacities and learning rates.
The approach provides an alternative to maximum-likelihood training that avoids computing intractable partition functions.

Load-bearing premise

That the theoretical minimax equilibrium can be reached or closely approximated when G and D are restricted to multilayer perceptrons trained by backpropagation.

What would settle it

A training run on multilayer perceptrons where the generated samples fail to match the training distribution statistics or where the discriminator outputs deviate persistently from 1/2 at equilibrium.

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a minimax game for generative models that reaches a unique equilibrium in arbitrary function space and trains via backprop without Markov chains or inference networks.

read the letter

The key point is that Goodfellow et al. frame generative modeling as a two-player game where the generator maximizes the discriminator's error. In the space of all functions this has a unique solution: the generator recovers the data distribution exactly and the discriminator outputs 1/2 everywhere. The derivation is straightforward—fix the generator, solve for the optimal discriminator as the density ratio, substitute to obtain the Jensen-Shannon divergence, and note that it is minimized only when the distributions match. That step uses only standard information-theoretic facts and does not depend on any fitted parameters or self-referential loops. The practical claim is that multilayer perceptrons for both networks can be trained jointly by backpropagation, removing the need for Markov chains or unrolled approximate inference that appeared in earlier generative work. Experiments on small datasets like MNIST show that samples can be produced and that some quantitative scores improve over baselines. The theory is internally consistent and the implementation is simple enough to reproduce from the description. The soft spot is the distance between the arbitrary-function guarantee and what finite neural nets actually achieve under gradient descent. The paper demonstrates qualitative success but does not prove or extensively test that backpropagation reaches or stays near the equilibrium; the experiments remain mostly visual with limited quantitative controls. No major circularity or hidden assumptions appear in the central argument. This is for researchers working on density estimation and generative models who are looking for an alternative to explicit likelihood or variational methods. It deserves a serious referee because the equilibrium result is cleanly derived and the training procedure is novel relative to the cited prior art, even if later work would need to fill in the empirical gaps.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Generative Adversarial Networks, a framework for training generative models by simultaneously optimizing a generator G (to capture the data distribution) and a discriminator D (to distinguish real from generated samples) in a minimax two-player game. It proves that in the space of arbitrary functions G and D, a unique equilibrium exists where G recovers the training data distribution exactly and D outputs 1/2 everywhere, derived by first finding the optimal D for fixed G and then showing that the resulting objective reduces to the Jensen-Shannon divergence between p_data and p_g. When G and D are multilayer perceptrons, the system is trainable end-to-end via backpropagation with no Markov chains or inference networks required. Experiments on small datasets provide qualitative and quantitative support for the generated samples.

Significance. If the central claims hold, the work is highly significant: it introduces a new, computationally efficient paradigm for generative modeling that sidesteps many limitations of prior approaches. A clear strength is the parameter-free theoretical derivation of the unique equilibrium using only standard properties of the Jensen-Shannon divergence and expectations, without reliance on fitted parameters or self-referential loops. This provides a clean, falsifiable characterization of the optimum that has proven foundational for subsequent research, even though the manuscript itself focuses on the initial framework and small-scale demonstrations.

major comments (2)

[Theoretical results] Theoretical results section: the proof establishes a unique global equilibrium only in the space of arbitrary functions G and D; the subsequent claim that multilayer perceptrons 'can be trained with backpropagation' to reach or closely approximate this equilibrium lacks any convergence analysis or guarantees, leaving the practical viability dependent on an unproven assumption about gradient descent behavior.
[Experiments] Experiments section: while the abstract states that both qualitative and quantitative evaluations are provided, the reported results consist primarily of visual inspection of generated samples on small datasets (e.g., MNIST); this provides only weak support for the claim that the framework works in practice when G and D are restricted to multilayer perceptrons.

minor comments (2)

[Adversarial nets] The value function V(G,D) and its relation to the Jensen-Shannon divergence could be introduced with an additional sentence of intuition in the main text to improve accessibility for readers unfamiliar with the derivation.
[Experiments] Figure captions for generated samples would benefit from explicit mention of the dataset, model architecture details, and any preprocessing steps used, to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive recommendation and constructive comments on our manuscript. We address the major comments point by point below, providing clarifications where appropriate.

read point-by-point responses

Referee: [Theoretical results] Theoretical results section: the proof establishes a unique global equilibrium only in the space of arbitrary functions G and D; the subsequent claim that multilayer perceptrons 'can be trained with backpropagation' to reach or closely approximate this equilibrium lacks any convergence analysis or guarantees, leaving the practical viability dependent on an unproven assumption about gradient descent behavior.

Authors: We agree that the unique global equilibrium is proven only in the nonparametric setting of arbitrary functions G and D. For the case of multilayer perceptrons, the manuscript states that the system can be trained end-to-end via backpropagation because the value function is differentiable with respect to the parameters of both models, allowing direct application of the chain rule without Markov chains or inference networks. We do not provide (and the manuscript does not claim) any convergence analysis or guarantees that gradient-based optimization will reach the global equilibrium; this remains an open question dependent on the optimization dynamics. The practical viability is instead supported by the empirical results. We will revise the text to explicitly distinguish the nonparametric equilibrium result from the parametric training procedure and to avoid any implication of convergence guarantees. revision: yes
Referee: [Experiments] Experiments section: while the abstract states that both qualitative and quantitative evaluations are provided, the reported results consist primarily of visual inspection of generated samples on small datasets (e.g., MNIST); this provides only weak support for the claim that the framework works in practice when G and D are restricted to multilayer perceptrons.

Authors: The experiments section provides both qualitative samples and quantitative elements, including performance metrics on MNIST and comparisons demonstrating that the generated samples are coherent and competitive with prior approaches on these datasets. We acknowledge that the evaluations are conducted on small-scale datasets and that visual inspection plays a prominent role, which is typical for an initial demonstration of a new generative framework. These results suffice to illustrate that the adversarial training procedure functions in practice with multilayer perceptrons and avoids the need for Markov chains or unrolled inference. More extensive quantitative benchmarks on larger datasets are left to future work. We do not believe additional revisions are required, as the current experiments align with the claims of demonstrating the framework's potential. revision: no

Circularity Check

0 steps flagged

No significant circularity in the GAN equilibrium derivation

full rationale

The paper's central claim derives the unique minimax equilibrium for arbitrary functions G and D by first obtaining the optimal D* for fixed G as D*(x) = p_data(x) / (p_data(x) + p_g(x)), then substituting to yield C(G) = -log(4) + 2 JSD(p_data || p_g), which is minimized exactly when p_g = p_data (with D = 1/2). This follows directly from the definitions of the value function and standard properties of the Jensen-Shannon divergence; it involves no fitted parameters renamed as predictions, no load-bearing self-citations, and no ansatz or uniqueness imported from prior author work. The derivation is self-contained and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on the existence of a unique Nash equilibrium in the space of arbitrary functions G and D, the ability to approximate it with multilayer perceptrons, and the use of backpropagation to optimize the resulting objective. No numerical parameters are fitted to data in the theoretical statement.

axioms (2)

domain assumption A unique solution to the minimax game exists in the space of arbitrary functions G and D.
Invoked in the abstract and theoretical analysis section to establish that G recovers the data distribution.
domain assumption The equilibrium can be approximated by training multilayer perceptrons with backpropagation.
Stated as the practical training procedure without additional justification for convergence.

invented entities (2)

Generative model G no independent evidence
purpose: To capture and sample from the data distribution via the adversarial game.
New component introduced as one player in the minimax framework.
Discriminative model D no independent evidence
purpose: To estimate the probability that a sample is real rather than generated.
New component introduced as the opposing player in the minimax framework.

pith-pipeline@v0.9.0 · 5472 in / 1465 out tokens · 110383 ms · 2026-05-13T03:56:00.345297+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. ... C(G) = -log(4) + 2·JSD(p_data ∥ p_g)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean absolute_floor_iff_bare_distinguishability unclear
Theorem 1. The global minimum of the virtual training criterion C(G) is achieved if and only if p_g = p_data.

Forward citations

Cited by 29 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

NICE: Non-linear Independent Components Estimation
cs.LG 2014-10 accept novelty 8.0

NICE learns a composition of invertible neural-network layers that transform data into independent latent variables, enabling exact log-likelihood training and sampling for density estimation.
VLTI/PIONIER imaging of post-AGB binaries. An INSPIRING hunt for inner rim substructures in circumbinary discs
astro-ph.SR 2026-05 unverdicted novelty 7.0

High-resolution interferometric imaging of eight post-AGB circumbinary discs reveals diverse inner-rim substructures including azimuthal brightness enhancements and arc-like features not explained by inclination alone.
Sampling two-dimensional spin systems with transformers
cond-mat.dis-nn 2026-04 unverdicted novelty 7.0

Transformer networks sample up to 180x180 2D Ising systems and 64x64 Edwards-Anderson systems by generating spin groups with probability approximations, yielding ~20x higher effective sample size than prior neural sam...
Physics-informed, Generative Adversarial Design of Funicular Shells
cs.CE 2026-04 unverdicted novelty 7.0

A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
Differentiable free energy surface: a variational approach to directly observing rare events using generative deep-learning models
physics.comp-ph 2026-04 unverdicted novelty 7.0

VaFES constructs a latent space from reversible collective variables and variationally optimizes a tractable-density generative model to produce a continuous free energy surface from which rare events are directly sampled.
FlowGuard: Towards Lightweight In-Generation Safety Detection for Diffusion Models via Linear Latent Decoding
cs.CV 2026-04 unverdicted novelty 7.0

FlowGuard detects unsafe content during diffusion image generation via linear latent decoding and curriculum learning, outperforming prior methods by over 30% F1 while reducing GPU memory by 97% and projection time to...
Hierarchical Text-Conditional Image Generation with CLIP Latents
cs.CV 2022-04 accept novelty 7.0

A hierarchical prior-decoder model using CLIP latents generates more diverse text-conditional images than direct methods while preserving photorealism and caption fidelity.
Diffusion Models Beat GANs on Image Synthesis
cs.LG 2021-05 accept novelty 7.0

Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
Separate Universe Super-Resolution Emulator
astro-ph.CO 2026-05 unverdicted novelty 6.0

A generative adversarial network emulator upscales low-resolution N-body simulations with non-zero curvature to high resolution, recovering most large-scale power but with up to 10% small-scale suppression and altered...
Energy-based models for diagnostic reconstruction and analysis in a laboratory plasma device
physics.plasm-ph 2026-05 unverdicted novelty 6.0

A single energy-based model trained on LAPD plasma data enables diagnostic reconstruction, inverse inference of probe position, conditional trend sampling, and unconditional mode reproduction for potential anomaly detection.
CASCADE: Context-Aware Relaxation for Speculative Image Decoding
cs.CV 2026-05 unverdicted novelty 6.0

CASCADE formalizes semantic interchangeability and convergence in target model representations to enable context-aware acceptance relaxation in tree-based speculative decoding, delivering up to 3.6x speedup on text-to...
HyperEvoGen: Exploring deep phylogeny using non-Euclidean variational inference
q-bio.QM 2026-04 unverdicted novelty 6.0

HyperEvoGen uses hyperbolic variational inference to learn phylogenetic representations from protein alignments that preserve hierarchy and scale with evolutionary divergence, outperforming baselines in ancestral reco...
Diffusion-based Galaxy Simulations for the Roman High Latitude Survey
astro-ph.CO 2026-04 unverdicted novelty 6.0

A denoising diffusion model trained on transformed JWST observations generates multi-band galaxy images that match key statistical properties of real galaxies for Roman weak lensing simulations.
COMPASS: A Unified Decision-Intelligence System for Navigating Performance Trade-off in HPC
cs.PF 2026-04 conditional novelty 6.0

COMPASS formalizes HPC configuration questions as ML tasks on traces, quantifies recommendation trustworthiness, and delivers 65.93% lower average job turnaround time plus 80.93% lower node usage versus prior methods ...
AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement Learning
cs.CV 2026-04 unverdicted novelty 6.0

AnomalyAgent uses tool-augmented reinforcement learning with self-reflection to generate realistic industrial anomalies, achieving better metrics than zero-shot methods on MVTec-AD.
Dartmouth Stellar Evolution Emulator (DSEE) 1: Generative Stellar Evolution Model Database
astro-ph.SR 2026-04 unverdicted novelty 6.0

DSEE is a flow-based emulator that generates stellar evolution tracks and isochrones as probabilistic outputs from a single model trained on millions of simulations, enabling fast interpolation and uncertainty-aware analyses.
Mitigating Data Scarcity in Spaceflight Applications for Offline Reinforcement Learning Using Physics-Informed Deep Generative Models
cs.LG 2026-04 unverdicted novelty 6.0

MI-VAE generates physics-constrained synthetic trajectories from scarce real data to improve offline RL policy performance on planetary lander tasks over standard VAEs.
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
cs.AI 2023-08 unverdicted novelty 6.0

MetaGPT embeds human SOPs into LLM prompts to create role-specialized agent teams that produce more coherent solutions on collaborative software engineering tasks than prior chat-based multi-agent systems.
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
cs.LG 2021-04 accept novelty 6.0

Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
Machine Learning for neutron source distributions
physics.ins-det 2026-05 unverdicted novelty 5.0

Generative models including VAEs, normalizing flows, GANs, and diffusion models can learn neutron source distributions from Monte Carlo lists for fast, memory-free sampling after training.
On the Tradeoffs of On-Device Generative Models in Federated Predictive Maintenance Systems
cs.LG 2026-05 unverdicted novelty 5.0

Experiments on real industrial time series show that partial model sharing improves diffusion model performance in bandwidth-limited non-IID settings, while full sharing stabilizes GAN training but offers less robustn...
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
cs.CV 2026-04 unverdicted novelty 5.0

Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemph...
Removing Motion Artifact in MRI by Using a Perceptual Loss Driven Deep Learning Framework
cs.CV 2026-04 unverdicted novelty 5.0

PERCEPT-Net uses motion perceptual loss in a residual U-Net with attention and multi-scale modules to remove MRI motion artifacts more effectively than prior methods on clinical data.
From Perception to Autonomous Computational Modeling: A Multi-Agent Approach
cs.CE 2026-04 unverdicted novelty 5.0

A multi-agent LLM framework autonomously completes the full computational mechanics pipeline from a photograph to a code-compliant engineering report on a steel L-bracket example.
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
cs.CV 2022-05 unverdicted novelty 5.0

CogVideo is a large-scale transformer pretrained for text-to-video generation that outperforms public models in evaluations.
Cross-Domain Adversarial Augmentation: Stabilizing GANs for Medical and Handwriting Data Scarcity
cs.CV 2026-05 unverdicted novelty 4.0

Stabilized GANs generate synthetic data that boosts sample diversity and classifier accuracy on scarce Bangla handwriting and chest X-ray datasets.
IncepDeHazeGAN: Novel Satellite Image Dehazing
cs.CV 2026-04 unverdicted novelty 4.0

IncepDeHazeGAN is a GAN with Inception blocks and multi-layer feature fusion that claims state-of-the-art single-image dehazing performance on satellite datasets.
Discrete Meanflow Training Curriculum
cs.LG 2026-04 unverdicted novelty 4.0

A DMF curriculum initialized from pretrained flow models achieves one-step FID 3.36 on CIFAR-10 after only 2000 epochs by exploiting a discretized consistency property in the Meanflow objective.
SAGE-GAN: Towards Realistic and Robust Segmentation of Spatially Ordered Nanoparticles via Attention-Guided GANs
cs.CV 2026-04 unverdicted novelty 4.0

SAGE-GAN integrates a self-attention U-Net into a CycleGAN framework to generate realistic synthetic electron microscopy image-mask pairs that augment training data for nanoparticle segmentation without human labeling.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · cited by 29 Pith papers

[1]

J., Bergeron, A., Bouchard, N., and Bengio, Y

Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I. J., Bergeron, A., Bouchard, N., and Bengio, Y . (2012). Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop

work page 2012
[2]

Bengio, Y . (2009). Learning deep architectures for AI. Now Publishers

work page 2009
[3]

Bengio, Y ., Mesnil, G., Dauphin, Y ., and Rifai, S. (2013a). Better mixing via deep representations. In ICML’13

work page
[4]

Bengio, Y ., Yao, L., Alain, G., and Vincent, P. (2013b). Generalized denoising auto-encoders as generative models. In NIPS26. Nips Foundation

work page
[5]

Bengio, Y ., Thibodeau-Laufer, E., and Yosinski, J. (2014a). Deep generative stochastic networks trainable by backprop. In ICML’14

work page
[6]

Bengio, Y ., Thibodeau-Laufer, E., Alain, G., and Yosinski, J. (2014b). Deep generative stochastic net- works trainable by backprop. In Proceedings of the 30th International Conference on Machine Learning (ICML’14)

work page
[7]

Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., and Bengio, Y . (2010). Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientiﬁc Computing Conference (SciPy). Oral Presentation

work page 2010
[8]

Breuleux, O., Bengio, Y ., and Vincent, P. (2011). Quickly generating representative samples from an RBM-derived process. Neural Computation, 23(8), 2053–2073

work page 2011
[9]

Glorot, X., Bordes, A., and Bengio, Y . (2011). Deep sparse rectiﬁer neural networks. In AISTATS’2011

work page 2011
[10]

J., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y

Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y . (2013a). Maxout networks. In ICML’2013

work page 2013
[11]

J., Mirza, M., Courville, A., and Bengio, Y

Goodfellow, I. J., Mirza, M., Courville, A., and Bengio, Y . (2013b). Multi-prediction deep Boltzmann machines. In NIPS’2013

work page 2013
[12]

Pylearn2: a machine learning research library

Goodfellow, I. J., Warde-Farley, D., Lamblin, P., Dumoulin, V ., Mirza, M., Pascanu, R., Bergstra, J., Bastien, F., and Bengio, Y . (2013c). Pylearn2: a machine learning research library. arXiv preprint arXiv:1308.4214

work page Pith review arXiv
[13]

and Hyvarinen, A

Gutmann, M. and Hyvarinen, A. (2010). Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In AISTATS’2010

work page 2010
[14]

E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V ., Nguyen, P., Sainath, T., and Kingsbury, B

Hinton, G., Deng, L., Dahl, G. E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V ., Nguyen, P., Sainath, T., and Kingsbury, B. (2012a). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6), 82–97

work page
[15]

E., Dayan, P., Frey, B

Hinton, G. E., Dayan, P., Frey, B. J., and Neal, R. M. (1995). The wake-sleep algorithm for unsupervised neural networks. Science, 268, 1558–1161. 8

work page 1995
[16]

E., Osindero, S., and Teh, Y

Hinton, G. E., Osindero, S., and Teh, Y . (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554

work page 2006
[17]

E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R

Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2012b). Improving neural networks by preventing co-adaptation of feature detectors. Technical report, arXiv:1207.0580

work page arXiv
[18]

Hyv ¨arinen, A. (2005). Estimation of non-normalized statistical models using score matching. J. Machine Learning Res., 6

work page 2005
[19]

Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y . (2009). What is the best multi-stage architecture for object recognition? In Proc. International Conference on Computer Vision (ICCV’09), pages 2146–2153. IEEE

work page 2009
[20]

Kingma, D. P. and Welling, M. (2014). Auto-encoding variational bayes. In Proceedings of the Interna- tional Conference on Learning Representations (ICLR)

work page 2014
[21]

and Hinton, G

Krizhevsky, A. and Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical report, University of Toronto

work page 2009
[22]

Krizhevsky, A., Sutskever, I., and Hinton, G. (2012). ImageNet classiﬁcation with deep convolutional neural networks. In NIPS’2012

work page 2012
[23]

LeCun, Y ., Bottou, L., Bengio, Y ., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324

work page 1998
[24]

J., Mohamed, S., and Wierstra, D

Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. Technical report, arXiv:1401.4082

work page arXiv 2014
[25]

Rifai, S., Bengio, Y ., Dauphin, Y ., and Vincent, P. (2012). A generative process for sampling contractive auto-encoders. In ICML’12

work page 2012
[26]

and Hinton, G

Salakhutdinov, R. and Hinton, G. E. (2009). Deep Boltzmann machines. In AISTATS’2009, pages 448– 455

work page 2009
[27]

Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing, volume 1, chapter 6, pages 194–281. MIT Press, Cambridge

work page 1986
[28]

Susskind, J., Anderson, A., and Hinton, G. E. (2010). The Toronto face dataset. Technical Report UTML TR 2010-001, U. Toronto

work page 2010
[29]

Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to the likelihood gradient. In W. W. Cohen, A. McCallum, and S. T. Roweis, editors,ICML 2008, pages 1064–1071. ACM

work page 2008
[30]

Vincent, P., Larochelle, H., Bengio, Y ., and Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. In ICML 2008

work page 2008
[31]

Younes, L. (1999). On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics and Stochastic Reports, 65(3), 177–228. 9

work page 1999