pith. machine review for the scientific record. sign in

arxiv: 1406.2661 · v1 · submitted 2014-06-10 · 📊 stat.ML · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Generative Adversarial Networks

Aaron Courville, Bing Xu, David Warde-Farley, Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Sherjil Ozair, Yoshua Bengio

Pith reviewed 2026-05-13 03:56 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords generative adversarial networksminimax gamegenerative modelsdiscriminatorbackpropagationdata distributionunsupervised learning
0
0 comments X

The pith

An adversarial minimax game between a generator and a discriminator yields a unique equilibrium where the generator recovers the training data distribution and the discriminator outputs 1/2 everywhere.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that trains two models at once: a generator G that produces samples meant to match the data distribution, and a discriminator D that estimates whether a sample is real or generated. G is trained to maximize the chance that D makes a mistake, which frames the process as a two-player minimax game. In the space of arbitrary functions, this game has a unique solution with G exactly matching the training distribution and D constant at 1/2. When G and D are implemented as multilayer perceptrons, the whole system trains end-to-end with backpropagation and requires no Markov chains or approximate inference networks. A reader would care because the method offers a direct optimization route to generative modeling that bypasses explicit likelihood calculations.

Core claim

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained by

What carries the argument

The minimax two-player game between generator G and discriminator D, in which G is optimized to fool D into classifying its outputs as real.

If this is right

  • Samples can be generated directly without running Markov chains or unrolled inference networks.
  • The full system trains end-to-end using standard backpropagation.
  • The generator learns an implicit density model that matches the data distribution at equilibrium.
  • Qualitative and quantitative evaluation of generated samples can demonstrate the framework's effectiveness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same adversarial objective might be applied to other model classes beyond perceptrons, such as convolutional networks for images.
  • Success in reaching equilibrium could depend on careful balancing of the two networks' capacities and learning rates.
  • The approach provides an alternative to maximum-likelihood training that avoids computing intractable partition functions.

Load-bearing premise

That the theoretical minimax equilibrium can be reached or closely approximated when G and D are restricted to multilayer perceptrons trained by backpropagation.

What would settle it

A training run on multilayer perceptrons where the generated samples fail to match the training distribution statistics or where the discriminator outputs deviate persistently from 1/2 at equilibrium.

read the original abstract

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Generative Adversarial Networks, a framework for training generative models by simultaneously optimizing a generator G (to capture the data distribution) and a discriminator D (to distinguish real from generated samples) in a minimax two-player game. It proves that in the space of arbitrary functions G and D, a unique equilibrium exists where G recovers the training data distribution exactly and D outputs 1/2 everywhere, derived by first finding the optimal D for fixed G and then showing that the resulting objective reduces to the Jensen-Shannon divergence between p_data and p_g. When G and D are multilayer perceptrons, the system is trainable end-to-end via backpropagation with no Markov chains or inference networks required. Experiments on small datasets provide qualitative and quantitative support for the generated samples.

Significance. If the central claims hold, the work is highly significant: it introduces a new, computationally efficient paradigm for generative modeling that sidesteps many limitations of prior approaches. A clear strength is the parameter-free theoretical derivation of the unique equilibrium using only standard properties of the Jensen-Shannon divergence and expectations, without reliance on fitted parameters or self-referential loops. This provides a clean, falsifiable characterization of the optimum that has proven foundational for subsequent research, even though the manuscript itself focuses on the initial framework and small-scale demonstrations.

major comments (2)
  1. [Theoretical results] Theoretical results section: the proof establishes a unique global equilibrium only in the space of arbitrary functions G and D; the subsequent claim that multilayer perceptrons 'can be trained with backpropagation' to reach or closely approximate this equilibrium lacks any convergence analysis or guarantees, leaving the practical viability dependent on an unproven assumption about gradient descent behavior.
  2. [Experiments] Experiments section: while the abstract states that both qualitative and quantitative evaluations are provided, the reported results consist primarily of visual inspection of generated samples on small datasets (e.g., MNIST); this provides only weak support for the claim that the framework works in practice when G and D are restricted to multilayer perceptrons.
minor comments (2)
  1. [Adversarial nets] The value function V(G,D) and its relation to the Jensen-Shannon divergence could be introduced with an additional sentence of intuition in the main text to improve accessibility for readers unfamiliar with the derivation.
  2. [Experiments] Figure captions for generated samples would benefit from explicit mention of the dataset, model architecture details, and any preprocessing steps used, to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive recommendation and constructive comments on our manuscript. We address the major comments point by point below, providing clarifications where appropriate.

read point-by-point responses
  1. Referee: [Theoretical results] Theoretical results section: the proof establishes a unique global equilibrium only in the space of arbitrary functions G and D; the subsequent claim that multilayer perceptrons 'can be trained with backpropagation' to reach or closely approximate this equilibrium lacks any convergence analysis or guarantees, leaving the practical viability dependent on an unproven assumption about gradient descent behavior.

    Authors: We agree that the unique global equilibrium is proven only in the nonparametric setting of arbitrary functions G and D. For the case of multilayer perceptrons, the manuscript states that the system can be trained end-to-end via backpropagation because the value function is differentiable with respect to the parameters of both models, allowing direct application of the chain rule without Markov chains or inference networks. We do not provide (and the manuscript does not claim) any convergence analysis or guarantees that gradient-based optimization will reach the global equilibrium; this remains an open question dependent on the optimization dynamics. The practical viability is instead supported by the empirical results. We will revise the text to explicitly distinguish the nonparametric equilibrium result from the parametric training procedure and to avoid any implication of convergence guarantees. revision: yes

  2. Referee: [Experiments] Experiments section: while the abstract states that both qualitative and quantitative evaluations are provided, the reported results consist primarily of visual inspection of generated samples on small datasets (e.g., MNIST); this provides only weak support for the claim that the framework works in practice when G and D are restricted to multilayer perceptrons.

    Authors: The experiments section provides both qualitative samples and quantitative elements, including performance metrics on MNIST and comparisons demonstrating that the generated samples are coherent and competitive with prior approaches on these datasets. We acknowledge that the evaluations are conducted on small-scale datasets and that visual inspection plays a prominent role, which is typical for an initial demonstration of a new generative framework. These results suffice to illustrate that the adversarial training procedure functions in practice with multilayer perceptrons and avoids the need for Markov chains or unrolled inference. More extensive quantitative benchmarks on larger datasets are left to future work. We do not believe additional revisions are required, as the current experiments align with the claims of demonstrating the framework's potential. revision: no

Circularity Check

0 steps flagged

No significant circularity in the GAN equilibrium derivation

full rationale

The paper's central claim derives the unique minimax equilibrium for arbitrary functions G and D by first obtaining the optimal D* for fixed G as D*(x) = p_data(x) / (p_data(x) + p_g(x)), then substituting to yield C(G) = -log(4) + 2 JSD(p_data || p_g), which is minimized exactly when p_g = p_data (with D = 1/2). This follows directly from the definitions of the value function and standard properties of the Jensen-Shannon divergence; it involves no fitted parameters renamed as predictions, no load-bearing self-citations, and no ansatz or uniqueness imported from prior author work. The derivation is self-contained and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on the existence of a unique Nash equilibrium in the space of arbitrary functions G and D, the ability to approximate it with multilayer perceptrons, and the use of backpropagation to optimize the resulting objective. No numerical parameters are fitted to data in the theoretical statement.

axioms (2)
  • domain assumption A unique solution to the minimax game exists in the space of arbitrary functions G and D.
    Invoked in the abstract and theoretical analysis section to establish that G recovers the data distribution.
  • domain assumption The equilibrium can be approximated by training multilayer perceptrons with backpropagation.
    Stated as the practical training procedure without additional justification for convergence.
invented entities (2)
  • Generative model G no independent evidence
    purpose: To capture and sample from the data distribution via the adversarial game.
    New component introduced as one player in the minimax framework.
  • Discriminative model D no independent evidence
    purpose: To estimate the probability that a sample is real rather than generated.
    New component introduced as the opposing player in the minimax framework.

pith-pipeline@v0.9.0 · 5472 in / 1465 out tokens · 110383 ms · 2026-05-13T03:56:00.345297+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Forward citations

Cited by 29 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. NICE: Non-linear Independent Components Estimation

    cs.LG 2014-10 accept novelty 8.0

    NICE learns a composition of invertible neural-network layers that transform data into independent latent variables, enabling exact log-likelihood training and sampling for density estimation.

  2. VLTI/PIONIER imaging of post-AGB binaries. An INSPIRING hunt for inner rim substructures in circumbinary discs

    astro-ph.SR 2026-05 unverdicted novelty 7.0

    High-resolution interferometric imaging of eight post-AGB circumbinary discs reveals diverse inner-rim substructures including azimuthal brightness enhancements and arc-like features not explained by inclination alone.

  3. Sampling two-dimensional spin systems with transformers

    cond-mat.dis-nn 2026-04 unverdicted novelty 7.0

    Transformer networks sample up to 180x180 2D Ising systems and 64x64 Edwards-Anderson systems by generating spin groups with probability approximations, yielding ~20x higher effective sample size than prior neural sam...

  4. Physics-informed, Generative Adversarial Design of Funicular Shells

    cs.CE 2026-04 unverdicted novelty 7.0

    A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.

  5. Differentiable free energy surface: a variational approach to directly observing rare events using generative deep-learning models

    physics.comp-ph 2026-04 unverdicted novelty 7.0

    VaFES constructs a latent space from reversible collective variables and variationally optimizes a tractable-density generative model to produce a continuous free energy surface from which rare events are directly sampled.

  6. FlowGuard: Towards Lightweight In-Generation Safety Detection for Diffusion Models via Linear Latent Decoding

    cs.CV 2026-04 unverdicted novelty 7.0

    FlowGuard detects unsafe content during diffusion image generation via linear latent decoding and curriculum learning, outperforming prior methods by over 30% F1 while reducing GPU memory by 97% and projection time to...

  7. Hierarchical Text-Conditional Image Generation with CLIP Latents

    cs.CV 2022-04 accept novelty 7.0

    A hierarchical prior-decoder model using CLIP latents generates more diverse text-conditional images than direct methods while preserving photorealism and caption fidelity.

  8. Diffusion Models Beat GANs on Image Synthesis

    cs.LG 2021-05 accept novelty 7.0

    Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.

  9. Separate Universe Super-Resolution Emulator

    astro-ph.CO 2026-05 unverdicted novelty 6.0

    A generative adversarial network emulator upscales low-resolution N-body simulations with non-zero curvature to high resolution, recovering most large-scale power but with up to 10% small-scale suppression and altered...

  10. Energy-based models for diagnostic reconstruction and analysis in a laboratory plasma device

    physics.plasm-ph 2026-05 unverdicted novelty 6.0

    A single energy-based model trained on LAPD plasma data enables diagnostic reconstruction, inverse inference of probe position, conditional trend sampling, and unconditional mode reproduction for potential anomaly detection.

  11. CASCADE: Context-Aware Relaxation for Speculative Image Decoding

    cs.CV 2026-05 unverdicted novelty 6.0

    CASCADE formalizes semantic interchangeability and convergence in target model representations to enable context-aware acceptance relaxation in tree-based speculative decoding, delivering up to 3.6x speedup on text-to...

  12. HyperEvoGen: Exploring deep phylogeny using non-Euclidean variational inference

    q-bio.QM 2026-04 unverdicted novelty 6.0

    HyperEvoGen uses hyperbolic variational inference to learn phylogenetic representations from protein alignments that preserve hierarchy and scale with evolutionary divergence, outperforming baselines in ancestral reco...

  13. Diffusion-based Galaxy Simulations for the Roman High Latitude Survey

    astro-ph.CO 2026-04 unverdicted novelty 6.0

    A denoising diffusion model trained on transformed JWST observations generates multi-band galaxy images that match key statistical properties of real galaxies for Roman weak lensing simulations.

  14. COMPASS: A Unified Decision-Intelligence System for Navigating Performance Trade-off in HPC

    cs.PF 2026-04 conditional novelty 6.0

    COMPASS formalizes HPC configuration questions as ML tasks on traces, quantifies recommendation trustworthiness, and delivers 65.93% lower average job turnaround time plus 80.93% lower node usage versus prior methods ...

  15. AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement Learning

    cs.CV 2026-04 unverdicted novelty 6.0

    AnomalyAgent uses tool-augmented reinforcement learning with self-reflection to generate realistic industrial anomalies, achieving better metrics than zero-shot methods on MVTec-AD.

  16. Dartmouth Stellar Evolution Emulator (DSEE) 1: Generative Stellar Evolution Model Database

    astro-ph.SR 2026-04 unverdicted novelty 6.0

    DSEE is a flow-based emulator that generates stellar evolution tracks and isochrones as probabilistic outputs from a single model trained on millions of simulations, enabling fast interpolation and uncertainty-aware analyses.

  17. Mitigating Data Scarcity in Spaceflight Applications for Offline Reinforcement Learning Using Physics-Informed Deep Generative Models

    cs.LG 2026-04 unverdicted novelty 6.0

    MI-VAE generates physics-constrained synthetic trajectories from scarce real data to improve offline RL policy performance on planetary lander tasks over standard VAEs.

  18. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

    cs.AI 2023-08 unverdicted novelty 6.0

    MetaGPT embeds human SOPs into LLM prompts to create role-specialized agent teams that produce more coherent solutions on collaborative software engineering tasks than prior chat-based multi-agent systems.

  19. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

    cs.LG 2021-04 accept novelty 6.0

    Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.

  20. Machine Learning for neutron source distributions

    physics.ins-det 2026-05 unverdicted novelty 5.0

    Generative models including VAEs, normalizing flows, GANs, and diffusion models can learn neutron source distributions from Monte Carlo lists for fast, memory-free sampling after training.

  21. On the Tradeoffs of On-Device Generative Models in Federated Predictive Maintenance Systems

    cs.LG 2026-05 unverdicted novelty 5.0

    Experiments on real industrial time series show that partial model sharing improves diffusion model performance in bandwidth-limited non-IID settings, while full sharing stabilizes GAN training but offers less robustn...

  22. Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

    cs.CV 2026-04 unverdicted novelty 5.0

    Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemph...

  23. Removing Motion Artifact in MRI by Using a Perceptual Loss Driven Deep Learning Framework

    cs.CV 2026-04 unverdicted novelty 5.0

    PERCEPT-Net uses motion perceptual loss in a residual U-Net with attention and multi-scale modules to remove MRI motion artifacts more effectively than prior methods on clinical data.

  24. From Perception to Autonomous Computational Modeling: A Multi-Agent Approach

    cs.CE 2026-04 unverdicted novelty 5.0

    A multi-agent LLM framework autonomously completes the full computational mechanics pipeline from a photograph to a code-compliant engineering report on a steel L-bracket example.

  25. CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

    cs.CV 2022-05 unverdicted novelty 5.0

    CogVideo is a large-scale transformer pretrained for text-to-video generation that outperforms public models in evaluations.

  26. Cross-Domain Adversarial Augmentation: Stabilizing GANs for Medical and Handwriting Data Scarcity

    cs.CV 2026-05 unverdicted novelty 4.0

    Stabilized GANs generate synthetic data that boosts sample diversity and classifier accuracy on scarce Bangla handwriting and chest X-ray datasets.

  27. IncepDeHazeGAN: Novel Satellite Image Dehazing

    cs.CV 2026-04 unverdicted novelty 4.0

    IncepDeHazeGAN is a GAN with Inception blocks and multi-layer feature fusion that claims state-of-the-art single-image dehazing performance on satellite datasets.

  28. Discrete Meanflow Training Curriculum

    cs.LG 2026-04 unverdicted novelty 4.0

    A DMF curriculum initialized from pretrained flow models achieves one-step FID 3.36 on CIFAR-10 after only 2000 epochs by exploiting a discretized consistency property in the Meanflow objective.

  29. SAGE-GAN: Towards Realistic and Robust Segmentation of Spatially Ordered Nanoparticles via Attention-Guided GANs

    cs.CV 2026-04 unverdicted novelty 4.0

    SAGE-GAN integrates a self-attention U-Net into a CycleGAN framework to generate realistic synthetic electron microscopy image-mask pairs that augment training data for nanoparticle segmentation without human labeling.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · cited by 29 Pith papers

  1. [1]

    J., Bergeron, A., Bouchard, N., and Bengio, Y

    Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I. J., Bergeron, A., Bouchard, N., and Bengio, Y . (2012). Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop

  2. [2]

    Bengio, Y . (2009). Learning deep architectures for AI. Now Publishers

  3. [3]

    Bengio, Y ., Mesnil, G., Dauphin, Y ., and Rifai, S. (2013a). Better mixing via deep representations. In ICML’13

  4. [4]

    Bengio, Y ., Yao, L., Alain, G., and Vincent, P. (2013b). Generalized denoising auto-encoders as generative models. In NIPS26. Nips Foundation

  5. [5]

    Bengio, Y ., Thibodeau-Laufer, E., and Yosinski, J. (2014a). Deep generative stochastic networks trainable by backprop. In ICML’14

  6. [6]

    Bengio, Y ., Thibodeau-Laufer, E., Alain, G., and Yosinski, J. (2014b). Deep generative stochastic net- works trainable by backprop. In Proceedings of the 30th International Conference on Machine Learning (ICML’14)

  7. [7]

    Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., and Bengio, Y . (2010). Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy). Oral Presentation

  8. [8]

    Breuleux, O., Bengio, Y ., and Vincent, P. (2011). Quickly generating representative samples from an RBM-derived process. Neural Computation, 23(8), 2053–2073

  9. [9]

    Glorot, X., Bordes, A., and Bengio, Y . (2011). Deep sparse rectifier neural networks. In AISTATS’2011

  10. [10]

    J., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y

    Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y . (2013a). Maxout networks. In ICML’2013

  11. [11]

    J., Mirza, M., Courville, A., and Bengio, Y

    Goodfellow, I. J., Mirza, M., Courville, A., and Bengio, Y . (2013b). Multi-prediction deep Boltzmann machines. In NIPS’2013

  12. [12]

    Pylearn2: a machine learning research library

    Goodfellow, I. J., Warde-Farley, D., Lamblin, P., Dumoulin, V ., Mirza, M., Pascanu, R., Bergstra, J., Bastien, F., and Bengio, Y . (2013c). Pylearn2: a machine learning research library. arXiv preprint arXiv:1308.4214

  13. [13]

    and Hyvarinen, A

    Gutmann, M. and Hyvarinen, A. (2010). Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In AISTATS’2010

  14. [14]

    E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V ., Nguyen, P., Sainath, T., and Kingsbury, B

    Hinton, G., Deng, L., Dahl, G. E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V ., Nguyen, P., Sainath, T., and Kingsbury, B. (2012a). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6), 82–97

  15. [15]

    E., Dayan, P., Frey, B

    Hinton, G. E., Dayan, P., Frey, B. J., and Neal, R. M. (1995). The wake-sleep algorithm for unsupervised neural networks. Science, 268, 1558–1161. 8

  16. [16]

    E., Osindero, S., and Teh, Y

    Hinton, G. E., Osindero, S., and Teh, Y . (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554

  17. [17]

    E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R

    Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2012b). Improving neural networks by preventing co-adaptation of feature detectors. Technical report, arXiv:1207.0580

  18. [18]

    Hyv ¨arinen, A. (2005). Estimation of non-normalized statistical models using score matching. J. Machine Learning Res., 6

  19. [19]

    Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y . (2009). What is the best multi-stage architecture for object recognition? In Proc. International Conference on Computer Vision (ICCV’09), pages 2146–2153. IEEE

  20. [20]

    Kingma, D. P. and Welling, M. (2014). Auto-encoding variational bayes. In Proceedings of the Interna- tional Conference on Learning Representations (ICLR)

  21. [21]

    and Hinton, G

    Krizhevsky, A. and Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical report, University of Toronto

  22. [22]

    Krizhevsky, A., Sutskever, I., and Hinton, G. (2012). ImageNet classification with deep convolutional neural networks. In NIPS’2012

  23. [23]

    LeCun, Y ., Bottou, L., Bengio, Y ., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324

  24. [24]

    J., Mohamed, S., and Wierstra, D

    Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. Technical report, arXiv:1401.4082

  25. [25]

    Rifai, S., Bengio, Y ., Dauphin, Y ., and Vincent, P. (2012). A generative process for sampling contractive auto-encoders. In ICML’12

  26. [26]

    and Hinton, G

    Salakhutdinov, R. and Hinton, G. E. (2009). Deep Boltzmann machines. In AISTATS’2009, pages 448– 455

  27. [27]

    Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing, volume 1, chapter 6, pages 194–281. MIT Press, Cambridge

  28. [28]

    Susskind, J., Anderson, A., and Hinton, G. E. (2010). The Toronto face dataset. Technical Report UTML TR 2010-001, U. Toronto

  29. [29]

    Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to the likelihood gradient. In W. W. Cohen, A. McCallum, and S. T. Roweis, editors,ICML 2008, pages 1064–1071. ACM

  30. [30]

    Vincent, P., Larochelle, H., Bengio, Y ., and Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. In ICML 2008

  31. [31]

    Younes, L. (1999). On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics and Stochastic Reports, 65(3), 177–228. 9