pith. machine review for the scientific record. sign in

arxiv: 1206.4635 · v1 · submitted 2012-06-18 · 💻 cs.LG · stat.ML

Recognition: unknown

Deep Mixtures of Factor Analysers

Geoffrey Hinton (University of Toronto), Ruslan Salakhutdinov (University of Toronto), Yichuan Tang (University of Toronto)

classification 💻 cs.LG stat.ML
keywords layerfactorlearningmodelsanalysersdeeplearnmfas
0
0 comments X
read the original abstract

An efficient way to learn deep density models that have many layers of latent variables is to learn one layer at a time using a model that has only one layer of latent variables. After learning each layer, samples from the posterior distributions for that layer are used as training data for learning the next layer. This approach is commonly used with Restricted Boltzmann Machines, which are undirected graphical models with a single hidden layer, but it can also be used with Mixtures of Factor Analysers (MFAs) which are directed graphical models. In this paper, we present a greedy layer-wise learning algorithm for Deep Mixtures of Factor Analysers (DMFAs). Even though a DMFA can be converted to an equivalent shallow MFA by multiplying together the factor loading matrices at different levels, learning and inference are much more efficient in a DMFA and the sharing of each lower-level factor loading matrix by many different higher level MFAs prevents overfitting. We demonstrate empirically that DMFAs learn better density models than both MFAs and two types of Restricted Boltzmann Machine on a wide variety of datasets.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. NICE: Non-linear Independent Components Estimation

    cs.LG 2014-10 accept novelty 8.0

    NICE learns a composition of invertible neural-network layers that transform data into independent latent variables, enabling exact log-likelihood training and sampling for density estimation.