Machine learns as data speak: 30 years of a Swiss army knife: Restricted Boltzmann machines

I read somewhere, but cannot recall exactly who said so, that in ancient worlds, 30 years are long enough for the new generation to settle down with a new system, regime or ideology. As there are only a few days away from 2017, I would like to look back the history of a 30-year old model which has captured my research attention for the past 10 years.

To some of you, restricted Boltzmann machine (RBM) may be a familiar name, especially for those who follow the current deep learning literature since the beginning. But RBM has also passed its prime time, so you may have heard about it in passing.

I was attracted to RBM for several reasons. When I was studying conditional random fields in 2004 and was looking for a fast way to train with arbitrary structures, Contrastive Divergence (CD) appears to be an interesting one. While CD is a generic technique, it was derived especially for RBMs. Second, RBM has "Boltzmann" in the name, which is kind of interesting, because physicists are kind of sexy :)

Needless too say, another big reason is that RBM, together with its cousin, Autoencoder are building blocks of unsupervised deep nets, which started the current revolution -- deep learning.

The greatest reason is that I think RBM is one of the most important classes of data models known to date, perhaps comparable to PCA in dimensionality-reduction and k-means in clustering in terms of usefulness.

First introduced in 1986 by Paul Smolensky under the name Harmonium in a classic two-volume book known as PDP (Parallel Distributed Processing), co-edited by Rumelhart and McLelland. RBMs were subsequently popularised by Geoff Hinton in the 2000s, especially in 2001 with the introduction of Contrastive Divergence (CD), and in 2006 with the introduction of a deep version known as Deep Belief Nets (DBN).

Statistically, RBM is a probabilistic model of data, i.e., it assigns a probability (or density) to a multivariate data. Initially, RBMs are limited to binary data (known as Bernoulli-Bernoulli RBM), but subsequently extended to Gaussian data (known as Gaussian-Bernoulli RBM), and mixed-types (known as Mixed-variate RBM, or Thurstonian Boltzmann machine).

Source: http://deeplearning.net/tutorial/_images/rbm.png

RBM is a special case of Boltzmann machine, which is in turn a special case of Markov random field. It has two layers, one for observed data, the other for latent representation. Due to its special bipartite structure, MCMC inference can be implemented in a block-wise fashion. Learning is relatively fast with CD or its Persistent version. Estimating of latent representation is very fast with a single matrix operation. RBM is also a powerful model in the sense that it can represent any distribution given enough hidden units. As a Markov random field, it has log-linear paramerization which makes it easy to incorporate a variety of domain knowledge.

With all of these advantages, RBMs have been used successfully in many applications, ranging from density modelling, feature extraction, dimensional reduction, clustering, topic modeling, imputation, classification, retrieval and anomaly detection.

Some bias selection of developments

1986: first introduced as Harmonium.
2001: fast approximate biased learning introduced as Contrastive Divergence (CD)
2004: generalized Harmonium introduced
2006: used successfully in Deep Belief Networks
2007: demonstrated with great success on a very large-scale task within the Netflix challenge
2007: temporal RBM
2008: recurrent temporal RBM
2008: classification RBM
2008: persistent CD introduced, essentially another variant of Young's.
2008: convolutional RBMs
2008: universality property proved
2009: topic models with Replicated Softmax
2009: matrix modelling with non i.i.d. RBMs, ordinal data, semi-restricted RBM
2009: implicit mixtures of RBMs
2010: factored high-order RBM
2010: mean-covariance RBM
2010: rectifier linear units RBM
2010: deep BM
2011: mixed-variate RBM
2012: a proper modeling of ordinal matrix data
2013: Thurstonian BM for joint modeling of most known data types
2013: nonnegative RBMs for parts-based representation
2015: trained with graph priors, demonstrating better generalization
2015: extended to tensor-objects
2016: infinite RBM

In short, most of the work has been on extending the representational power of RBM to suit problem structures. The rest is about analysing theoretical properties of RBMs, making deep nets out of RBMs, and improving training speed & accuracy. For the past few years, research about RBMs has slowed down significantly, mostly because the superb accuracy of supervised deep nets, and the ease of deployment of deterministic nets on large-scale problems.

Some of our own work

Statistical Latent Space Approach for Mixed Data Modelling and Applications, T Nguyen, T Tran, D Phung, S Venkatesh, arXiv preprint arXiv:1708.05594
Nonnegative Restricted Boltzmann Machines for Parts-based Representations Discovery and Predictive Model Stabilization, T Nguyen, T Tran, D Phung, S Venkatesh, arXiv preprint arXiv:1708.05603
Multilevel Anomaly Detection for Mixed Data, K Do, T Tran, S Venkatesh, arXiv preprint arXiv: 1610.06249.
Learning deep representation of multityped objects and tasks, Truyen Tran, D. Phung, and S. Venkatesh, arXiv preprint arXiv: 1603.01359.
Outlier Detection on Mixed-Type Data: An Energy-based Approach, K Do, T Tran, D Phung, S Venkatesh, International Conference on Advanced Data Mining and Applications (ADMA 2016).
Graph-induced restricted Boltzmann machines for document modeling, Tu D. Nguyen, Truyen Tran, D. Phung, and S. Venkatesh, Information Sciences. doi:10.1016/j.ins.2015.08.023.
Learning vector representation of medical objects via EMR-driven nonnegative restricted Boltzmann machines (e-NRBM), Truyen Tran, Tu D. Nguyen, D. Phung, and S. Venkatesh, Journal of Biomedical Informatics, 2015, pii: S1532-0464(15)00014-3. doi: 10.1016/j.jbi.2015.01.012.
Tensor-variate Restricted Boltzmann Machines, Tu D. Nguyen, Truyen Tran, D. Phung, and S. Venkatesh, AAAI 2015.
Thurstonian Boltzmann machines: Learning from multiple inequalities, Truyen Tran, D. Phung, and S. Venkatesh, In Proc. of 30th International Conference in Machine Learning (ICML’13), Atlanta, USA, June, 2013.
Learning parts-based representations with Nonnegative Restricted Boltzmann Machine, Tu D. Nguyen, Truyen Tran, D. Phung, and S. Venkatesh, Journal of Machine Learning Research (JMLR) Workshop and Conference Proceedings, Vol. 29, Proc. of 5th Asian Conference on Machine Learning, Nov 2013.
Latent patient profile modelling and applications with Mixed-Variate Restricted Boltzmann Machine, Tu D. Nguyen, Truyen Tran, D. Phung, and S. Venkatesh, In Proc. of 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’13), Gold Coast, Australia, April 2013.
Learning sparse latent representation and distance metric for image retrieval, Tu D. Nguyen, Truyen Tran, D. Phung, and S. Venkatesh, In Proc. of IEEE International Conference on Multimedia and Expo (ICME), San Jose, California, USA, July 2013.
Learning from Ordered Sets and Applications in Collaborative Ranking, Truyen Tran, Dinh Phung and Svetha Venkatesh, in Proc. of. the 4th Asian Conference on Machine Learning (ACML2012), Singapore, Nov 2012.
Cumulative Restricted Boltzmann Machines for Ordinal Matrix Data Analysis, Truyen Tran, Dinh Phung and Svetha Venkatesh, in Proc. of. the 4th Asian Conference on Machine Learning (ACML2012), Singapore, Nov 2012.
Embedded Restricted Boltzmann Machines for Fusion of Mixed Data Types and Applications in Social Measurements Analysis, Truyen Tran, Dinh Phung, Svetha Venkatesh, in Proc. of 15-th International Conference on Information Fusion (FUSION-12), Singapore, July 2012.
Learning Boltzmann Distance Metric for Face Recognition, Truyen Tran, Dinh Phung, Svetha Venkatesh, in Proc. of IEEE International Conference on Multimedia & Expo (ICME-12), Melbourne, Australia, July 2012.
Mixed-Variate Restricted Boltzmann Machines, Truyen Tran, Dinh Phung and Svetha Venkatesh, in Proc. of. the 3rd Asian Conference on Machine Learning (ACML2011), Taoyuan, Taiwan, Nov 2011.
Ordinal Boltzmann Machines for Collaborative Filtering. Truyen Tran, Dinh Q. Phung and Svetha Venkatesh. In Proc. of 25th Conference on Uncertainty in Artificial Intelligence, June, 2009, Montreal, Canada. Runner-up for the best paper award.

Machine learns as data speak

Sunday, 25 December 2016

30 years of a Swiss army knife: Restricted Boltzmann machines

No comments:

Post a Comment