Monday 19 December 2016

Everything old is new again: Deep statistical relational learning


In the age of combinatorial innovation, old things will be given a new shiny face, even nothing really new happens. The same holds for Statistical Relation Learning (SRL) -- a sub-field of machine learning for characterizing the relational structure of the world.

Started in late 1990s, SRL had gone through a fruitful period of about 10 years and reached its peak in 2007 with the publication of a book titled "Introduction to Statistical Relational Learning" co-edited by Lise Getoor and the late Ben Taskar (who died unexpectedly in 2013 at the age of 36 at his academic peak). Many significant models appeared in the first half of the 2000s, including Conditional Random Fields (CRF, 2001), Relational Markov networks (2002) and Markov Logic Networks (2006). Despite being more powerful than non-relational alternatives, SRL still relies on manual feature engineering, which will soon reach its limit of utility.

Developed rather in parallel is Deep Learning (DL), where the current wave officially started in 2006 with the publication of Deep Belief Networks in Science. Deep learning is concerned about learning data abstraction (aka features), favoring end-to-end learning through multiple steps of non-linear computation.

A combinatorial thinking would naturally lead to the question whether these two sub-fields can work together. The answer is a big YES, because SRL and DL are rather complementary. For example, in the past 3 years, there have been lots of papers marrying CRF and deep nets. While CRFs offer a semi-formal framework for joint learning and inference, deep nets offer learning of features (with feedforward nets), deterministic dynamics (with recurrent nets), and translation invariance (with convolutional nets). The marriage would be a happy one. But like any marriage of convenience, it won't go very far. Some genuine blending is needed.

Our recent work,  Column Networks, scheduled to appear in AAAI'17, blends the SRL and DL even further so that learning and inference can be carried out naturally. The term "column" refers to the famous columnar structure of neo-cortex in mammals. Interestingly, Column Networks share design features of all three main deep net architectures:

  • A column is a feedfoward net,
  • Parameters are tied across layers, which is essentially the idea behind recurrent nets.
  • The network between columns is designed so that the multi-relations between columns are invariant across columns, hence the translation invariance property of convolutional nets.
As Column Networks are very generic, expect more to come in the next few months. Stay tuned.

Updated references

  • Column Networks for Collective Classification, T Pham, T Tran, D Phung, S Venkatesh, AAAI'17.
  • Graph-induced restricted Boltzmann machines for document modeling, Tu D. Nguyen, Truyen Tran, D. Phung, and S. Venkatesh, Information Sciences. doi: 10.1016/j.ins.2015.08.023
  • Neural Choice by Elimination via Highway Networks, Truyen Tran, Dinh Phung and Svetha Venkatesh,  PAKDD workshop on Biologically Inspired Techniques for Data Mining (BDM'16), April 19-22 2016, Auckland, NZ.
  • Predicting delays in software projects using networked classification, Morakot Choetikertikul, Hoa Khanh Dam, Truyen Tran, Aditya Ghose, 30th IEEE/ACM International Conference on Automated Software Engineering, November 9–13, 2015 Lincoln, Nebraska, USA.
  • Learning vector representation of medical objects via EMR-driven nonnegative restricted Boltzmann machines (e-NRBM), Truyen Tran, Tu D. Nguyen, D. Phung, and S. Venkatesh, Journal of Biomedical Informatics, 2015, pii: S1532-0464(15)00014-3. doi: 10.1016/j.jbi.2015.01.012. 
  • Cumulative Restricted Boltzmann Machines for Ordinal Matrix Data Analysis, Truyen Tran, Dinh Phung and Svetha Venkatesh, in Proc. of. the 4th Asian Conference on Machine Learning (ACML2012), Singapore, Nov 2012.
  • Ordinal Boltzmann Machines for Collaborative Filtering. Truyen Tran, Dinh Q. Phung and Svetha Venkatesh. In Proc. of 25th Conference on Uncertainty in Artificial Intelligence, June, 2009, Montreal, Canada. 
  • Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data, Truyen Tran, Dinh Q. Phung, Hung H. Bui, and Svetha Venkatesh. In Proc. of 21st Annual Conference on Neural Information Processing Systems, Dec 2008, Vancouver, Canada.
  • AdaBoost.MRF: Boosted Markov random forests and application to multilevel activity recognition, Truyen Tran, Dinh Quoc Phung, Hung Hai Bui, and Svetha Venkatesh. In Proc. of  IEEE Conference on Computer Vision and Pattern Recognition, volume Volume 2, pages 1686-1693, New York, USA, June 2006.

2 comments: