Machine learns as data speak: Everything old is new again: Nested sequential models

Tuesday, 20 December 2016

Everything old is new again: Nested sequential models

Recently, multi-layer RNN architectures have been demonstrated to work better than single-layer versions. The Google's Neural Machine Translation machine, for example, has 8 layers of LSTMs as of Dec 2016.

The idea goes back to earlier days of multi-layer HMMs in the 1990s, which are special cases of Dynamic Bayesian Networks. These were then followed by multi-layer Conditional Random Fields (CRFs), which are also special case of Dynamic CRFs.

The idea is that higher layers represent more abstract semantics. In temporal sequences, one would expect that the "clock" of the upper layers is slower than that of the lower layers. But most existing work has to explicitly design the temporal resolution by hand.

Learning the temporal resolution automatically is an attractive idea. In 1998, Hierarchical HMM was introduced, here parent state is assumed to generate a child sequence, and each child in turn generates a grandchild subsequence and so forth. The network becomes nested. Learning and inference cost cubic time, which is prohibitive for long sequences.

A CRF counterpart is known as Hierarchical Semi-Markov CRF introduced by us in 2008.

Both HHMMs and HSCRFs are member of the Stochastic Context-Free Grammar family, which is known for its cubic time complexity. Not just being slow, HHMMs and HSCRFs are hopeless in large-scale tasks that require many bits to represent the world.

Given the recent successes of RNNs (mostly LSTM and GRU) for sequential tasks, one would naturally ask whether we can achieve the same feat as in HHMMs, that is, the hierarchy is learnt automatically from data. It proves to be a difficult task, until very recently. Check this paper by Bengio's group for more detail. I'm very curious to see how the idea plays out in practice. Let's wait and see.

Work by us:

Hierarchical semi-Markov conditional random fields for deep recursive sequential data, Truyen Tran, Dinh Phung, Hung Bui, Svetha Venkatesh, Artificial Intelligence, 2017. (Extension of the NIPS'08 paper).
MCMC for Hierarchical Semi-Markov Conditional Random Fields, Truyen Tran, Dinh Q. Phung, Svetha Venkatesh and Hung H. Bui. In NIPS'09 Workshop on Deep Learning for Speech Recognition and Related Applications. December, 2009, Whistler, BC, Canada.
Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data, Truyen Tran, Dinh Q. Phung, Hung H. Bui, and Svetha Venkatesh. In Proc. of 21st Annual Conference on Neural Information Processing Systems, Dec 2008, Vancouver, Canada.
AdaBoost.MRF: Boosted Markov random forests and application to multilevel activity recognition, Truyen Tran, Dinh Quoc Phung, Hung Hai Bui, and Svetha Venkatesh. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, volume Volume 2, pages 1686-1693, New York, USA, June 2006.