Wednesday, 8 March 2017

The Matrix Encoded: Matrices as first-class citizen



One thing as popular as hydrogen in the universe, is vector. Most mathematical and data analytical analysis asks for this fundamental structure of the world. PCA, ICA, SVM, GMM, t-SNE, neural nets to name a few, all implicitly assume vector representation of data. The power of vector should not be underestimated. The so-called distributed representation, which is rocking the machine learning and cognitive science worlds, is nothing but vector representation of thought (in Geoff Hinton's words, referring to Skip-Thought vectors).

The current love for distributed representation of things (yes, THINGS, as in Internet-of-Things) has gone really far. There is a huge line of work on [X]2vec, where you can substitute [X] by [word], [sentence],[paragraph], [document], [node], [tweet], [edge] and [subgraph]. I won't be surprised to see thing2vec very soon.

But can you really compress structured things like sentences into vectors? I bet you could, given that the vector is long enough. After all, although the space of all possible sentences in a language is theoretically infinite, the majority of language usage is tightly packed, and in practice the sentence space can be mapped into a linear space of thousands of dimensions.

However, compressing a data begs a question of decompressing it, e.g., to generate a target sentence in another language, as in machine translation. Surprisingly, the simplistic seq2seq trick works well in translation. But since the linguistic structures have been lost to vectorization, language generation from vector will be more difficult. A better way is to treat each sentence as a matrix, where each column is a word embedding. This gives rise to the attention scheme in machine translation, which turns out to a huge success, as in the current Google's Neural Machine Translation system.

Indeed, it has been well-recognized that vectors alone are not enough to memorize long-distant events. The idea is to augment vector-based RNN with an external memory, giving rise to the recent  Memory-augmented RNNs. The external memory is nothing but a matrix.

Enter the world of matrices

Matrices in vector space are used for linear transformation, that is, to map a vector from one space, to another vector in a different space. As a mathematical object, matrices have their own life, just like vectors, e.g., matrix calculus.

In NLP, it has been suggested that noun is a vector and adjective is really a matrix. The idea is cute, because adjective "acts" on noun, which will transform the meaning of the noun.

Matrices also form a basis for parameterization of neural layers. Hence a space of multilayered neural nets is a joint space of matrices.

Our recent paper titled "Matrix-centric neural networks" (co-authored with my PhD student, Kien Do and my boss, Professor Svetha Venkatesh) pushes the line of matrix thinking to the extreme. That is, matrices are fist-class citizen. They are no longer a collection of vectors. The input, hidden layers, and the output are all matrices. The RNNs is now a model of a sequence of input matrices and a sequence of output matrices. The internal memory (as in LSTM) is also a matrix, making it resemble the Memory-augmented RNNs.

To rephrase Geoff Hinton, we want a matrix representation of thought. Somehow, our neocortex looks like a matrix -- it is really a huge thin sheet of grey matter.

May be one day we will live in the space created by matrices.

References
  • Learning Deep Matrix Representations, Kien Do, Truyen Tran, Svetha Venkatesh. arXiv preprint arXiv: 1703.01454. The updated version (as of August 2017) covers graphs as special case!