Machine learns as data speak: Collaborative ranking I: Problem setting

As I have mentioned in the previous post, and there are two important aspects in modelling data of collaborative filtering (CF): one is the ordinal nature of the ratings, and another one is the end goal of CF in producing a ranked list (of new items for a particular user, or of new users for a particular item) rather than a set of ratings. This post is about the latter aspect. Indeed, I plan a series of posts, so stay tuned.

For clarity, we shall focus exclusively on producing a ranked item list for a particular user. We leave to another post the orthogonal problem of producing a ranked user list for a particular item.

There are really two steps involved in producing an item list: (i) identify a candidate list, and (ii) rank them. Let us look at each of them.

Identifying a candidate set

This seems trivial but it is not. One might think that we can use all the remaining items, but it is not very effective. First, each user is really interested in only some items. Second, including all the good items is not necessarily satisfactory to the user: They often want a mix of items of different tastes, even though some of they may not be of high quality. And third, how can you measure the quality of the candidate set? What if all the candidate items are of high quality but none of them will be actually in the future list?

There are a number of ways to produce a candidate list. The most intuitively way is to look for other items which similar users have seen. More specifically, we first identify top K similar users and aggregate all the items seen by them, minus those already seen by this particular user. This poses two questions: (a) what is the similarity measure, and (b) what is the best K? Adequate answers to each question deserves a research paper itself. So we will not pursue here.

There are of course other system criteria -- one might look to maximise the purchase rate, or the total revenue per user, the reduction of inventory, the higher degree of diversity, or so.

Producing the ranked list

Given the candidate set, a straightforward way to produce the ranked list is to first predict the rating for each item, and then sort them. For this, we do not need any specialized ranking algorithm, but it is less exciting: One might argue that since our objective is ranking, we should learn a ranking function in the first place.

What is a rank function? The simplest rank function might take a pair of (user,item) and return a real score. These scores will be used to sort items. For this the ranking will be easy -- the cost is usually Nlog(N) for standard sort algorithms. The drawback of this method is that it implicitly assumes that all items are independent, whereas indeed they are not, because after all, they are liked by the same user (and not just one user). Nevertheless, this method is by far very popular, dating back to the Luce's axiom of choices in the 1950s, to the recent rise of learning-to-rank in information retrieval and label ranking in machine learning.

There is another way: a rank function can take a triple (user,item1,item2) and return the ordering between item1 and item2. Ranking among all items will be based on voting, e.g., number of times a particular item is ranked higher than the others. The cost of this method will be N(N-1)/2. One drawback of this method is that it does not enforce the transitivity, i.e., if item1 is ranked higher than item2, and item2 higher than item3, then item1 should be ranked higher than item3. It also does not take higher-order interaction among items into account. Despite of these shortcomings, this method is widely practised, and one of the earliest citations is Bradley-Terry's work.

This leaves the most informative method: Think of the rank function is a way to produce a permutation whose utility for the user is maximum. In other word, we must solve a set-wise problem rather than the point-wise or pair-wise problems. However, there is a big problem here: the computational complexity is bounded by N!, which is too expensive for even moderate N. Thus, we would not address this method for the moment.

Updated references:

Probabilistic Models over Ordered Partitions with Applications in Document Ranking and Collaborative Filtering, T. Truyen, D. Phung, and S. Venkatesh, in Proc. of SIAM Int. Conf. on Data Mining (SDM11), April, Arizona, USA, 2011.
Learning from Ordered Sets and Applications in Collaborative Ranking, Truyen Tran, Dinh Phung and Svetha Venkatesh, in Proc. of. the 4th Asian Conference on Machine Learning (ACML2012), Singapore, Nov 2012.
Preference Relation-based Markov Random Fields in Recommender Systems, Shaowu Liu, Gang Li,Truyen Tran, Jiang Yuan, ACML'15, November 20-22, 2015, Hong Kong.

[Part 2]

Machine learns as data speak

Sunday, 3 June 2012

Collaborative ranking I: Problem setting

No comments:

Post a Comment