Machine learns as data speak: Ranking by estimating a cone

Geometry is a fascinating world, and in fact many machine learning algorithms can be expressed in the geometrical language: hyper-planes as in SVM, space partitioning as in k-d trees, decision-trees and other non-parametric Bayesian methods, manifold learning, information geometry, to name a few.

Here we will be concerned about a particular geometrical object known as cone (or to be more cultural, have a look at the so-called conical Asian hat). A cone has a special geometrical property that any non-negative combination of vectors belong to it will also belong to it. Mathematically it says for a cone \( C \), if \( x,y \in C \) then \( \alpha x + \beta y \in C \) for any \( \alpha,\beta > 0 \). But doesn't seem to be interesting.

What interests us more is the cone allows us to define a more general concept of inequality. Now we can freely compare vectors instead of points: A vector is said to be greater than another if the difference belongs to a cone. And thus, we can write
\[ x {\succeq}_C y \]
to denote that vector \( x \) is larger than or equal to vector \( y \) with respect to cone \( C \). By definition, we have
\[ x-y \in C \].

It turns out, this concept of generalized inequality can be exploited in our problem of learning to rank (LTR) and collaborative ranking. Let's see how.

Recall that in LTR, one must order objects according to their preferences or perceived utility and relevance. A standard way is to estimate a ranking functional which takes a pair of (query,object) and returns a real score. As we have already mentioned, this is not the only way. We can take a pair of objects and return their ordering.

Now suppose we are given two objects and their true ordering (as always in training data). For convenience we will assume that each (query,object) pair is represented by a vector. How to come up with this vector is an interesting problem on it own, and this is pretty much domain-specific. For example, in Web search engines, the vector can contain elements of relevance measures from different criteria such as TF-IDF, Okapi BM25, title matching and quality measures such as domain authority, structural designs, number of in-links, PageRank, timeliness, or so.

What we are interested here is the fact that we have an ordering between the two vectors. By ordering, we assume that there exists an inequality between the them. And now, the generalized inequality with a cone will come in. What is missing is the cone itself: We don't know it in advance, and thus it must be estimated. And we shall assume that there will be only one cone for the problem at hand, although for some problems more than one cone may be needed.

First, we need a parametric way to represent the cone. Recall that any non-negative combination of two vectors in a cone will stay in a cone. This can be generalised easily: any non-negative combination of any number of vectors in a cone will also stay in a cone. This suggests a way: a cone can be represented by a several basis vectors and all other vectors can be generated from this basis set. This cone is polyhedral in the sense that it is both a polyhedron and a cone.

Let us denote by \( u_1,u_2,..,u_K \) the \( K \) basis vectors. An ordering can be represented as
\[ x-y = \sum_{k=1}^K w_k^{(xy)} u_k \]
where \( w_k^{(xy)} > 0 \) are the coefficients for the pair \( (x,y) \). Thus we are left with two unknown sets of parameters to estimate: the shared basis vectors and the pairwise coefficients. However, we will not cover the details of the estimation for now but refer to our recent work (in progress) here.

Now let's assume that we have estimated the basis vectors, what can we do for prediction? It seems since we do not know which order is best, we need to try both and compare the error rate of the two directions. But this is quite inconvenient because of the non-negative constraints. Fortunately, we found an easy way: just do unconstrained regression in one direction and check for the sign of the sum of the coefficients. If the sign is positive, then the direction is correct; otherwise, the other direction is more accurate ordering.

Machine learns as data speak

Tuesday, 12 June 2012

Ranking by estimating a cone

No comments:

Post a Comment