Semiring Rank Matrix Factorisation

INGI Seminar

Event Speaker: Thanh Le Van
Event Place: Paul Otlet (Reaumur Building)
Event Date: 02/22/2017 - 12:00

Abstract. Rank data, in which each row is a complete or partial ranking of available items (columns), is ubiquitous. It can be used to represent, for instance, preferences of users, the levels of gene expression, and the outcomes of sports events. While rank data has been analysed in the data mining literature, mining patterns in such data has so far not received much attention.

In this talk, I will discuss matrix factorisation based methods for pattern set mining in rank data. First, I will discuss a general framework called Semiring Rank Matrix Factorisation. The framework employs semiring theory rather than relying on the traditional linear algebra for matrix factorisation, which results in a more elegant way of aggregating rankings. Subsequently, I will introduce two instantiations of the framework: Sparse RMF and ranked tiling. We introduce Sparse RMF to mine a set of sparse rank vectors that can be used to summarise given rank matrices succinctly and show the main categories of rankings. We introduce ranked tiling to discover a set of data regions in a rank matrix which have high ranks. Such data regions are interesting as they can show local associations between subsets of the rows and subsets of the columns of the given matrices. Finally, I will discuss how to use ranked tiling to formally define the concept of driver pathways, from which we can find cancer subtypes, i.e., groups of tumour samples having the same molecular mechanism driving tumorigenesis.