Models, Features and Dimension Reduction for Biological Data
Exploring high dimensional data in biology recalls the beautiful statistical ideas in multivariate techniques from the 20th century. Principal components analysis (PCA) has long been the workhorse for multivariate data visualisation. PCA is a straightforward algorithm linear projection of data onto a lower dimensional subspace that optimally preserves information (variation) in the data. In a 1999 paper, Tipping and Bishop suggested re-imagining PCA in a model-based framework, a statistical model being, after all, a way of representing the relevant and important signals in the data. Model-based PCA opens up more possibilities in dimension reduction when working with “difficult” data: missing values, mixtures of Gaussians, and non-normal data. In this talk, we’ll explore PCA from a dual perspective (sample space and feature space), and introduce the main results from model-based or probabilistic PCA.
Reference: Probabilistic Principal Component Analysis, Journal of the Royal Statistical Society. Series B (Statistical Methodology), Vol. 61, No. 3 (1999), pp. 611-622
MuseOmics: Unlocking the Vaults holding Historical Genetic and Gene Expression Data
Seminars held fortnightly from 2-3pm on Mondays at various locations, light refreshments will follow.
All welcome to attend.