Statistics

Research in Multivariate Statistics and Regression

Overview

This theme encompasses both theory and applications. Theory is involved with new models and their analysis by classical, likelihood and Bayesian methodologies. Often new computational methods are the key to analysing complex big data problems.

Areas of research

Bayesian multivariate models

We have been and are involved in applications to genomic data and drug discovery with linear and mixed models. One recent application in drug discovery has led to the development of new designs for screening when there are very many potential compounds that can be synthesised, work that has been undertaken in conjunction with computational chemists in the pharmaceutical industry.

Other applications involve data as curves in mass spectroscopy proteomics, and Bayesian shrinkage priors have been developed for this purpose.

Variable selection in regression models

Variable selection in regression models is increasingly recognised as an important problem in statistics.

A Bayesian approach assumes that some (or many) regression coefficients are zero (or values that are very close to zero).

Theoretical developments include correlated prior distributions capable of tackling sparsity in high dimensional data and giving teeth to ideas of hierarchies of interactions and grouping in linear models.

It can be hard to define efficient computational methods for Bayesian variable selection with many variables. We have been developing efficient adaptive Monte Carlo methods that can be applied to regression problems with thousands of variables.

Functional neuroimaging

This area involves some ill-posed inverse problems in spatial-temporal models, we have developed new methodologies and theories for filtering sparse features in infinite dimensional space with magnetoencephalographic data.

The image (right) shows orthogonal slices of a brain activity map through an estimated source.

We are also working on Bayesian approaches to data fusion between different imaging modalities and across different subjects.

Statistical genomics

We have developed Bayesian plaid models for bi-clustering, and regulised non-Gaussian mixture regression models for simultaneous clustering and variable selection with high dimensional data, and epistatic mixture models for modelling interactions between clusters.

Nonparametric smoothing and robust regression

Work in this area aims to provide a comprehensive rather than a partial picture of the relationship between the response variable and the covariates, quantile regression is also robust again extreme values.

We have developed general theories on the asymptotic representations of the regression quantiles and its derivatives, which are useful in establishing the asymptotics of a broad class of estimators of quantities which are themselves functionals of the conditional quantiles.

Modelling the shape change of mouse skullsModelling the shape change of mouse skulls

Another important application of such results is in dimension reduction, i.e. the identification of linear combinations of the covariate vector which completely decides the conditional distribution.

Multi-regression models for shape data

This methodology ensures invariance with respect to rotation, location and scale of the objects of interests.

This is particularly useful when applied to images of biological objects where the size (related to age) needs to be ignored from the inference as well as the position of the origin of coordinates (rotation and location).

The images above, right, show how we could model the shape change of mouse skulls using regression models on the shape coordinates.

Researchers working in this theme

Name Keywords
Prof Phil Brown Multi-objective Bayesian variable selection, functional mixed models, screening designs, sparsity and structured priors
Prof Jim Griffin Sparsity shrinkage priors, MCMC methods for Bayesian variable selection, Structured priors
Dr Efang Kong Nonparametric smoothing, quantile regression
Dr Alfred Kume Shape analysis, shape change, species classification; molecule matching
Cristiano Villa Variable Selection for Regression Models
Prof Jian Zhang Spatial-temporal models, mixture regression models, neuroimaging, statistical genomics

Orthogonal slices of a brain activity map through an estimated source
Orthogonal slices of a brain activity map through an estimated source

School of Mathematics, Statistics and Actuarial Science (SMSAS), Sibson Building, Parkwood Road, Canterbury, CT2 7FS

Contact us

Last Updated: 10/10/2014