Research in Biological Applications


Research in this theme is focused on statistical modelling and inference in biology and genetics with applications in complex disease studies. Over the past few decades, large amounts of complex data have been produced by high through-put biotechnologies. The grand challenges offered to statisticians include developing scalable statistical methods for extracting useful information from the data, modelling biological systems with the data, and fostering innovation in global health research.

Areas of research

Flexible Bayesian non-parametric models for the analysis of array CGH data

Structure variations in DNA sequences such as inheritable copy number alterations have been reported to be associated with numerous diseases. It has also been observed that somatic chromosomal aberrations (i.e. amplifications and deletions) in tumour samples have shown different clinical or pathological features in different cancer types or subtypes. With the remarkable capacity of current technologies in assessing copy number variants (CNVs), there has been a great wave of interest recently from the research community to investigate inheritable as well as somatic CNVs.

Broadly speaking, there are essentially three technological platforms for detecting copy number variation: array-based technology (including array comparative genomic hybridization (aCGH), as well as many other variants such as oligonucleotide array or bacterial artificial chromosome array), SNP genotyping technology, and next-generation sequencing technology. Our work focused on the analysis of array CGH data for the detection of chromosomal aberrations in breast cancer with a new Bayesian non-parametrics approach.

Dr Fabrizio Leisen is working in collaboration with researchers at MD Anderson Cancer Center, Harvard University, and Università di Pavia. He has recently obtained grant funding (FP7 Marie Curie CIG) for work on Flexible Bayesian non-parametric priors. The prior used in the above analysis is part of this project.

Screening for active compounds in drug discovery

A major thrust of research in pharmaceutical companies is the search for active compounds that have a variety of efficacious properties. A number of technologies are brought to bear on this. At an early stage large numbers of compounds are synthesised and tested for desirable features. This process involves mass automated screening and can be supplemented by other forms of data, perhaps involving similarity matrices of compounds as calculated in a variety of ways. Prof Philip Brown is involved with collaborators at GSK, Stevenage, and ECLT (Venice) in designing and modelling such data streams.

Statistical Genetics for Complex Disease Studies

Developing novel models and methods for analysing human genetic variations in relation to complex diseases and individually different drug response has been the longstanding focus of our research.

The recent meta-analyses of genome-wide association studies with sample sizes of over ten thousand have uncovered a large number of genetic variants underlying complex traits. Despite this success, the identified genetic variants usually have modest effect sizes and only account for a small proportion of the trait variation, resulting in the suggestion that many genetic variants, including both common and rare genetic variants, contribute little to phenotypic variation.

In fact, recently many resequencing-based studies of candidate genes have identified collections of rare variants associated with phenotypic variation. Although rare variants individually may make only tiny contributions to phenotypic variation, collectively rare variants may uncover a substantial proportion of missing heritability.

Our current research focuses on developing heterogeneous regression-type models with clustered haplotypes and scalable algorithms for improving current analysis of genome-wide association studies data.

Integrative analysis of 'omics' data

The last decade has witnessed a tremendous advance in our knowledge of systems biology based on the so-called 'omics' technologies. Omics spans a wide range of fields including genomics, transcriptomics, proteomics, metabolomics, pharmacogenomics, nutrigenomics, phylogenomics, interactomics and among others. There are already large amounts of data generated by these technologies.

One of the challenges offered to statisticians is how to integratively model these data in order to understand the underlying biological systems. Current research interests lie in developing mixture and Markov switching models for some of these data.

Prof Jian Zhang has collaborated on the above topics with researchers at the Max-Planck Institute for Molecular Genetics (Berlin), Utrecht University Medical Centre, and the University of Maaastricht. For more details, contact the Author for related publications in American Journal of Human Genetics, Human Heredity, and BMC in Medical Genetics.

Shape analysis in Biology

In biology there are many applications in which the theory of shape analysis is necessary. For example biologists like to describe the population mean shape, or the shape evolution of certain biological objects. These objects can be the outlines of cells; skulls measurements, facial features or general images to be used for species' classification.

Other interesting applications of shape analysis are in bioinformatics and in particular on the molecule matching algorithms. For example, biologists need to identify whether certain parts of a DNA molecule can be matched by parts of another one. We are applying likelihood theory for this high dimensional problem in order to find the optimal matching between two or more molecules. The geometry of shape spaces and MCMC methodology are necessary here.

Dr Alfred Kume collaborates in these areas with colleagues from the University of Nottingham and the University of Pescara.

Stochastic Models in Yeast Prion Research

Yeast prions are proteins that can exist in two states, a normal soluble state and an insoluble 'prion' state which forms polymers that grow and fragment within the cell. We mainly study the [PSI+] prion in the budding yeast Saccharomyces cerevisiae. We are interested in modelling the dynamics of the growth and fragmentation of polymers. These processes take place within yeast cells that are themselves growing and dividing by an asymmetric process in which mother cells produce buds that break off to form daughter cells.

The biological objective is to better understand the prions ability to self-replicate. The complexity of these processes means that they must be studied primarily by simulation. Current research interests lie in developing fast approximate simulation techniques, and in attempting to infer underlying parameters from limited and indirect experimental data.

Prof Byron Morgan and Prof Martin Ridout have interests in stochastic modelling in cell biology. In particular, they have been co-investigators on two BBSRC grants Stochastic models for yeast prions and Modelling prion dynamics in the living (yeast) cell (in collaboration with the School of Biosciences, University of Kent).

Researchers working in this theme

Name Keywords
Prof Phil Brown Drug screening designs, sequential Bayesian designs, Bayesian variable selection with multi- objectives
Dr Diana Cole, Prof Byron Morgan, and Prof Martin Ridout Cellular process, polymerisation, prion, stochastic modelling
Prof Jim Griffin Multiple dataset integration
Dr Alfred Kume Shape analysis, shape change, species classification; molecule matching
Dr Fabrizio Leisen Array CGH data, chromosomal aberrations, copy number variations, Beta-GOS prior, Bayesian non-parametrics
Prof Jian Zhang Genome-wide association studies on complex diseases, genetic variants, penalised regression models, 'omics' data, mixture and Markov switching models, and integrative analysis

Yeast prion colonies
Yeast prion colonies

School of Mathematics, Statistics and Actuarial Science (SMSAS), Sibson Building, Parkwood Road, Canterbury, CT2 7FS

Contact us

Last Updated: 10/10/2014