Research in Bayesian Statistics
Bayesian statistics is a subset of the field of statistics where some initial belief is expressed in terms of a statistical distribution. The research conducted in this area at Kent is mainly on Bayesian variable selection, Bayesian model fitting, Bayesian nonparametric methods, Monte Carlo Markov chain methods, and applications in areas including biology, economics, finance and engineering.
Areas of research
Data sets with many variables (perhaps thousands or millions) are becoming routinely available in areas such as genomics and healthcare. An important problem is the selection of a small subset of these variables which accurately predict one or more other variables, often using linear regression or generalized linear models.
The Bayesian approach to this problem involves putting a prior distribution on the regression coefficients with substantial probability of values close to zero. This leads to a posterior distribution which favours models with only a few regression coefficients which are far from zero.
Work at Kent has looked at the effect of different priors and the development of structured extensions for problems such as variable selection in generalized additive models or time-varying variable selection in time series regression models.
Fitting Bayesian models to data involves simulation-based algorithms such as Markov chain Monte Carlo methods or sequential Monte Carlo. Adaptive Monte Carlo methods allow parameters controlling the performance of these algorithms to change whilst the algorithm is running to give optimal performance.
Recent work at Kent has centred on the use of multiple chains for adaptive Monte Carlo and the application of adaptive Monte Carlo methods to variable selection problems. Computational methods for Bayesian nonparametric models have also been developed.
A very active sub-group has been working on Bayesian nonparametric methods. Work in this area is challenging since a prior must be placed on an infinite dimensional space (such as spaces of distribution).
Work at Kent has looked at the development of general priors, computational methods for these models, clustering and density estimation, and application in many areas including financial and survival data. Further details are available here.
A copula offers a flexible tool that demonstratively allows an experimenter to divide the cumulative distribution function into two parts: the marginal distributions and a copula function. The copula can completely characterize the statistical dependence of multiple variables.
The focus of this research area is to investigate a general Bayesian nonparametric approach for estimating a copula function and to investigate its application in financial risk and the insurance industry.
For some statistical problems it is difficult to express initial beliefs about parameters due to lack of initial information or the large number of parameters. In other scenarios, the initial information is available, but is intentionally disregarded to avoid the inferential results appearing guided by the scientist.
Objective Bayesian analysis encloses a multiplicity of techniques and procedures that allow the application of the Bayesian paradigm when there is little or no prior information. In other words, once a model has been chosen, priors on the parameters are automatically determined. In addition, Objective Bayes represents a useful way to introduce Bayesian idea in applied scenarios, where the idea of “subjectivity” is often seen with scepticism.
Recent work at Kent has led to the development of a new class of objective priors based on information theory and loss functions. In particular, an approach has been developed which can be applied to any discrete parameter (an area where objective Bayesian methods have not previously been applied).
In addition, these ideas have lead to contributions in Bayesian model selection and have been applied in other disciplines, such as finance and ecology.
Bayesian hierarchical models have been used to make estimates of risk factors for non-communicable diseases. These risk factors include obesity, high blood pressure, high cholesterol, and diabetes. The objective is to make estimates for 200 countries and territories, over several decades. Although large datasets are available, they are sparse in time and space, particularly in certain parts of the world, which makes Bayesian hierarchical models an ideal solution.
Estimates for most of the single risk factors have been published, and the focus of this research is moving towards modelling multiple risk factors simultaneously. These models are challenging to specify and fit, but will give us a much better understanding of who is at risk of these diseases.
A local company made a request for an advanced statistical method to reduce manufacturing cost. The evidence from the historical data is that the distribution of the parameter estimates is non-normal and exhibits multimodality and skewness. A novel Bayesian calibration was established to improve on the existing procedure.
See, for example, the Bayesian calibration illustrated in the figures, right.
|Dr James Bentham||Bayesian hierarchical models, MCMC, large health-related datasets|
|Prof Phil Brown||Multi-objective Bayesian variable selection, functional mixed models, screening designs, sparsity and structured priors|
|Prof Jim Griffin||Adaptive Monte Carlo methods, variable selection and regression, Bayesian nonparametric|
|Dr Fabrizio Leisen||Dependent and non-exchangeable nonparametric priors|
|Cristiano Villa||Model selection, Objective Bayes and copulas|
|Dr Xue Wang||Copulas, industrial applications|
|Prof Jian Zhang||Empirical Bayes and biclustering|
Click the images to view larger versions or view composite version
The topmost image shows that the combination of an informative prior (the black dotted line) and likelihood (dashed blue line) gives more certainty about the parameter value than the likelihood alone. The errors in predicting the performance of various products is shown underneath that. This shows that the Bayesian method has a much better performance than the likelihood method. Products above the black line are deemed to have failed the calibration process. The higher accuracy of the Bayesian method leads to less products failing the calibration process.
The second set of five plots show details of the predictions of 31 measurements for the failed products. The large errors in middle measurements for the likelihood method are due to the lack of data information around that area. On the other hand, the errors are much smaller for the Bayesian method since the prior contains the information around the area.
The chart below shows how Bayesian hierarchical models can be used to make estimates of risk factors for non-communicable diseases. Click the image to view a larger version.