# Dr Cristiano Villa

Senior Lecturer in Statistics
Director of Studies Financial Mathematics programme

Cristiano joined the School of Mathematics, Statistics and Actuarial Science in July 2014. He has more than 12 years' experience in industry. At KPMG he was involved in financial and IT audit, risk advisory services and process-control implementation.

## Research interests

• Objective Bayesian analysis
• Bayesian model selection and change point analysis
• Bayesian nonparametric modelling for 'big data' problems and cyber-security

## Professional

• Chair of the Royal Statistical Society East Kent Group (RSS)
• Member of the Royal Statistical Society (RSS)
• Member of the International Society for Bayesian Analysis (ISBA)
• Member of the Information Systems Audit and Control Association (ISACA)
• Certified Information Risk and Information Systems Control (ISACA)

## Publications

### Article

• Villa, C., Leisen, F. and Rossini, L. (2019). Loss-based approach to two-piece location-scaledistributions with applications to dependent data. Statistical Methods & Applications [Online]. Available at: https://dx.doi.org/10.1007/s10260-019-00481-x.
Two-piece location-scale models are used for modeling data presenting departuresfrom symmetry. In this paper, we propose an objective Bayesian methodology forthe tail parameter of two particular distributions of the above family: the skewedexponential power distribution and the skewed generalised logistic distribution. Weapply the proposed objective approach to time series models and linear regressionmodels where the error terms follow the distributions object of study. The performanceof the proposed approach is illustrated through simulation experiments and real dataanalysis. The methodology yields improvements in density forecasts, as shown by theanalysis we carry out on the electricity prices in Nordpool markets.
• Villa, C. and Lee, J. (2019). A loss-based prior for variable selection in linear regression methods. Bayesian Analysis [Online]. Available at: http://dx.doi.org/10.1214/19-BA1162.
In this work we propose a novel model prior for variable selection in linear regression. The idea is to determine the prior mass by considering the \emph{worth} of each of the regression models, given the number of possible covariates under consideration. The worth of a model consists of the information loss and the loss due to model complexity. While the information loss is determined objectively, the loss expression due to model complexity is flexible and, the penalty on model size can be even customized to include some prior knowledge. Some versions of the loss-based prior are proposed and compared empirically. Through simulation studies and real data analyses, we compare the proposed prior to the Scott and Berger prior, for noninformative scenarios, and with the Beta-Binomial prior, for informative scenarios.
• Villa, C. and Rubio, F. (2018). Objective priors for the number of degrees of freedom of a multivariate t distribution and the t-copula. Computational Statistics and Data Analysis [Online] 124:197-217. Available at: https://doi.org/10.1016/j.csda.2018.03.010.
An objective Bayesian approach to estimate the number of degrees of freedom $(\nu)$ for the multivariate $t$ distribution and for the $t$-copula, when the parameter is considered discrete, is proposed. Inference on this parameter has been problematic for the multivariate $t$ and, for the absence of any method, for the $t$-copula. An objective criterion based on loss functions which allows to overcome the issue of defining objective probabilities directly is employed. The support of the prior for $\nu$ is truncated, which derives from the property of both the multivariate $t$ and the $t$-copula of convergence to normality for a sufficiently large number of degrees of freedom. The performance of the priors is tested on simulated scenarios and on real data: daily logarithmic returns of IBM and of the Center for Research in Security Prices Database.
• Villa, C. and Walker, S. (2017). On The Mathematics of The Jeffreys-Lindley Paradox. Communications in Statistics – Theory and Methods [Online] 46:12290-12298. Available at: http://dx.doi.org/10.1080/03610926.2017.1295073.
This paper is concerned with the well known Jeffreys-Lindley paradox. In a Bayesian set up, the so-called paradox arises when a point null hypothesis is tested and an objective prior is sought for the alternative hypothesis. In particular, the posterior for the null hypothesis tends to one when the uncertainty, i.e. the variance, for the parameter value goes to infinity. We argue that the appropriate way to deal with the paradox is to use simple mathematics, and that any philosophical argument is to be regarded as irrelevant.

### Thesis

• Oftadeh, E. (2017). Complex Modelling of Multi-Outcome Data With Applications to Cancer Biology.
In applied scientific areas such as economics, finance, biology, and medicine, it is often required to find the relationship between a set of independent variables (predictors) and a set of response variables (i.e., outcomes of an experiment). If we model individual outcomes separately, we potentially miss information of the correlation among outcomes. Therefore, it is desirable to model these outcomes simultaneously by multivariate linear regressions.
With the advent of high-throughput technology, there is an enormous amount of high dimensional multivariate regression data being generated at an extraordinary speed. However, only a small proportion of them are informative. This has imposed a challenge on modern statistics because of this high dimensionality. In this work, we propose methods and algorithms for modelling high-dimensional multivariate regression data. The contributions of this thesis are as follows.

Firstly, we propose two variable screening techniques to reduce the high dimension of predictors. One is a beamforming-based screening method which is based on a statistic called SNR. The second approach is a mixture-based screening where the screening is conducted through the so-called likelihood fusion.

Secondly, we propose a variable selection method called principal variable analysis (PVA). In PVA we take into account the correlation between response variables in the process of variable selection. We compare PVA with some of well-known variable selection methods by simulation studies, showing that PVA can substantially enhance the selection accuracy.

Thirdly, we develop a method for clustering and variable selection simultaneously, by using the likelihood fusion. We show the feature of the proposed method by simulation studies.

Fourthly, we study a Bayesian clustering problem through the mixture of normal distributions where we propose mixing-proportion dependent priors for component parameters.

Finally, we apply the proposed methods to cancer drug data. This data contain expression levels of 13321 genes across 42 cell lines and the responses of these cell lines to 131 drugs, recorded as fifty percent inhibitory concentration (IC50) values. We identify 37 genes which are important for predicting IC50 values. We found that although the expressions of these genes are weakly correlated, they are highly correlated in terms of their regression coefficients. We also identify a regression coefficient-based network between genes. We also show that 34 out of 37 selected genes have played certain roles in at least one type of cancer.
Moreover, by applying the likelihood fusion model to real data we classify the drugs into five groups.
Last updated