Dr Diana Cole
Diana is an alumna of the University of Kent, having obtained her PhD in Statistics in 2003 for the thesis Stochastic Branching Processes in Biology. She then worked as a Research Associate on Stochastic models for yeast prion propagation and then on Parameter redundancy in ecological models. She became a Lecturer in Statistics in 2007.
Ecological statistics, Integrated Population Modelling, Identifiability, Parameter redundancy, Generalised linear mixed models.
Part of the SE@K (Statistical Ecology at Kent) research group and a member of the National Centre for Statistical Ecology (NCSE).
- Anita Jeyam (with Dr Rachel McCrea)
- Marina Jimenez-Munoz (with Dr Eleni Matechou)
Zhou, M. et al. (2019). Removal models accounting for temporary emigration. Biometrics [Online] 75:24-35. Available at: https://doi.org/10.1111/biom.12961.Removal of protected species from sites scheduled for development is often a legal requirement in order to minimize the loss of biodiversity. The assumption of closure in the classic removal model will be violated if individuals become temporarily undetectable, a phenomenon commonly exhibited by reptiles and amphibians. Temporary emigration can be modeled using a multievent framework with a partial hidden process, where the underlying state process describes the movement pattern of animals between the survey area and an area outside of the study. We present a multievent removal model within a robust design framework which allows for individuals becoming temporarily unavailable for detection. We demonstrate how to investigate parameter redundancy in the model. Results suggest the use of the robust design and certain forms of constraints overcome issues of parameter redundancy. We show which combinations of parameters are estimable when the robust design reduces to a single secondary capture occasion within each primary sampling period. Additionally, we explore the benefit of the robust design on the precision of parameters using simulation. We demonstrate that the use of the robust design is highly recommended when sampling removal data. We apply our model to removal data of common lizards, Zootoca vivipara, and for this application precision of parameter estimates is further improved using an integrated model.
Jimenez-Munoz, M. et al. (2019). Estimating age-dependent survival from age-aggregated ringing data - extending the use of historical records. Ecology and Evolution [Online] 9:769-779. Available at: https://doi.org/10.1002/ece3.4820.Bird ring-recovery data have been widely used to estimate demographic parameters
such as survival probabilities since the mid-twentieth century. However,
while the total number of birds ringed each year is usually known, historical
information on age at ringing is often not available. A standard ring-recovery
model, for which information on age at ringing is required, cannot be used
when historical data are incomplete. We develop a new model to estimate agedependent
survival probabilities from such historical data when age at ringing
is not recorded; we call this the historical data model. This new model provides
an extension to the model of Robinson (2010) by estimating the proportion of
the ringed birds marked as juveniles as an additional parameter. We conduct
a simulation study to examine the performance of the historical data model
and compare it with other models including the standard and conditional ringrecovery
models. Simulation studies show that the approach of Robinson (2010)
can cause bias in parameter estimates. In contrast, the historical data model
yields similar parameter estimates to the standard model. Parameter redundancy
results show that the newly developed historical data model is comparable
to the standard ring-recovery model, in terms of which parameters can be
estimated, and has fewer identifiability issues than the conditional model. We
illustrate the new proposed model using Blackbird and Sandwich Tern data.
The new historical data model allows us to make full use of historical data and
estimate the same parameters as the standard model with incomplete data and
in doing so, detect potential changes in demographic parameters further back
Allen, S. et al. (2017). Temporally varying natural mortality: Sensitivity of a virtual population analysis and an exploration of alternatives. Fisheries Research [Online] 185:185-197. Available at: http://dx.doi.org/10.1016/j.fishres.2016.09.002.Cohort reconstructions (CR) currently applied in Pacific salmon management estimate temporally variant exploitation, maturation, and juvenile natural mortality rates but require an assumed (typically invariant) adult natural mortality rate (dA), resulting in unknown biases in the remaining vital rates. We explored the sensitivity of CR results to misspecification of the mean and/or variability of dA, as well as the potential to estimate dA directly using models that assumed separable year and age/cohort effects on vital rates (separable cohort reconstruction, SCR). For CR, given the commonly assumed dA = 0.2, the error (RMSE) in estimated vital rates is generally small (? 0.05) when annual values of dA are low to moderate (? 0.4). The greatest absolute errors are in maturation rates, with large relative error in the juvenile survival rate. The ability of CR estimates to track temporal trends in the juvenile natural mortality rate is adequate (Pearson's correlation coefficient > 0.75) except for high dA (? 0.6) and high variability (CV > 0.35). The alternative SCR models allowing estimation of time-varying dA by assuming additive effects in natural mortality, fishing mortality, and/or maturation rates did not outperform CR across all simulated scenarios, and are less accurate when additivity assumptions are violated. Nevertheless an SCR model assuming additive effects on fishing and natural (juvenile and adult) mortality rates led to nearly unbiased estimates of all quantities estimated using CR, along with borderline acceptable estimates of the mean dA under multiple sets of conditions conducive to CR. Adding an assumption of additive effects on the maturation rates allowed nearly unbiased estimates of the mean dA as well. The SCR models performed slightly better than CR when the vital rates covaried as assumed. These separable models could serve as a partial check on the validity of CR assumptions about the adult natural mortality rate, or even a preferred alternative if there is strong reason to believe the vital rates, including juvenile and adult natural mortality rates, covary strongly across years or age classes as assumed.
Cole, D. and McCrea, R. (2016). Parameter Redundancy in Discrete State-Space and Integrated Models. Biometrical Journal [Online] 58:1071-1090. Available at: http://dx.doi.org/10.1002/bimj.201400239.Discrete state-space models are used in ecology to describe the dynamics of wild animal populations, with parameters, such as the probability of survival, being of ecological interest. For a particular parametrisation of a model it is not always clear which parameters can be estimated. This inability to estimate all parameters is known as parameter redundancy or a model is described as non-identifiable. In this paper we develop methods that can be used to detect parameter redundancy in discrete state-space models. An exhaustive summary is a combination of parameters that fully specify a model. To use general methods for detecting parameter redundancy a suitable exhaustive summary is required. This paper proposes two methods for the derivation of an exhaustive summary for discrete state-space models using discrete analogues of methods for continuous state-space models. We also demonstrate that combining multiple data sets, through the use of an integrated population model, may result in a model in which all parameters are estimable, even though models fitted to the separate data sets may be parameter redundant.
Cole, D. (2016). Reply to determining structural identifiability of parameter learning machines. Neurocomputing 173:2039-2040.The paper Ran and Hu (2014, Neurocomputing) examines identifiability and parameter redundancy in classes of models used in machine learning. This note discusses the results on global identifiability and also clarifies that the paper's results on parameter redundancy already exist in the paper Cole et al. (2010, Mathematical Biosciences).
Cole, D. et al. (2014). Does Your Species Have Memory? Analysing Capture-Recapture Data with Memory Models. Ecology and Evolution [Online] 4:2124-2133. Available at: http://dx.doi.org/10.1002/ece3.1037.1. We examine memory models for multi-site capture-recapture data. This is an important topic,as animals may exhibit behaviour that is more complex than simple first-order Markov movement between sites, when it is necessary to devise and fit appropriate models to data.
2. We consider the Arnason-Schwarz model for multi-site capture-recapture data, which incorporates just first-order Markov movement, and also two alternative models that allow for memory, the Brownie model and the Pradel model. We use simulation to compare two alternative tests which may be undertaken to determine whether models for multi-site capture-recapture data need to incorporate memory.
3. Increasing the complexity of models runs the risk of introducing parameters that cannot be estimated, irrespective of how much data are collected, a feature which is known as parameter redundancy. Rouan et al (JABES, 2009, pp 338-355) suggest a constraint that may be applied to overcome parameter redundancy when it is present in multi-site memory models. For this case, we apply symbolic methods to derive a simpler constraint, which allows more parameters to be estimated, and give general results not limited to a particular configuration. We also consider the effect sparse data can have on parameter redundancy, and recommend minimum sample sizes.
4. Memory models for multi-site capture-recapture data can be highly complex, and difficult to fit to data. We emphasise the importance of a structured approach to modelling such data, by considering a priori which parameters can be estimated, which constraints are needed in order for estimation to take place, and how much data need to be collected. We also give guidance on the amount of data needed to use two alternative families of tests for whether models for multi-site capture-recapture data need to incorporate memory.
Hubbard, B., Cole, D. and Morgan, B. (2014). Parameter Redundancy in Capture-Recapture-Recovery Models. Statistical Methodology 17:17-29.In principle it is possible to use recently-derived procedures to determine whether or not all the parameters of particular complex ecological models can be estimated using classical methods of statistical inference. If it is not possible to estimate all the parameters a model is parameter redundant. Furthermore, one can investigate whether derived results hold for such models for all lengths of study, and also how the results might change for specific data sets. In this paper we show how to apply these approaches to entire families of capture-recapture and capture-recapture-recovery models. This results in comprehensive tables, providing the definitive parameter redundancy status for such models. Parameter redundancy can also be caused by the data rather than the model, and how to investigate this is demonstrated through two applications, one to recapture data on dippers, and one to recapture-recovery data on great cormorants.
McCrea, R., Morgan, B. and Cole, D. (2013). Age-dependent mixture models for recovery data on animals marked at unknown age. Journal of the Royal Statistical Society: Series C (Applied Statistics) [Online] 62:101-113. Available at: http://dx.doi.org/10.1111/j.1467-9876.2012.01043.x.Data are often collected from wild animals that have been marked at unknown
age. As a result, standard probability models, fitted by maximum likelihood, cannot incorporate age dependence in probabilities of annual survival.We propose and fit new mixture models to ring–recovery data on birds ringed of unknown age, in which it is possible to incorporate age dependence in survival. It is shown that it is important to analyse simultaneously data on animals marked as young, and of known age, as otherwise the mixture model is parameter redundant. The potential of the approach is illustrated by a new analysis of data on mallards, Anas platyrhynchos, and the wider performance of the approach is demonstrated through simulation.The models provide a way of analysing correctly large numbers of historical data sets.
Choquet, R. and Cole, D. (2012). A Hybrid Symbolic-Numerical Method for Determining Model Structure. Mathematical Biosciences [Online] 236:117-125. Available at: http://dx.doi.org/10.1016/j.mbs.2012.02.002.In this article, we present a method for determining whether a model is at least locally identifiable and in the case of non-identifiable models whether any of the parameters are individually at least locally identifiable. This method combines symbolic and numeric methods to create an algorithm that is extremely accurate compared to other numeric methods and computationally inexpensive. A series of generic computational steps are developed to create a method that is ideal for practitioners to use. The algorithm is compared to symbolic methods for two capture-recapture models and a compartment model.
Cole, D. et al. (2012). Parameter redundancy in mark-recovery models. Biometrical Journal [Online] 54:507-523. Available at: http://dx.doi.org/10.1002/bimj.201100210.We provide a definitive guide to parameter redundancy in mark-recovery models, indicating, for a wide range of models, in which all the parameters are estimable, and in which models they are not. For these parameter-redundant models, we identify the parameter combinations that can be estimated. Simple, general results are obtained, which hold irrespective of the duration of the studies. We also examine the effect real data have on whether or not models are parameter redundant, and show that results can be robust even with very sparse data. Covariates, as well as time- or age-varying trends, can be added to models to overcome redundancy problems. We show how to determine, without further calculation, whether or not parameter-redundant models are still parameter redundant after the addition of covariates or trends.
Cole, D. (2012). Determining Parameter Redundancy of Multi-state Mark-recapture Models for Sea Birds. Journal of Ornithology [Online] 152:S305-S315. Available at: http://dx.doi.org/10.1007/s10336-010-0574-0.Multi-state mark–recapture models are structurally complex models, and in particular the complexity increases when there are unobservable states. Until recently, determining whether or not such models were parameter redundant was only possible numerically. In this paper, we show how it now possible to examine parameter redundancy of such models symbolically. The advantage of this approach is that you can determine exactly how many parameters can be estimated in a model for any number of years of marking and recovery, as well as which combinations of parameters can be estimated. Here, we illustrate how the new methodology works for multi-state models.Wefurther develop rules for determining the parameter redundancy status of a whole family of multi-state mark–recapture models.
Cole, D., Morgan, B. and Titterington, D. (2010). Determining the parametric structure of models. Mathematical Biosciences [Online] 228:16-30. Available at: http://dx.doi.org/10.1016/j.mbs.2010.08.004.In this paper we develop a comprehensive approach to determining the parametric structure of models. This involves considering whether a model is parameter redundant or not and investigating model identifiability. The approach adopted makes use of exhaustive summaries, quantities that uniquely define the model. We review and generalise previous work on evaluating the symbolic rank of an appropriate derivative matrix to detect parameter redundancy, and then develop further tools for use within this framework, based on a matrix decomposition. Complex models, where the symbolic rank is difficult to calculate, may be simplified structurally using reparameterisation and by finding a reduced-form exhaustive summary. The approach of the paper is illustrated using examples from ecology, compartment modelling and Bayes networks. This work is topical as models in the biosciences and elsewhere are becoming increasingly complex.
Cole, D. and Morgan, B. (2010). Parameter Redundancy with Covariates. Biometrika [Online] 97:1002-1005. Available at: http://dx.doi.org/10.1093/biomet/asq041.We show how to determine the parameter redundancy status of a model with covariates from that of the same model without covariates, thereby simplifying the calculation considerably. A matrix decomposition is necessary to ensure that the symbolic computation computer programmes return correct results. The paper is illustrated by mark-recovery and latent-class models, with associated Maple code.
Cole, D. and Morgan, B. (2010). A note on determining parameter redundancy in age-dependent tag return models for estimating fishing mortality, natural mortality and selectivity. Journal of Agricultural, Biological, and Environmental Statistics [Online] 15:431-434. Available at: http://dx.doi.org/10.1007/s13253-010-0026-6.Jiang et al. (JABES 12:177-194, 2007) present models for tag return data on fish. They examine whether the models are parameter redundant, but need to resort to numerical methods as symbolic methods were sometimes found to be intractable. Also, their results are only applicable for a specified number of years of tagging data and age-classes. Here we show how symbolic methods can in fact be used and also how conclusions apply to any number of years of tagging data and age-classes.
Byrne, L. et al. (2009). The Number and Transmission of [PSI+] Prion Seeds (Propagons) in the Yeast Saccharomyces Cerevisiae. PLoS ONE [Online] Online. Available at: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0004670.Abstract Top
Yeast (Saccharomyces cerevisiae) prions are efficiently propagated and the on-going generation and transmission of prion seeds (propagons) to daughter cells during cell division ensures a high degree of mitotic stability. The reversible inhibition of the molecular chaperone Hsp104p by guanidine hydrochloride (GdnHCl) results in cell division-dependent elimination of yeast prions due to a block in propagon generation and the subsequent dilution out of propagons by cell division.
Analysing the kinetics of the GdnHCl-induced elimination of the yeast [PSI+] prion has allowed us to develop novel statistical models that aid our understanding of prion propagation in yeast cells. Here we describe the application of a new stochastic model that allows us to estimate more accurately the mean number of propagons in a [PSI+] cell. To achieve this accuracy we also experimentally determine key cell reproduction parameters and show that the presence of the [PSI+] prion has no impact on these key processes. Additionally, we experimentally determine the proportion of propagons transmitted to a daughter cell and show this reflects the relative cell volume of mother and daughter cells at cell division.
While propagon generation is an ATP-driven process, the partition of propagons to daughter cells occurs by passive transfer via the distribution of cytoplasm. Furthermore, our new estimates of n0, the number of propagons per cell (500–1000), are some five times higher than our previous estimates and this has important implications for our understanding of the inheritance of the [PSI+] and the spontaneous formation of prion-free cells.
Byrne, L. et al. (2007). Cell division is essential for elimination of the yeast [PSI+] prion by guanidine hydrochloride. Proceedings of the National Academy of Sciences of the United States of America [Online] 104:11688-11693. Available at: http://dx.doi.org/10.1073/pnas.0701392104.Guanidine hydrochloride (Gdn center dot HCl) blocks the propagation of yeast prions by inhibiting Hsp104, a molecular chaperone that is absolutely required for yeast prion propagation. We had previously proposed that ongoing cell division is required for Gdn center dot HCl-induced loss of the [PSI+] prion. Subsequently, Wu et al. [Wu Y, Greene LE, Masison DC, Eisenberg E (2005) Proc Nat] Acad Sci USA 102:1278912794] claimed to show that Gdn center dot HCl can eliminate the [PSI+] prion from alpha-factor-arrested cells leading them to propose that in Gdn center dot HCl center dot treated cells the prion aggregates are degraded by an Hsp104-independent mechanism. Here we demonstrate that the results of Wu et al can be explained by an unusually high rate of alpha-factor-induced cell death in the [PSI+] strain (780-1D) used in their studies. What appeared to be no growth in their experiments was actually no increase in total cell number in a dividing culture through a counterbalancing level of cell death. Using media-exchange experiments, we provide further support for our original proposal that elimination of the [PSI+] prion by Gdn center dot HCl requires ongoing cell division and that prions are not destroyed during or after the evident curing phase.
Cole, D. et al. (2007). Approximations for expected generation number. Biometrics [Online] 63:1023-1030. Available at: http://dx.doi.org/10.1111/j.1541-0420.2007.00780.x.A deterministic formula is commonly used to approximate the expected generation number of a population of growing cells. However, this can give misleading results because it does not allow for natural variation in the times that individual cells take to reproduce. Here we present more accurate approximations for both symmetric and asymmetric cell division. Based on the first two moments of the generation time distribution, these approximations are also robust. We illustrate the improved approximations using data that arise from monitoring individual yeast cells under a microscope and also demonstrate how the approximaitions can be used when such detailed data are not available.
Ridout, M. et al. (2006). New approximations to the Malthusian parameter. Biometrics [Online] 62:1216-1223. Available at: http://dx.doi.org/10.1111/j.1541-0420.2006.00564.x.Approximations to the Malthusian parameter of an age-dependent branching process are
obtained in terms of the moments of the lifetime distribution, by exploiting a link with renewal
theory. In several examples, the new approximations are more accurate than those currently in use,
even when based on only the first two moments. The new approximations are extended to include a
form of asymmetric cell division that occurs in some species of yeast. When used for inference, the
new approximations are shown to have high efficiency.
Cole, D., Morgan, B. and Ridout, M. (2005). Models for strawberry inflorescence data. Journal of Agricultural, Biological, and Environmental Statistics [Online] 10:411-423. Available at: http://dx.doi.org/10.1198/108571105X80761.The flowers of strawberry plants grow on very variable branched structures called inflorescences, in which each branch gives rise to 0, 1, or 2 offspring branches. We extend previous modeling of the number of strawberry flowers at each individual level in the inflorescence structure conditional on the number of strawberry flowers at the previous level. We consider a range of logistic regression models, including models that incorporate inflorescence effects and random effects. The models can be used to summarize the overall structure of any particular variety and to indicate the main differences between varieties. For the data of the article, we show that models based on convolutions of correlated Bernoulli random variables outperform binomial regression models.
Cole, D. et al. (2004). Estimating the number of prions in yeast cells. Mathematical Medicine and Biology [Online] 21:369-395. Available at: http://dx.doi.org/doi:10.1093/imammb/21.4.369.Certain yeast cells contain proteins that behave like the mammalian prion PrP and are called yeast prions. The yeast prion protein Sup35p can exist in one of two stable forms, giving rise to phenotypes [PSI+] and [psi(-)]. If the chemical guanidine hydrochloride (GdnHCl) is added to a culture of growing [PSI+] cells, the proportion of [PSI+] cells decreases overtime. This process is called curing and is due to a failure to propagate the prion form of Sup35p. We describe how curing can be modelled, and improve upon previous models for the underlying processes of cell division and prion segregation; the new model allows for asymmetric cell division and unequal prion segregation. We conclude by outlining plans for future experimentation and modelling.
Cole, D., Morgan, B. and Ridout, M. (2003). Generalized linear mixed models for strawberry inflorescence data. Statistical Modelling [Online] 3:273-290. Available at: http://dx.doi.org/10.1191/1471082X03st060oa.Strawberry inflorescences have a variable branching structure. This paper demonstrates how the inflorescence structure can be modelled concisely using binomial logistic generalized linear mixed models. Many different procedures exist for estimating the parameters of generalized linear mixed models, including penalized likelihood, EM, Bayesian techniques, and simulated maximum likelihood. The main methods are reviewed and compared for fitting binomial logistic generalized linear mixed models to strawberry inflorescence data. Simulations matched to the original data are used to show that a modified EM method due to Steele (1996) is clearly the best, in terms of speed and mean-squared-error performance, for data of this kind.
Newman, K. et al. (2014). Modelling Population Dynamics: model formulation, fitting and assessment using state-space methods. Springer.Provides unifying framework for estimating the abundance of open populations that are subject to births, deaths and movement in and out of the population
Going beyond the estimation of abundance, teaches ways of determining the reasons for variation in abundance over time and survival probabilities
Ecologists and wildlife managers will learn to model dynamics in annual cycles for populations of large vertebrates, including discrete time models
This book gives a unifying framework for estimating the abundance of open populations: populations subject to births, deaths and movement, given imperfect measurements or samples of the populations. The focus is primarily on populations of vertebrates for which dynamics are typically modelled within the framework of an annual cycle, and for which stochastic variability in the demographic processes is usually modest. Discrete-time models are developed in which animals can be assigned to discrete states such as age class, gender, maturity, population (within a metapopulation), or species (for multi-species models).
The book goes well beyond estimation of abundance, allowing inference on underlying population processes such as birth or recruitment, survival and movement. This requires the formulation and fitting of population dynamics models. The resulting fitted models yield both estimates of abundance and estimates of parameters characterizing the underlying processes.
Cole, D. and Choquet, R. (2012). Parameter Redundancy in Models with Individual Random Effects. University of Kent.In parameter redundant models it is not possible to estimate all the parameters regardless of the amount of data collected. Introducing individual random effects may result in models that are no longer parameter redundant. Here we show how it is possible to determine whether or not a wide class of models with individual random effects are parameter redundant.
Cole, D. and Freeman, S. (2012). Estimating age-specific survival rates from historical ringing data: Comment. University of Kent Technical Report.Robinson (2010) describes a model for recoveries of Sandwich Terns Sterna sandvicensis. As is often the case in the UK, the numbers of birds ringed each year, at least until recently, are not fully computerised. The model assumes a fixed proportion of birds in different age classes. We show that the proportion can be estimated from the data, improving accuracy in estimates of survival.
Cole, D. (2009). A note on the identifiability of certain latent class models. University of Kent.Wiering (2005, Statistics and Probability Letters, 75, 211-218) provides conditions
for the identifiability of a class of latent models. Here we derive an alternative more
general method of proving this result, which is based on standard identi¯ability
methods involving forming Jacobians.
Ridout, M., Woodcock, C. and Cole, D. (2006). Generation number in a pure birth process. University of Kent.