Integrated population modelling is widely used in statistical ecology. It allows data from population time series and independent surveys to be analysed simultaneously. In classical analysis the time-series likelihood component can be conveniently approximated using Kalman filter methodology. However, the natural way to model systems which have a discrete state space is to use hidden Markov models (HMMs). The proposed method avoids the Kalman filter approximations and Monte Carlo simulations. Subject to possible numerical sensitivity analysis, it is exact, flexible, and allows the use of standard techniques of classical inference. We apply the approach to data on Little owls, where
the model is shown to require a one-dimensional state space, and Northern lapwings, with a two-dimensional state space. In the former example the method identifies a parameter redundancy which changes the perception of the data needed to estimate immigration in integrated population modelling. The latter example may be analysed using either first- or second-order HMMs, describing numbers of one-year olds and adults or adults only, respectively. The use of first-order chains is found to be more efficient, mainly due to the smaller number of one-year olds than adults in this application. For the lapwing modelling it is necessary to group the states in order to reduce the large dimension of the state space. Results check with Bayesian and Kalman filter analyses, and avenues for future research are identified.
Cowen, L. et al. (2017). Hidden Markov Models for Extended Batch Data. Biometrics[Online]. Available at: http://dx.doi.org/10.1111/biom.12701.
Batch marking provides an important and efficient way to estimate the survival probabilities and population sizes of wild animals. It is particularly useful when dealing with animals that are difficult to mark individually. For the first time, we provide the likelihood for extended batch-marking experiments. It is often the case that samples contain individuals that remain unmarked, due to time and other constraints, and this information has not previously been analyzed. We provide ways of modeling such information, including an open N-mixture approach. We demonstrate that models for both marked and unmarked individuals are hidden Markov models; this provides a unified approach, and is the key to developing methods for fast likelihood computation and maximization. Likelihoods for marked and unmarked individuals can easily be combined using integrated population modeling. This allows the simultaneous estimation of population size and immigration, in addition to survival, as well as efficient estimation of standard errors and methods of model selection and evaluation, using standard likelihood techniques. Alternative methods for estimating population size are presented and compared. An illustration is provided by a weather-loach data set, previously analyzed by means of a complex procedure of constructing a pseudo likelihood, the formation of estimating equations, the use of sandwich estimates of variance, and piecemeal estimation of population size. Simulation provides general validation of the hidden Markov model methods developed and demonstrates their excellent performance and efficiency. This is especially notable due to the large numbers of hidden states that may be typically required
Besbeas, P. and Morgan, B. (2017). Variance estimation for integrated population models. Advances in Statistical Analysis[Online]. Available at: http://dx.doi.org/10.1007/s10182-017-0304-5.
Abstract State-space models are widely used in ecology. However, it is well known that in practice it can be difficult to estimate both the process and observation variances that occur in such models. We consider this issue for integrated population models,which incorporate state-space models for population dynamics. To some extent, the mechanism of integrated population models protects against this problem, but it can still arise, and two illustrations are provided, in each of which the observation variance is estimated as zero. In the context of an extended case study involving data on British Grey herons, we consider alternative approaches for dealing with the problem when it occurs. In particular, we consider penalised likelihood, a method based on fitting splines and a method of pseudo-replication, which is undertaken via a simple bootstrap procedure. For the case study of the paper, it is shown that when it occurs, an estimate of zero observation variance is unimportant for inference relating to the model parameters of primary interest. This unexpected finding is supported by a simulation study.
Besbeas, P., McCrea, R. and Morgan, B. (2017). Integrated population model selection in ecology. In prep.
Cowen, L. et al. (2014). A comparison of abundance estimates from extended batch-marking and Jolly-Seber type experiments. Ecology and Evolution[Online]4:210-218. Available at: http://dx.doi.org/10.1002/ece3.899.
Little attention has been paid to the use of multi-sample batch-marking studies,
as it is generally assumed that an individual's capture history is necessary for
fully efficient estimates. However, recently, Huggins et al. (2010) present a
pseudo-likelihood for a multi-sample batch-marking study where they used
estimating equations to solve for survival and capture probabilities and then
derived abundance estimates using a Horvitz–Thompson-type estimator. We
have developed and maximized the likelihood for batch-marking studies. We
use data simulated from a Jolly–Seber-type study and convert this to what
would have been obtained from an extended batch-marking study. We compare
our abundance estimates obtained from the Crosbie–Manly–Arnason–Schwarz
(CMAS) model with those of the extended batch-marking model to determine
the efficiency of collecting and analyzing batch-marking data. We found that
estimates of abundance were similar for all three estimators: CMAS, Huggins,
and our likelihood. Gains are made when using unique identifiers and employ-
ing the CMAS model in terms of precision; however, the likelihood typically
had lower mean square error than the pseudo-likelihood method of Huggins
et al. (2010). When faced with designing a batch-marking study, researchers
can be confident in obtaining unbiased abundance estimators. Furthermore,
they can design studies in order to reduce mean square error by manipulating
capture probabilities and sample size.
Besbeas, P. and Morgan, B. (2014). Goodness-of-fit of integrated population models using calibrated simulation. Methods in Ecology and Evolution[Online]5:1373-1382. Available at: http://dx.doi.org/10.1111/2041-210X.12279.
1. Integrated population modelling is proving to be an important and useful technique in statistical ecology.
However, there is currently no simple formal method for judging how well models fit data, when potentially sev-
eral different data sets described by different structured models are being analysed in combination.
2. We propose and evaluate a new approach, of calibrated simulation. Here, comparative data sets are obtained
from simulating data when model parameter values are obtained from the assumed asymptotic normal distribu-
tion of the maximum-likelihood estimators from the real data. The approach is motivated and justified by Baye-
sian P-values. Calibration of the resulting statistics is achieved as repeated data sets are easily simulated from the
fitted model. The method requires the specification of model discrepancy measures, and we show how different
measures can highlight different aspects of fit.
3. Calibration is only strictly necessary if the statistics proposed may appear to be extreme.
4. The approach of using calibrated simulation to check the goodness-of-fit of integrated population models is
demonstrated by application to data sets on lapwings and herons. In each case, there are two data sets involved
in the integrated analysis, and for each component data set, discrepancy measures of goodness-of-fit are
obtained. For the lapwing application, as replication is efficient, it is possible to calibrate the procedure simply by
using additional simulations. The heron application is shown to be feasible, but is substantially harder to cali-
brate, due to the presence of productivity thresholds that need to be estimated using profile likelihood methods.
We demonstrate the importance of taking more than one discrepancy measure for time-series data. Avenues for
future research are outlined. This article has supplementary materials on line.
Besbeas, P. and Morgan, B. (2012). A threshold model for heron productivity. Journal of Agricultural, Biological, and Environmental Statistics[Online]17:128-141. Available at: http://dx.doi.org/10.1007/s13253-011-0080-8.
We demonstrate the potential of conditionally Gaussian state-space models in integrated
population modeling, when certain model parameters may be functions of previous
observations. The approach is applied to a heron census, and the data are best
described by a model with three population-size thresholds which determine the population
productivity. The model provides an explanation of how the population rebounds
rapidly after major falls in size, which are characteristic of the data. By contrast, a
simple logarithmic regression of productivity on population size was not significant.
The results are of ecological interest, and suggest hypotheses for further investigation.
Viallefont, A. et al. (2012). Estimating survival and transition rates from aggregate sightings of animals. Journal of Ornithology[Online]152:381-391. Available at: http://dx.doi.org/10.1007/s10336-010-0588-7.
We compare and contrast two methods for fitting probability models to data which arise when animals are marked in batches, without individual identification, and live in several different sites or states. The methods are suitable for populations in which animals are marked at birth and then resighted over several sites/states, for small animals going through several growth stages (insects, amphibiae, etc.), as well as for the follow-up of animals released after laboratory colour-marking, for example. The methods we consider include a multi-state model for resightings of batch-marked animals, allowing us to estimate survival, transitions, and sighting probabilities. One method is based on the EM algorithm, and the second uses the Kalman filter for computing likelihoods. The methods are illustrated on real data from a cohort of Great Cormorants Phalacrocorax carbo, and their performance is evaluated using simulation. We recommend identifying the batches, for instance in the case of sites, by using a different colour on each site at the time of marking, and in general the use of the Kalman filter rather than the EM-based approach.
Besbeas, P. and Morgan, B. (2012). Kalman filter initialization for integrated population modelling. Journal of the Royal Statistical Society: Series C[Online]61:151-162. Available at: http://dx.doi.org/10.1111/j.1467-9876.2011.01012.x.
In integrated population modelling in ecology, where data from multiple surveys are analysed simultaneously, the Kalman filter may be used to approximate a component likelihood for a state space model of population count data. We evaluate a new method for initiating this Kalman filter, based on a stable age distribution. The new method is illustrated and compared with alternative approaches by application to data on the grey heron. The new method is simple to use, extends naturally to the case of multivariate time series of count data and performs well in a simulation study.
Besbeas, P. and Morgan, B. (2012). Kalman filter initialisation for integrated population modelling. Journal of the Royal Statistical Society: Series C (Applied Statistics)[Online]61:151-162. Available at: http://dx.doi.org/10.1111/j.1467-9876.2011.01012.x.
In integrated population modelling in ecology, where data from multiple surveys are
analysed simultaneously, the Kalman filter may be used to approximate a component likelihood
for a state space model of population count data.We evaluate a new method for initiating this
Kalman filter, based on a stable age distribution. The new method is illustrated and compared
with alternative approaches by application to data on the grey heron.The new method is simple
to use, extends naturally to the case of multivariate time series of count data and performs well
in a simulation study.
McCrea, R. et al. (2010). Multi-site integrated population modelling. Journal of Agricultural, Biological, and Environmental Statistics[Online]15:539-561. Available at: http://dx.doi.org/10.1007/s13253-010-0027-5.
We examine the performance of a method of integrated population modelling for the joint analysis of different types of demographic data on individuals that exist in, and move between, different sites. The value of the approach is demonstrated by a simulation study which shows substantial improvement in parameter estimation when site-specific census data are combined with demographic data. The multivariate normal approximation to a multi-state mark-recapture likelihood is evaluated, and the performance of a diagonal variance-covariance matrix for the approximation is also examined. The work is motivated by a study of great cormorants. Analysis of the cormorant data suggests that breeders survive better than non-breeders, and also that probabilities of recruitment to breeding have been declining over time for all the colonies of the study. Supplementary material, including notes on the computation of standard errors and extended simulation results, are available online.
Tavecchia, G. et al. (2009). Estimating population size and hidden demographic parameters with state-space modelling. American Naturalist[Online]173:722-733. Available at: http://dx.doi.org/10.1086/598499.
Recent research has shown how process variability and measurement error in ecological time series can be separated using state-space modeling techniques to combine individual-based data with population counts. We extend the current maximum likelihood approaches to allow the incorporation of sex-and age-dependent counts and provide an application to data from a population of Soay sheep living on the St. Kilda archipelago. We then empirically evaluate the performance and potential of the method by sequentially omitting portions of the data available. We show that the use of multivariate time series extends the power of the state-space modeling approach. The variance of measurement error was found to be smaller for males and the sex ratio of lambs to be skewed toward females and constant over time. Our results indicated that demographic parameters estimated using state-space modeling without relevant individual-based data were in close agreement with those obtained from mark-recapture-recovery analyses alone. Similarly, estimates of population size obtained when the corresponding count observations were unavailable were close to those from the entire data set. We conclude that the approach illustrated here has great potential for estimating hidden demographic parameters, planning studies on population monitoring, and estimating both historical and future population size.
Besbeas, P. and Morgan, B. (2008). Improved estimation of the stable laws. Statistics and Computing[Online]18:219-231. Available at: http://dx.doi.org/10.1007/s11222-008-9050-6.
Fitting general stable laws to data by maximum likelihood is important but difficult. This is why much research has considered alternative procedures based on empirical characteristic functions. Two problems then are how many values of the characteristic function to select, and how to position them. We provide recommendations for both of these topics. We propose an arithmetic spacing of transform variables, coupled with a recommendation for the location of the variables. It is shown that arithmetic spacing, which is far simpler to implement, closely approximates optimum spacing. The new methods that result are compared in simulation studies with existing methods, including maximum-likelihood. The main conclusion is that arithmetic spacing of the values of the characteristic function, coupled with appropriately limiting the range for these values, improves the overall performance of the regression-type method of Koutrouvelis, which is the standard procedure for estimating general stable law parameters.
Gauthier, G. et al. (2007). Population growth in snow geese: A modeling approach integrating demographic and survey information. Ecology[Online]88:1420-1429. Available at: http://dx.doi.org/10.1890/06-0953.
There are few analytic tools available to formally integrate information coming from population surveys and demographic studies. The Kalman filter is a procedure that facilitates such integration. Based on a state-space model, we can obtain a likelihood function for the survey data using a Kalman filter, which we may then combine with a likelihood for the demographic data. In this paper, we used this combined approach to analyze the population dynamics of a hunted species, the Greater Snow Goose ( Chen caerulescens atlantica), and to examine the extent to which it can improve previous demographic population models. The state equation of the state-space model was a matrix population model with fecundity and regression parameters relating adult survival and harvest rate estimated in a previous capture-recapture study. The observation equation combined the output from this model with estimates from an annual spring photographic survey of the population. The maximum likelihood estimates of the regression parameters from the combined analysis differed little from the values of the original capture-recapture analysis, though their precision improved. The model output was found to be insensitive to a wide range of coefficient of variation ( CV) in fecundity parameters. We found a close match between the surveyed and smoothed population size estimates generated by the Kalman filter over an 18-year period, and the estimated CV of the survey (0.078-0.150) was quite compatible with its assumed value (similar to 0.10). When we used the updated parameter values to predict future population size, the model underestimated the surveyed population size by 18% over a three-year period. However, this could be explained by a concurrent change in the survey method. We conclude that the Kalman filter is a promising approach to forecast population change because it incorporates survey information in a formal way compared with ad hoc approaches that either neglect this information or require some parameter or model tuning
Besbeas, P. and Freeman, S. (2006). Methods for joint inference from panel survey and demographic data. Ecology87:1138-1145.
A number of methods for joint inference from animal abundance and demographic data have been proposed in recent years, each with its own advantages. A new approach to analyzing panel survey and demographic data simultaneously is described. The approach fits population-dynamics models to the survey data, rather than to a single index of abundance derived from them and thus avoids disadvantages inherent ill analyzing such an index. The methodology is developed and illustrated with British Lapwing data, and the results are compared with those obtained from existing approaches. The estimates of demographic parameters and population indices are similar for all methods. The results of a simulation study show that the new method performs well in terms of mean squared error.
Besbeas, P., Freeman, S. and Morgan, B. (2005). The potential of integrated population modelling. Australian & New Zealand Journal of Statistics[Online]47:35-48. Available at: http://dx.doi.org/10.1111/j.1467-842X.2005.00370.x.
Recent work has shown how the Kalman filter can be used to provide a simple framework for the integrated analysis of wild animal census and mark-recapture-recovery data. The approach has been applied to data on a range of bird species, on Soay sheep and on grey seals. This paper reviews the basic ideas, and then indicates the potential of the method through a series of new applications to data on the northern lapwing, a species of conservation interest that has been in decline in Britain for the past 20 years. The paper analyses a national index, as well as data from individual sites; it looks for a change-point in productivity, corresponding to the start of the decline in numbers, considers how to select appropriate covariates, and compares productivity between different habitats. The new procedures can be applied singly or in combination.
Besbeas, P. and Morgan, B. (2004). Efficient and robust estimation for the one-sided stable distribution of index (1)-(2). Statistics & Probability Letters[Online]66:251-257. Available at: http://dx.doi.org/10.1016/j.spl.2003.10.013.
We present a novel distribution for modelling count data that are underdispersed relative to the Poisson distribution. The distribution is a form of weighted Poisson distribution and is shown to have advantages over other weighted Poisson distributions that have been proposed to model underdispersion. One key difference is that the weights in our distribution are centred on the mean of the underlying Poisson distribution. Several illustrative examples are presented that illustrate the consistently good performance of the distribution.
Besbeas, P. and Morgan, B. (2004). Integrated squared error estimation of normal mixtures. Computational Statistics and Data Analysis[Online]44:517-526. Available at: http://dx.doi.org/10.1016/S0167-9473(02)00251-7.
Based on the empirical characteristic function, the integrated squared error criterion for normal mixtures is shown to have a simple form for a particular weight function. When the parameter of that function is chosen as the smoothed cross-validation selector in kernel density estimation, the estimator which minimises the criterion is shown to perform well in a simulation study. In comparison with maximum likelihood and a new recently proposed method there are better bias and standard deviation results for the method of this paper. Furthermore, the new estimator is less likely to fail and is appreciably more robust.
Besbeas, P., De Feis, I. and Sapatinas, T. (2004). A comparative simulation study of wavelet shrinkage estimators for Poisson counts. International Statistical Review[Online]72:209-237. Available at: http://dx.doi.org/10.1111/j.1751-5823.2004.tb00234.x.
Using computer simulations, the finite sample performance of a number of classical and Bayesian wavelet shrinkage estimators for Poisson counts is examined. For the purpose of comparison, a variety of intensity functions, background intensity levels, sample sizes, primary resolution levels, wavelet filters and performance criteria are employed. A demonstration is given of the use of some of the estimators to analyse a data set arising in high-energy astrophysics. Following the philosophy of reproducible research, the MATLAB programs and real-life data example used in this study are made freely available.
Besbeas, P., Lebreton, J. and Morgan, B. (2003). The efficient integration of abundance and demographic data. Journal of the Royal Statistical Society: Series C (Applied Statistics)[Online]52:95-102. Available at: http://dx.doi.org/10.1111/1467-9876.00391.
A drawback of a new method for integrating abundance and mark-recapture-recovery data is the need to combine likelihoods describing the different data sets. Often these likelihoods will be formed by using specialist computer programs, which is an obstacle to the joint analysis. This difficulty is easily circumvented by the use of a multivariate normal approximation. We show that it is only necessary to make the approximation for the parameters of interest in the joint analysis. The approximation is evaluated on data sets for two bird species and is shown to be efficient and accurate.
Abramovich, F., Besbeas, P. and Sapatinas, T. (2002). Empirical Bayes approach to block wavelet function estimation. Computational Statistics and Data Analysis39:435-451.
Wavelet methods have demonstrated considerable success in function estimation through term-by-term thresholding of the empirical wavelet coefficients, However, it has been shown that grouping the empirical wavelet coefficients into blocks and making simultaneous threshold decisions about all the coefficients in each block has a number of advantages over term-by-term wavelet thresholding, including asymptotic optimality and better mean squared error performance in finite sample situations. An empirical Bayes approach to incorporating information on neighbouring empirical wavelet coefficients into function estimation that results in block wavelet shrinkage and block wavelet thresholding estimators is considered. Simulated examples are used to illustrate the performance of the resulting estimators, and to compare these estimators with several existing non-Bayesian block wavelet thresholding estimators. It is observed that the proposed empirical Bayes block wavelet shrinkage and block wavelet thresholding estimators outperform the non-Bayesian block wavelet thresholding estimators in finite sample situations. An application to a data set that was collected in an anaesthesiological study is also presented. (C) 2002 Elsevier Science B.V. All rights reserved.
Besbeas, P. et al. (2002). Integrating mark-recapture-recovery and census data to estimate animal abundance and demographic parameters. Biometrics[Online]58:540-547. Available at: http://dx.doi.org/10.1111/j.0006-341X.2002.00540.x.
In studies of wild animals, one frequently encounters both census and mark-recapture-recovery data. We show how a state-space model for census data in combination with the usual multinomial-based models for ring-recovery data provide estimates of productivity not available from either type of data, alone. The approach is illustrated on two British bird species. For the lapwing; we calibrate how its recent decline could be due to a decrease in productivity. For the heron, there is no evidence for a decline in productivity, and the combined analysis increases significantly the strength of logistic regressions of survival on winter severity.
Fearn, T., Brown, P. and Besbeas, P. (2002). A Bayesian decision theory approach to variable selection for discrimination. Statistics and Computing[Online]12:253-260. Available at: http://dx.doi.org/10.1023/A:1020702927247.
Motivated by examples in spectroscopy, we study variable selection for discrimination in problems with very many predictor variables. Assuming multivariate normal distributions with common variance for the predictor variables within groups, we develop a Bayesian decision theory approach that balances costs for variables against a loss due to classification errors. The approach is computationally intensive, requiring a simulation to approximate the intractable expected loss and a search, using simulated annealing, over a large space of possible subsets of variables. It is illustrated by application to a spectroscopic example with 3 groups, 100 variables, and 71 training cases, where the approach finds subsets of between 5 and 14 variables whose discriminatory power is comparable with that of linear discriminant analysis using principal components derived from the full 100 variables. We study both the evaluation of expected loss and the tuning of the simulated annealing for the example, and conclude that computational effort should be concentrated on the search.
Besbeas, P. and Morgan, B. (2001). Integrated Squared Error Estimation of Cauchy Parameters. Statistics & Probability Letters[Online]55:397-401. Available at: http://dx.doi.org/10.1016/S0167-7152(01)00153-5.
We show that integrated squared error estimation of the parameters of a Cauchy distribution, based on the empirical characteristic function, is simple, robust and efficient. The k-L estimator of Koutrouvelis (Biometrika 69 (1982) 205) is more difficult to use, less robust and at best only marginally more efficient. (C) 2001 Elsevier Science B.V. All rights reserved.
Besbeas, P., Borysiewicz, R. and Morgan, B. (2008). Completing the ecological jigsaw. in:Thomson, D. L., Cooch, E. G. and Conroy, M. J. eds.Modeling Demographic Processes in Marked Populations.New York: Springer, pp. 513-539. Available at: http://dx.doi.org/10.1007/978-0-387-78151-8.
State-space models are widely used in ecology. However it is well known that in practice it can be difficult to separate process and observation variances. We consider this issue for integrated population modelling. To some extent the mechanism of such models protects against this problem, but it can still arise, and two illustrations are provided, in each of which the measurement variance is estimated as zero. In the context of an extended case study involving data on British Grey herons we consider alternative approaches for dealing with the problem when it occurs. In particular we consider penalised likelihood and introduce a method of pseudo replication. We recommend the use of pseudo replication, which is undertaken via a simple bootstrap procedure. For the case study of the paper it is shown that when it occurs, an estimate of zero measurement variance may not be important for inference relating to the model parameters of primary interest. The wider implications of the work are discussed.
Conference or workshop item
Morgan, B., Besbeas, P. and Lebreton, J. (2001). New Methodology for Integrated Monitoring of Wild Animal Populations. in:53rd Session of the International Statistical Institute.International Statistical Institute, pp. 361-364.
Provides unifying framework for estimating the abundance of open populations that are subject to births, deaths and movement in and out of the population
Going beyond the estimation of abundance, teaches ways of determining the reasons for variation in abundance over time and survival probabilities
Ecologists and wildlife managers will learn to model dynamics in annual cycles for populations of large vertebrates, including discrete time models
This book gives a unifying framework for estimating the abundance of open populations: populations subject to births, deaths and movement, given imperfect measurements or samples of the populations. The focus is primarily on populations of vertebrates for which dynamics are typically modelled within the framework of an annual cycle, and for which stochastic variability in the demographic processes is usually modest. Discrete-time models are developed in which animals can be assigned to discrete states such as age class, gender, maturity, population (within a metapopulation), or species (for multi-species models).
The book goes well beyond estimation of abundance, allowing inference on underlying population processes such as birth or recruitment, survival and movement. This requires the formulation and fitting of population dynamics models. The resulting fitted models yield both estimates of abundance and estimates of parameters characterizing the underlying processes.