This paper introduces the R package drpop to flexibly estimate total population size from incomplete lists. Total population estimation, also called capture-recapture, is an important problem in many biological and social sciences. A typical dataset consists of incomplete lists of individuals from the population of interest along with some covariate information. The goal is to estimate the number of unobserved individuals and equivalently, the total population size. drpop flexibly models heterogeneity using the covariate information, under the assumption that two lists are conditionally independent given covariates. This can be a much weaker assumption than full marginal independence often required by classical methods. Moreover, it can incorporate complex and high dimensional covariates, and does not require parametric models like other popular methods. In particular, our estimator is doubly robust and has fast convergence rates even under flexible non-parametric set-ups. drpop provides the user with the flexibility to choose the model for estimation of intermediate parameters and returns the estimated population size, confidence interval and some other related quantities. In this paper, we illustrate the applications of drpop in different scenarios and we also present some performance summaries.
翻译:本文介绍R包数据,以便从不完整的名单中灵活地估计人口总数。总体人口估计,也称为抓捕,在许多生物和社会科学中是一个重要问题。典型的数据集包括来自受关注人口的个人的不完整名单以及一些共变信息。目标是估计未观察的人数和相应的人口总数。使用共变资料的灵活模型差异性,假设两个名单是有条件的独立的共变数。这可能是比古典方法通常要求的充分边际独立要弱得多的假设。此外,它可以包含复杂和高维度的共变数,不需要像其他流行方法那样的参数模型。特别是,我们的估计数字是加倍的,即使在灵活的非参数组合下,也具有快速的趋同率。Drpop为用户提供了选择中间参数模型的灵活性,并返回估计的人口规模、信任间隔和其他一些相关数量。在本文中,我们举例说明了不同情景中的Drpop的应用,我们还提供了一些绩效摘要。