用于估计人口规模的稳健的捕获-抓获方法 (Doubly robust capture-recapture methods for estimating population size)

Estimation of population size using incomplete lists (also called the capture-recapture problem) has a long history across many biological and social sciences. For example, human rights and other groups often construct partial and overlapping lists of victims of armed conflicts, with the hope of using this information to estimate the total number of victims. Earlier statistical methods for this setup either use potentially restrictive parametric assumptions, or else rely on typically suboptimal plug-in-type nonparametric estimators; however, both approaches can lead to substantial bias, the former via model misspecification and the latter via smoothing. Under an identifying assumption that two lists are conditionally independent given measured covariate information, we make several contributions. First we derive the nonparametric efficiency bound for estimating the capture probability, which indicates the best possible performance of any estimator, and sheds light on the statistical limits of capture-recapture methods. Then we present a new estimator, and study its finite-sample properties, showing that it has a double robustness property new to capture-recapture, and that it is near-optimal in a non-asymptotic sense, under relatively mild nonparametric conditions. Next, we give a method for constructing confidence intervals for total population size from generic capture probability estimators, and prove non-asymptotic near-validity. Finally, we study our methods in simulations, and apply them to estimate the number of killings and disappearances attributable to different groups in Peru during its internal armed conflict between 1980 and 2000.

翻译：利用不完全的清单(也称为抓捕-抓捕问题)对人口规模进行估计,在许多生物和社会科学中都有很长的历史,例如,人权和其他群体往往建立部分和重叠的武装冲突受害者名单,希望利用这一信息来估计受害者总数; 对这一设置的早期统计方法,要么使用潜在的限制性参数假设,要么依赖通常不尽人意的非参数性估算器; 但是,这两种方法都可能导致严重偏差,前者通过模型错误区分,后者通过平滑来研究。根据一种确定假设,即两个清单是有条件独立的,根据测量的混合信息,我们作出若干贡献。首先,我们为估计捕获概率设定了非对等效率,这表明了任何估计者的最佳可能表现,或者说明了捕捉-抓方法的统计限度。然后,我们提出了一个新的估计,并研究了其有限的缩略性特性,表明它具有与捕获-抓取的新的双重稳健性属性,而根据一种确定性模型,我们作出了一些确定性效率的假设,在2000年的模拟中,在非精确性方法下,在不精确的准确性方法下,在不精确的秘鲁,在不精确地研究中,在不精确的顺序下,在不精确地研究它们。