We propose a modern method to estimate population size based on capture-recapture designs of K samples. The observed data is formulated as a sample of n i.i.d. K-dimensional vectors of binary indicators, where the k-th component of each vector indicates the subject being caught by the k-th sample, such that only subjects with nonzero capture vectors are observed. The target quantity is the unconditional probability of the vector being nonzero across both observed and unobserved subjects. We cover models assuming a single constraint (identification assumption) on the K-dimensional distribution such that the target quantity is identified and the statistical model is unrestricted. We present solutions for linear and non-linear constraints commonly assumed to identify capture-recapture models, including no K-way interaction in linear and log-linear models, independence or conditional independence. We demonstrate that the choice of constraint has a dramatic impact on the value of the estimand, showing that it is crucial that the constraint is known to hold by design. For the commonly assumed constraint of no K-way interaction in a log-linear model, the statistical target parameter is only defined when each of the $2^K - 1$ observable capture patterns is present, and therefore suffers from the curse of dimensionality. We propose a targeted MLE based on undersmoothed lasso model to smooth across the cells while targeting the fit towards the single valued target parameter of interest. For each identification assumption, we provide simulated inference and confidence intervals to assess the performance on the estimator under correct and incorrect identifying assumptions. We apply the proposed method, alongside existing estimators, to estimate prevalence of a parasitic infection using multi-source surveillance data from a region in southwestern China, under the four identification assumptions.
翻译:我们提出一种现代方法,根据K样本的捕获-回收设计来估计人口规模。观察到的数据是作为ni.i.d.d.d.的样本来拟订的。我们提出了一种模型,假设K-维矢量为n.i.d.d.二进制指标的样本,其中每个矢量的 k-th 组件显示的是K-sm 样本所捕捉的对象,因此只观察到非零捕获矢量。目标量是矢量在观测和未观测对象之间不为零的无条件概率。我们涵盖的模型,假设K-sm分布为单一限制(识别假设),以便确定目标数量和统计模型不受限制。我们提出了线性和非线性矢量的矢量矢量矢量矢量矢量矢量矢量矢量矢量矢量矢量矢量矢量矢量矢量,我们提出了线性定义的直线线性和非线性向和非线性矢量矢量矢量矢量矢量矢量矢量矢量的解决方案,我们从目前测的测量度表示的是每个基质定值的数值的数值,从正向下方位测量度的测量度表示的测量度的数值表示的数值的数值的数值,从正向下方位标度表示的测度的数值表示的数值表示的数值的数值的数值的数值,从正向下方根基底线度的数值表示的数值的数值表示的数值表示的数值表示的数值的数值表示的数值的数值,从正度的数值表示的数值,从正度的数值表示的数值表示的数值表示的数值表示的数值表示的数值的数值的数值的数值的数值的数值的数值,从正度的数值,从正度的数值表示的数值表示的数值表示的数值,从正度的数值表示的数值表示的数值表示的数值表示的数值的数值表示的数值的数值的数值的数值,从正的数值表示的数值,从正值的数值表示的数值表示的数值表示的数值表示的数值的数值的数值的数值的数值的数值的数值的数值的数值的数值的数值是的数值的数值的数值的数值的数值的数值的数值的数值的数值的数值的数值的数值表示的数值表示的数值表示的数值的数值的数值的数值