The problem of estimating the size of a population based on a subset of individuals observed across multiple data sources is often referred to as capture-recapture or multiple-systems estimation. This is fundamentally a missing data problem, where the number of unobserved individuals represents the missing data. As with any missing data problem, multiple-systems estimation requires users to make an untestable identifying assumption in order to estimate the population size from the observed data. Approaches to multiple-systems estimation often do not emphasize the role of the identifying assumption during model specification, which makes it difficult to decouple the specification of the model for the observed data from the identifying assumption. We present a re-framing of the multiple-systems estimation problem that decouples the specification of the observed-data model from the identifying assumptions, and discuss how log-linear models and the associated no-highest-order interaction assumption fit into this framing. We present an approach to computation in the Bayesian setting which takes advantage of existing software and facilitates various sensitivity analyses. We demonstrate our approach in a case study of estimating the number of civilian casualties in the Kosovo war. Code used to produce this manuscript is available at https://github.com/aleshing/revisiting-identifying-assumptions.
翻译:根据在多个数据源中观察到的一组个人来估计人口规模的问题通常被称为捕获-捕获或多系统估计,这从根本上说是一个缺失的数据问题,没有观测到的人数代表了缺失的数据。与任何缺失的数据问题一样,多系统估计要求用户作出无法测试的识别假设,以便从观察到的数据中估计人口规模。多系统估计方法往往不强调模型规格中识别假设的作用,这使得难以将观察到的数据模型的规格与识别假设脱钩。我们提出将多系统估算问题重新配置,将观察到的数据模型的规格与识别假设相混淆,并讨论逻辑线模型和相关的无高度顺序互动假设如何适合这一框架。我们提出一种方法,在巴伊西亚环境进行计算,利用现有软件,便利各种敏感性分析。我们在对科索沃战争中平民伤亡人数的估计案例研究中展示了我们的方法。用于制作这一手稿的代码可在 http://s/sumgis/commissions查阅。