The central challenges in missing data models concern the identifiability of two distributions: the target law and the full law. The target law refers to the joint distribution of the data variables, whereas the full law refers to the joint distribution of the data variables and their corresponding response indicators. However, the relationship between the identifiability of these two distributions and the feasibility of multiple imputation has not been clearly established. We show that imputations can be drawn from the correct conditional distributions for all possible missing data patterns if and only if the full law is identifiable. This result implies that standard multiple imputation methods -- which keep observed values unchanged and replace missing values with imputed values -- are invalid when the target law is identifiable but the full law is not. We demonstrate that alternative imputation strategies, in which certain observed values are also imputed, can enable the estimation of the target law in such cases.
翻译:缺失数据模型中的核心挑战涉及两个分布的可识别性:目标分布与完整分布。目标分布指数据变量的联合分布,而完整分布指数据变量及其对应响应指示符的联合分布。然而,这两个分布的可识别性与多重插补的可行性之间的关系尚未得到明确阐明。我们证明,当且仅当完整分布可识别时,才能为所有可能的缺失数据模式从正确的条件分布中抽取插补值。这一结果表明,当目标分布可识别而完整分布不可识别时,标准多重插补方法——即保持观测值不变并用插补值替换缺失值——是无效的。我们进一步证明,通过采用替代插补策略(即对部分观测值也进行插补),可以在这种情况下实现对目标分布的估计。