抽样大小和失踪对计算缺失数据的影响 (The Effect of Sample Size and Missingness on Inference with Missing Data)

When are inferences (whether Direct-Likelihood, Bayesian, or Frequentist) obtained from partial data valid? This paper answers this question by offering a new asymptotic theory about inference with missing data that is more general than existing theories. It proves that as the sample size increases and the extent of missingness decreases, the average-loglikelihood function generated by partial data and that ignores the missingness mechanism will converge in probability to that which would have been generated by complete data; and if the data are Missing at Random, this convergence depends only on sample size. Thus, inferences from partial data, such as posterior modes, confidence intervals, likelihood ratios, test statistics, and indeed, all quantities or features derived from the partial-data loglikelihood function, will be consistently estimated. Additionally, the missing data mechanism has asymptotically no effect on parameter estimation and hypothesis testing if the data are Missing at Random. This adds to previous research which has only proved the consistency and asymptotic normality of the posterior mode. Practical implications are discussed, and the theory is illustrated through simulation using a previous study of International Human Rights Law.

翻译：当从部分数据中获得推论(直接获益、巴耶斯或常识)时,何时从部分数据中得出推论(直接获益、巴耶斯或常识)是有效的?本文回答这一问题时,提供了一种新的关于对比现有理论更一般的缺失数据的推论的无症状理论。它证明,随着抽样规模的增加和缺失程度的缩小,部分数据产生的平均相似功能将逐渐减少,而忽略缺失机制则会在概率上汇合到完全数据产生的概率;如果数据在随机时丢失,这种趋同则仅取决于抽样大小。因此,从部分数据(例如后方模式、信任期、概率比率、测试统计数字)中得出的推论,以及从部分数据对正象功能产生的所有数量或特征都将得到一致的估计。此外,缺失的数据机制在参数估计和假设测试方面,如果数据是在随机失踪,则无任何作用。这补充了以前的研究,这些研究仅证明后方模式的一致性和无症状正常性。讨论了实际影响,并通过模拟利用以前的国际法研究来说明理论。