We derive an information criterion to select a parametric model of complete-data distribution when only incomplete or partially observed data is available. Compared with AIC, our new criterion has an additional penalty term for missing data, which is expressed by the Fisher information matrices of complete data and incomplete data. We prove that our criterion is an asymptotically unbiased estimator of complete-data divergence, namely, the expected Kullback-Leibler divergence between the true distribution and the estimated distribution for complete data, whereas AIC is that for the incomplete data. Information criteria PDIO (Shimodaira 1994) and AICcd (Cavanaugh and Shumway 1998) have been previously proposed to estimate complete-data divergence, and they have the same penalty term. The additional penalty term of our criterion for missing data turns out to be only half the value of that in PDIO and AICcd. The difference in the penalty term is attributed to the fact that our criterion is derived under a weaker assumption. A simulation study with the weaker assumption shows that our criterion is unbiased while the other two criteria are biased. In addition, we review the geometrical view of alternating minimizations of the EM algorithm. This geometrical view plays an important role in deriving our new criterion.
翻译:与AIC相比,我们的新标准对缺失的数据规定了额外的惩罚期限,即完全数据和不完整数据的渔业信息矩阵。我们证明,我们的标准是完全数据差异的绝对、公正的估计标准,即,预期的Kullback-Leible差点和完整数据的估计分布,而AIC则是对不完整数据的标准。信息标准PDIO(1994年希莫达伊拉)和AICcd(卡瓦诺和舒姆韦1998年)以前曾提出过估算完整数据差异的信息标准,而AICcd(卡瓦诺和舒姆韦1998年)则有相同的惩罚期限。我们的数据缺失标准的额外惩罚期限只是PDIO和AICcd中数据差异值的一半。惩罚期限的差别是由于我们的标准是在较弱的假设下得出的。模拟研究显示,我们的标准是公正的,而其他两个标准则是偏差的。此外,我们审视了我们测深的模型中测地标准。