The masking-one-out (MOO) procedure, masking an observed entry and comparing it versus its imputed values, is a very common procedure for comparing imputation models. We study the optimum of this procedure and generalize it to a missing data assumption and establish the corresponding semi-parametric efficiency theory. However, MOO is a measure of prediction accuracy, which is not ideal for evaluating an imputation model. To address this issue, we introduce three modified MOO criteria, based on rank transformation, energy distance, and likelihood principle, that allow us to select an imputation model that properly account for the stochastic nature of data. The likelihood approach further enables an elegant framework of learning an imputation model from the data and we derive its statistical and computational learning theories as well as consistency of BIC model selection. We also show how MOO is related to the missing-at-random assumption. Finally, we introduce the prediction-imputation diagram, a two-dimensional diagram visually comparing both the prediction and imputation utilities for various imputation models.
翻译:掩蔽单点(MOO)方法——通过掩蔽一个观测条目并将其与插补值进行比较——是评估插补模型的一种常用方法。我们研究了该方法的优化问题,并将其推广至缺失数据假设,建立了相应的半参数效率理论。然而,MOO本质上是一种预测精度度量,并不完全适用于评估插补模型。为解决这一问题,我们基于秩变换、能量距离和似然原理,提出了三种改进的MOO准则,这些准则能够帮助选择能恰当反映数据随机性的插补模型。其中,似然方法进一步构建了一个从数据中学习插补模型的优雅框架,我们推导了其统计与计算学习理论,并证明了BIC模型选择的一致性。我们还揭示了MOO与随机缺失假设之间的关联。最后,我们引入了预测-插补图,这是一种二维可视化图表,用于直观比较不同插补模型在预测效用与插补效用两方面的表现。