Including a large number of predictors in the imputation model underlying a multiple imputation (MI) procedure is one of the most challenging tasks imputers face. A variety of high-dimensional MI techniques can help, but there has been limited research on their relative performance. In this study, we investigated a wide range of extant high-dimensional MI techniques that can handle a large number of predictors in the imputation models and general missing data patterns. We assessed the relative performance of seven high-dimensional MI methods with a Monte Carlo simulation study and a resampling study based on real survey data. The performance of the methods was defined by the degree to which they facilitate unbiased and confidence-valid estimates of the parameters of complete-data analysis models. We found that using regularized regression to select the predictors used in the MI model and using principal component analysis to reduce the dimensionality of auxiliary data produce the best results.
翻译:多种估算(MI)程序所依据的估算模型中包含大量预测器,这是最具有挑战性的任务之一。各种高维MI技术可以起到帮助作用,但对其相对性能的研究有限。在本研究中,我们调查了广泛的现有高维MI技术,这些技术能够处理估算模型中的大量预测器和一般缺失数据模式。我们通过蒙特卡洛模拟研究和根据真实调查数据重新抽样研究,评估了七种高维MI方法的相对性能。这些方法的性能是由它们在多大程度上有助于对完整数据分析模型的参数进行公正和可信的估计来决定的。我们发现,使用常规回归来选择MI模型中使用的预测器,并利用主要组成部分分析来减少辅助数据的维度,得出了最佳结果。