Missing data is a common problem which has consistently plagued statisticians and applied analytical researchers. While replacement methods like mean-based or hot deck imputation have been well researched, emerging imputation techniques enabled through improved computational resources have had limited formal assessment. This study formally considers five more recently developed imputation methods: Amelia, Mice, mi, Hmisc and missForest - compares their performances using RMSE against actual values and against the well-established mean-based replacement approach. The RMSE measure was consolidated by method using a ranking approach. Our results indicate that the missForest algorithm performed best and the mi algorithm performed worst.
翻译:缺失数据是一个常见问题,一直困扰统计人员和应用分析研究人员。虽然对中值或热甲板估算等替代方法进行了很好的研究,但通过改进计算资源而促成的新兴估算技术也有限,正式评估也有限。本研究正式审议了最近开发的五种估算方法:Amelia、Mice、mi、Hmisc和Miss Forest—将其使用RUSE的绩效与实际价值进行比较,并与既定的中值替代方法进行比较。RMSE的计量方法通过采用分级方法加以合并。我们的结果表明,Forest错误算法表现最佳,MI算法表现最差。