Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the usage of modern Machine Learning algorithms for imputation. It originates from their capability of showing favourable prediction accuracy in different learning problems. In this work, we analyze through simulation the interaction between imputation accuracy and prediction accuracy in regression learning problems with missing covariates when Machine Learning based methods for both, imputation and prediction are used. In addition, we explore imputation performance when using statistical inference procedures in prediction settings, such as coverage rates of (valid) prediction intervals. Our analysis is based on empirical datasets provided by the UCI Machine Learning repository and an extensive simulation study.
翻译:最近的研究发现,使用现代机器学习算法进行估算的趋势日益明显,其根源是能够在不同学习问题中显示有利的预测准确性。在这项工作中,我们通过模拟分析计算准确性和预测准确性之间的相互作用,在使用机器学习方法进行预测、估算和预测时,在回归学习问题和缺失共同变量中,我们分析计算准确性和预测准确性之间的相互作用。此外,我们探索在预测环境中使用统计推论程序时的估算性能,例如(有效)预测间隔的覆盖率。我们的分析以UCI机器学习存储处提供的经验数据集和广泛的模拟研究为基础。