In this paper we present the practical benefits of a new random forest algorithm to deal withmissing values in the sample. The purpose of this work is to compare the different solutionsto deal with missing values with random forests and describe our new algorithm performanceas well as its algorithmic complexity. A variety of missing value mechanisms (such as MCAR,MAR, MNAR) are considered and simulated. We study the quadratic errors and the bias ofour algorithm and compare it to the most popular missing values random forests algorithms inthe literature. In particular, we compare those techniques for both a regression and predictionpurpose. This work follows a first paper Gomez-Mendez and Joly (2020) on the consistency ofthis new algorithm.
翻译:在本文中,我们介绍了处理抽样中缺失值的新的随机森林算法的实际好处。这项工作的目的是比较处理随机森林缺失值的不同解决办法,并描述我们新的算法性能及其复杂性。考虑并模拟了各种缺失值机制(如MCAR、MAR、MNAR)。我们研究了二次差错和我们的算法的偏差,并将其与文献中最受欢迎的缺失值随机森林算法进行比较。特别是,我们将这些技术进行比较,以便进行回归和预测。这项工作是在关于这一新算法一致性的第一篇论文Gomez-Mendez和Joly(2020年)之后进行的。