This paper presents a new ensemble learning method for classification problems called projection pursuit random forest (PPF). PPF uses the PPtree algorithm introduced in Lee et al. (2013). In PPF, trees are constructed by splitting on linear combinations of randomly chosen variables. Projection pursuit is used to choose a projection of the variables that best separates the classes. Utilizing linear combinations of variables to separate classes takes the correlation between variables into account which allows PPF to outperform a traditional random forest when separations between groups occurs in combinations of variables. The method presented here can be used in multi-class problems and is implemented into an R (R Core Team, 2018) package, PPforest, which is available on CRAN, with development versions at https://github.com/natydasilva/PPforest.
翻译:本文为分类问题提供了一个称为 " 投影随机森林 " (PPF)的全新学习方法。PPF使用Lee等人(2013年)采用的PPtree算法。在PPF中,树木是按随机选择变量的线性组合建造的。投影用于选择最能将类别分开的变量的预测。将变量的线性组合用于不同类别时,要考虑到变量之间的相互关系,使PPPF在组合变量中将群体分开时能够超越传统的随机森林。这里介绍的方法可用于多级问题,并应用于R(R核心小组,2018年)一揽子方案,即PPForest,可在CRAN上查阅,开发版本见https://github.com/ntydasilva/PPForest。