Random forests perform bootstrap-aggregation by sampling the training samples with replacement. This enables the evaluation of out-of-bag error which serves as a internal cross-validation mechanism. Our motivation lies in using the unsampled training samples to improve each decision tree in the ensemble. We study the effect of using the out-of-bag samples to improve the generalization error first of the decision trees and second the random forest by post-pruning. A preliminary empirical study on four UCI repository datasets show consistent decrease in the size of the forests without considerable loss in accuracy.
翻译:随机森林通过对培训样本进行取样和替换来进行皮靴捕捉汇总,从而能够评估出包错误,作为内部交叉验证机制。我们的动机在于利用未取样的培训样本来改善组合中的每个决策树。我们研究利用包外样本来改善决策树的简单化错误,然后通过砍伐后随机森林。关于四个UCI储存数据集的初步经验研究表明,森林面积持续缩小,没有相当精确的损害。