We show that underneath the training process of a random forest there lies not only the well known and almost computationally free out-of-bag point estimate of its generalization error, but also a path to compute a confidence interval for the generalization error which does not demand a retraining of the forest or any forms of data splitting. Besides the low computational cost involved in its construction, this confidence interval is shown through simulations to have good coverage and appropriate shrinking rate of its width in terms of the training sample size.
翻译:我们发现,在随机森林的训练过程中,不仅有众所周知的几乎在计算上自由的包点外点估计其一般化错误,而且有一条路径来计算不要求森林再培训或任何形式的数据分离的概括化错误的置信间隔。 除了建造森林的低计算成本外,这种置信间隔通过模拟来显示,其宽度在培训样本大小方面覆盖面良好,并适当缩小。