Random forests remain among the most popular off-the-shelf supervised learning algorithms. Despite their well-documented empirical success, however, until recently, few theoretical results were available to describe their performance and behavior. In this work we push beyond recent work on consistency and asymptotic normality by establishing rates of convergence for random forests and other supervised learning ensembles. We develop the notion of generalized U-statistics and show that within this framework, random forest predictions can potentially remain asymptotically normal for larger subsample sizes than previously established. We also provide Berry-Esseen bounds in order to quantify the rate at which this convergence occurs, making explicit the roles of the subsample size and the number of trees in determining the distribution of random forest predictions.
翻译:随机森林仍然是最受欢迎的现成监管的学习算法之一。 尽管它们的经验成功经验有据可查,但直到最近,它们的表现和行为却鲜有理论结果。 在这项工作中,我们通过确定随机森林和其他受监督的学习组合的趋同率,超越了最近关于一致性和无症状常态的工作。我们发展了普遍的U-统计概念,并表明在这一框架内,随机森林预测对于比以前确定的更大次抽样规模而言可能保持无症状的正常。我们还提供了Berry-Es seeen界限,以便量化这种趋同的速率,明确了子抽样规模和树木数量在决定随机森林预测分布方面的作用。