We give examples of data-generating models under which Breiman's random forest may be extremely slow to converge to the optimal predictor or even fail to be consistent. The evidence provided for these properties is based on mostly intuitive arguments, similar to those used earlier with simpler examples, and on numerical experiments. Although one can always choose models under which random forests perform very badly, we show that simple methods based on statistics of `variable use' and `variable importance' can often be used to construct a much better predictor based on a `many-armed' random forest obtained by forcing initial splits on variables which the default version of the algorithm tends to ignore.
翻译:我们举出了产生数据模型的例子,根据这些模型,布雷曼的随机森林可能极慢地聚集到最佳预测器上,甚至不一致。为这些特性提供的证据主要基于直觉的论据,类似于先前使用更简单的例子,以及数字实验。虽然人们总是可以选择随机森林表现极差的模型,但我们表明,基于“可变用途”和“可变重要性”的统计的简单方法,往往可以用来根据“多种武装”随机森林构建一个更好的预测器,这种预测器是以默认的算法往往忽视的变量为初步分割而获得的。