森林轮换是否是具有连续特征的问题的最佳分类方法? (Is rotation forest the best classifier for problems with continuous features?)

In short, our experiments suggest that yes, on average, rotation forest is better than the most common alternatives when all the attributes are real-valued. Rotation forest is a tree based ensemble that performs transforms on subsets of attributes prior to constructing each tree. We present an empirical comparison of classifiers for problems with only real-valued features. We evaluate classifiers from three families of algorithms: support vector machines; tree-based ensembles; and neural networks tuned with a large grid search. We compare classifiers on unseen data based on the quality of the decision rule (using classification error) the ability to rank cases (area under the receiver operating characteristic) and the probability estimates (using negative log likelihood). We conclude that, in answer to the question posed in the title, yes, rotation forest is significantly more accurate on average than competing techniques when compared on three distinct sets of datasets. Further, we assess the impact of the design features of rotation forest through an ablative study that transforms random forest into rotation forest. We identify the major limitation of rotation forest as its scalability, particularly in number of attributes. To overcome this problem we develop a model to predict the train time of the algorithm and hence propose a contract version of rotation forest where a run time cap is imposed {\em a priori}. We demonstrate that on large problems rotation forest can be made an order of magnitude faster without significant loss of accuracy. We also show that there is no real benefit (on average) from tuning rotation forest. We maintain that without any domain knowledge to indicate an algorithm preference, rotation forest should be the default algorithm of choice for problems with continuous attributes.

翻译：简言之,我们的实验表明,平均而言,轮用森林比所有属性都真正估价时最常见的替代物更好。轮用森林是一种基于树的混合组合,在建造每棵树之前对属性子子子进行变换。我们用经验比较分类者对只有实际价值特征的问题。我们评估了三个算法系列的分类者:支持矢量机器;植树的集合;和与大规模网格搜索相调的神经网络。我们根据决定规则的质量(使用分类错误)比较了隐蔽数据的分类者,对案例进行排序(接受者操作特性下的区域)和概率估计(使用负日志可能性)。我们的结论是,在回答标题中提出的问题时,旋转森林平均比竞争技术要准确得多。我们通过将随机森林转化为旋转森林的校正研究来评估旋转森林的设计特征的影响。我们确定旋转森林的主要限制是其可缩放性,特别是在属性数量上。我们的结论是,为了克服这个在标题中所提出的问题,我们不使用一个连续的森林变换森林的模型,我们也可以预测一个巨大的时间模型。我们提出一个长期变换森林的模型。我们用来预测一个大的森林的模型。