Random Forest is an ensemble of decision trees based on the bagging and random subspace concepts. As suggested by Breiman, the strength of unstable learners and the diversity among them are the ensemble models' core strength. In this paper, we propose two approaches known as oblique and rotation double random forests. In the first approach, we propose rotation based double random forest. In rotation based double random forests, transformation or rotation of the feature space is generated at each node. At each node different random feature subspace is chosen for evaluation, hence the transformation at each node is different. Different transformations result in better diversity among the base learners and hence, better generalization performance. With the double random forest as base learner, the data at each node is transformed via two different transformations namely, principal component analysis and linear discriminant analysis. In the second approach, we propose oblique double random forest. Decision trees in random forest and double random forest are univariate, and this results in the generation of axis parallel split which fails to capture the geometric structure of the data. Also, the standard random forest may not grow sufficiently large decision trees resulting in suboptimal performance. To capture the geometric properties and to grow the decision trees of sufficient depth, we propose oblique double random forest. The oblique double random forest models are multivariate decision trees. At each non-leaf node, multisurface proximal support vector machine generates the optimal plane for better generalization performance. Also, different regularization techniques are employed for tackling the small sample size problems in the decision trees of oblique double random forest.
翻译:随机森林是基于包包和随机子空间概念的决策树的组合。 正如布雷曼所言, 不稳定学习者的强度和多样性是混合模型的核心力量。 在本文中, 我们提出两种方法, 称为斜形和旋转双随机森林。 在第一个方法中, 我们提出基于旋转的双随机森林, 在每个节点上产生基于双随机森林, 特性空间的转换或旋转。 在每一个节点上选择不同的节点随机特性子空间进行评估, 因此每个节点的变化是不同的。 不同的转变导致基础学习者更加多样化, 因此, 它们的多样化是混合模型模型模型的核心力量。 由于使用双随机森林, 每个节点的数据会通过两种不同的变换方式改变, 即主要组成部分分析和线性分析。 在第二个方法中, 我们提出基于双向随机的双向森林。 随机森林和双向森林模型中的决定树是不易变的, 并且这种结果产生轴平行的分解, 无法捕捉到数据的几何结构结构结构结构结构结构, 因此, 标准的双向森林决定的特性可能不会产生足够的双向树 。