An ensemble of decision trees is known as Random Forest. As suggested by Breiman, the strength of unstable learners and the diversity among them are the ensemble models' core strength. In this paper, we propose two approaches for generating ensembles of double random forest. In the first approach, we propose a rotation based ensemble of double random forest. In rotation based double random forests, transformation or rotation of the feature space is generated at each node. At each node different random feature subspace is chosen for evaluation, hence the transformation at each node is different. Different transformations result in better diversity among the base learners and hence, better generalization performance. With the double random forest as base learner, the data at each node is transformed via two different transformations namely, principal component analysis and linear discriminant analysis. In the second approach, we propose oblique ensembles of double random forest. Decision trees in random forest and double random forest are univariate, and this results in the generation of axis parallel split which fails to capture the geometric structure of the data. Also, the standard random forest may not grow sufficiently large decision trees resulting in suboptimal performance. To capture the geometric properties and to grow the decision trees of sufficient depth, we propose oblique ensembles of double random forest. The oblique ensembles of double random forest models are multivariate decision trees. At each non-leaf node, multisurface proximal support vector machine generates the optimal plane for better generalization performance. Also, different regularization techniques (Tikhonov regularisation and axis-parallel split regularisation) are employed for tackling the small sample size problems in the decision trees of oblique ensembles of double random forest.
翻译:一组决定树被称为随机森林。 如 Breiman 所建议, 不稳定学习者的强度和多样性是混合模型的核心力量。 在本文件中, 我们提出两种方法来生成双随机森林的集合。 在第一种方法中, 我们提出一个基于旋转的双随机森林集合。 在以旋转为基础的双随机森林中, 每个节点都产生特性空间的变换或旋转。 在每一个节点选择不同的随机特性子空间进行评估, 因此每个节点的变换是不同的。 不同的变换导致基础学习者更加多样化, 因而使多向矢量模型的变异性能更佳。 由于以双向森林为基地学习者, 每个节点的数据会通过两种不同的变异性变方式, 即主要组成部分分析和线性差异分析来改变。 在第二个节点中, 我们提出双向随机森林的复变形组合。 在随机森林和双向森林中, 随机的树是双向分解的, 并且这种结果产生轴分解的, 无法捕捉取数据的几何结构的正正正正正正值, 。 在正常的树上, 我们的随机决定的性变变变的特性可能没有足够大的树 。