We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to exploit additional unlabeled data for tighter risk estimation. In experiments, we apply the bound to improve weighting of trees in random forests and show that, in contrast to the commonly used first order bound, minimization of the new bound typically does not lead to degradation of the test error of the ensemble.
翻译:我们对多级分类中加权多数票的预期风险进行了新颖的分析,分析将混合成员预测的关联性考虑在内,并提供了一个便于有效尽量减少的界限,从而提高多数投票权的加权比重。我们还提供了一个专门的二元分类约束版,允许利用额外的未贴标签数据进行更严格的风险估计。在实验中,我们应用了提高随机森林树木重量的界限,并表明,与通常使用的第一顺序不同的是,尽量减少新的约束通常不会导致合用物试验错误的退化。