A common approach to aggregate classification estimates in an ensemble of decision trees is to either use voting or to average the probabilities for each class. The latter takes uncertainty into account, but not the reliability of the uncertainty estimates (so to say, the "uncertainty about the uncertainty"). More generally, much remains unknown about how to best combine probabilistic estimates from multiple sources. In this paper, we investigate a number of alternative prediction methods. Our methods are inspired by the theories of probability, belief functions and reliable classification, as well as a principle that we call evidence accumulation. Our experiments on a variety of data sets are based on random decision trees which guarantees a high diversity in the predictions to be combined. Somewhat unexpectedly, we found that taking the average over the probabilities is actually hard to beat. However, evidence accumulation showed consistently better results on all but very small leafs.
翻译:在一系列决策树中,总分类估计的通用方法是,要么使用投票方式,要么平均每一类的概率。后者考虑到不确定性的不确定性,而不是不确定性估计数的可靠性(也就是说,“不确定性的不确定性”)。更一般地说,对于如何最好地将多种来源的概率估计结合起来,仍然有许多未知之处。在本文中,我们调查了一些备选预测方法。我们的方法来自概率、信仰功能和可靠分类的理论,以及我们称之为证据积累的原则。我们对各种数据集的实验基于随机决定树,这些树保证了预测的高度多样性。有些出乎意料的是,我们发现对概率的平均值的衡量实际上很难被击败。然而,证据积累显示,除了很小的叶叶子之外,所有证据积累的结果总是比较好。