We present a theory of ensemble diversity, explaining the nature and effect of diversity for a wide range of supervised learning scenarios. This challenge, of understanding ensemble diversity, has been referred to as the holy grail of ensemble learning, an open question for over 30 years. Our framework reveals that diversity is in fact a hidden dimension in the bias-variance decomposition of an ensemble. In particular, we prove a family of exact bias-variance-diversity decompositions, for both classification and regression losses, e.g., squared, and cross-entropy. The framework provides a methodology to automatically identify the combiner rule enabling such a decomposition, specific to the loss. The formulation of diversity is therefore dependent on just two design choices: the loss, and the combiner. For certain choices (e.g., 0-1 loss with majority voting) the effect of diversity is necessarily dependent on the target label. Experiments illustrate how we can use our framework to understand the diversity-encouraging mechanisms of popular ensemble methods: Bagging, Boosting, and Random Forests.
翻译:我们提出了共同多样性的理论,解释了多样性的性质和影响,为一系列广泛的监督学习情景提供了解释。这个挑战,即理解共同多样性的挑战,被称作共同学习的神圣柱体,这是一个30多年来的未决问题。我们的框架表明,多样性事实上是一个整体的偏差分解中隐含的维度。特别是,我们证明,对于分类和回归损失(例如,正方形和交叉渗透)来说,一个完全有偏差-多样性分解的大家庭。这个框架提供了一种方法,可以自动确定能够使这种分解(具体针对损失)的组合规则。因此,多样性的形成仅取决于两种设计选择:损失和组合。对于某些选择(例如,多数人投票造成0-1损失),多样性的影响必然取决于目标标签。实验表明,我们如何能够利用我们的框架来理解大众组合方法的多样性-激励机制:催化、诱导和随机森林。