Many real-world classification problems are cost-sensitive in nature, such that the misclassification costs vary between data instances. Cost-sensitive learning adapts classification algorithms to account for differences in misclassification costs. Stacking is an ensemble method that uses predictions from several classifiers as the training data for another classifier, which in turn makes the final classification decision. While a large body of empirical work exists where stacking is applied in various domains, very few of these works take the misclassification costs into account. In fact, there is no consensus in the literature as to what cost-sensitive stacking is. In this paper we perform extensive experiments with the aim of establishing what the appropriate setup for a cost-sensitive stacking ensemble is. Our experiments, conducted on twelve datasets from a number of application domains, using real, instance-dependent misclassification costs, show that for best performance, both levels of stacking require cost-sensitive classification decision.
翻译:许多真实世界的分类问题具有成本敏感性,因此分类成本在数据实例之间有差异。成本敏感性的学习调整了分类算法,以考虑到分类成本的差别。堆放是一种混合方法,使用若干分类者的预测作为另一个分类者的培训数据,这反过来又作出最后分类决定。虽然有大量的经验工作,在不同领域应用堆叠,但这些工作很少考虑到分类成本的错误。事实上,文献中对于什么是成本敏感性的堆放没有共识。在本文件中,我们进行了广泛的实验,目的是确定一个成本敏感性的堆放共和体的适当设置是什么。我们用实际的、以实例为依据的错误分类成本对一些应用领域的12个数据集进行的实验表明,为了最佳性能,堆放的两种级别都需要成本敏感的分类决定。