Ensemble learning combines multiple classifiers in the hope of obtaining better predictive performance. Empirical studies have shown that ensemble pruning, that is, choosing an appropriate subset of the available classifiers, can lead to comparable or better predictions than using all classifiers. In this paper, we consider a binary classification problem and propose an integer programming (IP) approach for selecting optimal classifier subsets. We propose a flexible objective function to adapt to desired criteria of different datasets. We also propose constraints to ensure minimum diversity levels in the ensemble. Despite the general case of IP being NP-Hard, state-of-the-art solvers are able to quickly obtain good solutions for datasets with up to 60000 data points. Our approach yields competitive results when compared to some of the best and most used pruning methods in literature.
翻译:经验研究表明,混合处理方法,即选择现有分类方法的适当子集,可导致可比或更好的预测。在本文中,我们考虑二进制分类问题,并提出选择最佳分类方法的整数编程(IP)方法。我们提出一个灵活的目标功能,以适应不同数据集的理想标准。我们还提出一些制约因素,以确保组合中的最低多样性水平。尽管IP是NP-Hard,但最先进的解答器能够迅速获得拥有多达6万个数据点的数据集的良好解决方案。我们的方法与文献中一些最佳和最常用的剪辑方法相比,会产生竞争性的结果。