Ensemble models refer to methods that combine a typically large number of classifiers into a compound prediction. The output of an ensemble method is the result of fitting a base-learning algorithm to a given data set, and obtaining diverse answers by reweighting the observations or by resampling them using a given probabilistic selection. A key challenge of using ensembles in large-scale multidimensional data lies in the complexity and the computational burden associated with them. The models created by ensembles are often difficult, if not impossible, to interpret and their implementation requires more computational power than single classifiers. Recent research effort in the field has concentrated in reducing ensemble size, while maintaining their predictive accuracy. We propose a method to prune an ensemble solution by optimizing its margin distribution, while increasing its diversity. The proposed algorithm results in an ensemble that uses only a fraction of the original classifiers, with improved or similar generalization performance. We analyze and test our method on both synthetic and real data sets. The simulations show that the proposed method compares favorably to the original ensemble solutions and to other existing ensemble pruning methodologies.
翻译:组合模型是指将一般数量众多的分类器结合到复合预测中的方法。混合方法的输出是将基础学习算法与某一数据集相适应的结果,并且通过对观测进行重新加权或利用特定概率选择对观测进行再抽样来获得不同答案的结果。在大规模多维数据中使用集合器的关键挑战在于其复杂性和相关的计算负担。组合产生的模型往往难以(如果不是不可能的话)解释和实施,因此其执行需要比单一分类器更大的计算能力。最近实地的研究工作集中于减少组合体大小,同时保持其预测性准确性。我们提出了一个方法,通过优化其边距分布,同时增加其多样性,将一个共通性解决方案配置成一个共性解决方案。拟议算法的结果只是使用原始分类器的一小部分,并且改进或类似一般化性性性。我们分析并测试我们在合成和真实数据集上的方法。模拟表明,拟议的方法比原始组合式解决办法和其他现有组合式方法要好得多。