Ensemble pruning is the process of selecting a subset of componentclassifiers from an ensemble which performs at least as well as theoriginal ensemble while reducing storage and computational costs.Ensemble pruning in data streams is a largely unexplored area ofresearch. It requires analysis of ensemble components as they arerunning on the stream, and differentiation of useful classifiers fromredundant ones. We present CCRP, an on-the-fly ensemble prun-ing method for multi-class data stream classification empoweredby an imbalance-aware fusion of class-wise component rankings.CCRP aims that the resulting pruned ensemble contains the bestperforming classifier for each target class and hence, reduces the ef-fects of class imbalance. The conducted experiments on real-worldand synthetic data streams demonstrate that different types of en-sembles that integrate CCRP as their pruning scheme consistentlyyield on par or superior performance with 20% to 90% less averagememory consumption. Lastly, we validate the proposed pruningscheme by comparing our approach against pruning schemes basedon ensemble weights and basic rank fusion methods.
翻译:共选线是从组合中选择一组元分类器的过程。 该组合件在减少存储和计算成本的同时至少和原始组合件同时进行至少和原始组合件。 在数据流中集合运行是一个基本上没有探索的研究领域。 它要求对流上运行的共选元件进行分析,并将有用的分类器与冗余数据流区分开来。 我们展示了CCRP, 这是一种在飞行上混合的多级数据流分类方法, 一种通过分类分级分级的不平衡- 认知组合增强能力。 CPR 的目标是, 由此形成的组合件包含每个目标类的最佳分类器, 从而降低阶级不平衡的缺陷。 在真实世界和合成数据流上进行的实验表明, 将CCRP 整合成其平均或超强性运行方案的不同类型, 以20%至90%的平均消耗量为基础。 最后, 我们通过比较我们基于基本排序方法的精度和加权方法, 来验证拟议的平整制方法。