Imbalanced Learning (IL) is an important problem that widely exists in data mining applications. Typical IL methods utilize intuitive class-wise resampling or reweighting to directly balance the training set. However, some recent research efforts in specific domains show that class-imbalanced learning can be achieved without class-wise manipulation. This prompts us to think about the relationship between the two different IL strategies and the nature of the class imbalance. Fundamentally, they correspond to two essential imbalances that exist in IL: the difference in quantity between examples from different classes as well as between easy and hard examples within a single class, i.e., inter-class and intra-class imbalance. Existing works fail to explicitly take both imbalances into account and thus suffer from suboptimal performance. In light of this, we present Duple-Balanced Ensemble, namely DUBE , a versatile ensemble learning framework. Unlike prevailing methods, DUBE directly performs inter-class and intra-class balancing without relying on heavy distance-based computation, which allows it to achieve competitive performance while being computationally efficient. We also present a detailed discussion and analysis about the pros and cons of different inter/intra-class balancing strategies based on DUBE . Extensive experiments validate the effectiveness of the proposed method. Code and examples are available at https://github.com/ICDE2022Sub/duplebalance.
翻译:数据采矿应用中广泛存在一个重要问题,即不平衡的学习(IL)是数据采矿应用中广泛存在的一个重要问题。典型的IL方法使用直观的阶级间取舍或重新加权,直接平衡培训内容。然而,最近在某些特定领域的研究工作表明,在不进行阶级间操纵的情况下,可以实现阶级间平衡学习。这促使我们思考两种不同的IL战略与阶级不平衡性质之间的关系。从根本上说,这相当于IL存在的两个基本不平衡:不同阶级之间以及一个阶级内简单和难举的例子之间的数量差异,即:阶级间和阶级内部不平衡。现有的工作未能明确兼顾不平衡因素,因此也受到不最佳业绩的影响。鉴于此,我们提出了Duple-Balanced Enemble,即DUBE,一个多功能的多套式学习框架。与普遍采用的方法不同,DUBE直接进行阶级间和内部平衡,而不必依赖大量远程计算,从而使其能够在进行计算时实现竞争性业绩,即分级间和分级之间的不平衡。我们还介绍了关于DPLEB/内部分析中现有的不同方法。