Class-imbalance is a common problem in machine learning practice. Typical Imbalanced Learning (IL) methods balance the data via intuitive class-wise resampling or reweighting. However, previous studies suggest that beyond class-imbalance, intrinsic data difficulty factors like overlapping, noise, and small disjuncts also play critical roles. To handle them, many solutions have been proposed (e.g., noise removal, borderline sampling, hard example mining) but are still confined to a specific factor and cannot generalize to broader scenarios, which raises an interesting question: how to handle both class-agnostic difficulties and the class-imbalance in a unified way? To answer this, we consider both class-imbalance and its orthogonal: intra-class imbalance, i.e., the imbalanced distribution over easy and hard samples. Such distribution naturally reflects the complex influence of class-agnostic intrinsic data difficulties thus providing a new unified view for identifying and handling these factors during learning. From this perspective, we discuss the pros and cons of existing IL solutions and further propose new balancing techniques for more robust and efficient IL. Finally, we wrap up all solutions into a generic ensemble IL framework, namely DuBE (Duple-Balanced Ensemble). It features explicit and efficient inter-\&intra-class balancing as well as easy extension with standardized APIs. Extensive experiments validate the effectiveness of DuBE. Code, examples, and documentation are available at https://github.com/AnonAuthorAI/duplebalance and https://duplebalance.readthedocs.io.
翻译:分类平衡是机器学习实践中常见的一个问题。 典型的平衡学习( IL) 方法通过直观的类比再抽样或重新加权来平衡数据。 但是, 先前的研究显示, 除了阶级平衡之外, 内在的数据困难因素, 如重叠、 噪音和小型脱节也起着关键作用 。 要处理它们, 已经提出了许多解决方案( 例如, 清除噪音、 边际抽样、 硬例采矿 ), 但仍然局限于一个特定因素, 无法概括到更广泛的情景中, 这就提出了一个有趣的问题: 如何通过统一的方式处理阶级间困难和阶级平衡? 为了回答这一点, 我们考虑阶级平衡及其或分层的内在数据困难因素: 阶级内部不平衡, 即: 简单和硬样本的分布不均匀。 这种分布自然反映了阶级间内在数据困难的复杂影响, 从而在学习过程中为识别和处理这些因素提供新的统一观点。 从这个角度, 我们讨论现有的IL解决方案的准和组合, 进一步提出新的平衡技术, 即, 稳定/ 透明 常规的版本 。 最后, 我们将I- dalalalalalalalalalalalalalalalalalalalalal A.