Learning classifiers using skewed or imbalanced datasets can occasionally lead to classification issues; this is a serious issue. In some cases, one class contains the majority of examples while the other, which is frequently the more important class, is nevertheless represented by a smaller proportion of examples. Using this kind of data could make many carefully designed machine-learning systems ineffective. High training fidelity was a term used to describe biases vs. all other instances of the class. The best approach to all possible remedies to this issue is typically to gain from the minority class. The article examines the most widely used methods for addressing the problem of learning with a class imbalance, including data-level, algorithm-level, hybrid, cost-sensitive learning, and deep learning, etc. including their advantages and limitations. The efficiency and performance of the classifier are assessed using a myriad of evaluation metrics.
翻译:使用偏斜或不平衡数据集的学习分类师偶尔会导致分类问题;这是一个严重问题;在某些情况下,一个类别包含大多数例子,而另一个类别(往往是更重要的类别)则代表较小比例的例子。使用这类数据可以使许多精心设计的机器学习系统无效。高培训忠诚性是用来描述偏差相对于该类所有其他情况的术语。解决这一问题的所有可能补救办法的最佳办法通常是从少数群体类别获益。文章审查了最广泛使用的解决以分类不平衡方式学习问题的方法,包括数据级别、算法级别、混合、成本敏感学习和深层次学习等,包括它们的优点和局限性。分类员的效率和业绩是使用多种评价指标进行评估的。