Classification predictive modeling involves the accurate assignment of observations in a dataset to target classes or categories. There is an increasing growth of real-world classification problems with severely imbalanced class distributions. In this case, minority classes have much fewer observations to learn from than those from majority classes. Despite this sparsity, a minority class is often considered the more interesting class yet developing a scientific learning algorithm suitable for the observations presents countless challenges. In this article, we suggest a novel multi-class classification algorithm specialized to handle severely imbalanced classes based on the method we refer to as SAMME.C2. It blends the flexible mechanics of the boosting techniques from SAMME algorithm, a multi-class classifier, and Ada.C2 algorithm, a cost-sensitive binary classifier designed to address highly class imbalances. Not only do we provide the resulting algorithm but we also establish scientific and statistical formulation of our proposed SAMME.C2 algorithm. Through numerical experiments examining various degrees of classifier difficulty, we demonstrate consistent superior performance of our proposed model.
翻译:分类预测模型涉及在一个数据集中将观测精确地分配给目标类别或类别。 真实世界分类问题随着严重不平衡的阶级分布而日益增长。 在这种情况下, 少数民族阶层比多数阶层要少得多的观察。 尽管这种偏狭性, 少数人阶层往往被视为更有趣的阶层, 但却为观察开发出适合科学学习算法, 却提出了无数的挑战。 在本条中, 我们建议采用新的多级分类算法, 专门处理严重不平衡的阶层, 以我们称之为 SAMME. C2 的方法为基础。 它将提振技术的灵活机理与SAMME 算法( 多级分类器)和Ada. C2 算法(一种旨在解决高度阶级不平衡的成本敏感的二进制算法)相混合。 我们不仅提供由此产生的算法, 我们还为我们提议的SAMME. C2 算法建立了科学和统计公式。 通过数字实验, 考察分类者的困难程度, 我们展示了我们拟议模型的一贯优异性。