Imbalanced classification tasks are widespread in many real-world applications. For such classification tasks, in comparison with the accuracy rate, it is usually much more appropriate to use non-decomposable performance measures such as the Area Under the receiver operating characteristic Curve (AUC) and the $F_\beta$ measure as the classification criterion since the label class is imbalanced. On the other hand, the minimax probability machine is a popular method for binary classification problems and aims at learning a linear classifier by maximizing the accuracy rate, which makes it unsuitable to deal with imbalanced classification tasks. The purpose of this paper is to develop a new minimax probability machine for the $F_\beta$ measure, called MPMF, which can be used to deal with imbalanced classification tasks. A brief discussion is also given on how to extend the MPMF model for several other non-decomposable performance measures listed in the paper. To solve the MPMF model effectively, we derive its equivalent form which can then be solved by an alternating descent method to learn a linear classifier. Further, the kernel trick is employed to derive a nonlinear MPMF model to learn a nonlinear classifier. Several experiments on real-world benchmark datasets demonstrate the effectiveness of our new model.
翻译:在许多现实世界应用中,分类任务广泛存在。对于这类分类任务,与精确率相比,通常更适宜使用非可分解性性能措施,如“在接收器运行特性曲线(AUC)之下的区域”和“$F ⁇ beta$”计量,作为分类标准,因为标签类别不平衡。另一方面,微缩最大概率机器是处理二进制分类问题的一种流行方法,目的是通过最大限度地提高精确率来学习线性分类器,从而使它不适合处理不平衡性分类任务。本文的目的是为“$F ⁇ beta$”计量开发一个新的微缩性负概率机器,称为“MPMF”,可用于处理不平衡的分类任务。还简要讨论了如何将MPMF模型扩展至文件中列出的其他一些不可分解性性性能措施。为有效解决MPMF模型,我们从中得出相应的形式,然后通过交替的血缘方法来学习线性分分解线性能分解器。此外,利用内核内流技术来得出一个非线性能模型,用以学习非线性MPMF的模型。