Support vector machines (SVMs) are powerful supervised learning tools developed to solve classification problems. However, SVMs are likely to perform poorly in the classification of imbalanced data. The rough set theory presents a mathematical tool for inference in nondeterministic cases that provides methods for removing irrelevant information from data. In this work, we propose an approach that efficiently used fuzzy rough set theory in weighted least squares twin support vector machine called FRLSTSVM for classification of imbalanced data. The first innovation is introducing a new fuzzy rough set-based under-sampling strategy to make the classifier robust in terms of the imbalanced data. For constructing the two proximal hyperplanes in FRLSTSVM, data points from the minority class remain unchanged while a subset of data points in the majority class are selected using a new method. In this model, we embed the weight biases in the LSTSVM formulations to overcome the bias phenomenon in the original twin SVM for the classification of imbalanced data. In order to determine these weights in this formulation, we introduce a new strategy that uses fuzzy rough set theory as the second innovation. Experimental results on the famous imbalanced datasets, compared to the related traditional SVM-based methods, demonstrate the superiority of the proposed FRLSTSVM model in the imbalanced data classification.
翻译:支持矢量机器(SVMs)是为解决分类问题而开发的强大、受监督的学习工具。然而,SVMs在对不平衡数据进行分类时可能表现不佳。粗略的数据集理论提供了一个数学工具,用于在非决定性情况下进行推断,提供从数据中删除不相干信息的方法。在这项工作中,我们建议一种方法,在加权的最小方双支持矢量机器(FRLSSTSVM)中高效使用模糊的粗糙粗糙的原始粗糙的原始支持矢量机器(FRLSSTSVM)对不平衡数据进行分类。第一个创新是引入一个新的模糊粗糙的基于粗糙的下层取样战略,以使分类者在数据不平衡的分类方面更加强大。在FRLSSTSVM中构建两个准超强的超文本模型时,少数类的数据点保持不变,而多数类中的一个数据点则采用新方法。在这个模型中,我们把重量偏差的偏差点嵌入了LSTSVMM的原始双对不平衡数据进行分类。为了确定这一公式中的这些权重,我们采用了新的战略,用模糊粗略粗略的RSLSRSSM的分类方法,将S-S-S-S-s-res-rog-s的相关数据偏差理论将S-res-rolg-s