We aim to demonstrate in experiments that our cost sensitive PEGASOS SVM (without synthetic majority oversampling/under sampling (SMOTE) ) achieves good performance on imbalanced data sets with a Majority to Minority Ratio ranging from 8.6:1 to 130:1. Although many resort to SMOTE methods, we aim for a less computationally intensive method. We evaluate the performance by examining the learning curves. These curves diagnose whether we overfit or underfit or we choose over-representive or under representative training/test data. We will also examine the effect of varying the hyperparameters via validation curves. We compare our PEGASOS Cost-Sensitive SVM's results on three of the datasets Ding analyzed using his LINEAR SVM DECIDL method. He obtained an ROC-AUC of .5 in one dataset. We consider that dataset the most promising use of kernel Support Vector Machine. Our work will extend the work of Ding by incorporating kernels into Support Vector Machine. We will use Python rather than MatLab as python has dictionaries for storing mixed data types during multi-parameter cross-validation.
翻译:我们的目标是在实验中证明,我们的成本敏感的PEGASOS SVM(没有合成多数过度抽样/正在取样(SMOTE))在不平衡的数据集上取得了良好的表现,其多数比例与少数比例介于8.6:1至130:1之间。虽然我们有许多人采用SMOTE方法,但我们的目标是采用一种不那么计算密集的方法。我们通过研究学习曲线来评估性能。这些曲线判断我们是否过分适合或不适宜,或者我们是否选择过份或有代表性的培训/测试数据。我们还将通过验证曲线来研究不同超分参数的效果。我们用他的LINEAR SVM DECIDL方法比较了我们PEASOS对成本敏感的SVM在三个数据集中分析的结果。他在一个数据集中获得了一个0.5的ROC-AUC。我们认为,数据集最有前途地使用了内核支持矢的矢量机器。我们的工作将通过将内核纳入支持矢量机器而扩大工作的范围。我们将使用Pythons-敏感度SVM的SVM结果,而不是在多盘存储期间将Pyth-lavical-traction数据作为跨类型。</s>