Class imbalance problems manifest in domains such as financial fraud detection or network intrusion analysis, where the prevalence of one class is much higher than another. Typically, practitioners are more interested in predicting the minority class than the majority class as the minority class may carry a higher misclassification cost. However, classifier performance deteriorates in the face of class imbalance as oftentimes classifiers may predict every point as the majority class. Methods for dealing with class imbalance include cost-sensitive learning or resampling techniques. In this paper, we introduce DeepBalance, an ensemble of deep belief networks trained with balanced bootstraps and random feature selection. We demonstrate that our proposed method outperforms baseline resampling methods such as SMOTE and under- and over-sampling in metrics such as AUC and sensitivity when applied to a highly imbalanced financial transaction data. Additionally, we explore performance and training time implications of various model parameters. Furthermore, we show that our model is easily parallelizable, which can reduce training times. Finally, we present an implementation of DeepBalance in R.
翻译:在金融欺诈检测或网络入侵分析等领域,出现分类不平衡问题,例如金融欺诈检测或网络入侵分析,其中某一类的发生率比另一类高得多。一般而言,从业人员更有兴趣预测少数类,而不是多数类,因为少数群体类的分类成本可能更高。然而,在面临阶级不平衡的情况下,分类性能恶化,因为分类者往往可以预测多数类的每个点。处理阶级不平衡的方法包括成本敏感的学习或再抽样技术。在本文中,我们引入了深博相,这是经过平衡靴子和随机特征选择培训的深信仰网络的组合。我们证明,我们所提议的方法优于SMOTE和AUC等指标的基线抽样方法,在应用高度不平衡的金融交易数据时,敏感度也高于AUC等指标。此外,我们探讨各种模型参数的绩效和培训时间影响。此外,我们表明我们的模型很容易平行,可以减少培训时间。最后,我们在R中介绍了“深博”。