基于 GAN 的 GAN 数据放大数据解析类平衡 (GAN based Data Augmentation to Resolve Class Imbalance)

The number of credit card fraud has been growing as technology grows and people can take advantage of it. Therefore, it is very important to implement a robust and effective method to detect such frauds. The machine learning algorithms are appropriate for these tasks since they try to maximize the accuracy of predictions and hence can be relied upon. However, there is an impending flaw where in machine learning models may not perform well due to the presence of an imbalance across classes distribution within the sample set. So, in many related tasks, the datasets have a very small number of observed fraud cases (sometimes around 1 percent positive fraud instances found). Therefore, this imbalance presence may impact any learning model's behavior by predicting all labels as the majority class, hence allowing no scope for generalization in the predictions made by the model. We trained Generative Adversarial Network(GAN) to generate a large number of convincing (and reliable) synthetic examples of the minority class that can be used to alleviate the class imbalance within the training set and hence generalize the learning of the data more effectively.

翻译：信用卡欺诈的数量随着技术的不断增长而不断增长,人们可以利用它。因此,实施一种强有力和有效的方法来发现这类欺诈非常重要。机器学习算法对于这些任务是合适的,因为它们试图最大限度地提高预测的准确性,因此可以依赖这些算法。但是,由于抽样集中存在不同类别分布不平衡的现象,机器学习模型可能无法很好地发挥作用,因此,在许多相关任务中,数据集的观察到的欺诈案件数量很少(有时发现大约1%的正欺诈案件 ) 。因此,这种不平衡的存在可能会影响任何学习模型的行为,因为预测所有标签都是多数类,因此在模型预测中没有普及的余地。我们培训了Genemental Adversarial网络(GAN),以产生大量少数群体类的令人信服的(和可靠的)合成例子,这些例子可用来缓解培训组内的阶级不平衡,从而更有效地普及对数据的学习。