When presented with a binary classification problem where the data exhibits severe class imbalance, most standard predictive methods may fail to accurately model the minority class. We present a model based on Generative Adversarial Networks which uses additional regularization losses to map majority samples to corresponding synthetic minority samples. This translation mechanism encourages the synthesized samples to be close to the class boundary. Furthermore, we explore a selection criterion to retain the most useful of the synthesized samples. Experimental results using several downstream classifiers on a variety of tabular class-imbalanced datasets show that the proposed method improves average precision when compared to alternative re-weighting and oversampling techniques.
翻译:当提出一个二进制分类问题时,当数据显示出严重的阶级不平衡时,大多数标准预测方法可能无法准确模拟少数阶层。我们提出了一个基于基因反versarial Networks的模型,该模型使用额外的正规化损失来将多数人样本绘制成相应的合成少数民族样本。这一翻译机制鼓励综合样本靠近分类边界。此外,我们探索一个选择标准,以保留最有用的综合样本。在各种表格类别平衡的数据集中使用几个下游分类方法的实验结果显示,与替代的重新加权和过度抽样技术相比,拟议方法提高了平均精确度。