Score-based generative models (SGMs) are a recent breakthrough in generating fake images. SGMs are known to surpass other generative models, e.g., generative adversarial networks (GANs) and variational autoencoders (VAEs). Being inspired by their big success, in this work, we fully customize them for generating fake tabular data. In particular, we are interested in oversampling minor classes since imbalanced classes frequently lead to sub-optimal training outcomes. To our knowledge, we are the first presenting a score-based tabular data oversampling method. Firstly, we re-design our own score network since we have to process tabular data. Secondly, we propose two options for our generation method: the former is equivalent to a style transfer for tabular data and the latter uses the standard generative policy of SGMs. Lastly, we define a fine-tuning method, which further enhances the oversampling quality. In our experiments with 6 datasets and 10 baselines, our method outperforms other oversampling methods in all cases.
翻译:基于分数的基因变异模型(SGMs)是最近在制作假图像方面的突破。已知SGMs超越了其他基因变异模型,例如基因对抗网络(GANs)和变异自动编码器(VAEs)。由于它们的巨大成功,我们在这项工作中充分定制了它们,以生成虚假的表单数据。特别是,我们有兴趣多采小类,因为不平衡的班级常常导致亚于最佳的培训结果。据我们所知,我们是第一个提出基于分数的表表数据过度抽样方法。首先,我们重新设计了我们自己的得分网络,因为我们必须处理表格数据。第二,我们提出了我们这一代方法的两个选项:前者相当于表格数据的样式转换,而后者则使用标准SGMs的基因化政策。最后,我们定义了一种微调方法,进一步提升过度抽样的质量。在我们用6个数据集和10个基线进行的实验中,我们的方法在所有案例中都优于其他过度抽样方法。