Tabular data synthesis has received wide attention in the literature. This is because available data is often limited, incomplete, or cannot be obtained easily, and data privacy is becoming increasingly important. In this work, we present a generalized GAN framework for tabular synthesis, which combines the adversarial training of GANs and the negative log-density regularization of invertible neural networks. The proposed framework can be used for two distinctive objectives. First, we can further improve the synthesis quality, by decreasing the negative log-density of real records in the process of adversarial training. On the other hand, by increasing the negative log-density of real records, realistic fake records can be synthesized in a way that they are not too much close to real records and reduce the chance of potential information leakage. We conduct experiments with real-world datasets for classification, regression, and privacy attacks. In general, the proposed method demonstrates the best synthesis quality (in terms of task-oriented evaluation metrics, e.g., F1) when decreasing the negative log-density during the adversarial training. If increasing the negative log-density, our experimental results show that the distance between real and fake records increases, enhancing robustness against privacy attacks.
翻译:文献中广泛关注了表层数据合成,因为现有数据往往有限、不完整或难以轻易获得,数据隐私越来越重要。在这项工作中,我们提出了一个通用的表格合成GAN框架,将全球网络的对抗性培训和不可视神经网络的负日密度正规化结合起来。拟议框架可用于两个不同的目标。首先,我们可以进一步改进综合质量,降低对抗性培训过程中真实记录的负日密度。另一方面,通过提高真实记录的负日密度,可以合成现实的假记录,使其不远接近真实记录,减少潜在信息泄漏的可能性。我们用真实世界数据集进行分类、回归和隐私攻击的实验。总体而言,拟议方法在减少对抗性培训中的负日密度时,可以进一步提高合成质量(任务导向评价指标,例如F1),从而降低负面日志密度。如果提高负面日志密度,则可以合成的假记录可以合成为真实的距离增加。