Generative modeling has recently seen many exciting developments with the advent of deep generative architectures such as Variational Auto-Encoders (VAE) or Generative Adversarial Networks (GAN). The ability to draw synthetic i.i.d. observations with the same joint probability distribution as a given dataset has a wide range of applications including representation learning, compression or imputation. It appears that it also has many applications in privacy preserving data analysis, especially when used in conjunction with differential privacy techniques. This paper focuses on synthetic data generation models with privacy preserving applications in mind. It introduces a novel architecture, the Composable Generative Model (CGM) that is state-of-the-art in tabular data generation. Any conditional generative model can be used as a sub-component of the CGM, including CGMs themselves, allowing the generation of numerical, categorical data as well as images, text, or time series. The CGM has been evaluated on 13 datasets (6 standard datasets and 7 simulated) and compared to 14 recent generative models. It beats the state of the art in tabular data generation by a significant margin.
翻译:最近,随着诸如变异式自动计算器(VAE)或基因反转网络(GAN)等深层基因结构的出现,生成模型方面出现了许多令人振奋的发展。与特定数据集一样,合成i.d.观测和共同概率分布都具有广泛的应用,包括代表性学习、压缩或估算。看来,在隐私保护数据分析方面也有许多应用,特别是在与不同隐私技术结合使用时。本文件侧重于具有隐私保护应用的合成数据生成模型。它引入了一种新型结构,即在表格数据生成中最先进的可合成聚合模型(CGM ) 。任何有条件的基因化模型都可以作为CGM的子组成部分,包括CGMs本身,允许生成数字、绝对数据以及图像、文本或时间序列。CGM在13个数据集(6个标准数据集和7个模拟数据集)上进行了评估,与14个最近的基因化模型相比,它打破了列表数据生成过程中的艺术状态。