通过图形变异自动编码器生成现实合成关系数据 (Generating Realistic Synthetic Relational Data through Graph Variational Autoencoders)

Ciro Antonio Mami,Andrea Coser,Eric Medvet,Alexander T. P. Boudewijn,Marco Volpe,Michael Whitworth,Borut Svara,Gabriele Sgroi,Daniele Panfilo,Sebastiano Saccani

from arxiv, 8 pages, 2 figures, 2 tables, Synthetic Data 4 ML workshop of the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

Synthetic data generation has recently gained widespread attention as a more reliable alternative to traditional data anonymization. The involved methods are originally developed for image synthesis. Hence, their application to the typically tabular and relational datasets from healthcare, finance and other industries is non-trivial. While substantial research has been devoted to the generation of realistic tabular datasets, the study of synthetic relational databases is still in its infancy. In this paper, we combine the variational autoencoder framework with graph neural networks to generate realistic synthetic relational databases. We then apply the obtained method to two publicly available databases in computational experiments. The results indicate that real databases' structures are accurately preserved in the resulting synthetic datasets, even for large datasets with advanced data types.

翻译：最近,合成数据生成作为传统数据匿名化的更可靠替代方法,最近得到了广泛的关注。所涉方法最初是为图像合成而开发的。因此,这些方法在典型的保健、金融和其他行业的表格和相关数据集中的应用是非三重的。虽然已经对产生现实的表格数据集进行了大量研究,但合成关系数据库的研究仍处于初级阶段。在本文件中,我们将变式自动编码框架与图形神经网络结合起来,以生成现实的合成关系数据库。我们随后在计算实验中将所获得的方法应用于两个公开的数据库。结果显示,实际数据库的结构准确保存在由此产生的合成数据集中,甚至保存在具有先进数据类型的大型数据集中。

相关内容

自编码器

关注 140

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日