As the field of Graph Neural Networks (GNN) continues to grow, it experiences a corresponding increase in the need for large, real-world datasets to train and test new GNN models on challenging, realistic problems. Unfortunately, such graph datasets are often generated from online, highly privacy-restricted ecosystems, which makes research and development on these datasets hard, if not impossible. This greatly reduces the amount of benchmark graphs available to researchers, causing the field to rely only on a handful of publicly-available datasets. To address this problem, we introduce a novel graph generative model, Computation Graph Transformer (CGT) that learns and reproduces the distribution of real-world graphs in a privacy-controlled way. More specifically, CGT (1) generates effective benchmark graphs on which GNNs show similar task performance as on the source graphs, (2) scales to process large-scale graphs, (3) incorporates off-the-shelf privacy modules to guarantee end-user privacy of the generated graph. Extensive experiments across a vast body of graph generative models show that only our model can successfully generate privacy-controlled, synthetic substitutes of large-scale real-world graphs that can be effectively used to benchmark GNN models.
翻译:随着图形神经网络(GNN)领域的继续增长,它也经历了对大型实际世界数据集的需求相应增加,以培训和测试关于具有挑战性、现实问题的新的GNN模型。不幸的是,这类图表数据集往往是由在线、高度隐私限制的生态系统生成的,这使得这些数据集的研究和开发很难,甚至不可能。这大大降低了研究人员可使用的基准图表的数量,使实地只能依赖少量公开提供的数据集。为解决这一问题,我们引入了一个新型的图表基因化模型,即Computation Graft 变异器(CGT),该模型以隐私控制的方式学习并复制真实世界图的分布。更具体地说,CGT(1)生成了有效的基准图表,GNN在源图上显示类似的任务性,(2) 处理大型图表的尺度,(3) 将现成的隐私模块纳入外地,以保障生成的图表的最终用户隐私隐私。在庞大的图形化模型中进行广泛的实验表明,只有我们的模型才能成功地生成隐私控制、合成的GNNG模型。