A surge of interest in Graph Convolutional Networks (GCN) has produced thousands of GCN variants, with hundreds introduced every year. In contrast, many GCN models re-use only a handful of benchmark datasets as many graphs of interest, such as social or commercial networks, are proprietary. We propose a new graph generation problem to enable generating a diverse set of benchmark graphs for GCNs following the distribution of a source graph -- possibly proprietary -- with three requirements: 1) benchmark effectiveness as a substitute for the source graph for GCN research, 2) scalability to process large-scale real-world graphs, and 3) a privacy guarantee for end-users. With a novel graph encoding scheme, we reframe large-scale graph generation problem into medium-length sequence generation problem and apply the strong generation power of the Transformer architecture to the graph domain. Extensive experiments across a vast body of graph generative models show that our model can successfully generate benchmark graphs with the realistic graph structure, node attributes, and node labels required to benchmark GCNs on node classification tasks.
翻译:对图表革命网络(GCN)的兴趣激增产生了数千个GCN变量,每年引入数百个。相反,许多GCN模型只重新使用少数基准数据集,因为许多感兴趣的图表,如社会或商业网络,都是专有的。我们提出一个新的图形生成问题,以便在分发源图(可能是专有的)之后,为GCN生成一套不同的基准图表,有三个要求:(1)基准有效性,以替代GCN研究源图,(2)处理大规模真实世界图表的可缩放性,(3)最终用户的隐私保障。我们用新的图形编码办法,将大比例图形生成问题重新设置为中等时间序列生成问题,并将变异器结构的强大生成力应用于图形领域。在大量图表基因缩写模型上进行的广泛实验表明,我们的模型能够成功地生成基准图表,用现实的图形结构、节点属性和节点标签来为GCN在节点分类任务上的基准。