Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs). Due to the nature of evolving graph structures into the training process, vanilla GNNs usually fail to scale up, limited by the GPU memory space. Up to now, though numerous scalable GNN architectures have been proposed, we still lack a comprehensive survey and fair benchmark of this reservoir to find the rationale for designing scalable GNNs. To this end, we first systematically formulate the representative methods of large-scale graph training into several branches and further establish a fair and consistent benchmark for them by a greedy hyperparameter searching. In addition, regarding efficiency, we theoretically evaluate the time and space complexity of various branches and empirically compare them w.r.t GPU memory usage, throughput, and convergence. Furthermore, We analyze the pros and cons for various branches of scalable GNNs and then present a new ensembling training manner, named EnGCN, to address the existing issues. Our code is available at https://github.com/VITA-Group/Large_Scale_GCN_Benchmarking.
翻译:大型图形培训对于图形神经网络(GNNs)来说是一个臭名昭著的挑战性问题。由于在培训过程中不断演变的图形结构的性质,Vanilla GNNs通常未能扩大规模,受到GPU记忆空间的限制。尽管提出了许多可扩缩的GNN结构,但迄今为止,我们仍缺乏对这一库的全面调查和公平基准,以找到设计可扩缩的GNNs的理由。为此,我们首先系统地将大规模图形培训的代表性方法发展成几个分支,并通过贪婪的超参数搜索进一步为它们建立一个公平和一致的基准。此外,关于效率问题,我们理论上评估了各分支的时间和空间复杂性,并从经验上比较了这些分支的 w.r.t GPU记忆使用情况、吞吐量和趋同。此外,我们分析了可扩缩的GNNNPs各部门的利弊,然后提出了新的编组培训方式,称为EGCN,以解决现有问题。我们的代码可在https://github.com/VITAGroup/Large_Gerge_G_Genchmarking。</s>