Graph neural networks (GNNs) are powerful tools for learning from graph data and are widely used in various applications such as social network recommendation, fraud detection, and graph search. The graphs in these applications are typically large, usually containing hundreds of millions of nodes. Training GNN models on such large graphs efficiently remains a big challenge. Despite a number of sampling-based methods have been proposed to enable mini-batch training on large graphs, these methods have not been proved to work on truly industry-scale graphs, which require GPUs or mixed-CPU-GPU training. The state-of-the-art sampling-based methods are usually not optimized for these real-world hardware setups, in which data movement between CPUs and GPUs is a bottleneck. To address this issue, we propose Global Neighborhood Sampling that aims at training GNNs on giant graphs specifically for mixed-CPU-GPU training. The algorithm samples a global cache of nodes periodically for all mini-batches and stores them in GPUs. This global cache allows in-GPU importance sampling of mini-batches, which drastically reduces the number of nodes in a mini-batch, especially in the input layer, to reduce data copy between CPU and GPU and mini-batch computation without compromising the training convergence rate or model accuracy. We provide a highly efficient implementation of this method and show that our implementation outperforms an efficient node-wise neighbor sampling baseline by a factor of 2X-4X on giant graphs. It outperforms an efficient implementation of LADIES with small layers by a factor of 2X-14X while achieving much higher accuracy than LADIES.We also theoretically analyze the proposed algorithm and show that with cached node data of a proper size, it enjoys a comparable convergence rate as the underlying node-wise sampling method.
翻译:图形神经网络(GNNs)是从图形数据中学习的强大工具,广泛用于社会网络建议、欺诈检测和图形搜索等各种应用。这些应用中的图表通常非常大,通常包含数亿个节点。在这样的大图形上培训GNN模型仍然是一项巨大的挑战。尽管已经提议了一些基于取样的方法,以便能够在大图表上进行微型批量培训,但这些方法还没有被证明能够用于真正的行业规模图,这需要GPU或混合CPU-GPU 的混合比例图,这需要GPU 或混合的 CPU- GPU 培训。最先进的基于取样的精度方法通常不是为这些真实世界硬件配置优化的,因为在这些硬件中,CPUs和GPUs之间的数据流动是一个瓶颈。为了解决这个问题,我们建议GIIbbbbrood的取样方法旨在对GNNNS进行巨型图培训,具体用于混合CPU-GPU培训。 算法将所有小X 的节点定期用于小节点的结点,并且储存在GPUPU。这个全球缓点可以让GPX 的精度的精度的精度的精度的精度的精度的精度,这个全球缓度,这个全球的精度可以让在GP-D-D-D-QA值中进行一个不比的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度。