Graph neural network (GNN) has been demonstrated to be a powerful model in many domains for its effectiveness in learning over graphs. To scale GNN training for large graphs, a widely adopted approach is distributed training which accelerates training using multiple computing nodes. Maximizing the performance is essential, but the execution of distributed GNN training remains preliminarily understood. In this work, we provide an in-depth analysis of distributed GNN training on GPUs, revealing several significant observations and providing useful guidelines for both software optimization and hardware optimization.
翻译:事实证明,在很多领域,图形神经网络(GNN)在通过图表学习的有效性方面是一个强大的模型。为了扩大GNN对大图表的培训规模,广泛采用的培训方式是分布式培训,加速使用多个计算节点的培训。必须最大限度地提高性能,但分布式GNN培训的实施仍然初步为人们所理解。在这项工作中,我们深入分析了分布式GNN关于GPU的培训,揭示了一些重要的观察结果,并为软件优化和硬件优化提供了有用的指导。