本地学习, 正确的全球全球学习: 用于培训图表神经网络的分布式算法 (Learn Locally, Correct Globally: A Distributed Algorithm for Training Graph Neural Networks)

Despite the recent success of Graph Neural Networks (GNNs), training GNNs on large graphs remains challenging. The limited resource capacities of the existing servers, the dependency between nodes in a graph, and the privacy concern due to the centralized storage and model learning have spurred the need to design an effective distributed algorithm for GNN training. However, existing distributed GNN training methods impose either excessive communication costs or large memory overheads that hinders their scalability. To overcome these issues, we propose a communication-efficient distributed GNN training technique named $\text{{Learn Locally, Correct Globally}}$ (LLCG). To reduce the communication and memory overhead, each local machine in LLCG first trains a GNN on its local data by ignoring the dependency between nodes among different machines, then sends the locally trained model to the server for periodic model averaging. However, ignoring node dependency could result in significant performance degradation. To solve the performance degradation, we propose to apply $\text{{Global Server Corrections}}$ on the server to refine the locally learned models. We rigorously analyze the convergence of distributed methods with periodic model averaging for training GNNs and show that naively applying periodic model averaging but ignoring the dependency between nodes will suffer from an irreducible residual error. However, this residual error can be eliminated by utilizing the proposed global corrections to entail fast convergence rate. Extensive experiments on real-world datasets show that LLCG can significantly improve the efficiency without hurting the performance.

翻译：尽管Great Neal网络(GNN)最近取得了成功,但是在大型图表上培训GNN公司仍然具有挑战性。现有服务器的资源有限、图表中节点之间的依赖性、以及中央存储和模型学习带来的隐私问题,都促使需要设计一个有效的GNN培训分配算法;然而,现有的分布式GNN培训方法要么造成通信费用过高,要么造成大量记忆管理费用过高,从而妨碍其可缩放能力。为了克服这些问题,我们提议在服务器上采用一个传播效率高的分布式GNN培训技术,名为$\text ⁇ Learn本地端,正全球$。为了减少通信和记忆管理,LLCG每家本地机器首先通过忽略不同机器节点之间的依赖性能,将GNNNN的本地数据列载在本地数据上,然后将当地培训模式发送到服务器上,以平均平均速度。然而,忽略不依赖性依赖性能降低性能退化。为了解决这些问题,我们提议在服务器上应用$\gree glob Servil Creal Redial remodial mass be be be radial delish be raviol ex be radududududududuce the ex the lacudududududuce vial druce vice lacuild vice laut the vice laut vice laut laut ex su su su laut laut laut la laut laut laut laut latialtra laut laut laut lad laut laut laut lad lad laut lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad lad 一种在不大量制一种不使用一种在服务器上,我们不大量使用一种不大量使用一种不使用一种定期模型上使用一种模拟制制制制的模型上,但不大量制制制制制制制制制制制制制制制制制制制制制制制