Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs. The large data sizes of graphs and their vertex features make scalable training algorithms and distributed memory systems necessary. Since the convolution operation on graphs induces irregular memory access patterns, designing a memory- and communication-efficient parallel algorithm for GCN training poses unique challenges. We propose a highly parallel training algorithm that scales to large processor counts. In our solution, the large adjacency and vertex-feature matrices are partitioned among processors. We exploit the vertex-partitioning of the graph to use non-blocking point-to-point communication operations between processors for better scalability. To further minimize the parallelization overheads, we introduce a sparse matrix partitioning scheme based on a hypergraph partitioning model for full-batch training. We also propose a novel stochastic hypergraph model to encode the expected communication volume in mini-batch training. We show the merits of the hypergraph model, previously unexplored for GCN training, over the standard graph partitioning model which does not accurately encode the communication costs. Experiments performed on real-world graph datasets demonstrate that the proposed algorithms achieve considerable speedups over alternative solutions. The optimizations achieved on communication costs become even more pronounced at high scalability with many processors. The performance benefits are preserved in deeper GCNs having more layers as well as on billion-scale graphs.
翻译:在图表上,大量数据大小的图表及其顶端特征使得可缩放的培训算法和分布式记忆系统成为必要。由于图形上的混凝土操作导致不规则的记忆存取模式,因此为GCN培训设计一个记忆和通信效率的平行算法提出了独特的挑战。我们提出了一种高度平行的培训算法,将规模与大处理器计算成比例。在我们的解决方案中,大相距和脊椎特征矩阵在处理器之间进行分割。我们利用图表的顶端分割法,使用不阻隔的点对点计算法和分布式记忆系统来更好地缩放。由于图表上的混凝土操作导致不规则的内存存存访问模式,因此为GCN培训设计了一种基于记忆和通信效率的记忆和通信平行平行平行的平行算法。我们还提议了一个新型的超光谱模型,用以在小型处理器培训中将预期的通信量进行分解。我们展示了GCN前未勘探过的超直径模型的优点,用于GCN培训的顶端通信操作,以便更精确的点对进程进行非精确的点对点对点通信的通信操作。在标准的更精确的更精确的图像分配模型上,从而在标准的精确的升级的升级的计算中,在高速度模型上,从而能化地显示了比地计算成本。