We present the Sequential Aggregation and Rematerialization (SAR) scheme for distributed full-batch training of Graph Neural Networks (GNNs) on large graphs. Large-scale training of GNNs has recently been dominated by sampling-based methods and methods based on non-learnable message passing. SAR on the other hand is a distributed technique that can train any GNN type directly on an entire large graph. The key innovation in SAR is the distributed sequential rematerialization scheme which sequentially re-constructs then frees pieces of the prohibitively large GNN computational graph during the backward pass. This results in excellent memory scaling behavior where the memory consumption per worker goes down linearly with the number of workers, even for densely connected graphs. Using SAR, we report the largest applications of full-batch GNN training to-date, and demonstrate large memory savings as the number of workers increases. We also present a general technique based on kernel fusion and attention-matrix rematerialization to optimize both the runtime and memory efficiency of attention-based models. We show that, coupled with SAR, our optimized attention kernels lead to significant speedups and memory savings in attention-based GNNs.We made the SAR GNN training library publicy available: \url{https://github.com/IntelLabs/SAR}.
翻译:我们用大图表展示图表神经网络(GNNs)分布式全批培训的序列聚合和再物质化(SAR)计划。GNNs的大规模培训最近以基于非可忽略信息传递的抽样方法为主。另一方面,SAR是一种分布式技术,可以直接用整个大图培训任何GNN类型。SAR的关键创新是分布式连续再材料化计划,在后传过程中按顺序重新构建,然后在后传过程中释放出令人无法接受的大GNN计算图的碎片。这导致出色的记忆缩放行为,使每个工人的记忆消耗量与工人数量线性下降,即使是在密连图中也是如此。我们使用SAR报告全批GNNN培训的最大应用,并显示随着工人人数的增加而节省了大量的记忆。我们还介绍了一种基于内核聚和注意力矩阵的普通技术,以优化关注模型的运行时间和记忆效率。我们展示了与SAAR一道,我们优化了对GNSN的记忆速度。