Developing scalable solutions for training Graph Neural Networks (GNNs) for link prediction tasks is challenging due to the high data dependencies which entail high computational cost and huge memory footprint. We propose a new method for scaling training of knowledge graph embedding models for link prediction to address these challenges. Towards this end, we propose the following algorithmic strategies: self-sufficient partitions, constraint-based negative sampling, and edge mini-batch training. Both, partitioning strategy and constraint-based negative sampling, avoid cross partition data transfer during training. In our experimental evaluation, we show that our scaling solution for GNN-based knowledge graph embedding models achieves a 16x speed up on benchmark datasets while maintaining a comparable model performance as non-distributed methods on standard metrics.
翻译:由于数据依赖性高,导致计算成本高,记忆足迹大,因此,为培训连接预测任务的图表神经网络(GNNs)开发可扩展的解决方案具有挑战性。我们提出了一种新方法,用于扩大知识图形嵌入模型的培训,以便将预测与应对这些挑战联系起来。为此,我们提出以下算法战略:自给自足分区、基于限制的负面抽样和边缘小型批量培训。两者都是分割战略和基于限制的负面抽样,避免在培训期间交叉分割数据转移。在实验性评估中,我们表明,我们为基于GNN的知识图形嵌入模型的扩大解决方案在基准数据集上实现了16x的加速,同时将模型的性能作为标准指标上的非分散方法保持可比的模型性能。