Graph Neural Networks (GNNs) have emerged as a powerful model for ML over graph-structured data. Yet, scalability remains a major challenge for using GNNs over billion-edge inputs. The creation of mini-batches used for training incurs computational and data movement costs that grow exponentially with the number of GNN layers as state-of-the-art models aggregate information from the multi-hop neighborhood of each input node. In this paper, we focus on scalable training of GNNs with emphasis on resource efficiency. We show that out-of-core pipelined mini-batch training in a single machine outperforms resource-hungry multi-GPU solutions. We introduce Marius++, a system for training GNNs over billion-scale graphs. Marius++ provides disk-optimized training for GNNs and introduces a series of data organization and algorithmic contributions that 1) minimize the memory-footprint and end-to-end time required for training and 2) ensure that models learned with disk-based training exhibit accuracy similar to those fully trained in mixed CPU/GPU settings. We evaluate Marius++ against PyTorch Geometric and Deep Graph Library using seven benchmark (model, data set) settings and find that Marius++ with one GPU can achieve the same level of model accuracy up to 8$\times$ faster than these systems when they are using up to eight GPUs. For these experiments, disk-based training allows Marius++ deployments to be up to 64$\times$ cheaper in monetary cost than those of the competing systems.
翻译:内建网络(GNNs) 已成为ML 超过 图形结构化数据的强大模型。 然而, 在使用 GNNs 超过 10 亿格内的投入方面, 缩放性仍然是使用 GNNs 超过 10 亿格内投入输入节点的多点周围的 GNN 层作为最先进的模型汇总信息的数量成倍增长的计算和数据移动成本。 在本文中, 我们侧重于对 GNNs 进行可缩放的培训, 重点是资源效率。 我们显示, 在一个机器超过 资源 10 10 亿格内内输入的多端输入软件解决方案中, 缩放式微插管小批培训是一个重大挑战。 我们引入了 Marius+这个用于培训GNNNS 超过 10亿格的系统。 Marius+为GNNCs 提供了磁盘优化培训, 引入了一系列数据组织和算术贡献:(1) 最大限度地降低记忆- 脚印和端端端端端培训所需的时间;和(2) 确保以磁盘为基础的培训模式所学模型显示的准确性类似于在 CP/GPPPPGPGPI= 80 标准中, 能够用这些GVIRC 10 和G 建的模型比这些GVI 10 建的模型比这些G 10 建的模型比这些GVD 10 标准级的模型比G 10 标准级标准级标准级的模型比这些G 级的模型比这些GV 10 。