Dorylus: 以分布式 CPU 服务器和无服务器线索进行可负担、可缩放和准确的 GNN 培训 (Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads)

A graph neural network (GNN) enables deep learning on structured graph data. There are two major GNN training obstacles: 1) it relies on high-end servers with many GPUs which are expensive to purchase and maintain, and 2) limited memory on GPUs cannot scale to today's billion-edge graphs. This paper presents Dorylus: a distributed system for training GNNs. Uniquely, Dorylus can take advantage of serverless computing to increase scalability at a low cost. The key insight guiding our design is computation separation. Computation separation makes it possible to construct a deep, bounded-asynchronous pipeline where graph and tensor parallel tasks can fully overlap, effectively hiding the network latency incurred by Lambdas. With the help of thousands of Lambda threads, Dorylus scales GNN training to billion-edge graphs. Currently, for large graphs, CPU servers offer the best performance-per-dollar over GPU servers. Just using Lambdas on top of CPU servers offers up to 2.75x more performance-per-dollar than training only with CPU servers. Concretely, Dorylus is 1.22x faster and 4.83x cheaper than GPU servers for massive sparse graphs. Dorylus is up to 3.8x faster and 10.7x cheaper compared to existing sampling-based systems.

翻译：图形神经网络( GNN) 能够对结构图形数据进行深层次学习。有两大 GNN 培训障碍:(1) 它依赖高端服务器和许多GPU, 购买和维护费用昂贵, 以及(2) 它依赖高端、闭紧的输油管道, 购买和维护费用昂贵, 以及(2) GPU 上有限的记忆无法缩到今天的亿层图表。本文展示了 Dorylus : 用于培训 GNN 的分布式系统。独特地, Dorylus 可以利用无服务器的计算来提高可缩放性。指导我们设计的关键见解是计算分解。计算分解使得能够建造一个深层、闭紧的输油管, 其中图形和高压平行任务可以完全重叠, 有效地隐藏 Lambadas 所引发的网络内嵌。在数千个 Lambda 线索的帮助下, Dorylus 将 GNNNNT 培训升级为10- pay- pagefallserver 服务器。 Doryls 将比 Crealx 更快速的GReal 和GRealx 10- creal- seral 更快地压。