Graph neural networks (GNNs), which have emerged as an effective method for handling machine learning tasks on graphs, bring a new approach to building recommender systems, where the task of recommendation can be formulated as the link prediction problem on user-item bipartite graphs. Training GNN-based recommender systems (GNNRecSys) on large graphs incurs a large memory footprint, easily exceeding the DRAM capacity on a typical server. Existing solutions resort to distributed subgraph training, which is inefficient due to the high cost of dynamically constructing subgraphs and significant redundancy across subgraphs. The emerging persistent memory technologies provide a significantly larger memory capacity than DRAMs at an affordable cost, making single-machine GNNRecSys training feasible, which eliminates the inefficiencies in distributed training. One major concern of using persistent memory devices for GNNRecSys is their relatively low bandwidth compared with DRAMs. This limitation can be particularly detrimental to achieving high performance for GNNRecSys workloads since their dominant compute kernels are sparse and memory access intensive. To understand whether persistent memory is a good fit for GNNRecSys training, we perform an in-depth characterization of GNNRecSys workloads and a comprehensive analysis of their performance on a persistent memory device, namely, Intel Optane. Based on the analysis, we provide guidance on how to configure Optane for GNNRecSys workloads. Furthermore, we present techniques for large-batch training to fully realize the advantages of single-machine GNNRecSys training. Our experiment results show that with the tuned batch size and optimal system configuration, Optane-based single-machine GNNRecSys training outperforms distributed training by a large margin, especially when handling deep GNN models.
翻译:图表神经网络(GNNNS)已成为处理图表上机器学习任务的有效方法,它带来了一种建设建议系统的新方法,其中建议的任务可以作为用户-项目双partite图形的链接预测问题来制定。在大图上培训基于GNNN的推荐系统(GNNNRecSys)会产生很大的记忆足迹,很容易超过典型服务器上的DRAM能力。现有的解决方案采用分布式子图培训,这种培训效率低下,因为动态建立子图和子图中大量冗余的成本很高。新兴的耐久记忆技术提供了比DRAM大得多的记忆能力,以可承受的成本计算,使单机器GNNNCS系统的培训变得可行,从而消除分布式培训效率的低效率。这种限制可能特别不利于GNNNRS的高度工作,因为其主要的计算结果是稀少的和记忆存留技术,因此,我们目前对GNNNS进行最高级的内装分析,也就是对GNNS进行最高级的内脏的内装分析。