Graph neural networks (GNNs), which have emerged as an effective method for handling machine learning tasks on graphs, bring a new approach to building recommender systems, where the task of recommendation can be formulated as the link prediction problem on user-item bipartite graphs. Training GNN-based recommender systems (GNNRecSys) on large graphs incurs a large memory footprint, easily exceeding the DRAM capacity on a typical server. Existing solutions resort to distributed subgraph training, which is inefficient due to the high cost of dynamically constructing subgraphs and significant redundancy across subgraphs. The emerging Intel Optane persistent memory allows a single machine to have up to 6 TB of memory at an affordable cost, thus making single-machine GNNRecSys training feasible, which eliminates the inefficiencies in distributed training. One major concern of using Optane for GNNRecSys is Optane's relatively low bandwidth compared with DRAMs. This limitation can be particularly detrimental to achieving high performance for GNNRecSys workloads since their dominant compute kernels are sparse and memory access intensive. To understand whether Optane is a good fit for GNNRecSys training, we perform an in-depth characterization of GNNRecSys workloads and a comprehensive benchmarking study. Our benchmarking results show that when properly configured, Optane-based single-machine GNNRecSys training outperforms distributed training by a large margin, especially when handling deep GNN models. We analyze where the speedup comes from, provide guidance on how to configure Optane for GNNRecSys workloads, and discuss opportunities for further optimizations.
翻译:图表神经网络(GNNRecSys)已成为处理图表上机器深层学习任务的有效方法,它为建立建议系统提供了新的方法,建议的任务可以作为用户-项目双partite图形的链接预测问题。在大图表上培训基于GNNN的推荐系统(GNNNRecSys)产生很大的记忆足迹,很容易超过典型服务器上的 DRAM 能力。现有的解决方案采用分布式子图培训,这种培训效率低下,因为动态地构建子图的费用高昂,且各子图中有大量冗余。新兴的Intel Optane 持续记忆使一台机器能够以可承受的成本拥有最多6TB的记忆,从而使单机GNNNNRecS培训变得可行,从而消除分布式培训的低效率。GNNCRecS使用 Optanetary 与DRAMs相比的带宽频率相对较低。这种限制可能特别不利于GNNPS工作量的高性性运行,因为其主控性化的精度的精度的精度的精度性培训模型处理,因此能够对 GNNNS进行精度的精度的精度的精度的精度的精度。