An increasing number of machine learning tasks require dealing with large graph datasets, which capture rich and complex relationship among potentially billions of elements. Graph Neural Network (GNN) becomes an effective way to address the graph learning problem by converting the graph data into a low dimensional space while keeping both the structural and property information to the maximum extent and constructing a neural network for training and referencing. However, it is challenging to provide an efficient graph storage and computation capabilities to facilitate GNN training and enable development of new GNN algorithms. In this paper, we present a comprehensive graph neural network system, namely AliGraph, which consists of distributed graph storage, optimized sampling operators and runtime to efficiently support not only existing popular GNNs but also a series of in-house developed ones for different scenarios. The system is currently deployed at Alibaba to support a variety of business scenarios, including product recommendation and personalized search at Alibaba's E-Commerce platform. By conducting extensive experiments on a real-world dataset with 492.90 million vertices, 6.82 billion edges and rich attributes, AliGraph performs an order of magnitude faster in terms of graph building (5 minutes vs hours reported from the state-of-the-art PowerGraph platform). At training, AliGraph runs 40%-50% faster with the novel caching strategy and demonstrates around 12 times speed up with the improved runtime. In addition, our in-house developed GNN models all showcase their statistically significant superiorities in terms of both effectiveness and efficiency (e.g., 4.12%-17.19% lift by F1 scores).
翻译:越来越多的机器学习任务要求处理大型图表数据集,这些数据集可以捕捉潜在数十亿元素之间的丰富和复杂关系。 图表神经网络(GNN)成为解决图形学习问题的有效方法,将图形数据转换成低维空间,同时将结构和财产信息保持在最大程度上,并建造用于培训和参考的神经网络。然而,提供高效的图形存储和计算能力以促进GNN培训和开发新的GNN算法是具有挑战性的。 在本文中,我们展示了一个全面的图形神经网络系统,即AliGraph, 该系统由分布式图表存储、优化取样操作员和运行时间组成,以便有效支持图形学习问题,不仅将图形数据转换为低维空间,同时将结构和财产信息尽可能保持在最高范围内,同时为培训和参考建立一个神经网络网络。 然而,提供高效的图形存储和计算能力,以促进Gphbus的G-Commerce平台。 通过对具有492.90万个显示的直径模型进行广泛的实验,6820亿顶端和丰富的属性。 AliGraph在40分钟内, 快速运行一个快速的图像平台,以更快的速度运行。