TC-GNN: 加速粗缩图神经网络计算GPU的 " 电离电离电离电离电离电离电离电离电离电离电离电离电离电离电动电离电动电离电离电离电离电动电离电离电离电离电动电离电离电离电离电离电离电动电离电离电离电离电离电离电离电离电动电离电离电离电离电离电离电离电离电离电离电的电离电离电离电离电离电离电离电离电离电离电离电的电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电离电 (TC-GNN: Accelerating Sparse Graph Neural Network Computation Via Dense Tensor Core on GPUs)

Yuke Wang,Boyuan Feng,Yufei Ding

Recently, graph neural networks (GNNs), as the backbone of graph-based machine learning, demonstrate great success in various domains (e.g., e-commerce). However, the performance of GNNs is usually unsatisfactory due to the highly sparse and irregular graph-based operations. To this end, we propose, TC-GNN, the first GPU Tensor Core Unit (TCU) based GNN acceleration framework. The core idea is to reconcile the "Sparse" GNN computation with "Dense" TCU. Specifically, we conduct an in-depth analysis of the sparse operations in mainstream GNN computing frameworks. We introduce a novel sparse graph translation technique to facilitate TCU processing of sparse GNN workload. We also implement an effective CUDA core and TCU collaboration design to fully utilize GPU resources. We fully integrate TC-GNN with the Pytorch framework for ease of programming. Rigorous experiments show an average of 1.70X speedup over the state-of-the-art Deep Graph Library framework across various GNN models and dataset settings.

翻译：最近,图形神经网络(GNN)作为基于图形的机器学习的骨干,在各个领域(例如电子商务)表现出巨大的成功。然而,由于基于图形的操作非常稀少和不规则,GNN通常不能令人满意。为此,我们提议,TC-GNN, 以GNN为基础的第一个GPU Tensor核心单位(TCU) 加速框架。核心思想是调和“Sparse” GNN计算与“Dense” TCU。具体地说,我们对主流GNN计算框架中的稀疏操作进行深入分析。我们引进了一种新的稀疏的图形翻译技术,以便利TCU处理稀疏的GNN工作量。我们还实施了有效的CUDA核心和TCU合作设计,以充分利用GPU资源。我们完全将TC-GNNNN与Pytorch框架结合起来,以方便编程。严格的实验显示,在所有GNNN模式和数据设置的高级深图馆框架上平均1.70X速度。