GGCD:通过专用算法和加速器共同设计加速图变网络 (GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design)

Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art graph learning model. However, it can be notoriously challenging to inference GCNs over large graph datasets, limiting their application to large real-world graphs and hindering the exploration of deeper and more sophisticated GCN graphs. This is because real-world graphs can be extremely large and sparse. Furthermore, the node degree of GCNs tends to follow the power-law distribution and therefore have highly irregular adjacency matrices, resulting in prohibitive inefficiencies in both data processing and movement and thus substantially limiting the achievable GCN acceleration efficiency. To this end, this paper proposes a GCN algorithm and accelerator Co-Design framework dubbed GCoD which can largely alleviate the aforementioned GCN irregularity and boost GCNs' inference efficiency. Specifically, on the algorithm level, GCoD integrates a split and conquer GCN training strategy that polarizes the graphs to be either denser or sparser in local neighborhoods without compromising the model accuracy, resulting in graph adjacency matrices that (mostly) have merely two levels of workload and enjoys largely enhanced regularity and thus ease of acceleration. On the hardware level, we further develop a dedicated two-pronged accelerator with a separated engine to process each of the aforementioned denser and sparser workloads, further boosting the overall utilization and acceleration efficiency. Extensive experiments and ablation studies validate that our GCoD consistently reduces the number of off-chip accesses, leading to speedups of 15286x, 294x, 7.8x, and 2.5x as compared to CPUs, GPUs, and prior-art GCN accelerators including HyGCN and AWB-GCN, respectively, while maintaining or even improving the task accuracy.

翻译：GCN 的节点程度可能非常大,而且非常稀少。此外,GCN 的节点程度往往遵循了电法分布,因此具有高度不规则的相近矩阵,导致数据处理和移动效率都极低,从而大大限制了GCN的加速效率。为此,本文件提议GCN算法和加速器框架将GCN限制在大型真实世界的图形中应用,并阻碍探索更深、更先进的GCN图形。这是因为真实世界的图形可能非常大,而且非常稀少。此外,GCN的节点程度往往遵循电法分布,因此,GCN的节点分布非常不规则,因此,在数据处理和移动方面,GCN的节点效果极低,从而极大地限制了GCN的加速效率。