BlucGNN: 利用块环环重量矩阵实现高效的 GNN 加速 (BlockGNN: Towards Efficient GNN Acceleration Using Block-Circulant Weight Matrices)

In recent years, Graph Neural Networks (GNNs) appear to be state-of-the-art algorithms for analyzing non-euclidean graph data. By applying deep-learning to extract high-level representations from graph structures, GNNs achieve extraordinary accuracy and great generalization ability in various tasks. However, with the ever-increasing graph sizes, more and more complicated GNN layers, and higher feature dimensions, the computational complexity of GNNs grows exponentially. How to inference GNNs in real time has become a challenging problem, especially for some resource-limited edge-computing platforms. To tackle this challenge, we propose BlockGNN, a software-hardware co-design approach to realize efficient GNN acceleration. At the algorithm level, we propose to leverage block-circulant weight matrices to greatly reduce the complexity of various GNN models. At the hardware design level, we propose a pipelined CirCore architecture, which supports efficient block-circulant matrices computation. Basing on CirCore, we present a novel BlockGNN accelerator to compute various GNNs with low latency. Moreover, to determine the optimal configurations for diverse deployed tasks, we also introduce a performance and resource model that helps choose the optimal hardware parameters automatically. Comprehensive experiments on the ZC706 FPGA platform demonstrate that on various GNN tasks, BlockGNN achieves up to $8.3\times$ speedup compared to the baseline HyGCN architecture and $111.9\times$ energy reduction compared to the Intel Xeon CPU platform.

翻译：近年来,图形神经网络(GNNS)似乎是用于分析非欧元图形数据的最先进的算法。通过应用深层学习从图形结构中提取高层次代表,GNNS在各种任务中达到了非常准确性和超强的概括化能力。然而,随着图形规模的不断增加,GNNS层的日益复杂和特性的提高,GNNS的计算复杂性成倍增长。如何实时推断GNS已成为一个具有挑战性的问题,对于一些资源有限的边缘计算平台来说尤其如此。为了应对这一挑战,我们建议BlockGNNNN是一个软件硬件联合设计方法,从图形结构中提取高层次代表高层次代表高层次代表高层次代表的GNNNN的加速性能。然而,在硬件设计层面,我们建议建立一个编织好的CirCoreCore架构,支持高效的区块-Cnurentral矩阵计算。在CirCore Core Core,我们推出一个新的BGNNNNNP6加速度模型, 比较各种GNNNNCS的模型, 以及优化的软化的硬性测试。