Dynamic Graph Neural Networks (DGNNs) are becoming increasingly popular due to their effectiveness in analyzing and predicting the evolution of complex interconnected graph-based systems. However, hardware deployment of DGNNs still remains a challenge. First, DGNNs do not fully utilize hardware resources because temporal data dependencies cause low hardware parallelism. Additionally, there is currently a lack of generic DGNN hardware accelerator frameworks, and existing GNN accelerator frameworks have limited ability to handle dynamic graphs with changing topologies and node features. To address the aforementioned challenges, in this paper, we propose DGNN-Booster, which is a novel Field-Programmable Gate Array (FPGA) accelerator framework for real-time DGNN inference using High-Level Synthesis (HLS). It includes two different FPGA accelerator designs with different dataflows that can support the most widely used DGNNs. We showcase the effectiveness of our designs by implementing and evaluating two representative DGNN models on ZCU102 board and measuring the end-to-end performance. The experiment results demonstrate that DGNN-Booster can achieve a speedup of up to 5.6x compared to the CPU baseline (6226R), 8.4x compared to the GPU baseline (A6000) and 2.1x compared to the FPGA baseline without applying optimizations proposed in this paper. Moreover, DGNN-Booster can achieve over 100x and over 1000x runtime energy efficiency than the CPU and GPU baseline respectively. Our implementation code and on-board measurements are publicly available at https://github.com/sharc-lab/DGNN-Booster.
翻译:动态图神经网络(DGNN)由于其在分析和预测复杂相互连接的基于图的系统演变方面的有效性而变得越来越流行。然而,DGNN的硬件部署仍然是一个挑战。首先,由于时间数据依赖关系导致硬件并行性较低,DGNN不能充分利用硬件资源。此外,目前缺乏通用的DGNN硬件加速器框架,并且现有的GNN加速器框架存在处理具有变化拓扑和节点特征的动态图的能力有限。为了解决上述挑战,在本文中,我们提出了DGNN-Booster。这是一个使用高级综合(HLS)的新型现场可编程门阵列(FPGA)加速器框架,用于实时DGNN推断。它包括两种不同的FPGA加速器设计,具有不同的数据流,可以支持最常用的DGNN。我们通过在ZCU102板上实现并评估两个代表性的DGNN模型来展示我们设计的有效性,并测量端到端性能。实验结果表明,与CPU基线(6226R)相比,DGNN-Booster可以实现高达5.6倍的加速,与GPU基线(A6000)相比可实现8.4倍的加速,与未应用本文中所提出的优化的FPGA基线相比可实现2.1倍的加速。此外,DGNN-Booster可以实现比CPU和GPU基线分别高出100倍和1000倍的运行时能效。我们的实现代码和板上测试数据可在https://github.com/sharc-lab/DGNN-Booster上公开获取。