FLGNN:实时工作量-不可知图形神经网络推断数据流结构 (FlowGNN: A Dataflow Architecture for Real-Time Workload-Agnostic Graph Neural Network Inference)

Graph neural networks (GNNs) have recently exploded in popularity thanks to their broad applicability to graph-related problems such as quantum chemistry, drug discovery, and high energy physics. However, meeting demand for novel GNN models and fast inference simultaneously is challenging due to the gap between developing efficient accelerators and the rapid creation of new GNN models. Prior art focuses on accelerating specific classes of GNNs, such as Graph Convolutional Networks (GCN), but lacks generality to support a wide range of existing or new GNN models. Furthermore, most works rely on graph pre-processing to exploit data locality, making them unsuitable for real-time applications. To address these limitations, in this work, we propose a generic dataflow architecture for GNN acceleration, named FlowGNN, which is generalizable to the majority of message-passing GNNs. The contributions are three-fold. First, we propose a novel and scalable dataflow architecture, which generally supports a wide range of GNN models with message-passing mechanism. The architecture features a configurable dataflow optimized for simultaneous computation of node embedding, edge embedding, and message passing, which is generally applicable to all models. We also propose a rich library of model-specific components. Second, we deliver ultra-fast real-time GNN inference without any graph pre-processing, making it agnostic to dynamically changing graph structures. Third, we verify our architecture on the Xilinx Alveo U50 FPGA board and measure the on-board end-to-end performance. We achieve a speed-up of up to 24-254x against CPU (6226R) and 1.3-477x against GPU (A6000) (with batch sizes 1 through 1024); we also outperform the SOTA GNN accelerator I-GCN by 1.26x speedup and 1.55x energy efficiency over four datasets. Our implementation code and on-board measurement are publicly available on GitHub.

翻译：图表神经网络(GNNs)最近由于广泛应用量子化学、药物发现和高能物理等与图形有关的问题而受到欢迎。然而,同时满足对新型GNN模型和快速推断的需求具有挑战性,因为开发高效加速器与快速创建新的GNN模型之间存在差距。先前的艺术侧重于加速GNN的具体类别,例如图变动网络(GCN),但缺乏支持现有或新的GNN模型的通用性。此外,大多数工程都依靠图形预处理来利用数据定位,使其不适合实时应用。然而,为了应对这些局限性,我们提议为GNNN的加速(名为FlookGNNNNNN)模型同时开发通用数据流结构,这个结构对大多数通过信息传输GNNNNNC的 GNNNC模型来说是通用的。首先,我们提出一个新的和可扩缩的数据流结构(GNNNNM ) 通常支持有信息传输机制的广大的GNNNM模型。建筑中, 将第三个端端端数据流优化数据流优化用于同步结构, 将GFODFILODSDA 和SNFA 数据转换的模型升级到我们的所有S- frealNFIFADMDMDS-S-S-S-S-S-S-S-SO-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S