FLGNN:一个数据流结构结构,用于通过多立队流推导的通用图形神经网络推理 (FlowGNN: A Dataflow Architecture for Universal Graph Neural Network Inference via Multi-Queue Streaming)

from arxiv, 13 pages, 10 figures. Submitted to MICRO 2022. Accelerator source code will be released on GitHub upon acceptance. arXiv admin note: text overlap with arXiv:2201.08475

Graph neural networks (GNNs) have recently exploded in popularity thanks to their broad applicability to graph-related problems such as quantum chemistry, drug discovery, and high energy physics. However, meeting demand for novel GNN models and fast inference simultaneously is challenging because of the gap between developing efficient accelerators and the rapid creation of new GNN models. Prior art focuses on the acceleration of specific classes of GNNs, such as Graph Convolutional Network (GCN), but lacks the generality to support a wide range of existing or new GNN models. Meanwhile, most work rely on graph pre-processing to exploit data locality, making them unsuitable for real-time applications. To address these limitations, in this work, we propose a generic dataflow architecture for GNN acceleration, named FlowGNN, which can flexibly support the majority of message-passing GNNs. The contributions are three-fold. First, we propose a novel and scalable dataflow architecture, which flexibly supports a wide range of GNN models with message-passing mechanism. The architecture features a configurable dataflow optimized for simultaneous computation of node embedding, edge embedding, and message passing, which is generally applicable to all models. We also propose a rich library of model-specific components. Second, we deliver ultra-fast real-time GNN inference without any graph pre-processing, making it agnostic to dynamically changing graph structures. Third, we verify our architecture on the Xilinx Alveo U50 FPGA board and measure the on-board end-to-end performance. We achieve a speed-up of up to 51-254x against CPU (6226R) and 1.3-477x against GPU (A6000) (with batch sizes 1 through 1024); we also outperform the SOTA GNN accelerator I-GCN by 1.03x and 1.25x across two datasets. Our implementation code and on-board measurement are publicly available on GitHub.

翻译：图表神经网络(GNNs)最近由于广泛应用量子化学、药物发现和高能物理等与图形有关的问题而受到欢迎。然而,同时满足对新型GNN模型和快速推断的需求是具有挑战性的,因为开发高效加速器与快速创建新的GNN模型之间存在差距。先前艺术的重点是加速GNN的具体类别,例如图变动网络(GCN),但缺乏支持现有或新的GNN模型的通用性能。同时,大多数工作依靠图形预处理来开发数据位置,使其不适合实时应用。然而,为了克服这些局限性,我们提议为GNNNN加速(名为FlookGNNNNN)模型同时建立一个通用数据流结构,这个结构可以灵活地支持大多数通过信息传输GNNNNNC的 GNNC 模型。首先,我们提出一个新的和可扩缩的数据流体化的数据流结构,这个结构可以灵活地支持一系列GNNNNNM模型, 并且可以灵活地使用二进式的G-NNNDA。在同步结构中, 我们的O-ral-deal-deal-deal-deal-deal-deal-de-deal-deal-deal-deal-deal-dealdeal-dealdeal-deal-deal-deal-deal-de-de-de-deal-de-de commode commode commal-commal-commodemental-commode-st-st-commodemental-st-st-commodemental-st-commodemental-st-st-st-st-st-st-st-st-st-st-commode-st-st-commode-st-commode-st-st-st-st-st-st-st-st-commodemental-commodemental-comptional-commodemental-st-s-s-s-s-s-comm-s-s-s-s-s-s-st-s-s-s-s-s-s-s-s-to-s-s-s-s-s-s-s-s-s