As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with multiple FPGAs due to the necessity of hardware expertise and substantial development effort. To this end, we propose HitGNN, a framework that enables users to effortlessly map GNN training workloads onto a CPU-Multi-FPGA platform for acceleration. In particular, HitGNN takes the user-defined synchronous GNN training algorithm, GNN model, and platform metadata as input, determines the design parameters based on the platform metadata, and performs hardware mapping onto the CPU+Multi-FPGA platform, automatically. HitGNN consists of the following building blocks: (1) high-level application programming interfaces (APIs) that allow users to specify various synchronous GNN training algorithms and GNN models with only a handful of lines of code; (2) a software generator that generates a host program that performs mini-batch sampling, manages CPU-FPGA communication, and handles workload balancing among the FPGAs; (3) an accelerator generator that generates GNN kernels with optimized datapath and memory organization. We show that existing synchronous GNN training algorithms such as DistDGL and PaGraph can be easily deployed on a CPU+Multi-FPGA platform using our framework, while achieving high training throughput. Compared with the state-of-the-art frameworks that accelerate synchronous GNN training on a multi-GPU platform, HitGNN achieves up to 27.21x bandwidth efficiency, and up to 4.26x speedup using much less compute power and memory bandwidth than GPUs. In addition, HitGNN demonstrates good scalability to 16 FPGAs on a CPU+Multi-FPGA platform.
翻译:随着真实世界图形的大小增加,培训图形神经网络(GNN)已变得耗时且需要加速。虽然以前的工作表明利用FPGA加速GNN培训的潜力,但由于硬件专门知识和大量开发努力的必要性,没有开展多少工作来加速多FPGGA培训。为此,我们提议HIDGNN,这是一个使用户能够不遗余力地将GNN培训工作量映射到 CPU-Multi-FDGA 平台上,以方便加速。特别是,HINNE采用用户定义的同步GNN培训算法、GNNNNNM模型和平台元数据元数据,确定基于平台元数据的设计参数,并在CPU+Multi-FGGGGGGGA平台上进行硬件测绘。HTGGNNNNNNNNNN,由以下的建筑块组成:(1)高级应用程序界面,使用户能够将各种同步的GNNNF培训算法和GNNNNG的模型在几条代码上指定一个良好的软件发电机;(2)软件发电机发电机发电机发电机发电机发电机发电机发电机发电机生成,在运行中,同时进行MGGPODGGGPOPOPODGSODRDMM节节节的内调调调调的内调调调调调高的内,同时处理一个硬数据。</s>