While FPGA accelerator boards and their respective high-level design tools are maturing, there is still a lack of multi-FPGA applications, libraries, and not least, benchmarks and reference implementations towards sustained HPC usage of these devices. As in the early days of GPUs in HPC, for workloads that can reasonably be decoupled into loosely coupled working sets, multi-accelerator support can be achieved by using standard communication interfaces like MPI on the host side. However, for performance and productivity, some applications can profit from a tighter coupling of the accelerators. FPGAs offer unique opportunities here when extending the dataflow characteristics to their communication ininterfaces. In this work, we extend the HPCC FPGA benchmark suite by multi-FPGA support and three missing benchmarks that particularly characterize or stress inter-device communication: b_eff, PTRANS, and LINPACK. With all benchmarks implemented for current boards with Intel and Xilinx FPGAs, we established a baseline for multi-FPGA performance. Additionally, for the communication-centric benchmarks, we explored the potential of direct FPGA-to-FPGA communication with a circuit-switched inter-FPGA network that is currently only available for one of the boards. The evaluation with parallel execution on up to 26 FPGA boards makes use of one of the largest academic FPGA installations.
翻译:虽然FPGA加速器板及其各自的高层次设计工具正在成熟,但仍缺乏多种FPGA应用程序、图书馆和至少缺乏用于持续使用HPC这些装置的基准和参考执行。正如在HPC的GPU的最初几天,对于可以合理分离成松散组合的工作组合体的工作量,可以使用主机方的MPI等标准通信界面实现多级加速器支持。然而,为了业绩和生产率,一些应用程序可以得益于更紧密地连接加速器。FPGA在这里提供独特的机会,将数据流特性扩展至其内部通信。在这项工作中,我们通过多PPGAGA支持扩大HPC FPGA基准套件基准套件,以及三个缺失的基准,这些基准特别具有特点或压力的跨层次通信:b_eff、PTRANS和LNPACK。在目前董事会与Intel和Xillinx FPGAGA的所有基准下,我们为多级FA业绩设定了基准。此外,我们为目前与GAFA的一家核心通信委员会之间的直接评估,我们探索了与FA的一条主机对FA的中央网络进行直接评估的可能性。