FPGA-based hardware accelerators have received increasing attention mainly due to their ability to accelerate deep pipelined applications, thus resulting in higher computational performance and energy efficiency. Nevertheless, the amount of resources available on even the most powerful FPGA is still not enough to speed up very large modern workloads. To achieve that, FPGAs need to be interconnected in a Multi-FPGA architecture capable of accelerating a single application. However, programming such architecture is a challenging endeavor that still requires additional research. This paper extends the OpenMP task-based computation offloading model to enable a number of FPGAs to work together as a single Multi-FPGA architecture. Experimental results for a set of OpenMP stencil applications running on a Multi-FPGA platform consisting of 6 Xilinx VC709 boards interconnected through fiber-optic links have shown close to linear speedups as the number of FPGAs and IP-cores per FPGA increase.
翻译:以FPGA为基础的硬件加速器日益受到越来越多的关注,这主要是因为它们有能力加速深度管道应用,从而提高计算性能和能源效率;然而,即使是最强大的FPGA上的资源数量仍然不足以加速非常庞大的现代工作量;为此,FPGA系统需要在能够加速单一应用的多FPGA结构中相互连接;然而,这种结构的方案拟订是一项具有挑战性的工作,仍然需要进一步研究;本文件扩展了以OpenMP为基础的任务卸载计算模型,使一些FPGA系统能够作为一个单一的多FPGA结构一起工作;在一个由6个Xilinx VC709个通过光纤连接连接的板组成的多FPGA平台上运行的一套OpenMP快速应用程序的实验结果显示,随着每个FPGA系统增加FGA的FGA和IP核心的数量,这种结构接近线性加速。