By providing highly efficient one-sided communication with globally shared memory space, Partitioned Global Address Space (PGAS) has become one of the most promising parallel computing models in high-performance computing (HPC). Meanwhile, FPGA is getting attention as an alternative compute platform for HPC systems with the benefit of custom computing and design flexibility. However, the exploration of PGAS has not been conducted on FPGAs, unlike the traditional message passing interface. This paper proposes FSHMEM, a software/hardware framework that enables the PGAS programming model on FPGAs. We implement the core functions of GASNet specification on FPGA for native PGAS integration in hardware, while its programming interface is designed to be highly compatible with legacy software. Our experiments show that FSHMEM achieves the peak bandwidth of 3813 MB/s, which is more than 95% of the theoretical maximum, outperforming the prior works by 9.5$\times$. It records 0.35$us$ and 0.59$us$ latency for remote write and read operations, respectively. Finally, we conduct a case study on the two Intel D5005 FPGA nodes integrating Intel's deep learning accelerator. The two-node system programmed by FSHMEM achieves 1.94$\times$ and 1.98$\times$ speedup for matrix multiplication and convolution operation, respectively, showing its scalability potential for HPC infrastructure.
翻译:通过向全球共享的记忆空间提供高效的单方通信,分区全球地址空间(PGAS)已成为高性能计算中最有希望的平行计算模型之一。与此同时,FPGA作为HPC系统的替代计算平台正在得到关注,这有利于定制计算和设计灵活性。然而,PGAS的探索尚未在FPGAs上进行,这与传统的传递信息界面不同。本文提议FSHMEM是一个软件/硬件框架,使PGAS FPGAs的编程模型得以进行。我们实施了GASNet关于FPGAS本地集成硬件的GSA规格的核心职能,而其编程界面的设计设计与遗留软件高度兼容。我们的实验显示,FSHMEM达到了3813 MB/s的峰值带宽度,超过理论最大值的95%,比以往工作高出9.5美元。它记录了0.35美元和0.59美元用于远程书写和阅读操作的硬度。最后,我们对FGAGA的两次InGA-98 Indel-developmental 分别显示DFMGA的1-deallistrational 和2PSEMGAS的深度运行。