Plenty of research efforts have been devoted to FPGA-based acceleration, due to its low latency and high energy efficiency. However, using the original low-level hardware description languages like Verilog to program FPGAs requires generally good knowledge of hardware design details and hand-on experiences. Fortunately, the FPGA community intends to address this low programmability issues. For example, , with the intention that programming FPGAs is just as easy as programming GPUs. Even though Vitis is proven to increase programmability, we cannot directly obtain high performance without careful design regarding hardware pipeline and memory subsystem.In this paper, we focus on the memory subsystem, comprehensively and systematically benchmarking the effect of optimization methods on memory performance. Upon benchmarking, we quantitatively analyze the typical memory access patterns for a broad range of applications, including AI, HPC, and database. Further, we also provide the corresponding optimization direction for each memory access pattern so as to improve overall performance.
翻译:大量研究工作都致力于基于FPGA的加速,原因是其潜伏性低,能效高。然而,使用原低水平硬件描述语言,如Verilog对FPGAs的编程,一般需要很好地了解硬件设计细节和亲身经验。幸运的是,FPGA社区打算解决这种低编程能力问题。例如,其意图是,编程FPGAs与编程GPUs一样容易。即使Vitis证明可以提高编程能力,但如果不仔细设计硬件管道和记忆子系统,我们就无法直接取得高性能。在本文件中,我们侧重于记忆子系统,全面系统地衡量优化方法对记忆性能的影响。在基准时,我们从数量上分析包括AI、HPC和数据库在内的广泛应用的典型记忆存取模式。此外,我们还为每个记忆存取模式提供相应的优化方向,以改进总体性能。