从特定域语言到流体动态的内存优化加速器 (From Domain-Specific Languages to Memory-Optimized Accelerators for Fluid Dynamics)

Many applications are increasingly requiring numerical simulations for solving complex problems. Most of these numerical algorithms are massively parallel and often implemented on parallel high-performance computers. However, classic CPU-based platforms suffers due to the demand for higher resolutions and the exponential growth of data. FPGAs offer a powerful and flexible alternative that can host accelerators to complement such platforms. Developing such application-specific accelerators is still challenging because it is hard to provide efficient code for hardware synthesis. In this paper, we study the challenges of porting a numerical simulation kernel onto FPGA. We propose an automated tool flow from a domain-specific language (DSL) to generate accelerators for computational fluid dynamics on FPGA. Our DSL-based flow simplifies the exploration of parameters and constraints such as on-chip memory usage. We also propose a decoupled optimization of memory and logic resources, which allows us to better use the limited FPGA resources. In our preliminary evaluation, this enabled doubling the number of parallel kernels, increasing the accelerator speedup versus ARM execution from 7 to 12 times.

翻译：许多应用都日益需要数字模拟来解决复杂问题。这些数字算法大多是大规模平行的,而且往往在平行的高性能计算机上实施。然而,传统的CPU平台由于对高分辨率的需求和数据指数增长而受到影响。 FPGAs提供了一种强大而灵活的替代方法,可以容纳加速器来补充这些平台。开发这种具体应用的加速器仍然具有挑战性,因为很难提供有效的硬件合成代码。在本文中,我们研究了将数字模拟内核移植到FPGA上的挑战。我们建议从一个特定域语言(DSL)中自动输入一个工具,以生成计算液动态的加速器。我们基于DSL的流量简化了对参数和限制的探索,例如芯内存使用等。我们还提议对记忆和逻辑资源进行解析优化,从而使我们能够更好地利用有限的FPGA资源。在初步评估中,使平行内核的数量翻了一番,从而将加速加速器的速度从7次增加到12次。

相关内容

FPGA

关注 18

FPGA：ACM/SIGDA International Symposium on Field-Programmable Gate Arrays。 Explanation：ACM/SIGDA现场可编程门阵列国际研讨会。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/fpga/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

专知会员服务

21+阅读 · 2020年4月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日