The impending termination of Moore's law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand. In this work, we design a custom FPGA-based accelerator for a computational fluid dynamics (CFD) code. Unlike prior work -- which often focuses on accelerating small kernels -- we target the entire Poisson solver on unstructured meshes based on the high-fidelity spectral element method (SEM) used in modern state-of-the-art CFD systems. We model our accelerator using an analytical performance model based on the I/O cost of the algorithm. We empirically evaluate our accelerator on a state-of-the-art Intel Stratix 10 FPGA in terms of performance and power consumption and contrast it against existing solutions on general-purpose processors (CPUs). Finally, we propose a data movement-reducing technique where we compute geometric factors on the fly, which yields significant (700+ Gflop/s) single-precision performance and an upwards of 2x reduction in runtime for the local evaluation of the Laplace operator. We end the paper by discussing the challenges and opportunities of using reconfigurable architecture in the future, particularly in the light of emerging (not yet available) technologies.
翻译:摩尔法律即将终止,这促使人们寻找新的计算形式,以继续我们逐渐习惯的绩效缩放。在很多新兴的摩尔后计算候选人中,也许没有像外地可配置门阵列(FPGA)那样突出的了。FPGA提供了硬件专门化和定制用于手边计算的手段。在这项工作中,我们设计了一个基于FPGA的自定义计算流体动态(CFD)代码加速器。与以前的工作不同,以前的工作往往侧重于加快小内核,我们把整个 Pooisson 软件放在基于现代状态的CFD系统中所使用的高纤维光谱元素(SEM)法(SEM)的非结构型网格中。我们用一个基于I/O算成本的分析性能模型模拟我们的加速器。我们用一个基于本地工艺状态的加速器加速器加速器(Intel Stratix 10) 的功能和能量消耗量,并将整个Poisson软件与基于高纤维光谱处理器的现有解决方案(我们用GFDFD ) 的快速操作器动作,最后我们用一个巨大的GLA(7-CIL) 和SLA 递减压的精确操作技术来提出一个数据。