Scientific computing is at the core of many High-Performance Computing applications, including computational flow dynamics. Because of the uttermost importance to simulate increasingly larger computational models, hardware acceleration is receiving increased attention due to its potential to maximize the performance of scientific computing. A Field-Programmable Gate Array is a reconfigurable hardware accelerator that is fully customizable in terms of computational resources and memory storage requirements of an application during its lifetime. Therefore, it is an ideal candidate to accelerate scientific computing applications because of the possibility to fully customize the memory hierarchy important in irregular applications such as iterative linear solvers found in scientific libraries. In this paper, we study the potential of using FPGA in HPC because of the rapid advances in reconfigurable hardware, such as the increase in on-chip memory size, increasing number of logic cells, and the integration of High-Bandwidth Memories on board. To perform this study, we first propose a novel ILU0 preconditioner tightly integrated with a BiCGStab solver kernel designed using a mixture of High-Level Synthesis and Register-Transfer Level hand-coded design. Second, we integrate the developed preconditioned iterative solver in Flow from the Open Porous Media (OPM) project, a state-of-the-art open-source reservoir simulator. Finally, we perform a thorough evaluation of the FPGA solver kernel in both standalone mode and integrated into the reservoir simulator that includes all the on-chip URAM and BRAM, on-board High-Bandwidth Memory, and off-chip CPU memory data transfers required in a complex simulator software such as OPM's Flow. We evaluate the performance on the Norne field, a real-world case reservoir model using a grid with more than 10^5 cells and using 3 unknowns per cell.
翻译:科学计算是许多高性能计算应用的核心, 包括计算流动态。 由于对模拟日益扩大的计算模型至关重要, 硬件加速因其在科学计算性能最大化方面的潜力而日益受到关注。 野外可配置门阵列是一个可重新配置的硬件加速器, 在计算资源和存储软件的存储要求方面完全可自定义。 因此, 加速科学计算应用是一个理想的候选方, 因为有可能完全定制在不规则应用中重要的记忆等级, 如科学图书馆的迭代线性线性求解器。 在本文中, 我们研究在高水平合成和可配置硬件方面使用FPGGA的潜力。 例如, 机内存储量的增加, 逻辑细胞的数量增加, 以及机内安装高压电流存储器的整合。 为了进行这项研究, 我们首先提议将新的ILUOU 先决条件与所有不规则的UGStab 解算器内部解算器连接在一起, 使用高水平合成合成和高水平存储器的ODLODLM IM 数据库中, 将OLODLODLOD IMLOD IMD IM IM IMD IMD IMD IMD, IMLOD IMLLOD IMD IMD IMD IMD IMD IMD IMD IMD IMD IMD IMD IMD IMD IMVD IMD IMD IMD, IMD IMD IMD IMD IM IM IMD IMD IMD IMD IMD IMD IMD IMD IMD IMD IMD IMD IMD IMD IMD IMD IMD IMD IMD IM IMD IM IM IM IM IMD IMD IMD IMD IMD IMDLLDLVDLDLDLDLD IMD IMD IMD IMD IMDL IMD IMD IM IM IMD IMDLDLDLDLLD IMD IMDLLLD IMD IMD IMD