The discontinuous Galerkin (DG) algorithm is a representative high order method in Computational Fluid Dynamics (CFD) area which possesses considerable mathematical advantages such as high resolution, low dissipation, and dispersion. However, DG is rather computationally intensive to demonstrate practical engineering problems. This paper discusses the implementation of our in-house practical DG application in three different programming models, as well as some optimization techniques, including grid renumbering and mixed precision to maximize the performance improvements in a single node system. The experiment on CPU and GPU shows that our CUDA, OpenACC, and OpenMP-based code obtains a maximum speedup of 42.9x, 35.3x, and 8.1x compared with serial execution by the original application, respectively. Besides, we systematically compare the programming models in two aspects: performance and productivity. Our empirical conclusions facilitate the programmers to select the right platform with a suitable programming model according to their target applications.
翻译:Galerkin(DG)算法是计算流体动态(CFD)领域具有相当大的数学优势,如高分辨率、低散射率和分散性。然而,DG在计算上相当密集,以证明实际工程问题。本文讨论了我们内部实际DG应用在三个不同的编程模型中的实施情况,以及一些优化技术,包括电网重编和混合精度,以在单一节点系统中最大限度地提高性能。关于CPU和GPU的实验表明,我们的CUDA、OpenACC和OpenMP代码获得的最大速度分别为42.9x、35.3x和8.1x,而原始应用程序的序列执行则与之相比。此外,我们系统地比较了两个方面的编程模型:绩效和生产率。我们的经验结论有助于程序员选择合适的编程模型,根据它们的目标应用程序选择合适的编程模型。