在Fortran、C++、Python和Julia的 GPU 加速 q- LSKUM 以网状网状解决方案的性能上 (On the performance of GPU accelerated q-LSKUM based meshfree solvers in Fortran, C++, Python, and Julia)

This report presents a comprehensive analysis of the performance of GPU accelerated meshfree CFD solvers for two-dimensional compressible flows in Fortran, C++, Python, and Julia. The programming model CUDA is used to develop the GPU codes. The meshfree solver is based on the least squares kinetic upwind method with entropy variables (q-LSKUM). To assess the computational efficiency of the GPU solvers and to compare their relative performance, benchmark calculations are performed on seven levels of point distribution. To analyse the difference in their run-times, the computationally intensive kernel is profiled. Various performance metrics are investigated from the profiled data to determine the cause of observed variation in run-times. To address some of the performance related issues, various optimisation strategies are employed. The optimised GPU codes are compared with the naive codes, and conclusions are drawn from their performance.

翻译：本报告全面分析了在Fortran、C++、Python和Julia等地用于二维压缩流的 GPU加速网格无CFD 解答器的性能。 CUDA 编程模型用于开发 GPU 代码。网格解答器基于最小正方形动能上风法, 带有 entropy 变量( q- LSKUM) 。为了评估 GPU 解答器的计算效率, 并比较其相对性能, 在七个点分布水平上进行了基准计算。为了分析运行时间的差异, 将计算密集的内核作了剖析。从剖析数据中调查了各种性能指标, 以确定运行时间观察到的差异的原因。为了解决一些与性能有关的问题, 采用了各种优化战略。优化的 GPU码与天性代码进行了比较, 并从它们的性能中得出了结论。