This paper explores strategies to transform an existing CPU-based high-performance computational fluid dynamics solver, HyPar, for compressible flow simulations on emerging exascale heterogeneous (CPU+GPU) computing platforms. The scientific motivation for developing a GPU-enhanced version of HyPar is to simulate canonical turbulent flows at the highest resolution possible on such platforms. We show that optimizing memory operations and thread blocks results in 200x speedup of computationally intensive kernels compared with a CPU core. Using multiple GPUs and CUDA-aware MPI communication, we demonstrate both strong and weak scaling of our GPU-based HyPar implementation on the NVIDIA Volta V100 GPUs. We simulate the decay of homogeneous isotropic turbulence in a triply periodic box on grids with up to $1024^3$ points (5.3 billion degrees of freedom) and on up to 1,024 GPUs. We compare the wall times for CPU-only and CPU+GPU simulations. The results presented in the paper are obtained on the Summit and Lassen supercomputers at Oak Ridge and Lawrence Livermore National Laboratories, respectively.
翻译:本文探索了改造现有基于CPU的高性能计算流动态求解器HyPar的策略, 用于对新兴的显性多元性(CPU+GPU)计算平台进行压缩流模拟。 开发 GPU 增强版 HyPar 的科学动机是模拟这种平台上尽可能高分辨率的罐状动荡流。 我们显示, 最优化的内存操作和线条块可以使计算密集内核与CPU核心相比加速200x加速。 我们使用多个 GPUs 和 CUDA-aware MPI 通信, 我们在 NVVIDIA Volta V100 GPU 上展示了基于 GPU的 HyPar 执行的强大和薄弱规模。 我们模拟了同质性异质性波动在高达 1024 3 点( 510亿 度自由度) 的电网格上和1 024 个GPUPU。 我们比较了C- 和 CPU+GPU MPI 的壁模模拟的壁记时间段。 和LA ALLA 的LA 和LARCLA 的LA和LA和LA ALSLA 。