We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers $Re_\tau=360$ and $Re_\tau=550$, based on friction velocity and pipe radius. The strong scaling is tested on several GPU-enabled HPC systems, including the Swiss Piz Daint system, TACC's Longhorn, J\"ulich's JUWELS Booster, and Berzelius in Sweden. The performance results show that speed-up between 3-5 can be achieved using the GPU accelerated version compared with the CPU version on these different systems. The run-time for 20 timesteps reduces from 43.5 to 13.2 seconds with increasing the number of GPUs from 64 to 512 for $Re_\tau=550$ case on JUWELS Booster system. This illustrates the GPU accelerated version the potential for high throughput. At the same time, the strong scaling limit is significantly larger for GPUs, at about $2000-5000$ elements per rank; compared to about $50-100$ for a CPU-rank.
翻译:我们根据摩擦速度和管道半径,对开放ACC加速实施高顺序光谱元素流体动态求解器Nek5000的强大平行缩放提出了新的结果。所考虑的测试案例包括直接数字模拟一个直管中完全开发的动荡流,在两个不同的Reynolds数字(Re ⁇ tau=360美元和$Re ⁇ tau=550美元)下,根据摩擦速度和管道半径,两个不同的Reynolds数字为Re ⁇ tau=360美元和$Re ⁇ tau=550美元。强大的缩放在几个GPU驱动的HPC系统中进行测试,包括瑞士Piz Daint系统、Tacc's Longhorn、J\“ulich's JUWELS Boster和瑞典的Berzelius。绩效结果表明,3-5之间的加速流可以通过GPU加速版本实现。20个时段的运行时间从43.5秒减少到13.2秒,将GPU从64美元增加到512美元,JUWELS Booster系统中的550美元案例。这显示了GPU加速版的高速提值潜力。相比之下,每500美元的大幅缩定值为500美元。