加速制作具有堡堡标准平行性的太阳能MHD代码:从开放ACC到“同时执行”</s> (Acceleration of a production Solar MHD code with Fortran standard parallelism: From OpenACC to `do concurrent')

There is growing interest in using standard language constructs for accelerated computing, avoiding the need for (often vendor-specific) external APIs. These constructs hold the potential to be more portable and much more `future-proof'. For Fortran codes, the current focus is on the {\tt do concurrent} (DC) loop. While there have been some successful examples of GPU-acceleration using DC for benchmark and/or small codes, its widespread adoption will require demonstrations of its use in full-size applications. Here, we look at the current capabilities and performance of using DC in a production application called Magnetohydrodynamic Algorithm outside a Sphere (MAS). MAS is a state-of-the-art model for studying coronal and heliospheric dynamics, is over 70,000 lines long, and has previously been ported to GPUs using MPI+OpenACC. We attempt to eliminate as many of its OpenACC directives as possible in favor of DC. We show that using the NVIDIA {\tt nvfortran} compiler's Fortran 202X preview implementation, unified managed memory, and modified MPI launch methods, we can achieve GPU acceleration across multiple GPUs without using a single OpenACC directive. However, doing so results in a slowdown between 1.25x and 3x. We discuss what future improvements are needed to avoid this loss, and show how we can still retain close

翻译：使用标准语言构建加速计算, 避免( 通常是供货商专用的) 外部 API 的必要性, 人们越来越有兴趣使用标准语言构建加速计算, 从而避免需要( 通常是供货商专用的) 外部 API 。这些构建具有更便捷和更多“ 未来防” 的潜力。对于 Fortran 代码, 当前的焦点是 vorran 环绕。虽然在基准和( 或) 小代码方面已经有一些使用 DC 的 GPU- 加速成功例子, 但是, 要广泛采用 GPU, 就需要在全方位应用中演示。我们在这里看到, 在名为 Magnetto Hyalivil Algorithm (MAS) 的生产应用程序中, 使用 magetran 202X Algorithm 的当前能力与艺术模型来研究coronal 和日光层动态。目前已有超过 70, 超过 70, 70, 并且已经通过 MPI 的管理下快速和 GPI 快速的 GPI 演示方法, 我们如何在 GMPI 中进行快速的运行中和中进行快速的快速。</s>