We present our experience with the modernization on the GR-MHD code BHAC, aimed at improving its novel hybrid (MPI+OpenMP) parallelization scheme. In doing so, we showcase the use of performance profiling tools usable on x86 (Intel-based) architectures. Our performance characterization and threading analysis provided guidance in improving the concurrency and thus the efficiency of the OpenMP parallel regions. We assess scaling and communication patterns in order to identify and alleviate MPI bottlenecks, with both runtime switches and precise code interventions. The performance of optimized version of BHAC improved by $\sim28\%$, making it viable for scaling on several hundreds of supercomputer nodes. We finally test whether porting such optimizations to different hardware is likewise beneficial on the new architecture by running on ARM A64FX vector nodes.
翻译:我们介绍了我们关于GR-MHD代码BHAC现代化的经验,其目的是改进其新型混合(MPI+OpenMP)平行计划。在这样做的过程中,我们展示了在x86(基于 Intel)结构中可用的性能特征分析工具的使用情况。我们的性能特征和线性分析为改进同值货币从而提高OpenMP平行区域的效率提供了指导。我们评估了规模和通信模式,以便查明和缓解MPI瓶颈,包括运行时间开关和精确的代码干预。BHAC的优化版本的性能提高了$\sim28 ⁇ $,使之可以推广到数百个超级计算机节点。我们最后通过运行ARM A64FX矢量节点来测试将这种优化移植到不同的硬件是否同样有益于新架构。