The upcoming exascale computing systems Frontier and Aurora will draw much of their computing power from GPU accelerators. The hardware for these systems will be provided by AMD and Intel, respectively, each supporting their own GPU programming model. The challenge for applications that harness one of these exascale systems will be to avoid lock-in and to preserve performance portability. We report here on our results of using Kokkos to accelerate a real-world application on NERSC's Perlmutter Phase 1 (using NVIDIA A100 accelerators) and the testbed system for OLCF's Frontier (using AMD MI250X). By porting to Kokkos, we were able to successfully run the same X-ray tracing code on both systems and achieved speed-ups between 13% and 66% compared to the original CUDA code. These results are a highly encouraging demonstration of using Kokkos to accelerate production science code.
翻译:即将推出的缩略计算系统 Frontier 和 Aurora 将会从 GPU 加速器中抽取其大部分计算能力。 这些系统的硬件将分别由AMD 和 Intel 提供, 各自支持自己的 GPU 编程模型。 使用这些缩略图的应用程序所面临的挑战将是避免锁定和保持性能可移动性。 我们在此报告我们利用Kokkos 加速NERSC Perlmutter 第一阶段( 使用 NVIDIA A100 加速器) 和 OLCF 前沿测试系统( 使用 AMD MI250X ) 的实时应用的结果。 通过移植到 Kokkos, 我们成功运行了两个系统的相同的X射线追踪代码,并实现了13至66%的超速率, 与原 CUDA 代码相比。 这些结果非常令人鼓舞地展示了使用 Kokos 加速生产科学代码。