High-performance computing (HPC) is a major driver accelerating scientific research and discovery, from quantum simulations to medical therapeutics. The growing number of new HPC systems coming online are being furnished with various hardware components, engineered by competing industry entities, each having their own architectures and platforms to be supported. While the increasing availability of these resources is in many cases pivotal to successful science, even the largest collaborations lack the computational expertise required for maximal exploitation of current hardware capabilities. The need to maintain multiple platform-specific codebases further complicates matters, potentially adding a constraint on the number of machines that can be utilized. Fortunately, numerous programming models are under development that aim to facilitate software solutions for heterogeneous computing. In this paper, we leverage the SYCL programming model to demonstrate cross-platform performance portability across heterogeneous resources. We detail our NVIDIA and AMD random number generator extensions to the oneMKL open-source interfaces library. Performance portability is measured relative to platform-specific baseline applications executed on four major hardware platforms using two different compilers supporting SYCL. The utility of our extensions are exemplified in a real-world setting via a high-energy physics simulation application. We show the performance of implementations that capitalize on SYCL interoperability are at par with native implementations, attesting to the cross-platform performance portability of a SYCL-based approach to scientific codes.
翻译:高性能计算(HPC)是加速科学研究和发现的一个主要驱动因素,从量子模拟到医疗治疗,越来越多的新的HPC在线系统正在配备各种硬件组件,由相互竞争的行业实体设计,每个都有自己的架构和平台支持。虽然这些资源在很多情况下对成功的科学至关重要,但即使是最大的协作机构也缺乏最佳利用现有硬件能力所需的计算专门知识。维护多平台特定代码库的必要性使问题更加复杂,可能给可使用的机器的数量增加限制。幸运的是,许多旨在为混合计算提供软件解决方案的HPC新系统正在开发之中。在本文件中,我们利用SYCL编程模型来展示跨多种资源的跨平台性能可移植性。我们详细介绍了我们的NVIDIA和AMD随机数发电机的扩展,以最大程度利用当前硬件接口图书馆进行最大程度的开发。在四个主要硬件平台上使用支持SYCLO的两种不同的编译程序,对平台上平台的基线应用程度可能增加限制。我们扩展的效用体现在一个现实世界的软件,通过高CLS-CS-CS-S-S-CLS-SDS-CS-SDSDSDS-SDS-SDS-S-S-SDSDS-S-SDSDSDSDSDSDSDS-SDSDSDSDSBSDSDS-S-S-S-S-S-S-SDSDSBS-S-SDSDSDS-SDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDS AS AS AS AS AS AS AS 上,在实际性能操作性能性能测试在现实世界上展示一个执行性能性能操作性能标准的可操作性能标准应用软件的可操作性能标准上展示在现实-S-SBS-S-S-S-S-S-S-S-S-SBS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S