将一个稀少的线性代数数学库移植到 Intel GPUs (Porting a sparse linear algebra math library to Intel GPUs)

With the announcement that the Aurora Supercomputer will be composed of general purpose Intel CPUs complemented by discrete high performance Intel GPUs, and the deployment of the oneAPI ecosystem, Intel has committed to enter the arena of discrete high performance GPUs. A central requirement for the scientific computing community is the availability of production-ready software stacks and a glimpse of the performance they can expect to see on Intel high performance GPUs. In this paper, we present the first platform-portable open source math library supporting Intel GPUs via the DPC++ programming environment. We also benchmark some of the developed sparse linear algebra functionality on different Intel GPUs to assess the efficiency of the DPC++ programming ecosystem to translate raw performance into application performance. Aside from quantifying the efficiency within the hardware-specific roofline model, we also compare against routines providing the same functionality that ship with Intel's oneMKL vendor library.

翻译：由于宣布Aurora超级计算机将由通用 Intel CPU组成,辅之以离散高性能 Intel GPUs, 并部署一个单一API生态系统,Intel承诺进入离散高性能 GPUs的舞台,科学计算界的一项核心要求是提供可用于生产的软件堆和他们预期在Intel 高性能 GPUs上看到的性能。在本文中,我们介绍了第一个平台-便携式开放源数学图书馆,通过DPC++编程环境支持 Intel GPUs。我们还将开发的稀薄线性代数功能用于不同 Intel GPUs,以评估DPC++编程生态系统的效率,将原始性能转化为应用性能。除了在硬件专用屋顶模型中量化效率外,我们还比较了与Intel 的 1MKL 供应商图书馆提供相同功能的例行程序。