CUDA is one of the most popular choices for GPU programming, but it can only be executed on NVIDIA GPUs. Executing CUDA on non-NVIDIA devices not only benefits the hardware community, but also allows data-parallel computation in heterogeneous systems. To make CUDA programs portable, some researchers have proposed using source-to-source translators to translate CUDA to portable programming languages that can be executed on non-NVIDIA devices. However, most CUDA translators require additional manual modifications on the translated code, which imposes a heavy workload on developers. In this paper, CuPBoP is proposed to execute CUDA on non-NVIDIA devices without relying on any portable programming languages. Compared with existing work that executes CUDA on non-NVIDIA devices, CuPBoP does not require manual modification of the CUDA source code, but it still achieves the highest coverage (69.6%), much higher than existing frameworks (56.6%) on the Rodinia benchmark. In particular, for CPU backends, CuPBoP supports several ISAs (e.g., X86, RISC-V, AArch64) and has close or even higher performance compared with other projects. We also compare and analyze the performance among CuPBoP, manually optimized OpenMP/MPI programs, and CUDA programs on the latest Ampere architecture GPU, and show future directions for supporting CUDA programs on non-NVIDIA devices with high performance
翻译:CUDA 是 GPU 最受欢迎的选择之一, 但只能在 NVIDIA GPU 上执行。 执行 CUDA 在 NVIDIA 非 NVIDIA 设备上执行 CUDA 不仅有利于硬件群体, 而且还允许在不同系统中进行数据平行计算 。 为了使 CUDA 程序具有可移植性, 一些研究人员提议使用源到源翻译将 CUDA 翻译成非 NVIDIA 设备上可执行的便携式编程语言 。 然而, 大多数 CUDA 笔译员需要对翻译的代码进行额外的手工修改, 这会给开发者带来沉重的工作量 。 在本文中, COPBOP 提议在非 NVIA 设备上执行 CUDA 设备上执行 CUDA, 与现有的非 NVDIA 设备上执行CUDA 程序相比, COPBOP 并不需要手工修改 CUDA 源码, 但是在 Rodinia 标上达到最高范围( 69.6%) 大大高于现有框架( 56.6 % ),, 。 特别是, CUPPP 支持 支持一些 SA- 和 CISA 高级项目 和 CISA 。