Achieving high performance for GPU codes requires developers to have significant knowledge in parallel programming and GPU architectures, and in-depth understanding of the application. This combination makes it challenging to find performance optimizations for GPU-based applications, especially in scientific computing. This paper shows that significant speedups can be achieved on two quite different scientific workloads using the tool, GEVO, to improve performance over human-optimized GPU code. GEVO uses evolutionary computation to find code edits that improve the runtime of a multiple sequence alignment kernel and a SARS-CoV-2 simulation by 28.9% and 29% respectively. Further, when GEVO begins with an early, unoptimized version of the sequence alignment program, it finds an impressive 30 times speedup -- a performance improvement similar to that of the hand-tuned version. This work presents an in-depth analysis of the discovered optimizations, revealing that the primary sources of improvement vary across applications; that most of the optimizations generalize across GPU architectures; and that several of the most important optimizations involve significant code interdependencies. The results showcase the potential of automated program optimization tools to help reduce the optimization burden for scientific computing developers and enhance performance portability for domain-specific accelerators.
翻译:实现 GPU 代码的高性能要求开发者在平行的编程和 GPU 结构中拥有大量知识,并深入理解应用程序。 这种组合使得找到基于 GPU 应用程序的性能优化,特别是在科学计算中,具有挑战性。 本文表明,在使用工具GEVO 改进人类优化的 GPU 代码的性能两个截然不同的科学工作量上,可以实现显著的超速。 GEVO 使用进化计算来找到代码编辑,从而改进多个序列对齐核心和SARS-COV-2模拟的运行时间,分别提高28.9%和29 %。 此外,当 GEVO 以序列调整程序的早期、不优化版本开始时,它发现一个令人印象深刻的30倍的加速度 -- -- 一种类似于手调版的性能改进。 这项工作展示了对所发现的优化的主要改进来源的深入分析,表明各种应用的改进来源各不相同; 多数优化是整个 GPUPI 结构中的总体性; 以及一些最重要的优化涉及重要的代码的相互依赖性。 此外, GEVOVO 展示了自动化程序优化工具的潜在潜力,可以提高具体化程序开发商域的软件和优化能力,帮助改进能力,从而降低了科学优化能力,从而降低了科学优化的压力。