Graphic Processing Units (GPUs) have become ubiquitous in scientific computing. However, writing efficient GPU kernels can be challenging due to the need for careful code tuning. To automatically explore the kernel optimization space, several auto-tuning tools - like Kernel Tuner - have been proposed. Unfortunately, these existing auto-tuning tools often do not concern themselves with integration of tuning results back into applications, which puts a significant implementation and maintenance burden on application developers. In this work, we present Kernel Launcher: an easy-to-use C++ library that simplifies the creation of highly-tuned CUDA applications. With Kernel Launcher, programmers can capture kernel launches, tune the captured kernels for different setups, and integrate the tuning results back into applications using runtime compilation. To showcase the applicability of Kernel Launcher, we consider a real-world computational fluid dynamics code and tune its kernels for different GPUs, input domains, and precisions.
翻译:图形处理单元(GPU)已成为科学计算中的普遍存在。不过,由于需要进行精心的代码调整,因此编写高效的GPU内核可能具有挑战性。为了自动探索内核优化空间,已经提出了几种自动调整工具,如Kernel Tuner。不幸的是,这些现有的自动调整工具通常并不关心将调整结果集成回应用程序中,这对应用程序开发人员带来了重大的实现和维护负担。在这项工作中,我们提供一种易于使用的C ++库 - Kernel Launcher,它简化了创建高度调整的CUDA应用程序的过程。使用Kernel Launcher,程序员可以捕获内核启动、为不同的设置调整捕获的内核,并使用运行时编译将调整结果集成回应用程序中。为了展示Kernel Launcher的适用性,我们考虑了一个实际的计算流体动力学代码,并将其内核调整为不同的GPU、输入域和精度。