争取为核心塔纳制定基准套套</s> (Towards a Benchmarking Suite for Kernel Tuners)

As computing system become more complex, it is becoming harder for programmers to keep their codes optimized as the hardware gets updated. Autotuners try to alleviate this by hiding as many architecture-based optimization details as possible from the user, so that the code can be used efficiently across different generations of systems. In this article we introduce a new benchmark suite for evaluating the performance of optimization algorithms used by modern autotuners targeting GPUs. The suite contains tunable GPU kernels that are representative of real-world applications, allowing for comparisons between optimization algorithms and the examination of code optimization, search space difficulty, and performance portability. Our framework facilitates easy integration of new autotuners and benchmarks by defining a shared problem interface. Our benchmark suite is evaluated based on five characteristics: convergence rate, local minima centrality, optimal speedup, Permutation Feature Importance (PFI), and performance portability. The results show that optimization parameters greatly impact performance and the need for global optimization. The importance of each parameter is consistent across GPU architectures, however, the specific values need to be optimized for each architecture. Our portability study highlights the crucial importance of autotuning each application for a specific target architecture. The results reveal that simply transferring the optimal configuration from one architecture to another can result in a performance ranging from 58.5% to 99.9% of the optimal performance, depending on the GPU architecture. This highlights the importance of autotuning in modern computing systems and the value of our benchmark suite in facilitating the study of optimization algorithms and their effectiveness in achieving optimal performance for specific target architectures.

翻译：随着计算机系统变得更加复杂,程序员越来越难在硬件更新后保持其代码的优化。自动用户试图通过向用户尽可能隐藏尽可能多的基于架构的优化细节来缓解这一点, 以便该代码能够在不同世代的系统中得到高效使用。在本篇文章中, 我们引入了一个新的基准套件, 用于评价现代自动用户针对 GPU 的优化算法的性能。该套件包含代表真实世界应用程序的、能够比较优化算法和对代码优化、搜索空间难度和性能可移动性进行审查的可操作性。我们的框架通过定义一个共同的问题界面来方便新的基于架构和基准的整合。我们的基准套件基于五种特性进行评估: 趋同率、本地迷你中心、最佳速度提升、易变性能(PFI) 和性能可移植性能。结果表明, 优化参数对业绩和全球优化需要产生极大影响。然而, 每种参数的重要性是整个 GPUI 结构的优化算法结构中, 具体值需要优化每个结构的值需要优化。我们的可移动性研究显示每个结构中一个最佳性能结构中的最佳性结构。从一个自动结构中, 将一个自动结构的自动结构的自动结构转换结果显示另一个结果。</s>