Power is increasingly becoming a limiting resource in high-performance, GPU-accelerated computing systems. Understanding the range and sources of power variation is essential in setting realistic bounds on rack and system peak power, and developing techniques that minimize energy. While variations arising during manufacturing and other factors like algorithm among others have been previously studied, this work shows that the program inputs can also severely impact the power consumed not only on the GPU but also CPUs. Power variations of up to 67% were observed on an NVIDIA Ampere A100 GPU for the same algorithm (DGEMM benchmark) and input size with different matrix values. Our investigation shows that the values used as matrix elements, their position, and their uniqueness strongly influence power consumption. The implications of this result on supercomputer performance and energy efficiency are further discussed.
翻译:电能日益成为高性能、GPU加速计算系统中的有限资源。了解电力变异的范围和来源对于确定架子和系统峰值电源的现实界限以及开发能耗最小化的技术至关重要。虽然以前曾研究过制造过程中产生的变异以及算法等其他因素,但这项工作表明,程序投入也会严重影响不仅对GPU而且对CPU所消耗的电能。在相同算法(DGEMM基准)和具有不同矩阵值的投入大小的NVIDIDAA Ampere A100 GPU上观测到高达67%的变异。我们的调查显示,用作矩阵要素的价值、其位置及其独特性对电力消耗的影响极大,并将进一步讨论这一结果对超级计算机性能和能源效率的影响。