With the continuous improvement of on-chip integrated voltage regulators (IVRs) and fast, adaptive frequency control, dynamic voltage-frequency scaling (DVFS) transition times have shrunk from the microsecond to the nanosecond regime, providing additional opportunities to improve energy efficiency. The key to unlocking the continued improvement in voltage-frequency circuit technology is the creation of new, smarter DVFS mechanisms that better adapt to rapid fluctuations in workload demand. It is particularly important to optimize fine-grain DVFS mechanisms for graphics processing units (GPUs) as the chips become ever more important workhorses in the datacenter. However, massive amount of thread-level parallelism in GPUs makes it uniquely difficult to determine the optimal voltage-frequency state at run-time. Existing solutions-mostly designed for single-threaded CPUs and longer time scales-fail to consider the seemingly chaotic, highly varying nature of GPU workloads at short time scales. This paper proposes a novel prediction mechanism, PCSTALL, that is tailored for emerging DVFS capabilities in GPUs and achieves near-optimal energy efficiency. Using the insights from our fine-grained workload analysis, we propose a wavefront-level program counter (PC) based DVFS mechanism that improves program behavior prediction accuracy by 32% on average for a wide set of GPU applications at 1 microsecond DVFS time epochs. Compared to the current state-of-art, our PC-based technique achieves 19% average improvement when optimized for Energy-Delay-Squared Product at 50 microsecond time epochs, reaching 32% power efficiencies when operated with 1 microsecond DVFS technologies.
翻译:随着芯片综合电压调节器(IVRs)的不断改进以及快速、适应频率控制、动态电压频率缩放(DVFS)过渡时间从微秒缩缩到纳米秒制度,为提高能效提供了更多的机会。释放电压频率电路技术持续改进的关键是创建新的、更聪明的DVFS机制,更好地适应工作量需求的快速波动。随着芯片在数据中心变得日益重要,优化图形处理器(GPUs)的微缩成份DVFS机制尤为重要。然而,GPUs的大量线级平行运行使得在运行时很难确定最佳电压频率状态。目前为单读电流频电路电路电路技术设计的解决方案,以及更长的时间框架,以考虑短期内GPUS工作量看似混乱、高度差异很大的性质。本文提出一个新的预测机制,即PCSTAL,在GVFS当前精密时间定位系统(GVPS)下,在运行一个近五年级水平的GVPFSA程序时,在运行一个基于我们水平的GVPS(DS)的平均速度程序时,在使用一个以直观的DFSal-ralalalalalalalalalalalalalalalalalalalal pralalal pressalalal pressal pral pral pral pressal pral pral pral pral pral press pral pral pral pral pral pral pral pral pressal pressal pral pral pral pressal pral pressal sal pral pral pressal press pral press pressal sal pressal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal