Energy conservation of large data centers for high-performance computing workloads, such as deep learning with big data, is of critical significance, where cutting down a few percent of electricity translates into million-dollar savings. This work studies energy conservation on emerging CPU-GPU hybrid clusters through dynamic voltage and frequency scaling (DVFS). We aim at minimizing the total energy consumption of processing a batch of offline tasks or a sequence of real-time tasks under deadline constraints. We derive a fast and accurate analytical model to compute the appropriate voltage/frequency setting for each task and assign multiple tasks to the cluster with heuristic scheduling algorithms. In particular, our model stresses the nonlinear relationship between task execution time and processor speed for GPU-accelerated applications, for more accurately capturing real-world GPU energy consumption. In performance evaluation driven by real-world power measurement traces, our scheduling algorithm shows comparable energy savings to the theoretical upper bound. With a GPU scaling interval where analytically at most 36% of energy can be saved, we record 33-35% of energy savings. Our results are applicable to energy management on modern heterogeneous clusters.
翻译:为高性能计算工作量(如用大数据深造)而保护大型数据中心的能源,具有至关重要的意义,因为通过动态电压和频率缩放(DVFS),削减了一小部分电力,从而节省了数百万美元。这项工作研究新兴的CPU-GPU混合组群的能源节约问题。我们的目标是最大限度地减少处理一批离线任务或期限限制下一系列实时任务的能源消耗总量。我们获得一个快速和准确的分析模型,以计算每项任务的适当电压/频率设置,并将多种任务分配到具有超常排期算法的集群。特别是,我们的模型强调任务执行时间与GPU-加速应用处理速度之间的非线性关系,以便更准确地捕捉到现实世界GPU的能源消耗量。在现实世界电力测量痕迹驱动的业绩评价中,我们的排期算法显示在理论上的上限下可以节省能源。在GPU的缩放间隔下,可以节省大部分36%的能源,我们记录了33-35%的节能量。我们的结果适用于现代混合组群群的能源管理。