Deep learning has experienced significant growth in recent years, resulting in increased energy consumption and carbon emission from the use of GPUs for training deep neural networks (DNNs). Answering the call for sustainability, conventional solutions have attempted to move training jobs to locations or time frames with lower carbon intensity. However, moving jobs to other locations may not always be feasible due to large dataset sizes or data regulations. Moreover, postponing training can negatively impact application service quality because the DNNs backing the service are not updated in a timely fashion. In this work, we present a practical solution that reduces the carbon footprint of DNN training without migrating or postponing jobs. Specifically, our solution observes real-time carbon intensity shifts during training and controls the energy consumption of GPUs, thereby reducing carbon footprint while maintaining training performance. Furthermore, in order to proactively adapt to shifting carbon intensity, we propose a lightweight machine learning algorithm that predicts the carbon intensity of the upcoming time frame. Our solution, Chase, reduces the total carbon footprint of training ResNet-50 on ImageNet by 13.6% while only increasing training time by 2.5%.
翻译:近些年来,深层学习经历了显著增长,导致使用GPU培训深神经网络的能源消耗和碳排放增加。响应对可持续性的号召,常规解决方案试图将培训岗位转移到碳密度较低的地点或时间框架。然而,由于庞大的数据集大小或数据条例,将工作转移到其他地点可能并不总是可行。此外,推迟培训可能会对应用服务质量产生不利影响,因为支持服务的DNNP没有及时更新。在这项工作中,我们提出了一个实际解决方案,在不迁移或推迟工作的情况下,将DNN培训的碳足迹减少。具体地说,我们的解决办法是观察培训期间的实时碳密度变化,控制GPU的能源消耗,从而减少碳足迹,同时保持培训绩效。此外,为了积极适应碳强度的变化,我们提议了一个轻量的机器学习算法,预测即将到的时间框架的碳强度。我们的解决办法是大通公司,将图像网络培训ResNet-50的碳足迹减少13.6%,而培训时间仅增加2.5%。</s>