Deep learning has experienced significant growth in recent years, resulting in increased energy consumption and carbon emission from the use of GPUs for training deep neural networks (DNNs). Answering the call for sustainability, conventional solutions have attempted to move training jobs to locations or time frames with lower carbon intensity. However, moving jobs to other locations may not always be feasible due to large dataset sizes or data regulations. Moreover, postponing training can negatively impact application service quality because the DNNs backing the service are not updated in a timely fashion. In this work, we present a practical solution that reduces the carbon footprint of DNN training without migrating or postponing jobs. Specifically, our solution observes real-time carbon intensity shifts during training and controls the energy consumption of GPUs, thereby reducing carbon footprint while maintaining training performance. Furthermore, in order to proactively adapt to shifting carbon intensity, we propose a lightweight machine learning algorithm that predicts the carbon intensity of the upcoming time frame. Our solution, Chase, reduces the total carbon footprint of training ResNet-50 on ImageNet by 13.6% while only increasing training time by 2.5%.
翻译:深度学习在近年来经历了显着增长,由此导致了使用GPU进行深度神经网络(DNN)训练的能源消耗和碳排放的增加。响应可持续性的呼吁,传统解决方案尝试将训练作业移到碳强度较低的位置或时间框架。然而,将作业移动到其他位置可能并不总是可行的,因为数据集的大小可能会很大或受到数据法规的限制。此外,推迟训练可能会对应用程序服务质量产生负面影响,因为支持服务的DNN没有及时更新。在这项工作中,我们提出了一种实用的解决方案,以减少DNN训练的碳足迹,而不需要迁移或推迟作业。具体而言,我们的解决方案在训练期间观察实时碳强度变化并控制GPU的能源消耗,从而在保持训练性能的同时减少碳足迹。此外,为了主动适应碳强度的变化,我们提出了一种轻量级机器学习算法,用于预测即将到来的时间框架的碳强度。我们的解决方案Chase将在ImageNet上训练ResNet-50的总碳足迹降低了13.6%,而仅将训练时间增加了2.5%。