The rapid rise in cloud computing has resulted in an alarming increase in data centers' carbon emissions, which now accounts for >3% of global greenhouse gas emissions, necessitating immediate steps to combat their mounting strain on the global climate. An important focus of this effort is to improve resource utilization in order to save electricity usage. Our proposed Full Scaling Automation (FSA) mechanism is an effective method of dynamically adapting resources to accommodate changing workloads in large-scale cloud computing clusters, enabling the clusters in data centers to maintain their desired CPU utilization target and thus improve energy efficiency. FSA harnesses the power of deep representation learning to accurately predict the future workload of each service and automatically stabilize the corresponding target CPU usage level, unlike the previous autoscaling methods, such as Autopilot or FIRM, that need to adjust computing resources with statistical models and expert knowledge. Our approach achieves significant performance improvement compared to the existing work in real-world datasets. We also deployed FSA on large-scale cloud computing clusters in industrial data centers, and according to the certification of the China Environmental United Certification Center (CEC), a reduction of 947 tons of carbon dioxide, equivalent to a saving of 1538,000 kWh of electricity, was achieved during the Double 11 shopping festival of 2022, marking a critical step for our company's strategic goal towards carbon neutrality by 2030.
翻译:云计算的迅速发展导致数据中心的碳排放急剧上升,现在已占全球温室气体排放的>3%,迫切需要采取措施来应对它们对全球气候的不断影响。这个努力的一个重点是提高资源利用率以节省电力消耗。我们提出的全面自动化(FSA)机制是一种有效的方法,用于动态适应大规模云计算集群中的变化工作负载,使数据中心的集群维持其CPU利用率目标,从而提高能源效率。与以往需要使用统计模型和专家知识来调整计算资源的自动缩放方法(例如Autopilot或FIRM)不同,FSA利用深度表示学习的力量来精确预测每个服务的未来工作负载,并自动稳定相应的目标CPU使用率水平。我们的方法比现有工作在实际数据集上实现了显着的性能改进。我们还在工业数据中心的大规模云计算集群上部署了FSA,并根据中国环境联合认证中心(CEC)的认证,于2022年的双11购物节中实现了947吨二氧化碳的减排,相当于节约了1538,000 kWh的电力,这标志着我们公司实现2030年碳中和战略目标的关键一步。