Thermal management in the hyper-scale cloud data centers is a critical problem. Increased host temperature creates hotspots which significantly increases cooling cost and affects reliability. Accurate prediction of host temperature is crucial for managing the resources effectively. Temperature estimation is a non-trivial problem due to thermal variations in the data center. Existing solutions for temperature estimation are inefficient due to their computational complexity and lack of accurate prediction. However, data-driven machine learning methods for temperature prediction is a promising approach. In this regard, we collect and study data from a private cloud and show the presence of thermal variations. We investigate several machine learning models to accurately predict the host temperature. Specifically, we propose a gradient boosting machine learning model for temperature prediction. The experiment results show that our model accurately predicts the temperature with the average RMSE value of 0.05 or an average prediction error of 2.38 degree Celsius, which is 6 degree Celsius less as compared to an existing theoretical model. In addition, we propose a dynamic scheduling algorithm to minimize the peak temperature of hosts. The results show that our algorithm reduces the peak temperature by 6.5 degree Celsius and consumes 34.5% less energy as compared to the baseline algorithm.
翻译:超大型云层数据中心的热管理是一个关键问题。主机温度的提高造成了热点,大大提高了冷却成本,影响了可靠性。对主机温度的准确预测对于有效管理资源至关重要。温度估计是一个非三重问题,因为数据中心的热变异。现有的温度估计解决方案效率低下,原因是其计算复杂,而且缺乏准确的预测。然而,数据驱动的温度预测机器学习方法是一个很有希望的方法。在这方面,我们从私人云中收集和研究数据,并显示热变异的存在。我们调查了几种机器学习模型,以准确预测主机温度。具体地说,我们提出了温度预测梯度加速机学习模型。实验结果表明,我们的模型精确预测温度值是RMSE平均值0.05或平均预测误差2.38摄氏度,比现有的理论模型低6摄氏度。此外,我们提议了动态的时间安排算法,以尽量减少宿主的高峰温度。结果显示,我们的算法将高峰温度降低6.5摄氏度,比基线算法少消耗34.5%的能量。