It is important to predict how the Global Mean Temperature (GMT) will evolve in the next few decades. The ability to predict historical data is a necessary first step toward the actual goal of making long-range forecasts. This paper examines the advantage of statistical and simpler Machine Learning (ML) methods instead of directly using complex ML algorithms and Deep Learning Neural Networks (DNN). Often neglected data transformation methods prior to applying different algorithms have been used as a means of improving predictive accuracy. The GMT time series is treated both as a univariate time series and also cast as a regression problem. Some steps of data transformations were found to be effective. Various simple ML methods did as well or better than the more well-known ones showing merit in trying a large bouquet of algorithms as a first step. Fifty-six algorithms were subject to Box-Cox, Yeo-Johnson, and first-order differencing and compared with the absence of them. Predictions for the annual GMT testing data were better than that published so far, with the lowest RMSE value of 0.02 $^\circ$C. RMSE for five-year mean GMT values for the test data ranged from 0.00002 to 0.00036 $^\circ$C.
翻译:预测全球平均温度(GMT)在未来几十年中将如何演变十分重要。预测历史数据的能力是实现长期预测的实际目标的必要的第一步。本文件审视了统计和更简单的机器学习方法的优势,而不是直接使用复杂的ML算法和深学习神经网络(DNN)。在应用不同算法之前,通常被忽视的数据转换方法被用来提高预测准确性。GMT时间序列既被视为一个单向时间序列,也被视为一个回归问题。一些数据转换步骤被认为是有效的。各种简单的ML方法在尝试大型算法和深学习神经网络(DNNN)方面表现良好或更好。56种算法在应用不同算法之前被作为提高预测性准确性的手段使用过。对于GMT年度测试数据的预测性比迄今为止公布的最低RME值为0.02 $cicrc$C 。对于五年内GMRC02_C数据的平均值从0.02美元到0.009美元的GMSERC值范围而言,对年度GMT测试的预测性值比所公布的要好得多。</s>