Weather forecasting is a long-standing computational challenge with direct societal and economic impacts. This task involves a large amount of continuous data collection and exhibits rich spatiotemporal dependencies over long periods, making it highly suitable for deep learning models. In this paper, we apply pre-training techniques to weather forecasting and propose W-MAE, a Weather model with Masked AutoEncoder pre-training for multi-variable weather forecasting. W-MAE is pre-trained in a self-supervised manner to reconstruct spatial correlations within meteorological variables. On the temporal scale, we fine-tune the pre-trained W-MAE to predict the future states of meteorological variables, thereby modeling the temporal dependencies present in weather data. We pre-train W-MAE using the fifth-generation ECMWF Reanalysis (ERA5) data, with samples selected every six hours and using only two years of data. Under the same training data conditions, we compare W-MAE with FourCastNet, and W-MAE outperforms FourCastNet in precipitation forecasting. In the setting where the training data is far less than that of FourCastNet, our model still performs much better in precipitation prediction (0.80 vs. 0.98). Additionally, experiments show that our model has a stable and significant advantage in short-to-medium-range forecasting (i.e., forecasting time ranges from 6 hours to one week), and the longer the prediction time, the more evident the performance advantage of W-MAE, further proving its robustness.
翻译:天气预报是一项长期存在的计算挑战,具有直接的社会和经济影响。这项任务涉及大量连续数据收集,并展现出较长时间的丰富的时空依赖性,使其非常适合深度学习模型。在本文中,我们将预训练技术应用于天气预测,并提出了W-MAE模型,这是一种带有遮蔽自编码器预训练的多变量天气预测模型。W-MAE以自监督的方式进行预训练,以重构气象变量内的空间相关性。在时间尺度上,我们微调预先训练的W-MAE以预测气象变量的未来状态,从而建模天气数据中存在的时间相关性。我们使用第五代ECMWF重分析(ERA5)数据对W-MAE进行预先训练,每隔六小时选择样本,并仅使用两年的数据。在与FourCastNet使用相同的训练数据条件下,我们比较了W-MAE和FourCastNet,并且W-MAE在降水预测方面胜过了FourCastNet。在训练数据远少于FourCastNet的情况下,我们的模型在降水预测方面仍然表现得更好(0.80与0.98)。此外,实验表明,我们的模型在短到中期的预测(即,预测时间范围从6小时到一周)中具有稳定和显着的优势,预测时间越长,W-MAE 的性能优势越明显,进一步证明了其鲁棒性。