WeatherBench is a benchmark dataset for medium-range weather forecasting of geopotential, temperature and precipitation, consisting of preprocessed data, predefined evaluation metrics and a number of baseline models. WeatherBench Probability extends this to probabilistic forecasting by adding a set of established probabilistic verification metrics (continuous ranked probability score, spread-skill ratio and rank histograms) and a state-of-the-art operational baseline using the ECWMF IFS ensemble forecast. In addition, we test three different probabilistic machine learning methods -- Monte Carlo dropout, parametric prediction and categorical prediction, in which the probability distribution is discretized. We find that plain Monte Carlo dropout severely underestimates uncertainty. The parametric and categorical models both produce fairly reliable forecasts of similar quality. The parametric models have fewer degrees of freedom while the categorical model is more flexible when it comes to predicting non-Gaussian distributions. None of the models are able to match the skill of the operational IFS model. We hope that this benchmark will enable other researchers to evaluate their probabilistic approaches.
翻译:天气是地球潜力、温度和降水的中程天气预报的基准数据集,由预处理数据、预设评价指标和若干基线模型组成。天气概率将扩大至概率预测,方法是增加一套既定的概率核查指标(连续的概率分级、扩散-技能比率和级直方图)以及使用欧洲CWCMF IFS 共性预报的最先进的操作基线。此外,我们还测试三种不同的概率机器学习方法 -- -- 蒙特卡洛辍学、参数预测和绝对预测,其中概率分布是分散的。我们发现蒙特卡洛平原辍学严重低估了不确定性。参数和绝对模型都产生类似质量的相当可靠的预测。参数模型的自由度较低,而绝对模型在预测非高加索分布时则比较灵活。这些模型没有一个能够与操作的IFS模型技能相匹配。我们希望这一基准将使其他研究人员能够评估其概率方法。