Post-processing ensemble prediction systems can improve weather forecasting, especially for extreme event prediction. In recent years, different machine learning models have been developed to improve the quality of the post-processing step. However, these models heavily rely on the data and generating such ensemble members requires multiple runs of numerical weather prediction models, at high computational cost. This paper introduces the ENS-10 dataset, consisting of ten ensemble members spread over 20 years (1998-2017). The ensemble members are generated by perturbing numerical weather simulations to capture the chaotic behavior of the Earth. To represent the three-dimensional state of the atmosphere, ENS-10 provides the most relevant atmospheric variables in 11 distinct pressure levels as well as the surface at 0.5-degree resolution. The dataset targets the prediction correction task at 48-hour lead time, which is essentially improving the forecast quality by removing the biases of the ensemble members. To this end, ENS-10 provides the weather variables for forecast lead times T=0, 24, and 48 hours (two data points per week). We provide a set of baselines for this task on ENS-10 and compare their performance in correcting the prediction of different weather variables. We also assess our baselines for predicting extreme events using our dataset. The ENS-10 dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) licence.
翻译:近些年来,为提高处理后步骤的质量,开发了不同的机器学习模型,这些模型在很大程度上依赖数据,并产生这种混合成员,需要以高计算成本计算多种数字天气预测模型。本文介绍了ENS-10数据集,由十个混合成员组成,分布在20年(1998-2017年),共同成员通过干扰数字天气模拟来捕捉地球的混乱行为产生。为了代表大气层的三维状态,ENS-10提供了11个不同压力水平的最相关的大气变量,以及0.5度分辨率的表面。数据集的目标是48小时前的预测更正任务,这主要是通过消除共成员偏见来提高预测质量。为此,ENS-10提供了预测时间T=0、24小时和48小时(每周两个数据点)的天气变量。我们为这一任务提供了一套基准,即:ENS-10的三维状态以及0.度分辨率分辨率分辨率的表面提供了最相关的大气变量。数据集针对48小时的预测修正任务,通过消除混合成员偏差的偏差,从而从根本上改进预报质量。我们还用我们现有的最新气象预测基准评估了我们现有的最高周期10号的数据基准。