Post-processing ensemble prediction systems can improve the reliability of weather forecasting, especially for extreme event prediction. In recent years, different machine learning models have been developed to improve the quality of weather post-processing. However, these models require a comprehensive dataset of weather simulations to produce high-accuracy results, which comes at a high computational cost to generate. This paper introduces the ENS-10 dataset, consisting of ten ensemble members spanning 20 years (1998-2017). The ensemble members are generated by perturbing numerical weather simulations to capture the chaotic behavior of the Earth. To represent the three-dimensional state of the atmosphere, ENS-10 provides the most relevant atmospheric variables at 11 distinct pressure levels and the surface at 0.5-degree resolution for forecast lead times T=0, 24, and 48 hours (two data points per week). We propose the ENS-10 prediction correction task for improving the forecast quality at a 48-hour lead time through ensemble post-processing. We provide a set of baselines and compare their skill at correcting the predictions of three important atmospheric variables. Moreover, we measure the baselines' skill at improving predictions of extreme weather events using our dataset. The ENS-10 dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
翻译:近些年来,为了提高气象处理后处理质量,已经开发了不同的机器学习模型,这些模型需要气象模拟综合数据集,以产生高准确性结果,计算成本很高。本文介绍了ENS-10数据集,由十个共十个成员组成,为期20年(1998-1999-2017年),集合成员通过干扰数字天气模拟来捕捉地球的混乱行为产生。为了代表大气的三维状态,ENS-10提供了11个不同压力水平的最相关大气变量,表层为0.5度分辨率,用于预测T=0、24和48小时(每周2个数据点),我们提议ENS-10预测纠正任务,用于在48小时前通过堆式后处理提高预报质量。我们提供了一套基线,并比较了它们在纠正三种重要大气变量预测方面的技能。此外,我们用全球同步数据模型4测量了用于预测全球同步气象事件的基准技能。我们用全球气象数据模型4测量了全球气象数据更新的通用数据。