CloudCast:以卫星为基础的数据集和预测云的基线 (CloudCast: A Satellite-Based Dataset and Baseline for Forecasting Clouds)

Forecasting the formation and development of clouds is a central element of modern weather forecasting systems. Incorrect clouds forecasts can lead to major uncertainty in the overall accuracy of weather forecasts due to their intrinsic role in the Earth's climate system. Few studies have tackled this challenging problem from a machine learning point-of-view due to a shortage of high-resolution datasets with many historical observations globally. In this paper, we present a novel satellite-based dataset called ``CloudCast''. It consists of 70,080 images with 10 different cloud types for multiple layers of the atmosphere annotated on a pixel level. The spatial resolution of the dataset is 928 x 1530 pixels (3x3 km per pixel) with 15-min intervals between frames for the period 2017-01-01 to 2018-12-31. All frames are centered and projected over Europe. To supplement the dataset, we conduct an evaluation study with current state-of-the-art video prediction methods such as convolutional long short-term memory networks, generative adversarial networks, and optical flow-based extrapolation methods. As the evaluation of video prediction is difficult in practice, we aim for a thorough evaluation in the spatial and temporal domain. Our benchmark models show promising results but with ample room for improvement. This is the first publicly available global-scale dataset with high-resolution cloud types on a high temporal granularity to the authors' best knowledge.

翻译：云层的形成和发展是现代天气预报系统的核心要素。不正确的云层预测可能导致天气预报总体准确性的重大不确定性,因为天气预报在地球气候系统中具有内在作用。很少有研究从机器学习点解决了这一具有挑战性的问题,因为缺少高分辨率数据集,并有许多历史全球观测数据。在本文中,我们提出了一个名为“CloudCast'”的新颖的卫星数据集。它由70 080张图像和10种不同云型的多层大气图像组成,并在像素水平上附加说明。数据集的空间分辨率为928x1530像素(每像素3x3千米)。在2017-01至2018-12-31年期间的框架中间隔15分钟。所有框架都以欧洲为中心并预测。为了补充数据集,我们用当前最先进的视频视频预测方法,如长长的短期记忆网络、配对顶级对顶级对顶级的顶层网络和光学流外推算法。我们第一次对高空域模型所作的评估非常困难,因此,我们对高空域数据进行高水平的模型进行高水平评估。我们对高水平数据进行高分辨率评估。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【牛津大学】深度学习时间序列预测，12页pdf, Deep Learning Time Series Forecasting

专知会员服务

174+阅读 · 2020年5月1日

【牛津大学】深度学习时间序列预测，Time Series Forecasting With Deep Learning: A Survey

专知会员服务

142+阅读 · 2020年4月30日

【预测天气】使用深度学习改进天气预报的进展和挑战，60页ppt，Progress and challenges for the use of deep learning to improve weather forecasts，Peter Dueben

专知会员服务

55+阅读 · 2020年3月14日