One of the greatest sources of uncertainty in future climate projections comes from limitations in modelling clouds and in understanding how different cloud types interact with the climate system. A key first step in reducing this uncertainty is to accurately classify cloud types at high spatial and temporal resolution. In this paper, we introduce Cumulo, a benchmark dataset for training and evaluating global cloud classification models. It consists of one year of 1km resolution MODIS hyperspectral imagery merged with pixel-width 'tracks' of CloudSat cloud labels. Bringing these complementary datasets together is a crucial first step, enabling the Machine-Learning community to develop innovative new techniques which could greatly benefit the Climate community. To showcase Cumulo, we provide baseline performance analysis using an invertible flow generative model (IResNet), which further allows us to discover new sub-classes for a given cloud class by exploring the latent space. To compare methods, we introduce a set of evaluation criteria, to identify models that are not only accurate, but also physically-realistic. CUMULO can be download from https://www.dropbox.com/sh/i3s9q2v2jjyk2it/AACxXnXfMF5wuIqLXqH4NJOra?dl=0 .
翻译:未来气候预测的最大不确定性来源之一来自云层建模的局限性和对不同云型与气候系统互动的理解。减少这种不确定性的关键第一步是准确对高空间和时空分辨率的云型进行分类。在本论文中,我们引入了Cumulo,这是用于培训和评价全球云级分类模型的基准数据集。它包括一年1公里分辨率的MODIS超光谱图像,与CloudSat云标签的像素-width“轨迹”相结合。把这些互补数据集汇集在一起是关键的第一步,使机器学习界能够开发创新的新技术,从而大大有利于气候界。要展示 Cumulo,我们使用不可逆的流谱化模型(IResNet)提供基线性绩效分析,这进一步使我们能够通过探索潜在空间为某个特定的云类找到新的子类。为了比较方法,我们引入一套评价标准,以确定不仅准确,而且具有物理现实性。CUMULO可以下载https://www.droption.com/sh/i3-AKRIS2XVQ=RA2XKK4。