Climate change has increased the intensity, frequency, and duration of extreme weather events and natural disasters across the world. While the increased data on natural disasters improves the scope of machine learning (ML) in this field, progress is relatively slow. One bottleneck is the lack of benchmark datasets that would allow ML researchers to quantify their progress against a standard metric. The objective of this short paper is to explore the state of benchmark datasets for ML tasks related to natural disasters, categorizing them according to the disaster management cycle. We compile a list of existing benchmark datasets introduced in the past five years. We propose a web platform - NADBenchmarks - where researchers can search for benchmark datasets for natural disasters, and we develop a preliminary version of such a platform using our compiled list. This paper is intended to aid researchers in finding benchmark datasets to train their ML models on, and provide general directions for topics where they can contribute new benchmark datasets.
翻译:气候变化增加了世界各地极端天气事件和自然灾害的强度、频率和持续时间。虽然自然灾害数据的增加改善了该领域的机器学习范围,但进展相对缓慢。一个瓶颈是缺乏基准数据集,使ML研究人员能够对照标准指标量化进展情况。本短文的目的是探索与自然灾害有关的ML任务的基准数据集状况,按照灾害管理周期对其进行分类。我们汇编了过去五年推出的现有基准数据集清单。我们提议了一个网络平台 — NADBenchmarks — 研究人员可以搜索自然灾害的基准数据集,我们利用我们汇编的清单开发了这样一个平台的初步版本。本短文旨在帮助研究人员寻找基准数据集,以培训其ML模型,并为他们能够提供新的基准数据集的专题提供一般方向。