Recent research in disaster informatics demonstrates a practical and important use case of artificial intelligence to save human lives and sufferings during post-natural disasters based on social media contents (text and images). While notable progress has been made using texts, research on exploiting the images remains relatively under-explored. To advance the image-based approach, we propose MEDIC (available at: https://crisisnlp.qcri.org/medic/index.html), which is the largest social media image classification dataset for humanitarian response consisting of 71,198 images to address four different tasks in a multi-task learning setup. This is the first dataset of its kind: social media image, disaster response, and multi-task learning research. An important property of this dataset is its high potential to contribute research on multi-task learning, which recently receives much interest from the machine learning community and has shown remarkable results in terms of memory, inference speed, performance, and generalization capability. Therefore, the proposed dataset is an important resource for advancing image-based disaster management and multi-task machine learning research.
翻译:最近对灾害信息学的研究显示,在社会媒体内容(文字和图像)的基础上,人工智能在自然灾害后拯救人的生命和痛苦,这是一个实用和重要的应用案例。虽然在利用文本方面取得了显著进展,但利用图像的研究仍然相对不足。为了推进基于图像的方法,我们提议MEDIC(见https://scorisnicnlp.qcri.org/medic/index.html),这是人道主义应急的最大社交媒体图像分类数据集,包括71 198张图像,用于多任务学习系统中解决四种不同任务。这是这类数据的第一个数据集:社交媒体形象、灾害应对和多任务学习研究。这一数据集的一个重要属性是它对于多任务学习的研究具有巨大潜力,最近机器学习界对此很感兴趣,并在记忆、推断速度、性能和普及能力方面显示出显著的成果。因此,拟议的数据集是推进基于图像的灾害管理和多任务机器学习研究的重要资源。