Recent research in disaster informatics demonstrates a practical and important use case of artificial intelligence to save human lives and suffering during natural disasters based on social media contents (text and images). While notable progress has been made using texts, research on exploiting the images remains relatively under-explored. To advance image-based approaches, we propose MEDIC (Available at: https://crisisnlp.qcri.org/medic/index.html), which is the largest social media image classification dataset for humanitarian response consisting of 71,198 images to address four different tasks in a multi-task learning setup. This is the first dataset of its kind: social media images, disaster response, and multi-task learning research. An important property of this dataset is its high potential to facilitate research on multi-task learning, which recently receives much interest from the machine learning community and has shown remarkable results in terms of memory, inference speed, performance, and generalization capability. Therefore, the proposed dataset is an important resource for advancing image-based disaster management and multi-task machine learning research. We experiment with different deep learning architectures and report promising results, which are above the majority baselines for all tasks. Along with the dataset, we also release all relevant scripts (https://github.com/firojalam/medic).
翻译:最近对灾害信息学的研究显示,在社会媒体内容(文本和图像)的基础上,人工智能在自然灾害期间拯救人的生命和痛苦,是一个实用和重要的使用案例。虽然在利用文本方面取得了显著的进展,但利用图像的研究仍然相对不足。为了推进基于图像的方法,我们提议使用MEDIC(可在https://scorisnlp.qcri.org/ medic/index.html上查阅),这是人道主义应急的最大社交媒体图像分类数据集,包括71 198张图像,用于在多任务学习系统中应对四种不同任务。这是第一个这类数据集:社交媒体图像、灾害应对和多任务学习研究。这一数据集的一个重要属性是它具有促进多任务学习研究的巨大潜力,最近机器学习界对此很感兴趣,在记忆、推断速度、性能和一般化能力方面都取得了显著成果。因此,拟议的数据集是推进基于图像的灾害管理和多任务机器学习研究的重要资源。这是首套数据集:社交媒体图像图像、灾害应对和多任务学习研究。我们用不同的深层任务进行实验,并用所有基础/基础文件进行。