In recent years, the task of mining important information from social media posts during crises has become a focus of research for the purposes of assisting emergency response (ES). The TREC Incident Streams (IS) track is a research challenge organised for this purpose. The track asks participating systems to both classify a stream of crisis-related tweets into humanitarian aid related information types and estimate their importance regarding criticality. The former refers to a multi-label information type classification task and the latter refers to a priority estimation task. In this paper, we report on the participation of the University College Dublin School of Computer Science (UCD-CS) in TREC-IS 2021. We explored a variety of approaches, including simple machine learning algorithms, multi-task learning techniques, text augmentation, and ensemble approaches. The official evaluation results indicate that our runs achieve the highest scores in many metrics. To aid reproducibility, our code is publicly available at https://github.com/wangcongcong123/crisis-mtl.
翻译:近年来,在危机期间挖掘来自社交媒体重要信息的任务已成为协助应急反应的研究重点。TREC事件流(IS)轨道是为此组织的一项研究挑战。该轨道要求参与系统将一系列与危机有关的推文分类为人道主义援助相关信息类型,并估计其重要性。前者提到多标签信息类型分类任务,后者提到一项优先估计任务。在本文件中,我们报告了都柏林大学计算机科学学院(UCD-CS)参加TREC-IS 2021的情况。我们探讨了各种办法,包括简单的机器学习算法、多任务学习技术、文本增强和共同方法。官方评价结果显示,我们的运行取得了许多指标的最高分数。为了帮助复兴,我们的代码可在https://github.com/wangcongcong123/surg-mtl上公开查阅。