Social media has become an important information source for crisis management and provides quick access to ongoing developments and critical information. However, classification models suffer from event-related biases and highly imbalanced label distributions which still poses a challenging task. To address these challenges, we propose a combination of entity-masked language modeling and hierarchical multi-label classification as a multi-task learning problem. We evaluate our method on tweets from the TREC-IS dataset and show an absolute performance gain w.r.t. F1-score of up to 10% for actionable information types. Moreover, we found that entity-masking reduces the effect of overfitting to in-domain events and enables improvements in cross-event generalization.
翻译:社会媒体已成为危机管理的重要信息来源,提供了快速获取当前动态和重要信息的途径,然而,分类模式受到与事件有关的偏见和高度不平衡的标签分配的困扰,这仍是一项艰巨的任务。为了应对这些挑战,我们提议将实体假冒语言建模和等级多标签分类相结合,作为一个多任务学习问题。我们评估了我们从TREC-IS数据集得到的推文方法,并显示在可采取行动的信息类型方面,绝对业绩增益高达10%。此外,我们发现,实体造型减少了过度适应日常事件的影响,可以改进交叉活动的一般化。