In recent years, social media has been widely explored as a potential source of communication and information in disasters and emergency situations. Several interesting works and case studies of disaster analytics exploring different aspects of natural disasters have been already conducted. Along with the great potential, disaster analytics comes with several challenges mainly due to the nature of social media content. In this paper, we explore one such challenge and propose a text classification framework to deal with Twitter noisy data. More specifically, we employed several transformers both individually and in combination, so as to differentiate between relevant and non-relevant Twitter posts, achieving the highest F1-score of 0.87.
翻译:近年来,人们广泛探讨社交媒体作为灾害和紧急情况下的通信和信息的潜在来源,已经开展了若干有趣的研究,并进行了一些探索自然灾害不同方面的灾害分析研究案例研究,探索了自然灾害的不同方面,除了巨大的潜力外,灾害分析还面临若干挑战,主要因为社交媒体内容的性质。在本论文中,我们探讨了其中一项挑战,并提出了一个处理Twitter上噪音数据的文本分类框架。更具体地说,我们单独或合并地使用了若干变压器,以便区分相关和非相关Twitter文章,达到最高F1核心0.87。