In this system paper we present our contribution to the Constraint 2021 COVID-19 Fake News Detection Shared Task, which poses the challenge of classifying COVID-19 related social media posts as either fake or real. In our system, we address this challenge by applying classical machine learning algorithms together with several linguistic features, such as n-grams, readability, emotional tone and punctuation. In terms of pre-processing, we experiment with various steps like stop word removal, stemming/lemmatization, link removal and more. We find our best performing system to be based on a linear SVM, which obtains a weighted average F1 score of 95.19% on test data, which lands a place in the middle of the leaderboard (place 80 of 167).
翻译:在这份系统文件中,我们介绍了我们对2021年COVID-19假新闻共同探测任务的贡献,这提出了将COVID-19相关社交媒体职位列为假的或真实的难题。在我们的系统中,我们通过应用古典机器学习算法以及若干语言特征,如n克、可读性、情感调子和标点来应对这一挑战。在预处理方面,我们试验了各种步骤,如停止删除单词、阻止/消除、链接删除等等。我们发现,我们的最佳运作系统是以线性SVM为基础的,该SVM在测试数据上获得了95.19%的加权平均F1分,该F1分位于领先板中间(167位中的80位 ) 。