检查:利用网络和内容采矿视角在推特上建立COVID-19误传探测系统 (Checkovid: A COVID-19 misinformation detection system on Twitter using network and content mining perspectives)

During the COVID-19 pandemic, social media platforms were ideal for communicating due to social isolation and quarantine. Also, it was the primary source of misinformation dissemination on a large scale, referred to as the infodemic. Therefore, automatic debunking misinformation is a crucial problem. To tackle this problem, we present two COVID-19 related misinformation datasets on Twitter and propose a misinformation detection system comprising network-based and content-based processes based on machine learning algorithms and NLP techniques. In the network-based process, we focus on social properties, network characteristics, and users. On the other hand, we classify misinformation using the content of the tweets directly in the content-based process, which contains text classification models (paragraph-level and sentence-level) and similarity models. The evaluation results on the network-based process show the best results for the artificial neural network model with an F1 score of 88.68%. In the content-based process, our novel similarity models, which obtained an F1 score of 90.26%, show an improvement in the misinformation classification results compared to the network-based models. In addition, in the text classification models, the best result was achieved using the stacking ensemble-learning model by obtaining an F1 score of 95.18%. Furthermore, we test our content-based models on the Constraint@AAAI2021 dataset, and by getting an F1 score of 94.38%, we improve the baseline results. Finally, we develop a fact-checking website called Checkovid that uses each process to detect misinformative and informative claims in the domain of COVID-19 from different perspectives.

翻译：在COVID-19大流行期间,社交媒体平台由于社会隔离和隔离而成为沟通的理想平台。它也是大规模传播错误信息的主要来源,称为Infomic。因此,自动消除错误信息是一个关键问题。为了解决这个问题,我们在Twitter上提供了两个与COVID-19有关的错误数据套件,并提议了一个错误检测系统,其中包括基于网络和内容的基于机器学习算法和NLP技术的程序。在基于网络的进程中,我们侧重于社会属性、网络特性和用户。另一方面,我们利用基于内容的流程直接对错误信息进行分类,其中含有文本分类模型(等级和句级)和类似模型。为了解决这一问题,我们在Twitter上提供了两个基于COVI-19-19的错误数据套件。我们在基于内容的流程中,根据基于网络的F1学习算法和NLP技术,我们获得了90-26%的F1标准评分,显示与基于网络模型的错误信息分类结果的改进。此外,在基于内容的DLIA中,我们利用基于文件的排序的排序模型,我们从FLA中获取了最佳结果。