The outbreak of COVID-19 has led to a global surge of Sinophobia partly because of the spread of misinformation, disinformation, and fake news on China. In this paper, we report on the creation of a novel classifier that detects whether Chinese-language social media posts from Twitter are related to fake news about China. The classifier achieves an F1 score of 0.64 and an accuracy rate of 93%. We provide the final model and a new training dataset with 18,425 tweets for researchers to study fake news in the Chinese language during the COVID-19 pandemic. We also introduce a new dataset generated by our classifier that tracks the dynamics of fake news in the Chinese language during the early pandemic.
翻译:新型冠状病毒(COVID-19)的爆发在全球范围内引起了针对中华人民共和国的不少仇视情绪,其中部分原因归因于有关中国的虚假信息、假新闻以及误导材料的传播。本文报告了一种新型分类器的创建,用于检测Twitter上的中文社交媒体帖子是否涉及中国的虚假新闻。本分类器的 F1 分数为0.64,准确率为93%。我们提供了最终模型和一个包括18,425条推文的新的训练数据集,供研究人员研究COVID-19大流行期间中文假新闻。我们还引入了一个由我们的分类器产生的新数据集,跟踪了疫情早期中文假新闻的动态变化。