The COVID-19 pandemic poses a great threat to global public health. Meanwhile, there is massive misinformation associated with the pandemic which advocates unfounded or unscientific claims. Even major social media and news outlets have made an extra effort in debunking COVID-19 misinformation, most of the fact-checking information is in English, whereas some unmoderated COVID-19 misinformation is still circulating in other languages, threatening the health of less-informed people in immigrant communities and developing countries. In this paper, we make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English). We start by curating a Chinese real&fake news dataset according to existing fact-checking information. Then, we propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content as much as possible. Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting and it also outperforms several monolingual and cross-lingual fake news detectors. The dataset is available at https://github.com/YingtongDou/CrossFake.
翻译:COVID-19大流行给全球公众健康带来巨大威胁。与此同时,与这一大流行有关的大量错误信息与这种大流行有关,鼓吹毫无根据或不科学的说法。即使是主要的社交媒体和新闻媒体也作出额外努力,破除COVID-19错误信息,大部分事实核对信息是英文,而一些未更新的COVID-19大流行信息仍然以其他语言传播,威胁移民社区和发展中国家信息不全的人的健康。在本文中,我们第一次尝试用一种低资源语言(中文)来检测COVID-19错误信息,但只能使用高资源语言(英文)经事实核对的新闻。我们首先根据现有的事实核对信息整理中国真实和假新闻数据集。然后,我们提出一个名为CrossFake的深层次学习框架,以联合编码跨语言新闻机构文本并尽可能地捕捉新闻内容。我们的数据集的“经验”显示跨语言设置下的CrosyFake的有效性,它也超越了几个单语和跨语言的假新闻探测器。数据可在 http://Ygisgs.