Misleading information spreads on the Internet at an incredible speed, which can lead to irreparable consequences in some cases. It is becoming essential to develop fake news detection technologies. While substantial work has been done in this direction, one of the limitations of the current approaches is that these models are focused only on one language and do not use multilingual information. In this work, we propose Multiverse -- a new feature based on multilingual evidence that can be used for fake news detection and improve existing approaches. The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed, firstly, by manual experiment based on a set of known true and fake news. After that, we compared our fake news classification system based on the proposed feature with several baselines on two multi-domain datasets of general-topic news and one fake COVID-19 news dataset showing that in additional combination with linguistic features it yields significant improvements.
翻译:互联网上误导信息的速度令人难以置信,在某些情况下可能导致不可弥补的后果。开发假新闻探测技术已变得至关重要。虽然在这方面已经做了大量工作,但目前方法的局限性之一是这些模型只侧重于一种语言,不使用多语种信息。在这项工作中,我们提出多元论 -- -- 基于多语种证据的新特征,可用于假新闻探测和改进现有方法。首先,根据一套已知真实和假新闻进行人工实验,证实了使用跨语言证据作为假新闻探测特征的假设。之后,我们根据两个通用新闻多域数据集的多个基线和一个假的COVID-19新闻数据集,对基于拟议特征的假新闻分类系统进行了比较,显示它与语言特征的结合产生了显著改进。