利用知识强化语言模型对假新闻探测进行评估 (Evaluation of Fake News Detection with Knowledge-Enhanced Language Models)

Recent advances in fake news detection have exploited the success of large-scale pre-trained language models (PLMs). The predominant state-of-the-art approaches are based on fine-tuning PLMs on labelled fake news datasets. However, large-scale PLMs are generally not trained on structured factual data and hence may not possess priors that are grounded in factually accurate knowledge. The use of existing knowledge bases (KBs) with rich human-curated factual information has thus the potential to make fake news detection more effective and robust. In this paper, we investigate the impact of knowledge integration into PLMs for fake news detection. We study several state-of-the-art approaches for knowledge integration, mostly using Wikidata as KB, on two popular fake news datasets - LIAR, a politics-based dataset, and COVID-19, a dataset of messages posted on social media relating to the COVID-19 pandemic. Our experiments show that knowledge-enhanced models can significantly improve fake news detection on LIAR where the KB is relevant and up-to-date. The mixed results on COVID-19 highlight the reliance on stylistic features and the importance of domain-specific and current KBs.

翻译：最新假新闻探测的进展利用了大规模预先培训语言模型的成功。主要的最新方法基于对贴有标签的假新闻数据集的PLMS进行微调,但大型PLM一般没有经过结构化的事实数据培训,因此可能没有基于事实准确知识的事先知识。使用现有知识库(KBs),拥有丰富的人造事实信息,因此有可能使假新闻探测更加有效和有力。在本文中,我们调查知识融入PLMS的影响,以进行假新闻探测。我们研究知识整合的若干最新方法,主要是在两个流行的假新闻数据集(LIAR,政治数据集和COVID-19)上使用一套与COVID-19大流行有关的社交媒体信息。我们的实验表明,知识强化模型可以大大改进LIAR的假新闻探测,而KB是相关和最新新闻探测器。关于CVID-19的混合结果突出了当前和具体域域对Styl特征的依赖性。