The rapid advancement of technology in online communication via social media platforms has led to a prolific rise in the spread of misinformation and fake news. Fake news is especially rampant in the current COVID-19 pandemic, leading to people believing in false and potentially harmful claims and stories. Detecting fake news quickly can alleviate the spread of panic, chaos and potential health hazards. We developed a two stage automated pipeline for COVID-19 fake news detection using state of the art machine learning models for natural language processing. The first model leverages a novel fact checking algorithm that retrieves the most relevant facts concerning user claims about particular COVID-19 claims. The second model verifies the level of truth in the claim by computing the textual entailment between the claim and the true facts retrieved from a manually curated COVID-19 dataset. The dataset is based on a publicly available knowledge source consisting of more than 5000 COVID-19 false claims and verified explanations, a subset of which was internally annotated and cross-validated to train and evaluate our models. We evaluate a series of models based on classical text-based features to more contextual Transformer based models and observe that a model pipeline based on BERT and ALBERT for the two stages respectively yields the best results.
翻译:通过社交媒体平台在线通信技术的迅速发展导致错误消息和假新闻传播大量增加。在目前的COVID-19大流行中,假消息特别猖獗,导致人们相信虚假和潜在有害的说法和故事。快速检测假消息可以缓解恐慌、混乱和潜在健康危害的蔓延。我们开发了两个阶段的COVID-19假新闻探测自动化管道,利用先进的机器学习模型进行自然语言处理。第一个模型利用了一个新颖的事实核对算法,检索用户对特定COVID-19索赔的声称的最相关事实。第二个模型通过计算索赔要求和从人工拼凑的COVID-19数据集中检索的真实事实之间的文字要求来验证索赔要求的真相水平。数据集基于一个公开的知识来源,即5,000多份COVID-19假要求和经核实的解释,其中的一个子集是内部注解和交叉验证,用来培训和评价我们的模型。我们根据基于古典文本的特征,对基于更上下文的变换模型模型进行了一系列模型的模型,并观察以两个阶段为基础的模型结果。