We present a comprehensive work on automated veracity assessment from dataset creation to developing novel methods based on Natural Language Inference (NLI), focusing on misinformation related to the COVID-19 pandemic. We first describe the construction of the novel PANACEA dataset consisting of heterogeneous claims on COVID-19 and their respective information sources. The dataset construction includes work on retrieval techniques and similarity measurements to ensure a unique set of claims. We then propose novel techniques for automated veracity assessment based on Natural Language Inference including graph convolutional networks and attention based approaches. We have carried out experiments on evidence retrieval and veracity assessment on the dataset using the proposed techniques and found them competitive with SOTA methods, and provided a detailed discussion.
翻译:我们介绍了从建立数据集到开发基于自然语言推断的新颖方法的自动化真实性评估的全面工作,重点是与COVID-19大流行有关的错误信息;我们首先介绍了由关于COVID-19及其各自的信息来源的多种索赔组成的新的PANACEA数据集的构建情况;数据集的构造包括检索技术和相似性测量工作,以确保一套独特的索赔;然后我们提出了基于自然语言推断的自动真实性评估新技术,包括图表革命网络和关注方法;我们利用拟议的技术对数据集进行了证据检索和真实性评估试验,发现这些数据与SOTA方法具有竞争力,并提供了详细讨论。