We propose the end-to-end multimodal fact-checking and explanation generation, where the input is a claim and a large collection of web sources, including articles, images, videos, and tweets, and the goal is to assess the truthfulness of the claim by retrieving relevant evidence and predicting a truthfulness label (i.e., support, refute and not enough information), and generate a rationalization statement to explain the reasoning and ruling process. To support this research, we construct Mocheg, a large-scale dataset that consists of 21,184 claims where each claim is assigned with a truthfulness label and ruling statement, with 58,523 evidence in the form of text and images. To establish baseline performances on Mocheg, we experiment with several state-of-the-art neural architectures on the three pipelined subtasks: multimodal evidence retrieval, claim verification, and explanation generation, and demonstrate the current state-of-the-art performance of end-to-end multimodal fact-checking is still far from satisfying. To the best of our knowledge, we are the first to build the benchmark dataset and solutions for end-to-end multimodal fact-checking and justification.
翻译:我们提出最终到最终多式联运的实况调查和解释生成,其中输入是一种索赔,并收集了大量网络来源,包括文章、图像、视频和推特,目的是通过检索相关证据和预测真实性标签(即支持、反驳和不充分信息)来评估索赔的真实性,并生成一个解释推理和裁决过程的合理化声明。为支持这一研究,我们建造了由21 184项索赔组成的大型数据集,其中每项索赔都配有真实性标签和裁决声明,有58 523项文本和图像形式的证据。为了在Mocheg上确立基线性能,我们试验了三个编审子任务:多式联运证据检索、索赔核实和解释生成,以及展示目前端到端的多式联运事实核对的最新表现,这还远远不能令人满意。我们最了解的是,我们首先为最终的多式联运核实和事实核实建立了基准数据集和解决方案。