We present Claim-Dissector: a novel latent variable model for fact-checking and fact-analysis, which given a claim and a set of retrieved provenances allows learning jointly: (i) what are the relevant provenances to this claim (ii) what is the veracity of this claim. We propose to disentangle the per-provenance relevance probability and its contribution to the final veracity probability in an interpretable way - the final veracity probability is proportional to a linear ensemble of per-provenance relevance probabilities. This way, it can be clearly identified the relevance of which sources contributes to what extent towards the final probability. We show that our system achieves state-of-the-art results on FEVER dataset comparable to two-stage systems typically used in traditional fact-checking pipelines, while it often uses significantly less parameters and computation. Our analysis shows that proposed approach further allows to learn not just which provenances are relevant, but also which provenances lead to supporting and which toward denying the claim, without direct supervision. This not only adds interpretability, but also allows to detect claims with conflicting evidence automatically. Furthermore, we study whether our model can learn fine-grained relevance cues while using coarse-grained supervision. We show that our model can achieve competitive sentence-recall while using only paragraph-level relevance supervision. Finally, traversing towards the finest granularity of relevance, we show that our framework is capable of identifying relevance at the token-level. To do this, we present a new benchmark focusing on token-level interpretability - humans annotate tokens in relevant provenances they considered essential when making their judgement. Then we measure how similar are these annotations to tokens our model is focusing on. Our code, and dataset will be released online.
翻译:我们提出了索赔――Dism部门:一个新的潜在潜在潜在变量模型,用于进行事实核查和事实分析,该模型提供了一种主张和一套检索的源代码,从而可以共同学习:(一) 与这一主张相关的出处是什么? (二) 这一主张的真实性是什么?我们提议以可解释的方式去分解经证明的相关性概率及其对最终真实概率的贡献。我们的分析表明,最终真实性概率与每个证明的相关性的线性组合成正比。这样,它可以清楚地确定哪些来源的关联性有助于最终概率的适切程度。我们表明,我们的系统在FEWE数据集上取得了与传统事实核对管道通常使用的两阶段系统相类似的最先进的结果。我们的分析表明,我们目前采用最真实性的数据计算方法不仅能够了解哪些证明是相关的,而且还能证明他们支持和否定这一主张,而没有直接监督。这不但增加了解释性,而且还能够用最有说服力的证据来检测索赔的适切性水平。我们用最有说服力的尺度来解释。最后,我们用最有说服力的模型来证明我们是否具有实际意义。 我们的模型来显示我们是否具有相关性。