As part of an automated fact-checking pipeline, the claim veracity classification task consists in determining if a claim is supported by an associated piece of evidence. The complexity of gathering labelled claim-evidence pairs leads to a scarcity of datasets, particularly when dealing with new domains. In this paper, we introduce SEED, a novel vector-based method to few-shot claim veracity classification that aggregates pairwise semantic differences for claim-evidence pairs. We build on the hypothesis that we can simulate class representative vectors that capture average semantic differences for claim-evidence pairs in a class, which can then be used for classification of new instances. We compare the performance of our method with competitive baselines including fine-tuned BERT/RoBERTa models, as well as the state-of-the-art few-shot veracity classification method that leverages language model perplexity. Experiments conducted on the FEVER and SCIFACT datasets show consistent improvements over competitive baselines in few-shot settings. Our code is available.
翻译:作为自动事实检查管道的一部分,索赔真实性分类任务包括确定索赔是否得到相关证据的支持。收集有标签的索赔证据对对等的复杂性导致数据集稀缺,特别是在处理新领域时。在本文件中,我们引入了一种基于病媒的新方法,即“SEED”,即对几发数据要求真实性分类进行汇总,将索赔证据对对等的语义差异相匹配。我们基于这样的假设,即我们可以模拟具有代表性的等级矢量矢量,以捕捉一个类别中索赔证据对对相的平均语义差异,然后用于对新案例进行分类。我们将我们方法的性能与竞争性基线进行比较,包括精细调整的BERT/ROBERTA模型,以及利用语言模型不易理解的最先进的几发真实性分类方法。在FEWER和SCIFACT数据集上进行的实验表明,在几发环境中比竞争性基线有一致的改进。我们的代码是存在的。