An important component of an automated fact-checking system is the claim check-worthiness detection system, which ranks sentences by prioritising them based on their need to be checked. Despite a body of research tackling the task, previous research has overlooked the challenging nature of identifying check-worthy claims across different topics. In this paper, we assess and quantify the challenge of detecting check-worthy claims for new, unseen topics. After highlighting the problem, we propose the AraCWA model to mitigate the performance deterioration when detecting check-worthy claims across topics. The AraCWA model enables boosting the performance for new topics by incorporating two components for few-shot learning and data augmentation. Using a publicly available dataset of Arabic tweets consisting of 14 different topics, we demonstrate that our proposed data augmentation strategy achieves substantial improvements across topics overall, where the extent of the improvement varies across topics. Further, we analyse the semantic similarities between topics, suggesting that the similarity metric could be used as a proxy to determine the difficulty level of an unseen topic prior to undertaking the task of labelling the underlying sentences.
翻译:自动事实检查系统的一个重要组成部分是索赔核实标准检测系统,该系统根据需要检查他们,根据需要对判决进行优先排序。尽管进行了大量研究,但先前的研究忽略了在不同专题中查明可核对索赔的难度性。在本文件中,我们评估和量化了发现对新的、隐性专题的可核对索赔的挑战。在突出问题之后,我们建议AraCWA模式在发现跨专题的可核对索赔时减轻性能恶化。AraCWA模式通过纳入由14个不同专题组成的可公开获取的阿拉伯推文数据集,提高了新专题的性能。我们利用由14个不同专题组成的阿拉伯推文数据集,表明我们拟议的数据增强战略在总体上取得了实质性改进,各个专题的改进程度各不相同。此外,我们分析各专题之间的语义相似性,建议使用类似度指标作为代用来确定在进行基本判决标注之前难以完成的隐性专题的难度。