To mitigate the impact of data scarcity on fact-checking systems, we focus on few-shot claim verification. Despite recent work on few-shot classification by proposing advanced language models, there is a dearth of research in data annotation prioritisation that improves the selection of the few shots to be labelled for optimal model performance. We propose Active PETs, a novel weighted approach that utilises an ensemble of Pattern Exploiting Training (PET) models based on various language models, to actively select unlabelled data as candidates for annotation. Using Active PETs for data selection shows consistent improvement over the state-of-the-art active learning method, on two technical fact-checking datasets and using six different pretrained language models. We show further improvement with Active PETs-o, which further integrates an oversampling strategy. Our approach enables effective selection of instances to be labelled where unlabelled data is abundant but resources for labelling are limited, leading to consistently improved few-shot claim verification performance. Our code will be available upon publication.
翻译:为了减轻数据稀缺对事实检查系统的影响,我们注重对少数数据要求的核实。尽管最近通过提出先进的语言模型对少数数据进行了分类,但缺乏数据说明的优先排序研究,从而改进了为最佳模型性能所贴标签的少数几张照片的选择。我们提议采用主动的 PETs, 这是一种新型的加权方法,利用各种语言模型的典型剥削培训模型的组合,积极选择未贴标签的数据作为注释的候选数据。在数据选择中使用主动的 PETs 显示,与最新活跃学习方法相比,在两个技术事实审查数据集和使用六个不同的预先培训的语言模型方面,持续改进。我们展示了与积极 PETs-o 的进一步改进,这进一步整合了过度抽样战略。我们的方法使得能够有效地选择在无标签数据但标签资源有限的地方贴上标签,从而不断改进少数点的认证绩效。我们的代码将在出版时公布。