主动 PETs: 主动数据注释优先排序,用于利用模式开发培训的少量热索赔核实 (Active PETs: Active Data Annotation Prioritisation for Few-Shot Claim Verification with Pattern Exploiting Training)

To mitigate the impact of data scarcity on fact-checking systems, we focus on few-shot claim verification. Despite recent work on few-shot classification by proposing advanced language models, there is a dearth of research in data annotation prioritisation that improves the selection of the few shots to be labelled for optimal model performance. We propose Active PETs, a novel weighted approach that utilises an ensemble of Pattern Exploiting Training (PET) models based on various language models, to actively select unlabelled data as candidates for annotation. Using Active PETs for data selection shows consistent improvement over the state-of-the-art active learning method, on two technical fact-checking datasets and using six different pretrained language models. We show further improvement with Active PETs-o, which further integrates an oversampling strategy. Our approach enables effective selection of instances to be labelled where unlabelled data is abundant but resources for labelling are limited, leading to consistently improved few-shot claim verification performance. Our code will be available upon publication.

翻译：为了减轻数据稀缺对事实检查系统的影响,我们注重对少数数据要求的核实。尽管最近通过提出先进的语言模型对少数数据进行了分类,但缺乏数据说明的优先排序研究,从而改进了为最佳模型性能所贴标签的少数几张照片的选择。我们提议采用主动的 PETs, 这是一种新型的加权方法,利用各种语言模型的典型剥削培训模型的组合,积极选择未贴标签的数据作为注释的候选数据。在数据选择中使用主动的 PETs 显示,与最新活跃学习方法相比,在两个技术事实审查数据集和使用六个不同的预先培训的语言模型方面,持续改进。我们展示了与积极 PETs-o 的进一步改进,这进一步整合了过度抽样战略。我们的方法使得能够有效地选择在无标签数据但标签资源有限的地方贴上标签,从而不断改进少数点的认证绩效。我们的代码将在出版时公布。

相关内容

小样本学习

关注 215

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日