Large pretrained Transformer-based language models like BERT and GPT have changed the landscape of Natural Language Processing (NLP). However, fine tuning such models still requires a large number of training examples for each target task, thus annotating multiple datasets and training these models on various downstream tasks becomes time consuming and expensive. In this work, we propose a simple extension of the Prototypical Networks for few-shot text classification. Our main idea is to replace the class prototypes by Gaussians and introduce a regularization term that encourages the examples to be clustered near the appropriate class centroids. Experimental results show that our method outperforms various strong baselines on 13 public and 4 internal datasets. Furthermore, we use the class distributions as a tool for detecting potential out-of-distribution (OOD) data points during deployment.
翻译:BERT和GPT等大型预先培训的基于变异器的语言模型改变了自然语言处理(NLP)的格局。然而,微调这些模型仍然需要为每个目标任务提供大量培训范例,因此,就各种下游任务说明多个数据集和培训这些模型会耗费时间和费用。在这项工作中,我们建议简单扩展原型网络,以进行微小的文本分类。我们的主要想法是用Gaussians取代类原型,并引入一个正规化的术语,鼓励将这些示例集中在适当的类类固醇附近。实验结果显示,我们的方法超越了13个公共数据集和4个内部数据集的各种强有力的基线。此外,我们使用分类分布作为工具,在部署期间探测潜在的分解(OOD)数据点。