Despite the huge and continuous advances in computational linguistics, the lack of annotated data for Named Entity Recognition (NER) is still a challenging issue, especially in low-resource languages and when domain knowledge is required for high-quality annotations. Recent findings in NLP show the effectiveness of cloze-style questions in enabling language models to leverage the knowledge they acquired during the pre-training phase. In our work, we propose a simple and intuitive adaptation of Pattern-Exploiting Training (PET), a recent approach which combines the cloze-questions mechanism and fine-tuning for few-shot learning: the key idea is to rephrase the NER task with patterns. Our approach achieves considerably better performance than standard fine-tuning and comparable or improved results with respect to other few-shot baselines without relying on manually annotated data or distant supervision on three benchmark datasets: NCBI-disease, BC2GM and a private Italian biomedical corpus.
翻译:尽管计算语言方面不断取得巨大进步,但缺乏关于命名实体识别(NER)的附加说明数据仍是一个具有挑战性的问题,特别是在低资源语言方面,当高质量说明需要领域知识时,NLP最近的调查结果显示,在使语言模型能够利用其在培训前阶段获得的知识方面,Cluze式的问题具有效力。在我们的工作中,我们建议对模式开发培训进行简单和直观的调整,这是最近将凝聚问题机制与微调相结合的方法,对微小的学习进行微调:关键的想法是用模式重新表述净化任务。我们的方法比标准微调和可比较或改进的结果要好得多,而不必依靠人工附加说明的数据或对三个基准数据集进行遥远的监督:NCI-疾病、BC2GM和一个意大利私人生物医学资料库。