Much of named entity recognition (NER) research focuses on developing dataset-specific models based on data from the domain of interest, and a limited set of related entity types. This is frustrating as each new dataset requires a new model to be trained and stored. In this work, we present a ``versatile'' model -- the Prompting-based Unified NER system (PUnifiedNER) -- that works with data from different domains and can recognise up to 37 entity types simultaneously, and theoretically it could be as many as possible. By using prompt learning, PUnifiedNER is a novel approach that is able to jointly train across multiple corpora, implementing intelligent on-demand entity recognition. Experimental results show that PUnifiedNER leads to significant prediction benefits compared to dataset-specific models with impressively reduced model deployment costs. Furthermore, the performance of PUnifiedNER can achieve competitive or even better performance than state-of-the-art domain-specific methods for some datasets. We also perform comprehensive pilot and ablation studies to support in-depth analysis of each component in PUnifiedNER.
翻译:大部分命名实体识别(NER)研究侧重于根据利益领域的数据和一组有限的相关实体类型开发数据集特有模型。这令人沮丧,因为每个新的数据集都需要培训和储存新的模型。在这项工作中,我们提出了一个“versatile”模型——基于快速的统一NER系统(Penific NER)——该模型与不同领域的数据合作,可以同时识别多达37个实体类型,理论上它可能尽可能多。通过及时学习,“Unid NER”是一种新颖的方法,能够联合培训多个公司,在需求实体的识别方面运用智能。实验结果显示,“UnidNER”与具有显著降低示范部署成本的数据集特有模型相比,具有重大的预测效益。此外,“UnidNER”的性能可以比一些数据集的州域专用方法更具有竞争力或更好性。我们还进行全面试点和联系研究,以支持对“UnizedNER”的每个组成部分进行深入分析。