This paper describes the system developed by the USTC-NELSLIP team for SemEval-2022 Task 11 Multilingual Complex Named Entity Recognition (MultiCoNER). We propose a gazetteer-adapted integration network (GAIN) to improve the performance of language models for recognizing complex named entities. The method first adapts the representations of gazetteer networks to those of language models by minimizing the KL divergence between them. After adaptation, these two networks are then integrated for backend supervised named entity recognition (NER) training. The proposed method is applied to several state-of-the-art Transformer-based NER models with a gazetteer built from Wikidata, and shows great generalization ability across them. The final predictions are derived from an ensemble of these trained models. Experimental results and detailed analysis verify the effectiveness of the proposed method. The official results show that our system ranked 1st on three tracks (Chinese, Code-mixed and Bangla) and 2nd on the other ten tracks in this task.
翻译:本文介绍由USTC-NELLIP小组为SemEval-2022任务11多语言复杂命名实体识别开发的系统。我们提议建立一个经地名录调整的整合网络,以改进识别复杂命名实体的语言模型的性能。该方法首先通过尽量减少地名模型之间的差异,使地名录网络的表示方式与语文模型的表示方式相适应。经过调整后,这两个网络随后被整合用于后端监督的命名实体识别(NER)培训。拟议方法被应用于几个以现代变异器为基础的NER模型,该模型以维基数据为地名录,并展示了这些模型的高度通用能力。最后的预测来自这些经过培训的模型的一组。实验结果和详细分析可以核实拟议方法的有效性。正式结果显示,我们的系统在三个轨道(中国、代码混合和Bangla)上排第1位,而在这项任务的其他十个轨道上排第2位。