Zero-shot cross-lingual named entity recognition (NER) aims at transferring knowledge from annotated and rich-resource data in source languages to unlabeled and lean-resource data in target languages. Existing mainstream methods based on the teacher-student distillation framework ignore the rich and complementary information lying in the intermediate layers of pre-trained language models, and domain-invariant information is easily lost during transfer. In this study, a mixture of short-channel distillers (MSD) method is proposed to fully interact the rich hierarchical information in the teacher model and to transfer knowledge to the student model sufficiently and efficiently. Concretely, a multi-channel distillation framework is designed for sufficient information transfer by aggregating multiple distillers as a mixture. Besides, an unsupervised method adopting parallel domain adaptation is proposed to shorten the channels between the teacher and student models to preserve domain-invariant features. Experiments on four datasets across nine languages demonstrate that the proposed method achieves new state-of-the-art performance on zero-shot cross-lingual NER and shows great generalization and compatibility across languages and fields.
翻译:零点跨语言跨名称实体识别(NER)旨在将知识从源语言的附加说明和丰富资源数据向目标语言的无标签和精干资源数据转移;基于师生蒸馏框架的现有主流方法忽视了在经过培训的语文模型中间层的丰富和补充信息,在转让过程中很容易丢失域变量信息;在这项研究中,提议采用一种短道蒸馏器(MSD)混合方法,将教师模型中丰富的等级信息充分互动,并将知识充分、高效地传递给学生模型;具体地说,设计了一个多通道蒸馏框架,通过将多个蒸馏器合并为混合体,充分传递信息;此外,还提议采用一种未经监督的平行领域适应方法,缩短教师模型与学生模型之间的渠道,以保持域变量特征;对九种语言的四个数据集的实验表明,拟议的方法在零式交叉语言NER上取得了新的最新水平表现,并显示各语文和不同领域的高度普及和兼容性。