We present DualNER, a simple and effective framework to make full use of both annotated source language corpus and unlabeled target language text for zero-shot cross-lingual named entity recognition (NER). In particular, we combine two complementary learning paradigms of NER, i.e., sequence labeling and span prediction, into a unified multi-task framework. After obtaining a sufficient NER model trained on the source data, we further train it on the target data in a {\it dual-teaching} manner, in which the pseudo-labels for one task are constructed from the prediction of the other task. Moreover, based on the span prediction, an entity-aware regularization is proposed to enhance the intrinsic cross-lingual alignment between the same entities in different languages. Experiments and analysis demonstrate the effectiveness of our DualNER. Code is available at https://github.com/lemon0830/dualNER.
翻译:我们提出了一个简单而有效的框架,即“双重身份”框架,以充分利用附加说明的原始语言材料和未加标记的目标语言文本,进行零点跨语文名称的实体识别(NER),特别是,我们将“双重身份”的两个互补学习模式,即序列标签和预测,合并为一个统一的多任务框架,在获得足够的关于源数据的培训的“双重身份”模型后,我们进一步以“双重教学”的方式对它进行目标数据培训,其中一项任务的假标签是根据对另一项任务的预测而制作的。此外,根据跨范围预测,建议“实体意识”正规化,以加强不同语文的同一实体之间的内在跨语言一致性。实验和分析表明我们“双重身份”的有效性。代码可在https://github.com/lemon0830/dualNER查阅。