Neural methods have been shown to achieve high performance in Named Entity Recognition (NER), but rely on costly high-quality labeled data for training, which is not always available across languages. While previous works have shown that unlabeled data in a target language can be used to improve cross-lingual model performance, we propose a novel adversarial approach (AdvPicker) to better leverage such data and further improve results. We design an adversarial learning framework in which an encoder learns entity domain knowledge from labeled source-language data and better shared features are captured via adversarial training - where a discriminator selects less language-dependent target-language data via similarity to the source language. Experimental results on standard benchmark datasets well demonstrate that the proposed method benefits strongly from this data selection process and outperforms existing state-of-the-art methods; without requiring any additional external resources (e.g., gazetteers or via machine translation). The code is available at https://aka.ms/AdvPicker
翻译:在命名实体识别(NER)中,神经学方法显示能够取得高性能,但依靠成本昂贵的高品质标签培训数据,而这些数据并不总是在所有语文之间都能得到。虽然以前的工作表明,可以使用目标语言的未贴标签数据来改进跨语言模型的性能,但我们提议采用新的对抗性方法(AdvPicker),以更好地利用这些数据和进一步改进成果。我们设计了一个对抗性学习框架,在这种框架内,编码员从标签的源语言数据中学习实体域知识,并通过对抗性培训捕捉更好的共享特征。在对抗性培训中,歧视者通过类似源语言选择不太依赖语言的目标语言数据。标准基准数据集的实验结果很好地表明,拟议方法从这一数据选择程序中受益匪浅,并超越了现有的最新方法;不需要任何额外的外部资源(例如,地名录或通过机器翻译),代码可在https://akas.ms/AdvPicker查阅。