Recent advances in Named Entity Recognition (NER) show that document-level contexts can significantly improve model performance. In many application scenarios, however, such contexts are not available. In this paper, we propose to find external contexts of a sentence by retrieving and selecting a set of semantically relevant texts through a search engine, with the original sentence as the query. We find empirically that the contextual representations computed on the retrieval-based input view, constructed through the concatenation of a sentence and its external contexts, can achieve significantly improved performance compared to the original input view based only on the sentence. Furthermore, we can improve the model performance of both input views by Cooperative Learning, a training method that encourages the two input views to produce similar contextual representations or output label distributions. Experiments show that our approach can achieve new state-of-the-art performance on 8 NER data sets across 5 domains.
翻译:近期在命名实体识别(NER)方面取得的进展表明,文件级别背景可以显著改善模型性能,但在许多应用情景中,这种背景并不存在。在本文件中,我们提议通过搜索引擎寻找和选择一套具有语义相关性的文本,最初的句子是查询。我们从经验中发现,根据检索输入视图计算出的背景表现,通过将句子和外部背景混为一体而构建,与仅基于该句子的原始输入视图相比,能够显著改进业绩。此外,我们可以改进合作学习两种输入观点的模型性能,这是一种鼓励两种输入观点产生类似背景表达或输出标签分布的培训方法。实验表明,我们的方法可以在5个领域8个NER数据集上实现新的最新性能。