Current State-of-the-Art models in Named Entity Recognition (NER) are neural models with a Conditional Random Field (CRF) as the final network layer, and pre-trained "contextual embeddings". The CRF layer is used to facilitate global coherence between labels, and the contextual embeddings provide a better representation of words in context. However, both of these improvements come at a high computational cost. In this work, we explore two simple techniques that substantially improve NER performance over a strong baseline with negligible cost. First, we use multiple pre-trained embeddings as word representations via concatenation. Second, we constrain the tagger, trained using a cross-entropy loss, during decoding to eliminate illegal transitions. While training a tagger on CoNLL 2003 we find a $786$\% speed-up over a contextual embeddings-based tagger without sacrificing strong performance. We also show that the concatenation technique works across multiple tasks and datasets. We analyze aspects of similarity and coverage between pre-trained embeddings and the dynamics of tag co-occurrence to explain why these techniques work. We provide an open source implementation of our tagger using these techniques in three popular deep learning frameworks --- TensorFlow, Pytorch, and DyNet.
翻译:在命名实体识别(NER)中,目前最先进的模型是神经模型,以有条件随机字段(CRF)作为最后的网络层,以及经过预先训练的“理论嵌入””。通用报告格式层用于促进标签之间的全球一致性,而背景嵌入则提供了更好的文字表达方式。但是,这两个改进都是以高计算成本取得的。在这项工作中,我们探索了两种简单技术,在强大的基线基础上大大改进净化性能,费用微不足道。首先,我们使用多种预先训练的嵌入作为通过连接的文字表达方式。第二,我们限制调试器,在解码消除非法过渡期间,用交叉作物损耗进行训练。在2003年CONLLL(C)上培训一个调试器时,我们发现一个基于背景嵌入的调控器的速度是786美元,而没有牺牲强的性能。我们还展示了配制技术在多个任务和数据集之间起作用。我们分析了预先训练的嵌入与标签连接点的动态之间的相似性和覆盖面。第二,我们利用这些深层理解框架来解释这些技术的开放源。