Recently, many works have tried to augment the performance of Chinese named entity recognition (NER) using word lexicons. As a representative, Lattice-LSTM (Zhang and Yang, 2018) has achieved new benchmark results on several public Chinese NER datasets. However, Lattice-LSTM has a complex model architecture. This limits its application in many industrial areas where real-time NER responses are needed. In this work, we propose a simple but effective method for incorporating the word lexicon into the character representations. This method avoids designing a complicated sequence modeling architecture, and for any neural NER model, it requires only subtle adjustment of the character representation layer to introduce the lexicon information. Experimental studies on four benchmark Chinese NER datasets show that our method achieves an inference speed up to 6.15 times faster than those of state-ofthe-art methods, along with a better performance. The experimental results also show that the proposed method can be easily incorporated with pre-trained models like BERT.
翻译:最近,许多著作试图用词法来提高中国名称实体识别(NER)的绩效。作为代表,Lattice-LSTM(Zhang和Yang,2018年)在中国几个公开的NER数据集上取得了新的基准结果。然而,Lattice-LSTM有一个复杂的模型结构。这限制了它在许多需要实时NER反应的许多工业领域的应用。在这项工作中,我们提出了一个简单而有效的方法,将词法纳入字符表达中。这种方法避免设计复杂的序列建模结构,也避免了任何神经NER模型,它只需要对字符代表层进行微妙的调整,以引入词汇信息。对四个基准中国NER数据集的实验研究表明,我们的方法比最先进的方法更快到6.15倍于6.15倍,同时提高性能。实验结果还表明,拟议的方法可以很容易地与BERT等预先培训的模型相结合。