Named entity recognition (NER) in Chinese is essential but difficult because of the lack of natural delimiters. Therefore, Chinese Word Segmentation (CWS) is usually considered as the first step for Chinese NER. However, models based on word-level embeddings and lexicon features often suffer from segmentation errors and out-of-vocabulary (OOV) words. In this paper, we investigate a Convolutional Attention Network called CAN for Chinese NER, which consists of a character-based convolutional neural network (CNN) with local-attention layer and a gated recurrent unit (GRU) with global self-attention layer to capture the information from adjacent characters and sentence contexts. Also, compared to other models, not depending on any external resources like lexicons and employing small size of char embeddings make our model more practical. Extensive experimental results show that our approach outperforms state-of-the-art methods without word embedding and external lexicon resources on different domain datasets including Weibo, MSRA and Chinese Resume NER dataset.
翻译:中文命名实体识别(NER)至关重要,但由于缺乏自然划界器,因此,中国单词分割(CWS)通常被视为中国净值的第一步,但基于字层嵌入和词汇特征的模型往往存在分解错误和校外词汇(OOOV)等词。在本文中,我们调查了一个革命关注网络,称为中国净值网络,由基于字符的动态神经网络(CNN)组成,由基于本地注意层和带有全球自留层的封闭式常规单元(GRU)组成,从相邻字符和句子背景中获取信息。此外,与其他模型相比,不依赖任何外部资源,如词汇和小型字符嵌入(OOOOV),使我们的模型更加实用。广泛的实验结果表明,我们的方法超越了包括Weibo、MSRA和中国Resume NER数据集在内的不同域域域域域域域域中不以词嵌入和外部词汇资源。