Gazetteer is widely used in Chinese named entity recognition (NER) to enhance span boundary detection and type classification. However, to further understand the generalizability and effectiveness of gazetteers, the NLP community still lacks a systematic analysis of the gazetteer-enhanced NER model. In this paper, we first re-examine the effectiveness several common practices of the gazetteer-enhanced NER models and carry out a series of detailed analysis to evaluate the relationship between the model performance and the gazetteer characteristics, which can guide us to build a more suitable gazetteer. The findings of this paper are as follows: (1) the gazetteer improves most of the situations that the traditional NER model datasets are difficult to learn. (2) the performance of model greatly benefits from the high-quality pre-trained lexeme embeddings. (3) a good gazetteer should cover more entities that can be matched in both the training set and testing set.
翻译:地名词典在中国名称的实体识别中被广泛用于加强边界探测和类型分类,但是,为了进一步理解地名词典的通用性和有效性,国家地名词典社区仍然缺乏对地名词典增强的地名词典模型的系统分析,在本文件中,我们首先重新审查地名词典增强的地名词典模型的若干共同做法的有效性,并进行一系列详细分析,以评估示范性工作与地名词典特点之间的关系,这可以指导我们建立一个更合适的地名词典。