Recent years have witnessed the improving performance of Chinese Named Entity Recognition (NER) from proposing new frameworks or incorporating word lexicons. However, the inner composition of entity mentions in character-level Chinese NER has been rarely studied. Actually, most mentions of regular types have strong name regularity. For example, entities end with indicator words such as "company" or "bank" usually belong to organization. In this paper, we propose a simple but effective method for investigating the regularity of entity spans in Chinese NER, dubbed as Regularity-Inspired reCOgnition Network (RICON). Specifically, the proposed model consists of two branches: a regularity-aware module and a regularityagnostic module. The regularity-aware module captures the internal regularity of each span for better entity type prediction, while the regularity-agnostic module is employed to locate the boundary of entities and relieve the excessive attention to span regularity. An orthogonality space is further constructed to encourage two modules to extract different aspects of regularity features. To verify the effectiveness of our method, we conduct extensive experiments on three benchmark datasets and a practical medical dataset. The experimental results show that our RICON significantly outperforms previous state-of-the-art methods, including various lexicon-based methods.
翻译:近些年来,中国名实体识别(NER)在提出新框架或纳入字词词词法方面表现有所改善。然而,中国名实体识别(NER)的内在构成很少研究。实际上,大多数经常类型的提及都具有很强的名称常态性。例如,以“公司”或“银行”等指示词结尾的实体通常属于组织。在本文件中,我们提出了一个简单而有效的方法,用以调查中国名实体在中国名实体识别(NER)的常规性,称为“常规性激励重新识别网络(RICON ) 。具体地说,拟议模式由两个分支组成:常识型模块和常识型模块。常识型模块捕捉每个系统的内部规律性,以更好地进行实体类型预测,而常识性模块则用于确定实体的边界,减轻对常规性的过度关注。一个或多层次的空间进一步构建了鼓励两个模块,以提取常规性特征的不同方面。具体地说,为了核实我们的方法的有效性,我们对三个基准数据设置的模块进行了广泛的实验,包括我们以前以基准为基础的数据配置和各种实际格式数据设置的方法。