In this paper, we introduce the NameRec* task, which aims to do highly accurate and fine-grained person name recognition. Traditional Named Entity Recognition models have good performance in recognising well-formed person names from text with consistent and complete syntax, such as news articles. However, there are rapidly growing scenarios where sentences are of incomplete syntax and names are in various forms such as user-generated contents and academic homepages. To address person name recognition in this context, we propose a fine-grained annotation scheme based on anthroponymy. To take full advantage of the fine-grained annotations, we propose a Co-guided Neural Network (CogNN) for person name recognition. CogNN fully explores the intra-sentence context and rich training signals of name forms. To better utilize the inter-sentence context and implicit relations, which are extremely essential for recognizing person names in long documents, we further propose an Inter-sentence BERT Model (IsBERT). IsBERT has an overlapped input processor, and an inter-sentence encoder with bidirectional overlapped contextual embedding learning and multi-hop inference mechanisms. To derive benefit from different documents with a diverse abundance of context, we propose an advanced Adaptive Inter-sentence BERT Model (Ada-IsBERT) to dynamically adjust the inter-sentence overlapping ratio to different documents. We conduct extensive experiments to demonstrate the superiority of the proposed methods on both academic homepages and news articles.
翻译:在本文中,我们引入了 NameRec* 任务, 目的是进行非常准确和精细的人名识别。 传统命名实体识别模型在通过一致和完整的语法(如新闻文章)从文本中识别完善的人名方面表现良好, 然而, 正在迅速出现一些情况, 判决不完全的语法, 姓名以多种形式出现, 如用户生成的内容和学术主页等。 为了在此背景下处理姓名识别问题, 我们提议了一个基于人类语义比例的细微辨别方案。 为了充分利用细微的描述, 我们提议了一个共同制导的神经系统网络(CogNNN), 用于识别个人姓名。 CogNN 充分探索了名称表格的内涵背景和丰富的培训信号。 为了更好地利用对长期文件中识别个人名极为重要的内涵背景和隐含关系。 我们进一步提议了一个Inter- sent BERT 模型( ISBERT) 。 为了充分利用精细的语义说明, 我们提议了一个双向双向双向的双向双向双向的双向内部文件的双向内部学习和双向内部调整。