Named entity recognition is a fundamental task in natural language processing, identifying the span and category of entities in unstructured texts. The traditional sequence labeling methodology ignores the nested entities, i.e. entities included in other entity mentions. Many approaches attempt to address this scenario, most of which rely on complex structures or have high computation complexity. The representation learning of the heterogeneous star graph containing text nodes and type nodes is investigated in this paper. In addition, we revise the graph attention mechanism into a hybrid form to address its unreasonableness in specific topologies. The model performs the type-supervised sequence labeling after updating nodes in the graph. The annotation scheme is an extension of the single-layer sequence labeling and is able to cope with the vast majority of nested entities. Extensive experiments on public NER datasets reveal the effectiveness of our model in extracting both flat and nested entities. The method achieved state-of-the-art performance on both flat and nested datasets. The significant improvement in accuracy reflects the superiority of the multi-layer labeling strategy.
翻译:命名实体的识别是自然语言处理中的一项基本任务,查明非结构化文本中实体的范围和类别。传统的排序标签方法忽略了嵌套实体,即其他实体中包括的实体。许多办法试图解决这一假设,其中大多数依赖复杂的结构或具有很高的计算复杂性。本文调查了包含文本节点和类型节点的混杂恒星图的演示学习。此外,我们还将图形关注机制修改为混合形式,以解决其在特定表层中的不合理性。模型在更新图中节点后,将类型监督的顺序标签置于类型监督之下。注解方案是单层序列标签的延伸,能够应对绝大多数嵌套实体。公共净化数据集的广泛实验揭示了我们模型在提取固定和嵌套实体方面的有效性。该方法在固定和嵌套数据集中都实现了最新状态的性表现。精度的大幅提高反映了多层标签战略的优越性。