There has been a growing academic interest in the recognition of nested named entities in many domains. We tackle the task with a novel local hypergraph-based method: We first propose start token candidates and generate corresponding queries with their surrounding context, then use a query-based sequence labeling module to form a local hypergraph for each candidate. An end token estimator is used to correct the hypergraphs and get the final predictions. Compared to span-based approaches, our method is free of the high computation cost of span sampling and the risk of losing long entities. Sequential prediction makes it easier to leverage information in word order inside nested structures, and richer representations are built with a local hypergraph. Experiments show that our proposed method outperforms all the previous hypergraph-based and sequence labeling approaches with large margins on all four nested datasets. It achieves a new state-of-the-art F1 score on the ACE 2004 dataset and competitive F1 scores with previous state-of-the-art methods on three other nested NER datasets: ACE 2005, GENIA, and KBP 2017.
翻译:学术界对在许多领域识别嵌入名称实体的兴趣日益浓厚。我们用一种新的本地高光学方法来应对这项任务:我们首先提议启动象征性候选人,并根据周围环境生成相应的查询,然后使用基于查询的序列标签模块来为每个候选人形成本地高光谱。使用一个终端象征性估计器来纠正高光谱和获得最终预测。与基于跨区域的方法相比,我们的方法没有计算跨区域抽样的高成本和损失长实体的风险。序列预测使得更容易在嵌入结构中调用文字顺序信息,并且用本地高光谱建立更丰富的代表。实验显示,我们拟议的方法比所有四个嵌入数据集以往所有基于高光谱和序列的标签方法都高。在ACE 2004数据集上实现了一个新的最先进的F1分,在另外三个嵌入式国家NER数据集上实现了具有竞争力的F1分:ACE 2005、GENIA和KBP 2017。