实体检测的端到端方法：提议者和回归器 (End-to-End Entity Detection with Proposer and Regressor)

Named entity recognition is a traditional task in natural language processing. In particular, nested entity recognition receives extensive attention for the widespread existence of the nesting scenario. The latest research migrates the well-established paradigm of set prediction in object detection to cope with entity nesting. However, the manual creation of query vectors, which fail to adapt to the rich semantic information in the context, limits these approaches. An end-to-end entity detection approach with proposer and regressor is presented in this paper to tackle the issues. First, the proposer utilizes the feature pyramid network to generate high-quality entity proposals. Then, the regressor refines the proposals for generating the final prediction. The model adopts encoder-only architecture and thus obtains the advantages of the richness of query semantics, high precision of entity localization, and easiness of model training. Moreover, we introduce the novel spatially modulated attention and progressive refinement for further improvement. Extensive experiments demonstrate that our model achieves advanced performance in flat and nested NER, achieving a new state-of-the-art F1 score of 80.74 on the GENIA dataset and 72.38 on the WeiboNER dataset.

翻译：命名实体识别是自然语言处理中的一个传统任务，尤其是由于嵌套情况的广泛存在，嵌套实体识别受到了广泛关注。最新的研究将目标检测中已经成熟的集合预测范例应用于实体识别任务以解决嵌套问题。然而，这些方法中的查询向量手动创建，无法适应上下文中丰富的语义信息，限制了这些方法的应用。本文提出了一种端到端实体检测方法，采用提议者和回归器来解决这些问题。首先，提议者使用特征金字塔网络生成高质量的实体提案。然后，回归器调整提案以生成最终预测结果。该模型采用只有编码器的体系结构，因此具有查询语义丰富、实体定位精度高和模型训练易于管理的优点。此外，我们引入了新颖的空间调制注意力和渐进细化算法进行性能优化。通过广泛的实验证明，我们的模型在扁平与嵌套NER中都取得了先进的性能，GENIA 数据集F1得分为80.74，WeiboNER数据集F1得分为72.38。

相关内容

实体

关注 12

实体（entity）是有可区别性且独立存在的某种事物，但它不需要是物质上的存在。尤其是抽象和法律拟制也通常被视为实体。实体可被看成是一包含有子集的集合。在哲学里，这种集合被称为客体。实体可被使用来指涉某个可能是人、动物、植物或真菌等不会思考的生命、无生命物体或信念等的事物。在这一方面，实体可以被视为一全包的词语。有时，实体被当做本质的广义，不论即指的是否为物质上的存在，如时常会指涉到的无物质形式的实体－语言。更有甚者，实体有时亦指存在或本质本身。在法律上，实体是指能具有权利和义务的事物。这通常是指法人，但也包括自然人。

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

专知会员服务

51+阅读 · 2020年5月28日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日