An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task. However, these models sample a large number of negative entities and negative relations during the model training, which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance. In order to address the above issues, we propose a two-phase paradigm for the span-based joint entity and relation extraction, which involves classifying the entities and relations in the first phase, and predicting the types of these entities and relations in the second phase. The two-phase paradigm enables our model to significantly reduce the data distribution gap, including the gap between negative entities and other entities, as well as the gap between negative relations and other relations. In addition, we make the first attempt at combining entity type and entity distance as global features, which has proven effective, especially for the relation extraction. Experimental results on several datasets demonstrate that the spanbased joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-of-the-art span-based models for the joint extraction task, establishing a new standard benchmark. Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.
翻译:为调查联合实体和关系提取任务的跨边界模式进行了详尽的研究,然而,这些模式抽样调查了模型培训期间的大量负面实体和消极关系,这些实体和消极关系固然重要,但导致数据分布严重失衡,进而导致模型表现不理想。为了解决上述问题,我们为跨边界联合实体和关系提取提出一个两阶段模式,其中包括对第一阶段的实体和关系进行分类,并预测这些实体的类型和第二阶段的关系。两阶段模式使我们的模型能够显著缩小数据分配差距,包括负面实体和其他实体之间的差距,以及负面关系和其他关系之间的差距。此外,我们首次尝试将实体类型和实体距离作为全球特征加以合并,事实证明这些特征是有效的,特别是对关系提取而言。几个数据集的实验结果表明,基于跨边界的联合提取模式随着两阶段模式的扩大,全球特征始终优于以往的基于现状的跨边界模式,从而确定了新的标准特征。 验证了拟议的全球范例和定量分析。