Relation Extraction is an important task in Information Extraction which deals with identifying semantic relations between entity mentions. Traditionally, relation extraction is carried out after entity extraction in a "pipeline" fashion, so that relation extraction only focuses on determining whether any semantic relation exists between a pair of extracted entity mentions. This leads to propagation of errors from entity extraction stage to relation extraction stage. Also, entity extraction is carried out without any knowledge about the relations. Hence, it was observed that jointly performing entity and relation extraction is beneficial for both the tasks. In this paper, we survey various techniques for jointly extracting entities and relations. We categorize techniques based on the approach they adopt for joint extraction, i.e. whether they employ joint inference or joint modelling or both. We further describe some representative techniques for joint inference and joint modelling. We also describe two standard datasets, evaluation techniques and performance of the joint extraction approaches on these datasets. We present a brief analysis of application of a general domain joint extraction approach to a Biomedical dataset. This survey is useful for researchers as well as practitioners in the field of Information Extraction, by covering a broad landscape of joint extraction techniques.
翻译:在《信息提取》中,《信息提取》是一项重要任务,涉及查明实体之间提及的语义关系。传统上,在实体提取后,以“管道”方式进行关系提取,因此,关系提取仅侧重于确定一对被提取实体之间是否存在任何语义关系。这导致从实体提取阶段到相关提取阶段的错误的传播。此外,在进行实体提取时对这种关系没有任何了解。因此,人们注意到,联合执行的实体和关系提取对这两项任务都有益。在本文件中,我们调查联合提取实体和关系的各种技术。我们根据联合提取所采用的方法对技术进行分类,即:它们是否采用联合推断或联合建模,还是两者兼用。我们进一步描述一些用于联合推断和联合建模的代表性技术。我们还描述了两个标准数据集、评价技术和联合提取方法的绩效。我们简要分析了对生物医学数据集采用的一般域联合提取方法的情况。这项调查对研究人员以及信息提取领域实践者有用,包括联合提取技术的广泛景观。