Extracting relational triples from unstructured text is an essential task in natural language processing and knowledge graph construction. Existing approaches usually contain two fundamental steps: (1) finding the boundary positions of head and tail entities; (2) concatenating specific tokens to form triples. However, nearly all previous methods suffer from the problem of error accumulation, i.e., the boundary recognition error of each entity in step (1) will be accumulated into the final combined triples. To solve the problem, in this paper, we introduce a fresh perspective to revisit the triple extraction task, and propose a simple but effective model, named DirectRel. Specifically, the proposed model first generates candidate entities through enumerating token sequences in a sentence, and then transforms the triple extraction task into a linking problem on a "head $\rightarrow$ tail" bipartite graph. By doing so, all triples can be directly extracted in only one step. Extensive experimental results on two widely used datasets demonstrate that the proposed model performs better than the state-of-the-art baselines.
翻译:从非结构化文本中提取关系三重是自然语言处理和知识图表构建中的一项基本任务。现有方法通常包含两个基本步骤:(1) 寻找头和尾实体的边界位置;(2) 将特定标记混为三重;然而,几乎所有以前的方法都存在错误积累问题,即每个实体在步骤中的边界识别错误(1)将累积到最后的三重组合中。为了解决这个问题,我们在本文件中引入了一个全新的视角来重新审视三重提取任务,并提出一个简单而有效的模型,名为DirectRel。具体地说,拟议模型首先通过在句子中列出符号序列生成候选实体,然后将三重提取任务转换成“美元/直径程尾巴”双方图上的一个连接问题。这样,所有三重物只能直接从一个步骤中提取。在两个广泛使用的数据集上的广泛实验结果表明,拟议模型的运行状况比最先进的基线要好。